Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More IDNA roundtrippability issues #760

Closed
TimothyGu opened this issue Mar 9, 2023 · 4 comments
Closed

More IDNA roundtrippability issues #760

TimothyGu opened this issue Mar 9, 2023 · 4 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: idna

Comments

@TimothyGu
Copy link
Member

TimothyGu commented Mar 9, 2023

Here are a few more issues (from @valenting in #603 (comment)). We need to sort out whose fault this is: the spec or the whatwg-url implementation. (I've also included a few additional examples to defeat ASCII-only fast path in Chrome.)

input whatwg-url Chrome WebKit Live URL Viewer
http://a.xn--xn-----/ http://a.xn----/ http://a.xn--xn-----/ error link
http://é.xn--xn-----/ http://xn--9ca.xn----/ error error link
http://a.xn----/ http://a.-/ http://a.xn----/ error link
http://é.xn----/ http://xn--9ca.-/ error error link
http://a.xn--/ http://a./ http://a.xn----/ error link
http://é.xn--/ http://xn--9ca./ error error link

Without digging too deep, it seems like Punycode-decoding all of these labels result in an all-ASCII label, that should never have been Punycode-encoded in the first place. However, RFC 3492 says the following:

Using hyphen-minus as the delimiter implies that the encoded string can end with a hyphen-minus only if the Unicode string consists entirely of basic code points, but IDNA forbids such strings from being encoded.

I'm not yet sure where in IDNA this requirement is set, but if Unicode IDNA included this requirement then that'd probably solve this issue.


Update: Indeed, IDNA2003's ToUnicode (https://www.rfc-editor.org/rfc/rfc3490#section-4.2) includes:

  1. Verify that the sequence begins with the ACE prefix, and save a copy of the sequence.

  2. Remove the ACE prefix.

  3. Decode the sequence using the decoding algorithm in [PUNYCODE] and fail if there is an error. Save a copy of the result of this step.

  4. Apply ToASCII.

  5. Verify that the result of step 6 matches the saved copy from step 3, using a case-insensitive ASCII comparison.

Basically, it includes roundtrippability test as part of the ToUnicode algorithm. This test is absent from UTS 46's ToUnicode and processing steps.


Update 2: IDNA2008's Domain Name Lookup Protocol (https://www.rfc-editor.org/rfc/rfc5891.html#section-5) has the same roundtrippability test. Section 5.3 has:

If the input to this procedure appears to be an A-label (i.e., it starts in "xn--", interpreted case-insensitively), the lookup application MAY attempt to convert it to a U-label … If the label is converted to Unicode (i.e., to U-label form) using the Punycode decoding algorithm, then the processing specified in [the following] two sections MUST be performed, and the label MUST be rejected if the resulting label is not identical to the original.

The following two sections would basically validate the U-label, and then convert the U-label back into an A-label using Punycode. So this test is essentially equivalent to the IDNA2003 version.

@rmisev
Copy link
Member

rmisev commented Mar 9, 2023

AFAIK WebKit uses the ICU library for IDNA. In ICU a check is added to report failure on xn-- and xn--ASCII- labels after the "If the label starts with “xn--”" step. This check hasn't been added to UTS 46 standard yet. More info here:

This explains, why these tests return an error in the WebKit, but success in the whatwg-url.

@annevk
Copy link
Member

annevk commented Mar 10, 2023

Interesting, per comments on the second issue @markusicu already submitted feedback for this, but it apparently hasn't been processed yet? @macchiati do you happen to know if that feedback is still pending or did it get lost?

@markusicu
Copy link

Sorry, my fault. It's approved but I am behind on UTC action items.

[165-A48] Action Item for Markus Scherer, Editorial Committee: Update UTS #46 to validate ACE label edge cases, see L2/20-240 item F7. For Unicode 14.

There are a couple of others relevant for UTS46... :-/

@xfq xfq added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Dec 8, 2023
rmisev added a commit to upa-url/idna that referenced this issue Jul 21, 2024
@annevk
Copy link
Member

annevk commented Nov 29, 2024

Let's close this in favor of web-platform-tests/wpt#48301 and #836 that will end up resolving this.

@annevk annevk closed this as completed Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: idna
Development

No branches or pull requests

5 participants