-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we forbid U+226E (≮) and U+226F (≯) in hosts? #733
Comments
On my computer |
Fundamentally, I'm not even sure why the decomposition of these characters is even relevant - UTS46 normalises them to a composed form and Punycodes that, so none of these characters should ever result in naked ASCII So I see no technical reason why these characters should be disallowed. And I see no non-technical reason why we should disallow characters such as
|
Thanks! I suppose this is another issue where it would be great to get input from @markusicu @macchiati. |
There's are good points. Markus, see any good reason to disallow, given that the result has to be NFC? |
I am not vested in these three characters, or possible future ones with this behavior. Clearly the UTS46 rule is based on their Decomposition_Mapping, but UTS46 does use NFC compositions, and there are no compositions with other combining marks that could block these. Who decides on these things? Consensus of browser makers? For a formal request to change this, please use https://www.unicode.org/reporting.html --> UTC / Report Error in Publication/Data |
Thanks, I'll file feedback as well as for #543 in time for Unicode's April meeting. In my experience of trying to make IDNA interoperable over the past decade browsers have not been super opinionated on ToASCII. (Now ToUnicode is another matter, but that algorithm isn't directly exposed.) As long as we err on the side of compatibility, i.e., making hosts resolve, I think it should work out. And apparently the IETF hasn't been opinionated enough either as according to a comment in that other issue they gave up on standardizing the details of client behavior with IDNA2008. So I'm very thankful we have UTS46. |
Tentative feedback (not submitted yet):
|
Sounds reasonable to me; what do you think, Markus?
…On Mon, Jan 16, 2023 at 5:46 AM Anne van Kesteren ***@***.***> wrote:
Tentative feedback (not submitted yet):
Please change U+2260 (≠), U+226E (≮), and U+226F (≯) from
disallowed_STD3_valid to valid.
These code points are not decomposed so they can never conflict with =, <,
and >. And they are not inherently more confusing than any of the other
allowed code points, which include hieroglyphics and emoji. These code
points also work as-is in all browser engines (while < and > are
forbidden) and on balance preference ought to be given to retaining
compatibility so end users are not prevented from visiting websites or
seeing subresources that might use these code points in their domain for
one reason or another.
For further background and discussion please see
#733.
Thank you!
—
Reply to this email directly, view it on GitHub
<#733 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMFB5PODQ24EIGPNHW3WSVGKZANCNFSM6AAAAAATZOHJVI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
tentative feedback lgtm |
This has already been fixed in UTS 46 15.1.0, see https://www.unicode.org/reports/tr46/tr46-31.html#Modifications |
I guess we were already testing this? If so, agreed. |
Yes, there are tests for these characters, but we test with |
That seems correct, no? |
Yes, the tests are correct. |
From https://www.unicode.org/reports/tr46/#UseSTD3ASCIIRules:
We allow
=
, but<
and>
are forbidden. All of the three non-ASCII code points listed above work fine in WebKit and I personally might not see the problem as strongly as UTS46 does. I added tests for them in web-platform-tests/wpt#37907. (The tests reflect the status quo.)Thoughts?
cc @karwa @ricea @achristensen07 @valenting
The text was updated successfully, but these errors were encountered: