-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with UTS #46 tests #341
Comments
Thanks @TimothyGu. I don't think there's need to worry just yet, they replied to my May 24 email (about changing processing_option to a boolean) on Aug 11. It might be that they first need to have another meeting to figure out what to do. As for your bug report, if you always applied ToASCII before ToUnicode (which I think is the intent), would you still run into issues? |
I see. Well by this point I just hope I wrote my email address correctly in the feedback form… In the URL Standard we do use ToASCII before ToUnicode. But Unicode tests don't do that, I don't think. |
An update was pushed out. http://www.unicode.org/draft/reports/tr46/tr46.html#Format lists changes to the test format including questions they'd like feedback on. I'm told that if we can get back to them quickly that'd be good. |
I won’t be able to look at this until late next week. |
@TimothyGu any chance you could poke at this again? |
An update is that the new tests look good, and we were able to adopt them for tr46 successfully. On the other hand, the newest version of UTS 46 tests (for Unicode 13) introduced new errors it looks like, but we don't need to track that here. |
Well, let me reopen for a bit, is there a way we could import these tests into web-platform-tests? Happy to track that as a new issue. Just wondering. |
I apologise for the bump, and I'd be happy to open a new issue if that would be better. But I've also encountered an issue with the UTS46 tests. Is anybody successfully using the latest version of the tests? For example, I've been getting error "P1". That corresponds to processing step 1, the mapping table. The specific test that was failing was Looking at the section on mapping, it appears that this specific codepoint is marked
We apparently use the not-normal case, so my implementation is behaving correctly for a URL context, but I don't have the ability to ignore only those errors that are due to this difference. I managed to find the offending code, and sure enough, it is considering code-points disallowed and throwing out error P1's regardless of the value of I was about to file a bug, but this is the first time I'm encountering this algorithm so I'm not entirely sure if I messed up. It all seems to line up and the evidence appears to corroborate the story. But it obviously raises questions about what everybody else is doing to test conformance. Why did nobody else hit this before me? |
I'm not sure. According to https://jsdom.github.io/whatwg-url/#url=aHR0cHM6Ly/iiaAuLw==&base=YWJvdXQ6Ymxhbms= it seems that browsers and the reference implementation agree for that input (it becomes " @macchiati @srl295 @markusicu could you perhaps help us out with these tests or providing guidance on how to best provide feedback on them? Thanks! |
Note that per jsdom/whatwg-url#239 (also by @karwa) it seems we might not be able to trust the reference implementation for this sort of thing 😢. https://jsdom.github.io/whatwg-url/#url=aHR0cDovL3huLS1sczhoPT09Lw==&base=YWJvdXQ6Ymxhbms= (from that issue) is another interesting case, where Chrome/Firefox/reference implementation all differ. |
Hmm, I also found what I believe to be an ambiguity in UTS46, which is causing actual implementation divergence, so I sent a Unicode Error Report for that to be clarified. I think it underscores why integrating these tests in to the WPT is important, and aligns with the overall goal of minimising divergence and promoting interoperability. As we well know, the standards are not perfect, implementations are not perfect, but having everybody use the same test-suite and ensuring that they all run it is a good way to catch imperfections in both the standards and the implementations early. The issues that I'm seeing make me think that perhaps not everybody is running the tests, or they are running older versions, or they have hacks to exclude certain buggy tests, or (and this one can be really subtle) they are testing a different code-path (via a different set of flags) to what is actually used by the URL Standard. Integrating with the WPT would help guard against those issues, and make it simpler even for non-WPT implementations to match web behaviour. Or I'm missing something. Unicode is complex, and when you notice an issue, I find it's either because (a) you're just starting to understand how things work, or (b) because you totally misunderstood/forgot something. It's hard to tell. It would be easier to tell if there were a single, correct way of running the tests, and I could see proof that nobody else has a problem with it. Anyway, here's the report I sent on the (possible) UTS46 ambiguity, for the curious. > Expand: Unicode Error ReportUTS 46 I only just started writing my own implementation of this recently, so apologies if I'm misunderstanding, but there are two locations where code-points are checked. Using the same format as the IdnaTestV2.txt file for describing those locations, they would be P1 and V6.
Here is the text of Section 4.1, Validity Criteria (https://www.unicode.org/reports/tr46/#Validity_Criteria), Step 6:
It is not clear whether these status values are supposed to take the value of UseSTD3ASCIIRules in to account. As described above, if V6 does not consider UseSTD3ASCIIRules, "≠ᢙ≯.com" and "xn--jbf911clb.com" will always be invalid domains. It does not matter that P1 considers UseSTD3ASCIIRules, because it will be caught by V6 later anyway. I'll have to apologise again because I am not very familiar with the codebases I am about to cite, but from what I can glean this is leading to divergence in the wild:
|
Thank you @karwa. Really appreciate the time you are putting into this. |
How do people feel about our own coverage in |
Identified in whatwg/url#341 by karwa.
I created #733 and web-platform-tests/wpt#37907 based on the more recent discussion here. I guess we'll keep this open for people to report more issues they run into with UTS46 tests and then hopefully they get clarified upstream, but if not we'll cover them in WPT somehow. |
Identified in whatwg/url#341 by karwa.
…ases, a=testonly Automatic update from web-platform-tests IDNA: add a couple interesting ToASCII cases Identified in whatwg/url#341 by karwa. -- wpt-commits: 3d997a3ff43a545b3d36e12d55478fb264e6d0df wpt-pr: 37907
I now created a test runner again for |
…ases, a=testonly Automatic update from web-platform-tests IDNA: add a couple interesting ToASCII cases Identified in whatwg/url#341 by karwa. -- wpt-commits: 3d997a3ff43a545b3d36e12d55478fb264e6d0df wpt-pr: 37907
It appears (at a first glance!) that most of the issues are mislabeling of
the reasons for rejection. Is that correct?
…On Fri, Jan 20, 2023 at 3:14 AM Anne van Kesteren ***@***.***> wrote:
I now created a test runner again for IdnaTestV2.txt:
web-platform-tests/wpt#38080
<web-platform-tests/wpt#38080>. I also wrote down
a bunch of feedback there in a separate file I plan to submit to Unicode.
Feedback appreciated!
—
Reply to this email directly, view it on GitHub
<#341 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMEPKY35SURGV3YWFN3WTJXSTANCNFSM4DXBJUXQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yeah, mislabeling and conflicts with the URL parser:
With that a lot of the remaining failures in WebKit are #733. I haven't gone through all of them yet as there are >100 failures still. |
I didn't think you were done yet!
…On Fri, Jan 20, 2023 at 9:31 AM Anne van Kesteren ***@***.***> wrote:
Yeah, mislabeling and conflicts with the URL parser:
- Hosts that end with an ASCII digit label will get parsed as an IPv4
address. And thus fail whereas some of these are expected to not fail.
- Some inputs contain ? which I could perhaps percent-encode first.
With that a lot of the remaining failures in WebKit are #733
<#733>. I haven't gone through all of
them yet as there are >100 failures still.
—
Reply to this email directly, view it on GitHub
<#341 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMD3CWEZD5A4HEXNFKLWTLDXRANCNFSM4DXBJUXQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I analyzed the remaining WebKit failures and they all seemed to be about UseSTD3ASCIIRules in one way or another. So mainly a status annotation problem, coupled with #733. I went ahead and submitted feedback to Unicode on the Per https://unicode.org/timesens/calendar.html it looks like it will be discussed at the end of April so we'll have to wait a bit before we get a resolution here. |
This excludes various tests for now due to the open issues mentioned at the top of IdnaTestV2-parser.py. For whatwg/url#341.
Automatic update from web-platform-tests URL: run a subset of IdnaTestV2.txt in WPT This excludes various tests for now due to the open issues mentioned at the top of IdnaTestV2-parser.py. For whatwg/url#341. -- wpt-commits: 9216115f5621b04a27e0f2e9bbf1ce44dd7d3b9e wpt-pr: 38080
Automatic update from web-platform-tests URL: run a subset of IdnaTestV2.txt in WPT This excludes various tests for now due to the open issues mentioned at the top of IdnaTestV2-parser.py. For whatwg/url#341. -- wpt-commits: 9216115f5621b04a27e0f2e9bbf1ce44dd7d3b9e wpt-pr: 38080
This excludes various tests for now due to the open issues mentioned at the top of IdnaTestV2-parser.py. For whatwg/url#341.
Automatic update from web-platform-tests URL: run a subset of IdnaTestV2.txt in WPT This excludes various tests for now due to the open issues mentioned at the top of IdnaTestV2-parser.py. For whatwg/url#341. -- wpt-commits: 9216115f5621b04a27e0f2e9bbf1ce44dd7d3b9e wpt-pr: 38080
Let's fold this into the aforementioned issues and web-platform-tests/wpt#48301. I don't think there's anything actionable remaining here. |
Wrote a letter to Unicode through their feedback form on July 30 with the details. Haven't heard back despite their promises that "You can expect an acknowledgement of your report within 2-3 business days."
The text was updated successfully, but these errors were encountered: