-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opaque Host and domain encoding #220
Comments
After some more digging, most of the fail tests in this issue #206 are related to this opaque host issue. I think the idea was that these test should (at least) fail when checking forbidden domain code point in the host parser. This checking is only done in the URL parser for non opaque host. As for the domain encoding issue, all these cases produce URL encoded hostname: new URLPattern({ hostname : "café.com" }); // hostname pattern string: "caf%C3%A9.com"
new URLPattern({ protocol: "https", hostname : "café.com" }); // hostname pattern string: "caf%C3%A9.com"
new URLPattern("https://café.com" }); // hostname pattern string: "caf%C3%A9.com" The first one seems logic (except if we consider that every non-adress hostname with unknown protocol should be considered domain) but the other two should be IDNA encoded domain name since they are non opaque/special URLs. It seems everything that goes into the pattern compiler is parsed separately, without context, so the hostname is always considered opaque. We can see the right behaviour when the domain is passed in the base URL : new URLPattern("", "https://café.com" }); // hostname pattern string: "xn--caf-dma.com" @jeremyroman Maybe we should pass necessary URL context (the protocol) to the canonicalize a hostname method, for it to be properly interpreted by the URL parser. |
I guess ideally you can get both. |
In fact, it's exactly what canonicalize a port is already doing : passing the protocol as context to normalize default port values. And as pointed out above process pathname for init already fallback on "special URL" processing when protocol is unspecified. So it would only make sense hostname are treated the same ;) |
hi @annevk, I finally tried to look into this. It's pretty simple to fix the hostname parsing for regular processing but there's no easy solution to fix it for pattern processing. The canonicalization functions are passed as callback parameter to the pattern parser. Internally, the pattern parser have no context for what it's currently parsing (scheme, hostname, port...) or what kind of protocol is used for the current URL pattern. So every hostname is always URL encoded. Without adding URL context to the pattern parser, the easiest (but not so elegant) way would be to URL decode the hostname pattern after pattern parsing and if an empty or special scheme re-encoded it with the host parser. Would that be OK ? |
I wonder if @domenic @wanderview or @jeremyroman have some ideas. I don't currently have the bandwidth to look into this. |
@jeremyroman @domenic I found a decent way to fix this issue :
It encodes properly hostname and hostname patterns. If it's OK for you I can make a PR. |
What is the issue with the URL Pattern Standard?
Here's a test case from the WPT test suite, it's expected to succeed (Chrome 123 pass).
The test expect the hostname "café.com" to be converted to "xn--caf-dma.com" and to match the pattern.
But as per the spec, without a special scheme the host is considered an opaque host and not a domain name, therefore it should be percent-encoded to "caf%C3%A9.com" and not IDNA encoded to "xn--caf-dma.com". The pattern won't match.
Is it an oversight in the test suite/Chrome implementation or should hostname always be considered domain name ?
In some case the specs already allows opaque URL to be treated as special (cf. process pathname for init) and default the URL processing to the most common (most expected??) behaviour.
Should it be the case here too ? Or maybe only in some cases... Like forcing the URL to a special scheme when the host is not an address and domain labels are detected ?
The text was updated successfully, but these errors were encountered: