-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add URLPatternList
#166
base: main
Are you sure you want to change the base?
Add URLPatternList
#166
Conversation
This commit adds support for a URLPatternList class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this!
spec.bs
Outdated
<xmp class="idl"> | ||
[Exposed=(Window,Worker)] | ||
interface URLPatternList { | ||
constructor(sequence<URLPatternListEntry> entries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of the feedback in #30 suggested it would be nice to be able to dynamically mutate the list. So add new entries to the list and possibly delete entries from the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I didn't want to add this right away until we figure out the internal sort order a little better.
Also:
I doubt the API shape here will really dictate whether a radix tree implementation is possible or not. Instead I expect the path-to-regexp style pattern matching will impose some constraints which make a pure radix solution difficult. That does not mean optimizations will not be possible, though. I think there are clear optimizations a list object like this can enable; e.g. log(n) searching of a fixed string prefix in the patterns. I have also theorized it might be possible to construct an optimization where the patterns are deconstructed into automata similar to regex compilation. Then something akin to a radix tree could be built from the automata. We could then evaluate the automata against the input in O(n) time where n is the input length. |
Yeah, that makes total sense. My first pass was going to just deal with speeding up fixed text prefix matches (actually also for regular
I was also thinking about that today! :) I was actually planning to do exactly this for a v2 of the What do you think about including this intermediate format in the spec? Doing that would allow us to make sure future spec additions / changes don't break non-regexp matcher algorithms. The spec would still only specify the naive regexp based matcher, but it'd be much easier for implementations to implement alternative faster matchers based on this intermediate format. |
I don't think we should put any intermediate representation in the spec as it then makes future optimizations more difficult in the implementation. I think we want the flexibility to change the intermediate representation in the future. Also, none of this stuff should be observable to sites. |
Also, when this is further along it would be nice to include a "mini explainer" in the pull request description. Something like whatwg/dom#1032. And submit to TAG review. Besides helping to get feedback on the proposal this would also make it easier for chromium to adopt the API more quickly. |
@wanderview I simplified the proposed API. There is no more const patterns: Array<[URLPattern, string]>;
const keys = new Map(patterns);
const list = new URLPatternList(patterns.map(r => r[0]));
const res = list.exec(...)
if (res !== null) console.log("Matched", map.get(res.pattern)); |
I think @domenic had previously advocated the map approach to me. If this is ergonomic enough for developers, that works for me. |
I have added a mini explainer to the PR description. |
@lucacasonato What is the status of this? How did your prototyping in deno go? I'm thinking of prototyping this in chromium using re2's set implementation. |
I was also wondering if we should rename this to |
I filed an intent to prototype for chromium: https://groups.google.com/a/chromium.org/g/blink-dev/c/QrPrveVyFnA/m/PkaPKfJ4AwAJ |
@domenic Do you think we should include Edit: Note, the URLPatternSet will be ordered, but not insertion order. It will instead be sorted based on a spec defined algorithm. |
Also, I wonder if the Map oriented approach to associating data with entries will make it hard to have a constructor that takes dictionaries instead of full URLPatterns. Not sure if the dictionaries shortcut is worth optimizing for at the cost of added complexity elsewhere. |
I've always been -1 on the map-oriented approach, myself. Regarding setlike APIs, I think the main question is whether you want people to introspect and possibly mutate these things after creation. The spec PR here gives an immutable, non-introspectable object, and that seems to accomplish the main use cases. On the other hand, add introspection APIs from setlike (entries, forEach, has, keys, size, values) doesn't seem to hurt. And adding add, clear, and delete also doesn't seem that bad. Note that if you're not using insertion order, you'll need to override the implementation of add(), in order to put the given value in the right place in the set entries. |
Really? I thought you suggested it to me before. I must have been confused. Should we use a maplike API instead? While read-only is all we technically need, web developers have said making it mutable would be nice. Maybe starting read-only would be safest and add mutators later if needed. (For example I'm unsure if putting a duplicate entry in should overwrite or fail.) |
I'm sorry, I misunderstood what you meant. I was -1 on changing the API of URLPatternList to be map-oriented. I am +1 on developers using maps separately as a side table. |
Ok, I recall you were positive on passing in an array of init dictionaries to the constructor. But that doesn't seem to work with the external map approach. Does that seem ok to you? |
To clarify, if you pass in an init dictionary and the constructor makes the URLPattern for you, then you don't have the pattern as a key in your Map. |
Yeah, that seems OK to me. I think people who want to associate data that way can use explicit URLPattern objects. |
}; | ||
|
||
dictionary URLPatternListResult : URLPatternResult { | ||
URLPattern pattern; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how it works internally but shouldn't it contain the match result as well? Since the point is to pick a pattern to match and it's likely doing some work behind the scenes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does — the : URLPatternResult
part means this dictionary extends URLPatternResult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaah i didn’t see it. (Not familiar with this spec language 😂)
<div algorithm> | ||
The <dfn constructor for=URLPatternList lt="URLPatternList(patterns)">new URLPatternList(|patterns|)</dfn> constructor steps are: | ||
|
||
1. [=list/For each=] |pattern| in |patterns|, run the following steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this imply that the pattern list is traversed as an array and is potentially slower? #30 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iteration steps here and elsewhere don’t appear to be observable, so engines can potentially optimize them however they see fit, right? Spec steps usually aim for the simplest and clearest explanation of observable behaviors, not the most efficient ones (because observable behaviors are the only things they can actually specify, further complexity is noise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation!
Does this only do stateless longest matching? Should it be possible to resume pattern matching from a specified pattern ( |
Closes #30
Mini-explainer:
This adds a new
URLPatternList
interface that is can be used to match manyURLPattern
s at once. This new interface can be used to easily create URL routers with a large number of patterns, without compromising on performance.An example of how a user could use this API:
The main motivation for this addition is the performance improvements this can unlock compared to a naive matching algorithm based on the existing
URLPattern
interface. Existing user-land router implementation usingURLPattern
slow down linearly with the addition of patterns.URLPatternList
is designed to be able to deliver sub linear performance characteristics for routers of any size.TODOs
Preview | Diff