Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fr] More than 9 capture groups? #369

Open
Eugenekoh12 opened this issue Oct 4, 2023 · 8 comments
Open

[fr] More than 9 capture groups? #369

Eugenekoh12 opened this issue Oct 4, 2023 · 8 comments

Comments

@Eugenekoh12
Copy link

Is it possible to have more than 9 capture groups and use them in the resulting replaced url?
I tried to use $10 and above and it results in the first capture group results and a 0 behind it so I'm not sure if I'm just not getting something. Sorry if this is a silly question.

@Gitoffthelawn
Copy link
Collaborator

It's not a silly question at all!

In fact, I seem to remember that Einar and I once discussed it here on GitHub, or I meant to discuss it with him (I no longer remember, but you can search and refresh my memory if you feel like it!).

Going through all my Redirector rules, I see that I wrote a rule that goes all the way up to the 9th regex capture group, and would benefit form more capture groups. That likely means that I did not find a way to go beyond 9 capture groups in Redirector (at least when I wrote that rule, and I think I wrote that rule about 2 years ago).

@Rhys-T
Copy link

Rhys-T commented Oct 30, 2023

It looks like this was previously reported as #210, and fixed in #212 (by reversing the loop, so that $10 would get replaced before $1 did). There just hasn't been a release since that fix was added.

@Gitoffthelawn
Copy link
Collaborator

@Rhys-T Thank you. I've added it to my list to include in the next release.

@Rhys-T
Copy link

Rhys-T commented Nov 13, 2023

@Eugenekoh12 Here are a couple of workarounds you might be able to use for now, until that fix gets released:

  • You may already know this, but if there are groups that you're just using to e.g. make something optional, and not to extract values, you should be able to make them non-capturing groups (so they won't count toward the limit) by starting them with (?: instead of just (.
  • You might also be able to process the URL in multiple steps: Make one rule that redirects to a deliberately invalid URL starting with e.g. https://[ff00::]/redirector-temp/1, with groups 1-8 processed into whatever form you need for the final URL, and the rest of the original URL matched as group 9 and passed through as-is. Then make a second rule that matches that URL, processes the remaining pieces, and redirects to the actual destination. Note that there's currently a hard-coded limit on the number of redirect rules the extension will apply to a request - see [fr] Need limited recursive redirection #152.
    Edit: Apparently this doesn't actually work. See below.

I'm not sure what you're doing that involves that many capturing groups, so I'm not sure how to give concrete examples of how to apply either of those approaches.

Footnotes

  1. That [ff00::] is a multicast IPv6 address. Since HTTP can't ever be multicast, that makes sure that no actual network traffic will be generated for that fake URL. The NoScript extension uses similar fake URLs at that address for some of its internal messaging.

@Gitoffthelawn
Copy link
Collaborator

@Rhys-T The first point you wrote definitely works, as it is included in the regex standard we use. It also has the benefit of using less memory, as less data needs to be stored internally.

But regarding the second point, are you sure that works? I definitely think we should make it work, but I'm pretty sure Redirector doesn't currently process its own output. I'm pressed for time right now, but there is the issue you found (#152), and perhaps another one on the topic.

@Rhys-T
Copy link

Rhys-T commented Nov 13, 2023

…huh. I could have sworn that was working. I have a rule set up to redirect Medium articles to https://scribe.rip, and another rule (earlier in the list) that fixes the result up when it ends up at https://scribe.rip/m/global-identity-2?redirectUrl=…, since scribe.rip doesn't handle that kind of URL1, and that combination has been working as far as I can tell. However, I just did a very simple pair of rules redirecting https://a.example/*https://b.example/*https://c.example/*, and if I enter an a.example URL, it only redirects to b, not c, regardless of what order I put them in. Maybe in the Scribe case, there was a redirect being done by the server in between the two being done by Redirector that made it work? I apologize for the misinformation. I should have checked first, but I couldn't come up with an example with lots of groups like that, and didn't think to test the basic mechanism since I thought I had already seen it work.

Footnotes

  1. Or didn't at the time - I haven't checked in a while.

@Gitoffthelawn
Copy link
Collaborator

@Rhys-T No problem at all. Historically, I encountered a similar situation, but I couldn't reproduce it either. If you ever are able to generate a situation that shows both of us to be mistaken, please educate us both on how you do it. :) I say that smiling due to the (hopefully) obvious humour, but I'm actually quite interested.

@Gitoffthelawn Gitoffthelawn changed the title More than 9 capture groups? [fr] More than 9 capture groups? Oct 25, 2024
@Gitoffthelawn
Copy link
Collaborator

See also #302

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants