-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ std::regex syntax incompatibility #841
Comments
I recommend keeping std::regex solution and on top of that using the regex engine proposal for others to properly support ecmascript regex through the plugin support. |
@lemire One possible edge case I've found is related to how std::regex and v8::RegExp differentiates for match results that return empty string. I've opened a PR but I couldn't find a good way to solve this problem only updating std_regexp_provider: #850 I've implemented V8 Regex Provider and it works right now (with 22 tests failing at the moment). I'll be working on fixing those in the upcoming days. Here's the implementation for regex_search: https://github.com/nodejs/node/blob/d489ce104856408f1fc34393a3b8a48765d36833/src/node_url_pattern.cc#L77 This shows that our regex provider is working really good! PS: Here's the list of URLPattern WPT that are failing on node.js PR at the moment: https://github.com/nodejs/node/blob/d489ce104856408f1fc34393a3b8a48765d36833/test/wpt/status/urlpattern.json |
Youhou! |
Node.js PR now passes all valid WPT. We can close this issue. I'm planning on putting std_regex_provider behind "ADA_USE_UNSAFE_STD_REGEX_PROVIDER" flag in a follow up pull-request. |
std::regex in C++ supports a modified ECMAScript syntax which differs from the actual ECMAScript regex syntax, causing compatibility issues with JavaScript-like patterns, notably in URL pattern matching.
This problem arises in the tests of following PR: Node.js Pull Request #56452
The implementation of std::regex rejects certain regex constructs that are valid in JavaScript ECMAScript syntax:
Example Pattern:
"/([\\d&&[0-1]])"
Reproducible Example:
C++ Code to Simulate Failure:
The regex is somewhat confusing:
[[0-1]
is a character class made of three characters. Followed by]
.JavaScript handles this by interpreting the final ] as a literal character as if it does not close a valid character class.
C++ rejects the regex.
To achieve the intended regex behavior in C++, the pattern needs to be adjusted:
This discrepancy can lead to failures when C++ and JavaScript regex compatibility is assumed or required
The text was updated successfully, but these errors were encountered: