Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ std::regex syntax incompatibility #841

Closed
lemire opened this issue Jan 15, 2025 · 4 comments
Closed

C++ std::regex syntax incompatibility #841

lemire opened this issue Jan 15, 2025 · 4 comments

Comments

@lemire
Copy link
Member

lemire commented Jan 15, 2025

std::regex in C++ supports a modified ECMAScript syntax which differs from the actual ECMAScript regex syntax, causing compatibility issues with JavaScript-like patterns, notably in URL pattern matching.

This problem arises in the tests of following PR: Node.js Pull Request #56452

The implementation of std::regex rejects certain regex constructs that are valid in JavaScript ECMAScript syntax:

Example Pattern: "/([\\d&&[0-1]])"

Reproducible Example:

{
  "pattern": [{ "pathname": "/([\\d&&[0-1]])" }],
  "inputs": [{ "pathname": "/0" }],
  "expected_match": {
    "pathname": { "input": "/0", "groups": { "0": "0" } }
  }
}

C++ Code to Simulate Failure:

#include <iostream>
#include <regex>
#include "ada.h" // Assuming ada is a library for URL parsing

int main() {
    ada::url_pattern_init init{.pathname = "/([\\d&&[0-1]])"};
    auto url_pattern = ada::parse_url_pattern(init, nullptr, nullptr);
    if (!url_pattern) {
        std::cout << "URL pattern parsing failure" << std::endl;
    }

    // Simplified regex test
    try {
        std::regex regexPattern("[[0-1]]");
    } catch (const std::regex_error& e) {
        std::cout << "Regex construction error: " << e.what() << std::endl;
    }

    return 0;
}

The regex is somewhat confusing: [[0-1] is a character class made of three characters. Followed by ].

JavaScript handles this by interpreting the final ] as a literal character as if it does not close a valid character class.

C++ rejects the regex.

To achieve the intended regex behavior in C++, the pattern needs to be adjusted:

std::regex regexPattern("[[0-1]\\]");

This discrepancy can lead to failures when C++ and JavaScript regex compatibility is assumed or required

@lemire lemire changed the title C++ std::regex Syntax Incompatibility C++ std::regex syntax incompatibility Jan 15, 2025
@anonrig
Copy link
Member

anonrig commented Jan 17, 2025

I recommend keeping std::regex solution and on top of that using the regex engine proposal for others to properly support ecmascript regex through the plugin support.

@anonrig
Copy link
Member

anonrig commented Jan 25, 2025

@lemire One possible edge case I've found is related to how std::regex and v8::RegExp differentiates for match results that return empty string. I've opened a PR but I couldn't find a good way to solve this problem only updating std_regexp_provider: #850

I've implemented V8 Regex Provider and it works right now (with 22 tests failing at the moment). I'll be working on fixing those in the upcoming days. Here's the implementation for regex_search: https://github.com/nodejs/node/blob/d489ce104856408f1fc34393a3b8a48765d36833/src/node_url_pattern.cc#L77

This shows that our regex provider is working really good!

PS: Here's the list of URLPattern WPT that are failing on node.js PR at the moment: https://github.com/nodejs/node/blob/d489ce104856408f1fc34393a3b8a48765d36833/test/wpt/status/urlpattern.json

@lemire
Copy link
Member Author

lemire commented Jan 25, 2025

Youhou!

@anonrig
Copy link
Member

anonrig commented Jan 27, 2025

Node.js PR now passes all valid WPT. We can close this issue. I'm planning on putting std_regex_provider behind "ADA_USE_UNSAFE_STD_REGEX_PROVIDER" flag in a follow up pull-request.

@anonrig anonrig closed this as completed Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants