-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<regex>
: Fix character range bounds in case-insensitive regexes
#5164
<regex>
: Fix character range bounds in case-insensitive regexes
#5164
Conversation
if (static_cast<typename _RxTraits::_Uelem>(_Val) < static_cast<typename _RxTraits::_Uelem>(_Chr1)) { | ||
if (static_cast<typename _RxTraits::_Uelem>(_Chr2) < static_cast<typename _RxTraits::_Uelem>(_Chr1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is still a (pre-existing) bug in this error check when the collate
flag is set: The bounds should be transformed by _Traits.transform()
before performing the comparison.
However, the matcher fails to do this transform()
dance as well, so I think this should rather be fixed for both matcher and parser in a coordinated fashion in a follow-up PR. (But I will adjust the PR if you rather prefer to fix this check completely now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow-up PR will be great.
if (static_cast<typename _RxTraits::_Uelem>(_Val) < static_cast<typename _RxTraits::_Uelem>(_Chr1)) { | ||
if (static_cast<typename _RxTraits::_Uelem>(_Chr2) < static_cast<typename _RxTraits::_Uelem>(_Chr1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow-up PR will be great.
Thanks - I really appreciate the detailed explanations in your PRs! 🐱 |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Thanks! 😻+ |
This PR deals with three related problems for character ranges in case-insensitive mode:
_Builder::_Add_range()
casts the character bounds tounsigned int
. As a consequence, characters with negative numeric values are not added to the bitmap, but rather to the_Large
list of characters. This means that these characters are not found during matching. (Note the suspiciously different casts in the else branch for the case-sensitive case.)[Z-a]
(=[z-a]
).collate
andicase
flags are set, there is an unnecessary call to translate the bounds by_Traits.translate()
first before passing them to_Traits.translate_nocase()
. The standard says in [re.grammar]/14.1 and 14.2 that it is sufficient to calltranslate_nocase()
only. (See also_Builder::_Add_char()
, which already follows the Standard in this regard.)The PR moves the entire character translation into the parser so that empty ranges can be reliably diagnosed there in case-insensitive mode as well. It also fixes the unsigned cast and removes the unnecessary
translate()
call.The test deliberately does not use any manual signed/unsigned casts, but leaves all of these casts to
char_traits
to avoid getting the casts similarly wrong in<regex>
and the test.