Opened 6 years ago
Last modified 5 years ago
#12959 new Bugs
Regex class negation
Reported by: | Owned by: | John Maddock | |
---|---|---|---|
Milestone: | To Be Determined | Component: | regex |
Version: | Boost 1.61.0 | Severity: | Showstopper |
Keywords: | Cc: |
Description
Pertains to boost::regex Tested on version 1.61
Flags: Perl
Target string: abc092efg
Regex: [^\W\D]+
Function: regex_search
Matches: abc092efg
Should match: 092
Notes
Negative class resolution: 'Not-Not Word' AND 'Not-Not Digit'
The intersection of word AND digits is digits.
Every other regex engine does this correctly.
This includes Perl, PCRE, JS, C++11, Python, etc..
In this engine, [^\W\D]
matches what [\w\d]
does.
[^\W\D]
appears not to be an intersection as the operator in
a negative class is AND.
Fwiw - this behavior is seen with all negated shorthand elements of a
negative class, i.e. [^\S\W]
matches all whitespace OR all word char's.
Change History (3)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
Sorry, were some typo's in my last comment. Result, as is actually used now:
<boost\regex\v4\basic_regex_creator.hpp>
if ( m_negate ) { // if it's not already there, add it .. if ( false == (std::find(m_NegNeg_Class.begin(), m_NegNeg_Class.end(), m) != m_NegNeg_Class.end()) ) m_NegNeg_Class.insert( m_NegNeg_Class.end(), (unsigned __int64)m ); } else m_negated_classes |= m; m_empty = false;
<boost\regex\v4\perl_matcher.hpp>
// try and match a single character from the neg-neg classes if ( set_->cNegNegClasses ) //&& set_->isnot ) { for(i = 0; i < set_->cNegNegClasses; i++) { uint64_t mask = *((uint64_t*)p); if(traits_inst.isctype(col, (mask_type)mask) == false) return set_->isnot ? next : ++next; p += (sizeof(uint64_t) / sizeof(charT)); } }
A solution could be to keep a vector of individual class
instead of a mask for all classes cnclasses
Only in a negated class, and only negative classes need to be tracked.
The rest remains unchanged.
Something like this works (tested):
<boost\regex\v4\basic_regex_creator.hpp>
<boost\regex\v4\perl_matcher.hpp>