Opened 10 years ago
Closed 10 years ago
#7744 closed Bugs (fixed)
make_u32regex() performs insufficient UTF-8 validation
| Reported by: | anonymous | Owned by: | John Maddock |
|---|---|---|---|
| Milestone: | To Be Determined | Component: | regex |
| Version: | Boost 1.52.0 | Severity: | Problem |
| Keywords: | Cc: |
Description
The program below shows a segfault for regular expression ".*\xf6.*". AFAIK the maximum value allowed as leading byte for 4-byte sequences is 0xF4. I would expect an exception.
Regular expression ".*\xe4.*" is created without exception. However 0xE4 starts a 3-byte character and no trailing bytes are present. I would expect an exception here too.
We use Boost 1.52.0 together with ICU 50.1. The behavior is the same in Linux and Windows.
#include <boost/regex/icu.hpp>
int main(void)
{
// this line does not throw an exception although this is not valid UTF-8
boost::u32regex(boost::make_u32regex(".*\xe4.*"));
// this line segfaults
boost::u32regex(boost::make_u32regex(".*\xf6.*"));
return 0;
}
Note:
See TracTickets
for help on using tickets.

Fixed in Trunk rev #81614