Opened 10 years ago

Closed 10 years ago

#7744 closed Bugs (fixed)

make_u32regex() performs insufficient UTF-8 validation

Reported by: anonymous Owned by: John Maddock
Milestone: To Be Determined Component: regex
Version: Boost 1.52.0 Severity: Problem
Keywords: Cc:

Description

The program below shows a segfault for regular expression ".*\xf6.*". AFAIK the maximum value allowed as leading byte for 4-byte sequences is 0xF4. I would expect an exception.

Regular expression ".*\xe4.*" is created without exception. However 0xE4 starts a 3-byte character and no trailing bytes are present. I would expect an exception here too.

We use Boost 1.52.0 together with ICU 50.1. The behavior is the same in Linux and Windows.

#include <boost/regex/icu.hpp>

int main(void)
{
    // this line does not throw an exception although this is not valid UTF-8
    boost::u32regex(boost::make_u32regex(".*\xe4.*"));
    // this line segfaults
    boost::u32regex(boost::make_u32regex(".*\xf6.*"));
    return 0;
}

Change History (1)

comment:1 by John Maddock, 10 years ago

Resolution: fixed
Status: newclosed

Fixed in Trunk rev #81614

Note: See TracTickets for help on using tickets.