Opened 9 years ago

Closed 9 years ago

#9473 closed Bugs (fixed)

make_u32regex() accepts illegal UTF-8

Reported by: Peter Klotz <peter.klotz@…> Owned by: John Maddock
Milestone: To Be Determined Component: regex
Version: Boost 1.54.0 Severity: Problem
Keywords: Cc:

Description

The attached example shows that make_u32regex() accepts two kinds of illegal UTF-8.

It accepts codepoints reserved for UTF-16 surrogate pairs encoded as 3-byte UTF-8 characters, e.g. "\xed\xa0\x80" representing U+D800.

It accepts overlong UTF-8 encodings where the codepoint value has been extended to the left with additional zero bits, e.g. "\xc0\x80" representing U+0000 whereas its correct 1-byte encoding is "\x00".

Boost.Locale already contains code to protect against overlong encodings (see method width() in https://svn.boost.org/svn/boost/trunk/boost/locale/utf.hpp).

Attachments (1)

main.cpp (1.6 KB ) - added by Peter Klotz <peter.klotz@…> 9 years ago.

Download all attachments as: .zip

Change History (2)

by Peter Klotz <peter.klotz@…>, 9 years ago

Attachment: main.cpp added

comment:1 by John Maddock, 9 years ago

Resolution: fixed
Status: newclosed

Fixed in Git develop.

Note: See TracTickets for help on using tickets.