Opened 9 years ago
Closed 9 years ago
#9473 closed Bugs (fixed)
make_u32regex() accepts illegal UTF-8
Reported by: | Owned by: | John Maddock | |
---|---|---|---|
Milestone: | To Be Determined | Component: | regex |
Version: | Boost 1.54.0 | Severity: | Problem |
Keywords: | Cc: |
Description
The attached example shows that make_u32regex() accepts two kinds of illegal UTF-8.
It accepts codepoints reserved for UTF-16 surrogate pairs encoded as 3-byte UTF-8 characters, e.g. "\xed\xa0\x80" representing U+D800.
It accepts overlong UTF-8 encodings where the codepoint value has been extended to the left with additional zero bits, e.g. "\xc0\x80" representing U+0000 whereas its correct 1-byte encoding is "\x00".
Boost.Locale already contains code to protect against overlong encodings (see method width() in https://svn.boost.org/svn/boost/trunk/boost/locale/utf.hpp).
Attachments (1)
Change History (2)
by , 9 years ago
comment:1 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed in Git develop.