Opened 9 years ago
Closed 9 years ago
#9473 closed Bugs (fixed)
make_u32regex() accepts illegal UTF-8
| Reported by: | Owned by: | John Maddock | |
|---|---|---|---|
| Milestone: | To Be Determined | Component: | regex | 
| Version: | Boost 1.54.0 | Severity: | Problem | 
| Keywords: | Cc: | 
Description
The attached example shows that make_u32regex() accepts two kinds of illegal UTF-8.
It accepts codepoints reserved for UTF-16 surrogate pairs encoded as 3-byte UTF-8 characters, e.g. "\xed\xa0\x80" representing U+D800.
It accepts overlong UTF-8 encodings where the codepoint value has been extended to the left with additional zero bits, e.g. "\xc0\x80" representing U+0000 whereas its correct 1-byte encoding is "\x00".
Boost.Locale already contains code to protect against overlong encodings (see method width() in https://svn.boost.org/svn/boost/trunk/boost/locale/utf.hpp).
Attachments (1)
Change History (2)
by , 9 years ago
comment:1 by , 9 years ago
| Resolution: | → fixed | 
|---|---|
| Status: | new → closed | 


Fixed in Git develop.