Opened 10 years ago
Closed 10 years ago
#7743 closed Bugs (fixed)
utf_traits::decode does not check for correct UTF-8 trailing bytes
| Reported by: | Owned by: | Artyom Beilis | |
|---|---|---|---|
| Milestone: | To Be Determined | Component: | locale | 
| Version: | Boost 1.52.0 | Severity: | Showstopper | 
| Keywords: | Cc: | 
Description
This program shows the erroneous behavior. An exception should be thrown since 0xdf is the start byte of a 2-byte sequence. However it is followed by a 1-byte character.
Boost.Locale does not throw an exception.
#include <iostream> #include <stdexcept> #include <boost/locale/utf.hpp>
template<typename T> std::basic_string<T> checkUtf(const std::basic_string<T>& p_str) {
const std::string encodingType = sizeof(T) == 1 ? "UTF-8" : (sizeof(T) == 2 ? "UTF-16" : "UTF-32"); typename std::basic_string<T>::const_iterator it = p_str.begin(); while (it != p_str.end()) {
const boost::locale::utf::code_point cp = boost::locale::utf::utf_traits<T>::decode(it, p_str.end()); if (cp == boost::locale::utf::illegal)
throw std::runtime_error("Source string contains illegal " + encodingType + " byte sequences");
else if (cp == boost::locale::utf::incomplete)
throw std::runtime_error("Source string contains imcomplete " + encodingType + " byte sequences");
} return p_str;
}
int main(void) {
try {
checkUtf("A"+std::string(1,0xdf)+"A"); return 0;
} catch (const std::exception& e) {
std::cout << e.what() << std::endl;
} return 1;
}
Attachments (1)
Change History (3)
by , 10 years ago
| Attachment: | utf.hpp.patch added | 
|---|
comment:1 by , 10 years ago
| Severity: | Problem → Showstopper | 
|---|---|
| Status: | new → assigned | 
Yes, you are 100% right.
I'll apply the patch ASAP
comment:2 by , 10 years ago
| Resolution: | → fixed | 
|---|---|
| Status: | assigned → closed | 
Fixed in trunk in changeset [81590].


Patch that adds the necessary checks