Opened 10 years ago
Closed 10 years ago
#7743 closed Bugs (fixed)
utf_traits::decode does not check for correct UTF-8 trailing bytes
Reported by: | Owned by: | Artyom Beilis | |
---|---|---|---|
Milestone: | To Be Determined | Component: | locale |
Version: | Boost 1.52.0 | Severity: | Showstopper |
Keywords: | Cc: |
Description
This program shows the erroneous behavior. An exception should be thrown since 0xdf is the start byte of a 2-byte sequence. However it is followed by a 1-byte character.
Boost.Locale does not throw an exception.
#include <iostream> #include <stdexcept> #include <boost/locale/utf.hpp>
template<typename T> std::basic_string<T> checkUtf(const std::basic_string<T>& p_str) {
const std::string encodingType = sizeof(T) == 1 ? "UTF-8" : (sizeof(T) == 2 ? "UTF-16" : "UTF-32"); typename std::basic_string<T>::const_iterator it = p_str.begin(); while (it != p_str.end()) {
const boost::locale::utf::code_point cp = boost::locale::utf::utf_traits<T>::decode(it, p_str.end()); if (cp == boost::locale::utf::illegal)
throw std::runtime_error("Source string contains illegal " + encodingType + " byte sequences");
else if (cp == boost::locale::utf::incomplete)
throw std::runtime_error("Source string contains imcomplete " + encodingType + " byte sequences");
} return p_str;
}
int main(void) {
try {
checkUtf("A"+std::string(1,0xdf)+"A"); return 0;
} catch (const std::exception& e) {
std::cout << e.what() << std::endl;
} return 1;
}
Attachments (1)
Change History (3)
by , 10 years ago
Attachment: | utf.hpp.patch added |
---|
comment:1 by , 10 years ago
Severity: | Problem → Showstopper |
---|---|
Status: | new → assigned |
Yes, you are 100% right.
I'll apply the patch ASAP
comment:2 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed in trunk in changeset [81590].
Patch that adds the necessary checks