Boost C++ Libraries: Ticket #9435: Erroneous character set conversions of strings with more than int32 bytes https://svn.boost.org/trac10/ticket/9435 <p> To internationalize our software, we use Boost.Locale together with ICU for character set conversions. During our tests we found out that it is not possible to convert strings with more than int32_t bytes because icu::<a class="missing wiki">UnicodeString</a>, which is used by the functions boost::locale::conv::to_utf and boost::locale::conv::from_utf to perform character set conversions, is limited to strings with a size of at most int32_t bytes. Because Boost.Locale does not check if the size of the given string exceeds those limit, the behavior of the functions boost::locale::conv::to_utf and boost::locale::conv::from_utf is undefined for big strings. </p> <p> PS: We already contact the ICU support mailing list. They told us that the UText API (<a class="ext-link" href="http://icu-project.org/apiref/icu4c/utext_8h.html"><span class="icon">​</span>http://icu-project.org/apiref/icu4c/utext_8h.html</a>) might be able to handle strings with more than int32_t bytes. Another possibility, according to the ICU support mailing list, would be to use the lower-level conversion API of ICU (uconv). </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/9435 Trac 1.4.3 Artyom Beilis Tue, 26 Nov 2013 14:31:07 GMT status changed; resolution set https://svn.boost.org/trac10/ticket/9435#comment:1 https://svn.boost.org/trac10/ticket/9435#comment:1 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">wontfix</span> </li> </ul> <ul><li>This is the limitation of ICU. </li><li>It is bad idea to convert "huge chuncks of text" via to_utf API as it allocates entire text in memory. </li></ul><p> However, you can use std::locale::codecvt facet for stream based conversions that provide integration with io-streams: </p> <p> <a href="http://www.boost.org/doc/libs/1_55_0/libs/locale/doc/html/charset_handling.html#codecvt_codecvt">http://www.boost.org/doc/libs/1_55_0/libs/locale/doc/html/charset_handling.html#codecvt_codecvt</a> </p> <p> Of course it is not as simple as call to_utf or from_utf, however, allocating buffer of more than 2G for string is not good idea either. </p> <p> Closing this bug. </p> Ticket