Boost C++ Libraries: Ticket #9827: Missing support for some code page(e.g 949, 950) in windows conversion with std backend

Boost C++ Libraries: Ticket #9827: Missing support for some code page(e.g 949, 950) in windows conversion with std backend https://svn.boost.org/trac10/ticket/9827 <p> There is a table windows_encoding all_windows_encodings[] in wconv_codepage.ipp. It contains several code page definitions. However, it misses some code pages, such as the Korean code page(949) or Traditional Chinese Big5 code page(950), which will cause an invalid_charset_error when running in that windows for the following code: </p> <pre class="wiki">// Assuming we are using the std backend so it supports ansi encodings boost::locale::generator gen; gen.use_ansi_encoding(true); std::locale loc(gen("")); // Throws invalid_charset_error when running in Korean windows but OK in English windows. // The charset is "windows949" in Korean windows, which is not in the table. std::string us = boost::locale::conv::to_utf<char>("abcdefg", loc); </pre><p> The root cause of this exception is that the generated code page string is not in the table. When the locale generator with std backend in windows platform generates a locale, it calls boost::locale::util::get_system_locale(bool use_utf8). This function will use the following code to generate the locale string(in default_locale.cpp): </p> <pre class="wiki">if(GetLocaleInfoA(LOCALE_USER_DEFAULT,LOCALE_IDEFAULTANSICODEPAGE,buf,sizeof(buf))!=0) { if(atoi(buf)==0) lc_name+=".UTF-8"; else { lc_name +=".windows-"; lc_name +=buf; } } </pre><p> So the encoding part of the lc_name is windows-(code page). In a system with Korean(949) or Traditional Chinese(950) code page, this will generate an encoding string like "windows-949" or "windows-950". However, when wconv_from_utf::open() initializes, it tries to search "windows949" or "windows950" in array all_windows_encodings[]. Obviously it will not find the string, and the open() fails, then the exception is thrown. </p> <p> For a quick fix, I suggest adding the missing code page to the table: </p> <pre class="wiki">{ "cp949", 949, 0 }, // Korean { "uhc", 949, 0 }, // From "iconv -l" { "windows949", 949, 0 }, // Korean // "big5" already in the table { "windows950", 950, 0 }, // TC, big5 </pre><p> However the list may not be complete, and we may encounter problems when running in a system with code page that does not exist in the list. So we may probably add the following code to function int encoding_to_windows_codepage(char const *ccharset) in wconv_codepage.ipp: </p> <pre class="wiki">--- E:\Build1\boost_1_55_0\libs\locale\src\encoding\wconv_codepage.ipp 2014-04-02 16:34:52.000000000 +0800 +++ E:\Build2\boost_1_55_0\libs\locale\src\encoding\wconv_codepage.ipp 2014-04-02 17:31:37.000000000 +0800 @@ -206,12 +206,18 @@ return ptr->codepage; } else { return -1; } } + if(ptr==end && charset.size()>7 && charset.substr(0,7)=="windows") { + int cp = atoi(charset.substr(7).c_str()); + if(IsValidCodePage(cp)) { + return cp; + } + } return -1; } template<typename CharType> bool validate_utf16(CharType const *str,unsigned len) </pre><p> This piece of code directly parses and validates the encoding string. The concern is that the call to <a class="missing wiki">IsValidCodePage</a> may decrease the performance(not tested). </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/9827 Trac 1.4.3 Artyom Beilis Thu, 13 Jul 2017 14:51:24 GMT status changed https://svn.boost.org/trac10/ticket/9827#comment:1 https://svn.boost.org/trac10/ticket/9827#comment:1 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> Ticket