id summary reporter owner description type status milestone component version severity resolution keywords cc 9827 Missing support for some code page(e.g 949, 950) in windows conversion with std backend hucaonju@… Artyom Beilis "There is a table windows_encoding all_windows_encodings[] in wconv_codepage.ipp. It contains several code page definitions. However, it misses some code pages, such as the Korean code page(949) or Traditional Chinese Big5 code page(950), which will cause an invalid_charset_error when running in that windows for the following code: {{{ // Assuming we are using the std backend so it supports ansi encodings boost::locale::generator gen; gen.use_ansi_encoding(true); std::locale loc(gen("""")); // Throws invalid_charset_error when running in Korean windows but OK in English windows. // The charset is ""windows949"" in Korean windows, which is not in the table. std::string us = boost::locale::conv::to_utf(""abcdefg"", loc); }}} The root cause of this exception is that the generated code page string is not in the table. When the locale generator with std backend in windows platform generates a locale, it calls boost::locale::util::get_system_locale(bool use_utf8). This function will use the following code to generate the locale string(in default_locale.cpp): {{{ if(GetLocaleInfoA(LOCALE_USER_DEFAULT,LOCALE_IDEFAULTANSICODEPAGE,buf,sizeof(buf))!=0) { if(atoi(buf)==0) lc_name+="".UTF-8""; else { lc_name +="".windows-""; lc_name +=buf; } } }}} So the encoding part of the lc_name is windows-(code page). In a system with Korean(949) or Traditional Chinese(950) code page, this will generate an encoding string like ""windows-949"" or ""windows-950"". However, when wconv_from_utf::open() initializes, it tries to search ""windows949"" or ""windows950"" in array all_windows_encodings[]. Obviously it will not find the string, and the open() fails, then the exception is thrown. For a quick fix, I suggest adding the missing code page to the table: {{{ { ""cp949"", 949, 0 }, // Korean { ""uhc"", 949, 0 }, // From ""iconv -l"" { ""windows949"", 949, 0 }, // Korean // ""big5"" already in the table { ""windows950"", 950, 0 }, // TC, big5 }}} However the list may not be complete, and we may encounter problems when running in a system with code page that does not exist in the list. So we may probably add the following code to function int encoding_to_windows_codepage(char const *ccharset) in wconv_codepage.ipp: {{{ --- E:\Build1\boost_1_55_0\libs\locale\src\encoding\wconv_codepage.ipp 2014-04-02 16:34:52.000000000 +0800 +++ E:\Build2\boost_1_55_0\libs\locale\src\encoding\wconv_codepage.ipp 2014-04-02 17:31:37.000000000 +0800 @@ -206,12 +206,18 @@ return ptr->codepage; } else { return -1; } } + if(ptr==end && charset.size()>7 && charset.substr(0,7)==""windows"") { + int cp = atoi(charset.substr(7).c_str()); + if(IsValidCodePage(cp)) { + return cp; + } + } return -1; } template bool validate_utf16(CharType const *str,unsigned len) }}} This piece of code directly parses and validates the encoding string. The concern is that the call to IsValidCodePage may decrease the performance(not tested)." Bugs assigned To Be Determined locale Boost 1.55.0 Problem locale,code page,Korean,Traditional Chinese,exception