id	summary	reporter	owner	description	type	status	milestone	component	version	severity	resolution	keywords	cc
9827	Missing support for some code page(e.g 949, 950) in windows conversion with std backend	hucaonju@…	Artyom Beilis	"There is a table windows_encoding all_windows_encodings[] in wconv_codepage.ipp. It contains several code page definitions. However, it misses some code pages, such as the Korean code page(949) or Traditional Chinese Big5 code page(950), which will cause an invalid_charset_error when running in that windows for the following code:

{{{
// Assuming we are using the std backend so it supports ansi encodings
boost::locale::generator gen;
gen.use_ansi_encoding(true);

std::locale loc(gen(""""));
// Throws invalid_charset_error when running in Korean windows but OK in English windows.
// The charset is ""windows949"" in Korean windows, which is not in the table.
std::string us = boost::locale::conv::to_utf<char>(""abcdefg"", loc);
}}}

The root cause of this exception is that the generated code page string is not in the table. When the locale generator with std backend in windows platform generates a locale, it calls boost::locale::util::get_system_locale(bool use_utf8). This function will use the following code to generate the locale string(in default_locale.cpp):
{{{
if(GetLocaleInfoA(LOCALE_USER_DEFAULT,LOCALE_IDEFAULTANSICODEPAGE,buf,sizeof(buf))!=0) {
    if(atoi(buf)==0)
        lc_name+="".UTF-8"";
    else {
        lc_name +="".windows-"";
        lc_name +=buf;
    }
}
}}}
So the encoding part of the lc_name is windows-(code page). In a system with Korean(949) or Traditional Chinese(950) code page, this will generate an encoding string like ""windows-949"" or ""windows-950"". However, when wconv_from_utf::open() initializes, it tries to search ""windows949"" or ""windows950"" in array all_windows_encodings[]. Obviously it will not find the string, and the open() fails, then the exception is thrown.

For a quick fix, I suggest adding the missing code page to the table:
{{{
{ ""cp949"",      949, 0 }, // Korean
{ ""uhc"",        949, 0 }, // From ""iconv -l""
{ ""windows949"",         949, 0 }, // Korean
// ""big5"" already in the table
{ ""windows950"",         950, 0 }, // TC, big5
}}}

However the list may not be complete, and we may encounter problems when running in a system with code page that does not exist in the list. So we may probably add the following code to function int encoding_to_windows_codepage(char const *ccharset) in wconv_codepage.ipp:

{{{
--- E:\Build1\boost_1_55_0\libs\locale\src\encoding\wconv_codepage.ipp	2014-04-02 16:34:52.000000000 +0800
+++ E:\Build2\boost_1_55_0\libs\locale\src\encoding\wconv_codepage.ipp	2014-04-02 17:31:37.000000000 +0800
@@ -206,12 +206,18 @@
                 return ptr->codepage;
             }
             else {
                 return -1;
             }
         }
+        if(ptr==end && charset.size()>7 && charset.substr(0,7)==""windows"") {
+            int cp = atoi(charset.substr(7).c_str());
+            if(IsValidCodePage(cp)) {
+                return cp;
+            }
+        }
         return -1;
         
     }
 
     template<typename CharType>
     bool validate_utf16(CharType const *str,unsigned len)
}}}

This piece of code directly parses and validates the encoding string. The concern is that the call to IsValidCodePage may decrease the performance(not tested)."	Bugs	assigned	To Be Determined	locale	Boost 1.55.0	Problem		locale,code page,Korean,Traditional Chinese,exception