Boost C++ Libraries: Ticket #3332: boost::filesystem::path will get trobule in locale Chinese_Taiwan.950 (windows) https://svn.boost.org/trac10/ticket/3332 <p> My test enviroment is win xp with the default locale Chinese_Taiwan.950. <a class="missing wiki">CodePage</a> 950 which extend the Big5 encoding system is created by ms and use in Taiwan mostly. the cause is the cp950 using double-bytes to assemble a word, but some byte contains \ (0x5c) that also is escape char in c/cpp language or the file path separator in ms os. some information about the Big5 encoding system: <a class="ext-link" href="http://en.wikipedia.org/wiki/Big5"><span class="icon">​</span>http://en.wikipedia.org/wiki/Big5</a> </p> <p> the attachment is some fix in path.hpp but some cases are lost. It just check the '\' is a real path separator or a part of Big5 char. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/3332 Trac 1.4.3 Ching Yi, Chan <chingyichan.tw@…> Tue, 11 Aug 2009 09:30:39 GMT attachment set https://svn.boost.org/trac10/ticket/3332 https://svn.boost.org/trac10/ticket/3332 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">code.txt</span> </li> </ul> <p> try to fix the problem but not really work. </p> Ticket Ching Yi, Chan <chingyichan.tw@…> Tue, 11 Aug 2009 09:52:41 GMT component changed; owner set https://svn.boost.org/trac10/ticket/3332#comment:1 https://svn.boost.org/trac10/ticket/3332#comment:1 <ul> <li><strong>owner</strong> set to <span class="trac-author">Beman Dawes</span> </li> <li><strong>component</strong> <span class="trac-field-old">None</span> → <span class="trac-field-new">filesystem</span> </li> </ul> <p> at code line 22 can be deleted. this is over paste </p> Ticket Ching Yi, Chan <chingyichan.tw@…> Tue, 11 Aug 2009 10:13:31 GMT attachment set https://svn.boost.org/trac10/ticket/3332 https://svn.boost.org/trac10/ticket/3332 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">test-code-and-data.zip</span> </li> </ul> <p> a minial test app and test folder </p> Ticket Ching Yi, Chan <chingyichan.tw@…> Tue, 11 Aug 2009 10:32:00 GMT <link>https://svn.boost.org/trac10/ticket/3332#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3332#comment:2</guid> <description> <p> In test-code-and-data.zip contains: test.cpp and test_folder. the test.cpp use recursive_directory_iterator to travsal test_folder. but the recursive_directory_iterator cannot travsal it completely because of the wrong path converted. </p> <p> test_folder contains this structure: </p> <pre class="wiki">test_folder\ 功/ foo.txt 功能總覽/ a.txt b.txt 另一個資料夾/ a.txt b.txt </pre><p> the path name contains 「功」 is a chinese character which means 'function'. it last byte is 0x5c ('<br />'), when the path try to convert the '<br />' to '<em>' will make this word broken and the semi-word have the chance to become antoher chinese character with the first byte in 「功」. that is reason why we got problem in cp950 at win os. </em></p> <p> the test app output is: </p> <pre class="wiki">C:\demo-room-workspace\native.impl&gt;test test_folder\功 [directory] boost::filesystem::basic_directory_iterator constructor: 系統找不到指定的路徑。: "test_folder\功功功能總覽" 0.00 s </pre><p> the message said: system cannot found the path "test_folder\功功功能總覽". when path object try to invoke remove_filename() it will calculate the last of '\' but 「功」 also contains '\' so that remove_filename() got <strong>test_folder\功 (half of 功)</strong> not <strong>test_folder</strong>. </p> <p> then, path add next path '\' will make the half of 功 become the whole Big5 char 「功」. the path will be "test_folder\功" not "test_folder\" </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Beman Dawes</dc:creator> <pubDate>Tue, 18 Aug 2009 01:57:51 GMT</pubDate> <title>status changed https://svn.boost.org/trac10/ticket/3332#comment:3 https://svn.boost.org/trac10/ticket/3332#comment:3 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> Ticket nemg2004@… Fri, 16 Mar 2012 02:25:37 GMT <link>https://svn.boost.org/trac10/ticket/3332#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3332#comment:4</guid> <description> <p> 您好,請問現在的filesystem可以支持中文路徑了嗎? </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Beman Dawes</dc:creator> <pubDate>Wed, 24 Dec 2014 13:22:00 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/3332#comment:5 https://svn.boost.org/trac10/ticket/3332#comment:5 <ul> <li><strong>status</strong> <span class="trac-field-old">assigned</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">worksforme</span> </li> </ul> <p> Sorry for the 5 year delay in closing this. </p> <p> The problem does not reproduce with current versions of boost.filesystem. The path has already been converted to UTF-16 by the time operations begin, so the C5 character in cp950 is immaterial. </p> <p> Here is an updated test program, using the codepage 950 codecvt facet that ships with recent versions of VC++: </p> <pre class="wiki">#include &lt;boost/filesystem.hpp&gt; #include &lt;cvt/cp950&gt; #include &lt;iostream&gt; #include &lt;string&gt; #include &lt;locale&gt; namespace fs = boost::filesystem; int main(void) { std::locale global_loc = std::locale(); std::locale loc(global_loc, new stdext::cvt::codecvt_cp950&lt;wchar_t&gt;); fs::path::imbue(loc); std::cout &lt;&lt; "HEADS UP! PIPE OUTPUT TO FILE AND INSPECT WITH HEX OR CP950 EDITOR.\n" "WINDOWS COMMAND PROMPT FONTS DON'T SUPPORT CHINESE,\n" "EVEN WITH CODEPAGE SET AND EVEN AS OF WIN 10 TECH PREVIEW." &lt;&lt; std::endl; fs::recursive_directory_iterator end; fs::recursive_directory_iterator iter ("C:/boost/modular/develop/libs/filesystem/test/issues/3332/test_folder"); while (iter != end) { if (fs::is_directory(*iter)) { std::cout &lt;&lt; "[directory] " &lt;&lt; iter-&gt;path().generic_string() &lt;&lt; std::endl; } else if (fs::is_regular(*iter)) { std::cout &lt;&lt; " [file] " &lt;&lt; iter-&gt;path().generic_string() &lt;&lt; std::endl; } ++iter; } return 0; } </pre><p> A hex dump of the output shows that it does correctly handle the Big5 characters. </p> <p> I've also tested a UTF-8 version of the above, and checked those with a UTF-8 aware text editor. </p> <p> Thanks, </p> <p> --Beman </p> Ticket