Opened 13 years ago
Closed 8 years ago
#3332 closed Bugs (worksforme)
boost::filesystem::path will get trobule in locale Chinese_Taiwan.950 (windows)
Reported by: | Owned by: | Beman Dawes | |
---|---|---|---|
Milestone: | Boost 1.40.0 | Component: | filesystem |
Version: | Boost Development Trunk | Severity: | Problem |
Keywords: | encoding cp950 Big5 0x5c | Cc: |
Description
My test enviroment is win xp with the default locale Chinese_Taiwan.950. CodePage 950 which extend the Big5 encoding system is created by ms and use in Taiwan mostly. the cause is the cp950 using double-bytes to assemble a word, but some byte contains \ (0x5c) that also is escape char in c/cpp language or the file path separator in ms os. some information about the Big5 encoding system: http://en.wikipedia.org/wiki/Big5
the attachment is some fix in path.hpp but some cases are lost. It just check the '\' is a real path separator or a part of Big5 char.
Attachments (2)
Change History (7)
by , 13 years ago
comment:1 by , 13 years ago
Component: | None → filesystem |
---|---|
Owner: | set to |
at code line 22 can be deleted. this is over paste
comment:2 by , 13 years ago
In test-code-and-data.zip contains: test.cpp and test_folder. the test.cpp use recursive_directory_iterator to travsal test_folder. but the recursive_directory_iterator cannot travsal it completely because of the wrong path converted.
test_folder contains this structure:
test_folder\ 功/ foo.txt 功能總覽/ a.txt b.txt 另一個資料夾/ a.txt b.txt
the path name contains 「功」 is a chinese character which means 'function'. it last byte is 0x5c ('
'), when the path try to convert the '
' to '' will make this word broken and the semi-word have the chance to become antoher chinese character with the first byte in 「功」. that is reason why we got problem in cp950 at win os.
the test app output is:
C:\demo-room-workspace\native.impl>test test_folder\功 [directory] boost::filesystem::basic_directory_iterator constructor: 系統找不到指定的路徑。: "test_folder\功功功能總覽" 0.00 s
the message said: system cannot found the path "test_folder\功功功能總覽". when path object try to invoke remove_filename() it will calculate the last of '\' but 「功」 also contains '\' so that remove_filename() got test_folder\功 (half of 功) not test_folder.
then, path add next path '\' will make the half of 功 become the whole Big5 char 「功」. the path will be "test_folder\功" not "test_folder\"
comment:3 by , 13 years ago
Status: | new → assigned |
---|
comment:5 by , 8 years ago
Resolution: | → worksforme |
---|---|
Status: | assigned → closed |
Sorry for the 5 year delay in closing this.
The problem does not reproduce with current versions of boost.filesystem. The path has already been converted to UTF-16 by the time operations begin, so the C5 character in cp950 is immaterial.
Here is an updated test program, using the codepage 950 codecvt facet that ships with recent versions of VC++:
#include <boost/filesystem.hpp> #include <cvt/cp950> #include <iostream> #include <string> #include <locale> namespace fs = boost::filesystem; int main(void) { std::locale global_loc = std::locale(); std::locale loc(global_loc, new stdext::cvt::codecvt_cp950<wchar_t>); fs::path::imbue(loc); std::cout << "HEADS UP! PIPE OUTPUT TO FILE AND INSPECT WITH HEX OR CP950 EDITOR.\n" "WINDOWS COMMAND PROMPT FONTS DON'T SUPPORT CHINESE,\n" "EVEN WITH CODEPAGE SET AND EVEN AS OF WIN 10 TECH PREVIEW." << std::endl; fs::recursive_directory_iterator end; fs::recursive_directory_iterator iter ("C:/boost/modular/develop/libs/filesystem/test/issues/3332/test_folder"); while (iter != end) { if (fs::is_directory(*iter)) { std::cout << "[directory] " << iter->path().generic_string() << std::endl; } else if (fs::is_regular(*iter)) { std::cout << " [file] " << iter->path().generic_string() << std::endl; } ++iter; } return 0; }
A hex dump of the output shows that it does correctly handle the Big5 characters.
I've also tested a UTF-8 version of the above, and checked those with a UTF-8 aware text editor.
Thanks,
--Beman
try to fix the problem but not really work.