Opened 15 years ago
Last modified 12 years ago
#1836 assigned Bugs
bug in serializing wide character strings
Reported by: | Owned by: | Robert Ramey | |
---|---|---|---|
Milestone: | To Be Determined | Component: | serialization |
Version: | Boost 1.34.1 | Severity: | Problem |
Keywords: | wstring wchar_t | Cc: | jeff@…, Sohail Somani |
Description
We've discovered an issue Boost has writing and reading wide character strings (wchar_t* and std::wstrings) to non-wide character file streams (std::ifstream and std::ofstream). The issue stems from the fact that wide characters are written and read as a sequence of characters (in text_oarchive_impl.ipp and text_iarchive_impl.ipp, respectively). For text streams, an EOF character terminates the reading of a file on Windows. Some wide characters have EOF (value = 26 decimal) as one of the bytes so reading that byte causes early termination of the read. We have worked around the issue by deriving our own input and output archives from text_i|oarchive_impl<Archive> and overriding load_override() and save_override for std::wstring and wchar_t*. Our implementation just sequences through the wide characters and writes them 1 by 1 as wchar_t to the archive. This isn't very elegant and is even less readable in the file than the current implementation but does resolve the problem.
Although the test test_simple_class does test wstrings, it only uses characters 'a'-'z' which does not expose this problem.
Change History (10)
follow-up: 2 comment:1 by , 14 years ago
comment:2 by , 14 years ago
Replying to Ostap Kutsyy <ostapkl@gmail.com>:
Why don't you use text_wi|oarchive? This was designed for wide characters and strings.
Frankly, I've never seen those in the documentation. Now that I look, there they are... in the "Archive Concepts" and "Implementation Notes" sections. The wide character classes are not even part of the "Text Archive Class Diagram".
I've asked the developer working on this if this will solve our problem. I'll follow up after he looks into it.
Thanks for the help!
Jeff
comment:3 by , 14 years ago
Ostap,
Using text_w?archive does address our problem. Thanks for the help. However, I still consider this a bug. Attempting to serialize a wstring should fail to compile, in the same way that "cout << wstring();" fails to compile. As it is currently, it fails at runtime in frustratingly subtle ways.
Jeff
comment:4 by , 14 years ago
Status: | new → assigned |
---|
follow-up: 7 comment:6 by , 13 years ago
Cc: | added |
---|
Robert, do you have any thoughts on this? My thoughts are that narrow archives should definitely not support wide character streams for this reason (among others...)
If you agree, I have a patch in the works for this issue.
comment:7 by , 13 years ago
Replying to sohail:
Robert, do you have any thoughts on this? My thoughts are that narrow archives should definitely not support wide character streams for this reason (among others...)
Should not support wide characters...
follow-up: 9 comment:8 by , 13 years ago
I don't think that data types should be coupled to archives.
That is, any data that can be serialized to one kind of archive should be serializable to ALL kinds of archives. That is why std::wstring must be serializable into a text or xml_archive. The real fix is to make adjustments so that all characters are rendered. This is not trivial to do without a big performance hit. So its and open issue for now.
comment:9 by , 12 years ago
Milestone: | → To Be Determined |
---|
Replying to ramey:
This is not trivial to do without a big performance hit. So its and open issue for now.
Do you have a suggestion as to how this should be addressed? For example, should serializing a std::wstring to a text archive yield an appropriately encoded (maybe Base64) string, then when read back be appropriately decoded? How will this work on binary archives and in other user-provided archives (like the ones Boost.MPI provides)?
comment:10 by , 12 years ago
speaking from memory when I last looked at this, I concluded that what was needed was an escape mechanism. This would render some characters as \134 or something like that. This would be implemented in the "stack" of iterator adaptors which are used to save/load the string. This would entail character by character processing which I was thinking would be a performance hit - which it would be. BUT, now I realize that serialization of a wstring to a char archive is not a common operation, so performance really isn't an issue. The way to address this is to look at the documentation for "dataflow" iterators and the related code. It wouldn't be too hard to craft another "escape" layer and insert into the iterator stack which handles this. Feel free to take this on.
Robert Ramey
Why don't you use text_wi|oarchive? This was designed for wide characters and strings.