Opened 15 years ago

Last modified 12 years ago

#1836 assigned Bugs

bug in serializing wide character strings

Reported by: Jeff Faust <jeff@…> Owned by: Robert Ramey
Milestone: To Be Determined Component: serialization
Version: Boost 1.34.1 Severity: Problem
Keywords: wstring wchar_t Cc: jeff@…, Sohail Somani

Description

We've discovered an issue Boost has writing and reading wide character strings (wchar_t* and std::wstrings) to non-wide character file streams (std::ifstream and std::ofstream). The issue stems from the fact that wide characters are written and read as a sequence of characters (in text_oarchive_impl.ipp and text_iarchive_impl.ipp, respectively). For text streams, an EOF character terminates the reading of a file on Windows. Some wide characters have EOF (value = 26 decimal) as one of the bytes so reading that byte causes early termination of the read. We have worked around the issue by deriving our own input and output archives from text_i|oarchive_impl<Archive> and overriding load_override() and save_override for std::wstring and wchar_t*. Our implementation just sequences through the wide characters and writes them 1 by 1 as wchar_t to the archive. This isn't very elegant and is even less readable in the file than the current implementation but does resolve the problem.

Although the test test_simple_class does test wstrings, it only uses characters 'a'-'z' which does not expose this problem.

Change History (10)

comment:1 by Ostap Kutsyy <ostapkl@…>, 14 years ago

Why don't you use text_wi|oarchive? This was designed for wide characters and strings.

in reply to:  1 comment:2 by jefffaust, 14 years ago

Replying to Ostap Kutsyy <ostapkl@gmail.com>:

Why don't you use text_wi|oarchive? This was designed for wide characters and strings.

Frankly, I've never seen those in the documentation. Now that I look, there they are... in the "Archive Concepts" and "Implementation Notes" sections. The wide character classes are not even part of the "Text Archive Class Diagram".

I've asked the developer working on this if this will solve our problem. I'll follow up after he looks into it.

Thanks for the help!

Jeff

comment:3 by jefffaust, 14 years ago

Ostap,

Using text_w?archive does address our problem. Thanks for the help. However, I still consider this a bug. Attempting to serialize a wstring should fail to compile, in the same way that "cout << wstring();" fails to compile. As it is currently, it fails at runtime in frustratingly subtle ways.

Jeff

comment:4 by Robert Ramey, 14 years ago

Status: newassigned

comment:5 by (none), 14 years ago

Milestone: Boost 1.35.1

Milestone Boost 1.35.1 deleted

comment:6 by Sohail Somani, 13 years ago

Cc: Sohail Somani added

Robert, do you have any thoughts on this? My thoughts are that narrow archives should definitely not support wide character streams for this reason (among others...)

If you agree, I have a patch in the works for this issue.

in reply to:  6 comment:7 by Sohail Somani, 13 years ago

Replying to sohail:

Robert, do you have any thoughts on this? My thoughts are that narrow archives should definitely not support wide character streams for this reason (among others...)

Should not support wide characters...

comment:8 by Robert Ramey, 13 years ago

I don't think that data types should be coupled to archives.

That is, any data that can be serialized to one kind of archive should be serializable to ALL kinds of archives. That is why std::wstring must be serializable into a text or xml_archive. The real fix is to make adjustments so that all characters are rendered. This is not trivial to do without a big performance hit. So its and open issue for now.

in reply to:  8 comment:9 by Dean Michael Berris, 12 years ago

Milestone: To Be Determined

Replying to ramey:

This is not trivial to do without a big performance hit. So its and open issue for now.

Do you have a suggestion as to how this should be addressed? For example, should serializing a std::wstring to a text archive yield an appropriately encoded (maybe Base64) string, then when read back be appropriately decoded? How will this work on binary archives and in other user-provided archives (like the ones Boost.MPI provides)?

comment:10 by Robert Ramey, 12 years ago

speaking from memory when I last looked at this, I concluded that what was needed was an escape mechanism. This would render some characters as \134 or something like that. This would be implemented in the "stack" of iterator adaptors which are used to save/load the string. This would entail character by character processing which I was thinking would be a performance hit - which it would be. BUT, now I realize that serialization of a wstring to a char archive is not a common operation, so performance really isn't an issue. The way to address this is to look at the documentation for "dataflow" iterators and the related code. It wouldn't be too hard to craft another "escape" layer and insert into the iterator stack which handles this. Feel free to take this on.

Robert Ramey

Note: See TracTickets for help on using tickets.