Boost C++ Libraries: Ticket #1836: bug in serializing wide character strings https://svn.boost.org/trac10/ticket/1836 <p> We've discovered an issue Boost has writing and reading wide character strings (wchar_t* and std::wstrings) to non-wide character file streams (std::ifstream and std::ofstream). The issue stems from the fact that wide characters are written and read as a sequence of characters (in text_oarchive_impl.ipp and text_iarchive_impl.ipp, respectively). For text streams, an EOF character terminates the reading of a file on Windows. Some wide characters have EOF (value = 26 decimal) as one of the bytes so reading that byte causes early termination of the read. We have worked around the issue by deriving our own input and output archives from text_i|oarchive_impl&lt;Archive&gt; and overriding load_override() and save_override for std::wstring and wchar_t*. Our implementation just sequences through the wide characters and writes them 1 by 1 as wchar_t to the archive. This isn't very elegant and is even less readable in the file than the current implementation but does resolve the problem. </p> <p> Although the test test_simple_class does test wstrings, it only uses characters 'a'-'z' which does not expose this problem. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/1836 Trac 1.4.3 Ostap Kutsyy <ostapkl@…> Fri, 23 May 2008 12:40:13 GMT <link>https://svn.boost.org/trac10/ticket/1836#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1836#comment:1</guid> <description> <p> Why don't you use text_<strong>w</strong>i|oarchive? This was designed for wide characters and strings. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>jefffaust</dc:creator> <pubDate>Fri, 23 May 2008 14:23:32 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/1836#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1836#comment:2</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/1836#comment:1" title="Comment 1">Ostap Kutsyy &lt;ostapkl@gmail.com&gt;</a>: </p> <blockquote class="citation"> <p> Why don't you use text_<strong>w</strong>i|oarchive? This was designed for wide characters and strings. </p> </blockquote> <p> Frankly, I've never seen those in the documentation. Now that I look, there they are... in the "Archive Concepts" and "Implementation Notes" sections. The wide character classes are not even part of the "Text Archive Class Diagram". </p> <p> I've asked the developer working on this if this will solve our problem. I'll follow up after he looks into it. </p> <p> Thanks for the help! </p> <p> Jeff </p> </description> <category>Ticket</category> </item> <item> <dc:creator>jefffaust</dc:creator> <pubDate>Fri, 23 May 2008 17:48:30 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/1836#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1836#comment:3</guid> <description> <p> Ostap, </p> <p> Using text_w?archive does address our problem. Thanks for the help. However, I still consider this a bug. Attempting to serialize a wstring should fail to compile, in the same way that "cout &lt;&lt; wstring();" fails to compile. As it is currently, it fails at runtime in frustratingly subtle ways. </p> <p> Jeff </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Robert Ramey</dc:creator> <pubDate>Mon, 09 Jun 2008 19:16:14 GMT</pubDate> <title>status changed https://svn.boost.org/trac10/ticket/1836#comment:4 https://svn.boost.org/trac10/ticket/1836#comment:4 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> Ticket Mon, 03 Nov 2008 14:20:34 GMT milestone deleted https://svn.boost.org/trac10/ticket/1836#comment:5 https://svn.boost.org/trac10/ticket/1836#comment:5 <ul> <li><strong>milestone</strong> <span class="trac-field-deleted">Boost 1.35.1</span> </li> </ul> <p> Milestone Boost 1.35.1 deleted </p> Ticket Sohail Somani Sat, 30 May 2009 04:02:39 GMT cc changed https://svn.boost.org/trac10/ticket/1836#comment:6 https://svn.boost.org/trac10/ticket/1836#comment:6 <ul> <li><strong>cc</strong> <span class="trac-author">Sohail Somani</span> added </li> </ul> <p> Robert, do you have any thoughts on this? My thoughts are that narrow archives should definitely not support wide character streams for this reason (among others...) </p> <p> If you agree, I have a patch in the works for this issue. </p> Ticket Sohail Somani Sat, 30 May 2009 04:03:12 GMT <link>https://svn.boost.org/trac10/ticket/1836#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1836#comment:7</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/1836#comment:6" title="Comment 6">sohail</a>: </p> <blockquote class="citation"> <p> Robert, do you have any thoughts on this? My thoughts are that narrow archives should definitely not support wide character streams for this reason (among others...) </p> </blockquote> <p> Should not support wide characters... </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Robert Ramey</dc:creator> <pubDate>Sat, 30 May 2009 04:59:13 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/1836#comment:8 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1836#comment:8</guid> <description> <p> I don't think that data types should be coupled to archives. </p> <p> That is, any data that can be serialized to one kind of archive should be serializable to ALL kinds of archives. That is why std::wstring must be serializable into a text or xml_archive. The real fix is to make adjustments so that all characters are rendered. This is not trivial to do without a big performance hit. So its and open issue for now. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Dean Michael Berris</dc:creator> <pubDate>Mon, 29 Nov 2010 02:41:35 GMT</pubDate> <title>milestone set https://svn.boost.org/trac10/ticket/1836#comment:9 https://svn.boost.org/trac10/ticket/1836#comment:9 <ul> <li><strong>milestone</strong> → <span class="trac-field-new">To Be Determined</span> </li> </ul> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/1836#comment:8" title="Comment 8">ramey</a>: </p> <blockquote class="citation"> <p> This is not trivial to do without a big performance hit. So its and open issue for now. </p> </blockquote> <p> Do you have a suggestion as to how this should be addressed? For example, should serializing a std::wstring to a text archive yield an appropriately encoded (maybe Base64) string, then when read back be appropriately decoded? How will this work on binary archives and in other user-provided archives (like the ones Boost.MPI provides)? </p> Ticket Robert Ramey Mon, 29 Nov 2010 20:32:54 GMT <link>https://svn.boost.org/trac10/ticket/1836#comment:10 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1836#comment:10</guid> <description> <p> speaking from memory when I last looked at this, I concluded that what was needed was an escape mechanism. This would render some characters as \134 or something like that. This would be implemented in the "stack" of iterator adaptors which are used to save/load the string. This would entail character by character processing which I was thinking would be a performance hit - which it would be. BUT, now I realize that serialization of a wstring to a char archive is not a common operation, so performance really isn't an issue. The way to address this is to look at the documentation for "dataflow" iterators and the related code. It wouldn't be too hard to craft another "escape" layer and insert into the iterator stack which handles this. Feel free to take this on. </p> <p> Robert Ramey </p> </description> <category>Ticket</category> </item> </channel> </rss>