Boost C++ Libraries: Ticket #1849: Deserialization of std::string overwrites non-copied contents. https://svn.boost.org/trac10/ticket/1849 <p> When a string is initialized with another string it doesn't copy the other strings contents but refers to it: </p> <pre class="wiki"> const std::string Original( "Hello" ); std::string Foostring = Original; // Lazy copy. </pre><p> When that copied string is deserialized, it overwrites the original string's contents. Instead it should always allocate new memory for a string when deserializing. </p> <p> One workaround is to initialize the string with the c_str() of the original string to enforce a copy: </p> <pre class="wiki"> std::string Foostring = Original.c_str(); // Really copy the string. </pre><p> The example was tested on Ubuntu 8.04 with gcc 4.2.3. </p> <hr /> <p> Code: </p> <pre class="wiki">#include &lt;string&gt; #include &lt;iostream&gt; #include &lt;sstream&gt; #include &lt;boost/archive/binary_oarchive.hpp&gt; #include &lt;boost/archive/binary_iarchive.hpp&gt; #include &lt;boost/serialization/base_object.hpp&gt; #include &lt;boost/serialization/utility.hpp&gt; #include &lt;boost/serialization/vector.hpp&gt; void Test1() { const std::string Original( "Hello" ); std::string Foostring = Original; // Lazy copy. std::string Barstring( "BarST" ); std::cout &lt;&lt; "Original:" &lt;&lt; Original &lt;&lt; " Foostring:" &lt;&lt; Foostring &lt;&lt; " Barstring:" &lt;&lt; Barstring &lt;&lt; std::endl; std::ostringstream OS(std::ios::binary); boost::archive::binary_oarchive OA( OS ); OA &lt;&lt; Barstring; std::istringstream IS( OS.str(), std::ios::binary); boost::archive::binary_iarchive IA( IS ); IA &gt;&gt; Foostring; std::cout &lt;&lt; "Original:" &lt;&lt; Original &lt;&lt; " Foostring:" &lt;&lt; Foostring &lt;&lt; " Barstring:" &lt;&lt; Barstring &lt;&lt; std::endl; } void Test2() { const std::string Original( "Hello" ); std::string Foostring = Original.c_str(); // Really copy the string. std::string Barstring( "BarST" ); std::cout &lt;&lt; "Original:" &lt;&lt; Original &lt;&lt; " Foostring:" &lt;&lt; Foostring &lt;&lt; " Barstring:" &lt;&lt; Barstring &lt;&lt; std::endl; std::ostringstream OS(std::ios::binary); boost::archive::binary_oarchive OA( OS ); OA &lt;&lt; Barstring; std::istringstream IS( OS.str(), std::ios::binary); boost::archive::binary_iarchive IA( IS ); IA &gt;&gt; Foostring; std::cout &lt;&lt; "Original:" &lt;&lt; Original &lt;&lt; " Foostring:" &lt;&lt; Foostring &lt;&lt; " Barstring:" &lt;&lt; Barstring &lt;&lt; std::endl; } int main() { std::cout &lt;&lt; "Test1()" &lt;&lt; std::endl; Test1(); std::cout &lt;&lt; "Test2()" &lt;&lt; std::endl; Test2(); } </pre><hr /> <p> Output: </p> <pre class="wiki">Test1() Original:Hello Foostring:Hello Barstring:BarST Original:BarST Foostring:BarST Barstring:BarST Test2() Original:Hello Foostring:Hello Barstring:BarST Original:Hello Foostring:BarST Barstring:BarST </pre> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/1849 Trac 1.4.3 Robert Ramey Sun, 08 Jun 2008 04:41:04 GMT <link>https://svn.boost.org/trac10/ticket/1849#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1849#comment:1</guid> <description> <p> add </p> <p> #include &lt;boost/serialization/string.hpp&gt; </p> <p> and run your test again. </p> <p> Robert Ramey </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Robert Ramey</dc:creator> <pubDate>Sun, 08 Jun 2008 04:41:17 GMT</pubDate> <title>status changed https://svn.boost.org/trac10/ticket/1849#comment:2 https://svn.boost.org/trac10/ticket/1849#comment:2 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> Ticket dev@… Mon, 09 Jun 2008 10:11:51 GMT <link>https://svn.boost.org/trac10/ticket/1849#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1849#comment:3</guid> <description> <p> Added the string.hpp and got the same result. </p> <p> Using g++ version 4.3.1 also got the same result. (don't know if the 4.3.1 stdlib from gcc-snapshot is automatically used when switching the compiler in kdevelop) </p> <p> Adding a character to "BarST" -&gt; "BarSTx" leads to _no_ overwriting. Removing a character from "BarST" -&gt; "BarS" leads to _no_ overwriting. </p> <p> So this problem only exists when the _lengths of the strings match_ like they did in my program when the bug first occured to me. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Robert Ramey</dc:creator> <pubDate>Mon, 09 Jun 2008 19:20:40 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/1849#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1849#comment:4</guid> <description> <p> I presume that "to _no_ overwriting" refers to an exception. </p> <p> This looks very much like a bug in the standard library implementation being used. I'll have look at the serialization string implemention. I wouldn't want to have to create a new string on all platforms just to work around this. Keep me posted if you find a good (portable?) fix for this. </p> <p> Robert Ramey </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Mon, 09 Jun 2008 21:35:34 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/1849#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1849#comment:5</guid> <description> <p> No, there was no exception thrown or anything. </p> <p> It must have to do something with the way the deserialization writes into that string. </p> <p> If the lengths match, it doesn't allocate new memory and also doesn't check if the memory referred to belongs to another object and should be copied on write. </p> <p> Also because the original string was a const, the constness must have been cast away somewhere. </p> <p> I don't understand this with the actual implementation because of that in string.hpp: </p> <pre class="wiki">... BOOST_CLASS_IMPLEMENTATION(std::string, boost::serialization::primitive_type) #ifndef BOOST_NO_STD_WSTRING BOOST_CLASS_IMPLEMENTATION(std::wstring, boost::serialization::primitive_type) #endif // left over from a previous incarnation - strings are now always primitive types #if 0 ... old implementation </pre><p> as far as i came in the code, the old implementation does the reasonable thing (call .clear(), .reserve() and .push_back()). </p> <p> But i don't understand where the loading with the new implementation is actually happening. </p> <p> What does that "primitive_type" mean when loading an object? Maybe a string sometimes ain't as primitive as expected. </p> <p> The obvious fix would be to use the old implementation again, but i don't know if that raises other problems. </p> <p> Maybe the problem is in the gnu-stl, maybe someone else can reconfirm it with other compilers/platforms because i only have access to the gnu compilers and the error occurs on linux and mingw. But let alone that it happens on multiple versions of gnu there should be great interest of this bug to go away. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Robert Ramey</dc:creator> <pubDate>Tue, 10 Jun 2008 01:50:49 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/1849#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1849#comment:6</guid> <description> <p> as far as serialization is concerned std::string is primitive - that is its implemented as code intrinsic to the library - look at the documentation regarding serialization traits. The implementation has been move to text_iarchive_impl.ipp </p> <pre class="wiki">template&lt;class Archive&gt; BOOST_ARCHIVE_DECL(void) text_iarchive_impl&lt;Archive&gt;::load(std::string &amp;s) { std::size_t size; * this-&gt;This() &gt;&gt; size; // skip separating space is.get(); // borland de-allocator fixup #if BOOST_WORKAROUND(_RWSTD_VER, BOOST_TESTED_AT(20101)) if(NULL != s.data()) #endif s.resize(size); if(0 &lt; size) is.read(&amp;(*s.begin()), size); } </pre><p> We ARE breaking the rules here. I think we had a discussion regarding this point some time ago. The concensus was that this hack was worth the risk to improve performance. Performance is a BIG issue - but of course the program has to work - I'll look into what to do about this. My first thought is another hack that it clear out the string if the sizes are the same. </p> <p> Robert Ramey </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Tue, 10 Jun 2008 04:46:52 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/1849#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1849#comment:7</guid> <description> <p> You're pointing to the right location. </p> <p> Obviously your code is correct with: </p> <pre class="wiki">is.read(&amp;(*s.begin()), size); </pre><p> The call to .begin() causes the string to copy and .data() changes locations. </p> <p> But in 1.35.0 release this line is executed. </p> <pre class="wiki">is.read(const_cast&lt;char *&gt;(s.data()), size); </pre><p> .data() obviously doesn't copy the string and there's also the bad cast. </p> <p> So the issue is fixed in the next release. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Robert Ramey</dc:creator> <pubDate>Tue, 10 Jun 2008 05:23:23 GMT</pubDate> <title>status, milestone changed; resolution set https://svn.boost.org/trac10/ticket/1849#comment:8 https://svn.boost.org/trac10/ticket/1849#comment:8 <ul> <li><strong>status</strong> <span class="trac-field-old">assigned</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> <li><strong>milestone</strong> <span class="trac-field-old">Boost 1.35.1</span> → <span class="trac-field-new">Boost 1.36.0</span> </li> </ul> Ticket