Opened 15 years ago

Closed 14 years ago

#1849 closed Bugs (fixed)

Deserialization of std::string overwrites non-copied contents.

Reported by: Siegfried Kettlitz <dev@…> Owned by: Robert Ramey
Milestone: Boost 1.36.0 Component: serialization
Version: Boost 1.35.0 Severity: Showstopper
Keywords: Cc:

Description

When a string is initialized with another string it doesn't copy the other strings contents but refers to it:

    const std::string Original( "Hello" );
    std::string Foostring = Original; // Lazy copy.

When that copied string is deserialized, it overwrites the original string's contents. Instead it should always allocate new memory for a string when deserializing.

One workaround is to initialize the string with the c_str() of the original string to enforce a copy:

    std::string Foostring = Original.c_str(); // Really copy the string.

The example was tested on Ubuntu 8.04 with gcc 4.2.3.


Code:

#include <string>
#include <iostream>
#include <sstream>

#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>

#include <boost/serialization/base_object.hpp>
#include <boost/serialization/utility.hpp>
#include <boost/serialization/vector.hpp>

void Test1()
{
    const std::string Original( "Hello" );
    std::string Foostring = Original; // Lazy copy.
    std::string Barstring( "BarST" );

    std::cout
        << "Original:" << Original 
        << " Foostring:" << Foostring 
        << " Barstring:" << Barstring
        << std::endl;


    std::ostringstream OS(std::ios::binary);
    boost::archive::binary_oarchive OA( OS );
    OA << Barstring;

    std::istringstream IS( OS.str(), std::ios::binary);
    boost::archive::binary_iarchive IA( IS );
    IA >> Foostring;

    std::cout
        << "Original:" << Original 
        << " Foostring:" << Foostring 
        << " Barstring:" << Barstring
        << std::endl;

}

void Test2()
{
    const std::string Original( "Hello" );
    std::string Foostring = Original.c_str(); // Really copy the string.
    std::string Barstring( "BarST" );

    std::cout
        << "Original:" << Original 
        << " Foostring:" << Foostring 
        << " Barstring:" << Barstring
        << std::endl;


    std::ostringstream OS(std::ios::binary);
    boost::archive::binary_oarchive OA( OS );
    OA << Barstring;

    std::istringstream IS( OS.str(), std::ios::binary);
    boost::archive::binary_iarchive IA( IS );
    IA >> Foostring;

    std::cout
        << "Original:" << Original 
        << " Foostring:" << Foostring 
        << " Barstring:" << Barstring
        << std::endl;

}


int main()
{
    std::cout << "Test1()" << std::endl;
    Test1();
    std::cout << "Test2()" << std::endl;
    Test2();
}

Output:

Test1()
Original:Hello Foostring:Hello Barstring:BarST
Original:BarST Foostring:BarST Barstring:BarST
Test2()
Original:Hello Foostring:Hello Barstring:BarST
Original:Hello Foostring:BarST Barstring:BarST

Change History (8)

comment:1 by Robert Ramey, 14 years ago

add

#include <boost/serialization/string.hpp>

and run your test again.

Robert Ramey

comment:2 by Robert Ramey, 14 years ago

Status: newassigned

comment:3 by dev@…, 14 years ago

Added the string.hpp and got the same result.

Using g++ version 4.3.1 also got the same result. (don't know if the 4.3.1 stdlib from gcc-snapshot is automatically used when switching the compiler in kdevelop)

Adding a character to "BarST" -> "BarSTx" leads to _no_ overwriting. Removing a character from "BarST" -> "BarS" leads to _no_ overwriting.

So this problem only exists when the _lengths of the strings match_ like they did in my program when the bug first occured to me.

comment:4 by Robert Ramey, 14 years ago

I presume that "to _no_ overwriting" refers to an exception.

This looks very much like a bug in the standard library implementation being used. I'll have look at the serialization string implemention. I wouldn't want to have to create a new string on all platforms just to work around this. Keep me posted if you find a good (portable?) fix for this.

Robert Ramey

comment:5 by anonymous, 14 years ago

No, there was no exception thrown or anything.

It must have to do something with the way the deserialization writes into that string.

If the lengths match, it doesn't allocate new memory and also doesn't check if the memory referred to belongs to another object and should be copied on write.

Also because the original string was a const, the constness must have been cast away somewhere.

I don't understand this with the actual implementation because of that in string.hpp:

...
BOOST_CLASS_IMPLEMENTATION(std::string, boost::serialization::primitive_type)
#ifndef BOOST_NO_STD_WSTRING
BOOST_CLASS_IMPLEMENTATION(std::wstring, boost::serialization::primitive_type)
#endif

// left over from a previous incarnation - strings are now always primitive types
#if 0
... old implementation

as far as i came in the code, the old implementation does the reasonable thing (call .clear(), .reserve() and .push_back()).

But i don't understand where the loading with the new implementation is actually happening.

What does that "primitive_type" mean when loading an object? Maybe a string sometimes ain't as primitive as expected.

The obvious fix would be to use the old implementation again, but i don't know if that raises other problems.

Maybe the problem is in the gnu-stl, maybe someone else can reconfirm it with other compilers/platforms because i only have access to the gnu compilers and the error occurs on linux and mingw. But let alone that it happens on multiple versions of gnu there should be great interest of this bug to go away.

comment:6 by Robert Ramey, 14 years ago

as far as serialization is concerned std::string is primitive - that is its implemented as code intrinsic to the library - look at the documentation regarding serialization traits. The implementation has been move to text_iarchive_impl.ipp

template<class Archive>
BOOST_ARCHIVE_DECL(void)
text_iarchive_impl<Archive>::load(std::string &s)
{
    std::size_t size;
    * this->This() >> size;
    // skip separating space
    is.get();
    // borland de-allocator fixup
    #if BOOST_WORKAROUND(_RWSTD_VER, BOOST_TESTED_AT(20101))
    if(NULL != s.data())
    #endif
        s.resize(size);
    if(0 < size)
        is.read(&(*s.begin()), size);
}

We ARE breaking the rules here. I think we had a discussion regarding this point some time ago. The concensus was that this hack was worth the risk to improve performance. Performance is a BIG issue - but of course the program has to work - I'll look into what to do about this. My first thought is another hack that it clear out the string if the sizes are the same.

Robert Ramey

comment:7 by anonymous, 14 years ago

You're pointing to the right location.

Obviously your code is correct with:

is.read(&(*s.begin()), size);

The call to .begin() causes the string to copy and .data() changes locations.

But in 1.35.0 release this line is executed.

is.read(const_cast<char *>(s.data()), size);

.data() obviously doesn't copy the string and there's also the bad cast.

So the issue is fixed in the next release.

comment:8 by Robert Ramey, 14 years ago

Milestone: Boost 1.35.1Boost 1.36.0
Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.