Opened 15 years ago
Closed 14 years ago
#1849 closed Bugs (fixed)
Deserialization of std::string overwrites non-copied contents.
Reported by: | Owned by: | Robert Ramey | |
---|---|---|---|
Milestone: | Boost 1.36.0 | Component: | serialization |
Version: | Boost 1.35.0 | Severity: | Showstopper |
Keywords: | Cc: |
Description
When a string is initialized with another string it doesn't copy the other strings contents but refers to it:
const std::string Original( "Hello" ); std::string Foostring = Original; // Lazy copy.
When that copied string is deserialized, it overwrites the original string's contents. Instead it should always allocate new memory for a string when deserializing.
One workaround is to initialize the string with the c_str() of the original string to enforce a copy:
std::string Foostring = Original.c_str(); // Really copy the string.
The example was tested on Ubuntu 8.04 with gcc 4.2.3.
Code:
#include <string> #include <iostream> #include <sstream> #include <boost/archive/binary_oarchive.hpp> #include <boost/archive/binary_iarchive.hpp> #include <boost/serialization/base_object.hpp> #include <boost/serialization/utility.hpp> #include <boost/serialization/vector.hpp> void Test1() { const std::string Original( "Hello" ); std::string Foostring = Original; // Lazy copy. std::string Barstring( "BarST" ); std::cout << "Original:" << Original << " Foostring:" << Foostring << " Barstring:" << Barstring << std::endl; std::ostringstream OS(std::ios::binary); boost::archive::binary_oarchive OA( OS ); OA << Barstring; std::istringstream IS( OS.str(), std::ios::binary); boost::archive::binary_iarchive IA( IS ); IA >> Foostring; std::cout << "Original:" << Original << " Foostring:" << Foostring << " Barstring:" << Barstring << std::endl; } void Test2() { const std::string Original( "Hello" ); std::string Foostring = Original.c_str(); // Really copy the string. std::string Barstring( "BarST" ); std::cout << "Original:" << Original << " Foostring:" << Foostring << " Barstring:" << Barstring << std::endl; std::ostringstream OS(std::ios::binary); boost::archive::binary_oarchive OA( OS ); OA << Barstring; std::istringstream IS( OS.str(), std::ios::binary); boost::archive::binary_iarchive IA( IS ); IA >> Foostring; std::cout << "Original:" << Original << " Foostring:" << Foostring << " Barstring:" << Barstring << std::endl; } int main() { std::cout << "Test1()" << std::endl; Test1(); std::cout << "Test2()" << std::endl; Test2(); }
Output:
Test1() Original:Hello Foostring:Hello Barstring:BarST Original:BarST Foostring:BarST Barstring:BarST Test2() Original:Hello Foostring:Hello Barstring:BarST Original:Hello Foostring:BarST Barstring:BarST
Change History (8)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Status: | new → assigned |
---|
comment:3 by , 14 years ago
Added the string.hpp and got the same result.
Using g++ version 4.3.1 also got the same result. (don't know if the 4.3.1 stdlib from gcc-snapshot is automatically used when switching the compiler in kdevelop)
Adding a character to "BarST" -> "BarSTx" leads to _no_ overwriting. Removing a character from "BarST" -> "BarS" leads to _no_ overwriting.
So this problem only exists when the _lengths of the strings match_ like they did in my program when the bug first occured to me.
comment:4 by , 14 years ago
I presume that "to _no_ overwriting" refers to an exception.
This looks very much like a bug in the standard library implementation being used. I'll have look at the serialization string implemention. I wouldn't want to have to create a new string on all platforms just to work around this. Keep me posted if you find a good (portable?) fix for this.
Robert Ramey
comment:5 by , 14 years ago
No, there was no exception thrown or anything.
It must have to do something with the way the deserialization writes into that string.
If the lengths match, it doesn't allocate new memory and also doesn't check if the memory referred to belongs to another object and should be copied on write.
Also because the original string was a const, the constness must have been cast away somewhere.
I don't understand this with the actual implementation because of that in string.hpp:
... BOOST_CLASS_IMPLEMENTATION(std::string, boost::serialization::primitive_type) #ifndef BOOST_NO_STD_WSTRING BOOST_CLASS_IMPLEMENTATION(std::wstring, boost::serialization::primitive_type) #endif // left over from a previous incarnation - strings are now always primitive types #if 0 ... old implementation
as far as i came in the code, the old implementation does the reasonable thing (call .clear(), .reserve() and .push_back()).
But i don't understand where the loading with the new implementation is actually happening.
What does that "primitive_type" mean when loading an object? Maybe a string sometimes ain't as primitive as expected.
The obvious fix would be to use the old implementation again, but i don't know if that raises other problems.
Maybe the problem is in the gnu-stl, maybe someone else can reconfirm it with other compilers/platforms because i only have access to the gnu compilers and the error occurs on linux and mingw. But let alone that it happens on multiple versions of gnu there should be great interest of this bug to go away.
comment:6 by , 14 years ago
as far as serialization is concerned std::string is primitive - that is its implemented as code intrinsic to the library - look at the documentation regarding serialization traits. The implementation has been move to text_iarchive_impl.ipp
template<class Archive> BOOST_ARCHIVE_DECL(void) text_iarchive_impl<Archive>::load(std::string &s) { std::size_t size; * this->This() >> size; // skip separating space is.get(); // borland de-allocator fixup #if BOOST_WORKAROUND(_RWSTD_VER, BOOST_TESTED_AT(20101)) if(NULL != s.data()) #endif s.resize(size); if(0 < size) is.read(&(*s.begin()), size); }
We ARE breaking the rules here. I think we had a discussion regarding this point some time ago. The concensus was that this hack was worth the risk to improve performance. Performance is a BIG issue - but of course the program has to work - I'll look into what to do about this. My first thought is another hack that it clear out the string if the sizes are the same.
Robert Ramey
comment:7 by , 14 years ago
You're pointing to the right location.
Obviously your code is correct with:
is.read(&(*s.begin()), size);
The call to .begin() causes the string to copy and .data() changes locations.
But in 1.35.0 release this line is executed.
is.read(const_cast<char *>(s.data()), size);
.data() obviously doesn't copy the string and there's also the bad cast.
So the issue is fixed in the next release.
comment:8 by , 14 years ago
Milestone: | Boost 1.35.1 → Boost 1.36.0 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
add
#include <boost/serialization/string.hpp>
and run your test again.
Robert Ramey