Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#10740 closed Bugs (invalid)

Multi-level containers do not cooperate with address tracking

Reported by: Simon Etter <ettersi@…> Owned by: Robert Ramey
Milestone: To Be Determined Component: serialization
Version: Boost 1.56.0 Severity: Problem
Keywords: Address tracking, STL containers Cc:

Description

Serializing nested sequence containers does not track the object addresses correctly. See the attached example.

Non-sequence containers may also be subject to this bug, I did not check them.

The problem is the push_back in collections_load_imp.hpp:65. If t is itself a container, all elements of t get copied, but their address is not updated in the archive. In C++11, using emplace() and std::move() should solve the problem (did not test this).

Attachments (4)

bug.cpp (1.8 KB ) - added by Simon Etter <ettersi@…> 8 years ago.
Minimal example
bug.2.cpp (2.5 KB ) - added by Simon Etter <ettersi@…> 8 years ago.
bug.cpp with more comments
test_z.cpp (1.5 KB ) - added by Robert Ramey 8 years ago.
altered version of bug.cpp
vector.hpp (1.3 KB ) - added by Simon Etter <ettersi@…> 8 years ago.
Partial fix.

Download all attachments as: .zip

Change History (21)

by Simon Etter <ettersi@…>, 8 years ago

Attachment: bug.cpp added

Minimal example

comment:1 by Robert Ramey, 8 years ago

Resolution: invalid
Status: newclosed

I looked at your example and I think it's wrong. I think that you've assumed incorrectly that the lower level container will be reloaded to the original address. There is no reason to believe this. I've modified you program to check the contents of the lower level container and verify that they have been restored correctly

If you're not convinced, feel free to look into this and make another small example

by Simon Etter <ettersi@…>, 8 years ago

Attachment: bug.2.cpp added

bug.cpp with more comments

comment:2 by Simon Etter <ettersi@…>, 8 years ago

I am not convinced, but I most admit my initial bug report did not point out the problem very clearly. I apologize for that.

I uploaded a new version of the minimal non-working example which I commented more extensively. I hope this makes it clear where I see the problem.

I am aware that stored and reloaded objects may be located at different memory addresses. To my understanding, Boost.Serialization wishes to maintain the invariant that if a pointer pointed to an object before the serialization/deserialization process, it also does so afterwards. My example shows a setting where this is not the case.

comment:3 by Robert Ramey, 8 years ago

"To my understanding, Boost.Serialization wishes to maintain the invariant that if a pointer pointed to an object before the serialization/deserialization process, it also does so afterwards."

nope. loading an object serialized through a pointer creates a NEW pointer which is "equivalent" to the old one. It's equivalent in that it points to a new and disjoint versions of the original.

       // Again, we would like pd to point to the dummy object, independent
       // of where it is loaded. This is not the case, however.
       assert(&l.back().back() == pd); // Does fire!!!	

Nope: we expect pd to point to some new object - disjoint but equivalent to the original one.

        // Again, let pd point to the dummy object
        dummy* pd = &l.back().back();

        // Same as before...
        {
            std::ofstream ofs("two_level.xml");
            boost::archive::xml_oarchive oa(ofs);
            oa << BOOST_SERIALIZATION_NVP(l) << BOOST_SERIALIZATION_NVP(pd);
        }
        dummy* pd1;
        {
            std::ifstream ifs("two_level.xml");
            boost::archive::xml_iarchive ia(ifs);
            ia >> BOOST_SERIALIZATION_NVP(l) >> BOOST_SERIALIZATION_NVP(pd1);
        }
        assert(*pd == *pd1); // should be true

comment:4 by Robert Ramey, 8 years ago

Actually I just did this and my assertion fails. This would suggest a failure to track individual members of the array. I'll look into this

comment:5 by Simon Etter <ettersi@…>, 8 years ago

I don't understand your posted code snippet. The second line lets pd point to some element in l. On the third line from below, you read a new vector into l, which calls clear() on l (see collections_load_imp.hpp : 140) and therefore invalidates pd. On the last line, you nevertheless dereference pd. This is undefined behaviour.

I think we are still talking past each other. Let me describe the situation once more for the one-level case. Assume oa is any output archive, and l and pd are defined and initialized as follows:

std::vector<dummy> l(1);
dummy* pd = &l.back();

I call d the object of type dummy which is located at the address &l.back(). I emphasize that the type and address of d are the only relevant properties here. We first serialize l:

oa << l;

The implementation of serialization for std::vector<> calls serialize on every element of l, thus also on d. Next, we serialize pd.

oa << pd;

According to http://www.boost.org/doc/libs/1_57_0/libs/serialization/doc/serialization.html#pointeroperators, the serialization code checks whether an object of type dummy at address &d has already been serialized. Since d was already serialized as an element of l, this is indeed the case. Thus, we only store some special tag for pd, no actual object information.

Next, we create some input archive ia which reads the same file as oa wrote to. We first deserialize the l from above, which we now call l_:

std::vector<dummy> l_;
ia >> l_;

This creates a new object of type dummy at some arbitrary address which we can get through &l_.back(). We define d_ to be this object identified by the combination of type and address. When deserializing the pd from above into pd_,

dummy* pd_;
ia >> pd_;

we encounter the special tag that we wrote to the archive instead of the proper object. By the same section of the documentation as mentioned above, this should allow the serialization library to recognize that it does not need to create a new object but rather let pd_ refer to d_. We check this through the following assert:

assert(pd_ = &l_.back());

As mentioned, for a single level of std::vector, this indeed works.

The key point is that oa << l; and oa << pd try to serialize an object of the same type at the same address. According to the documentation on serializing pointers, the library detects this situation and stores only one object. But exactly the same situations occurs for multilevel containers! From the user's perspective, there is thus no reason to expect this situation not to work.


In the remainder, I'll try to explain why in fact it does not work with the current implementation. The problem is on lines 61 to 67 in collections_load_imp.hpp, and the line numbers below refer to this section. Assume we serialized by running the following code

std::vector<std::vector<dummy>> l(1);
oa << l;

and are now about to deserialize this object. The lines

std::vector<std::vector<dummy>> l_;
ia >> l_;

cause the following to happen. We read that l contained a single element. We therefore create a temporary (line 62), which we call tll_, and deserialize this single element into it (line 64). Now tll_ is a std::vector<dummy> of length 1. Next, we call l_.push_back(tll_) (line 65). Since we would like future pointers to this "logical" object to point to &l_.back() and not &tll_, we tell the new address to the archive by calling reset_pointer_address(). At this point, the error happens: We would also have to tell the archive that we want future pointers to reference &l_.back().back() and not &tll_.back(). We cannot do this, however, because at this point in the code we don't know that tll_ (which corresponds to the variable s in the actual code) is in fact a vector. In conclusion, I thus know where the bug is (and I am pretty sure it is in fact a bug), but I don't know how to solve it.

by Robert Ramey, 8 years ago

Attachment: test_z.cpp added

altered version of bug.cpp

comment:6 by Robert Ramey, 8 years ago

OK - I got the code snippet wrong. I was disturbed by your example re-using the variable "l" which introduced other possibilities. I've pared your example down to the simplest one that I think should demonstrate any failure and uploaded it under the same name as before test_z.cpp.

I used the simpler text_?archive I separated the output variables l and p from l1 and p1 I just included the failing case.

I run this program and sometimes it asserts and sometimes it doesn't. This has to be a bug - and one that is going to be damn difficult to find. Is this equivalent to your bug?

comment:7 by Robert Ramey, 8 years ago

" Since we would like future pointers to this "logical" object to point to &l_.back() and not &tll_, we tell the new address to the archive by calling reset_pointer_address(). At this point, the error happens: We would also have to tell the archive that we want future pointers to reference &l_.back().back() and not &tll_.back(). We cannot do this, however, because at this point in the code we don't know that tll_ (which corresponds to the variable s in the actual code) is in fact a vector."

Not quite that simple

given: std::vector<dummy> t1; std::vector<std::vector<dummy>> t2(t1)

ia >> t2

for each t in t2

ia >> t

for each x in t

ia >> x resetobjectaddress (x) moves pointer to everythingf t1

resetobjectaddress(t) moves pointers to everything under t2

note that the addresses in t1 get moved twice ! std::vector<dummy> t

So this is pretty subtle - the fact that sometimes it works and sometimes it doesn't means it's going to be a bitch to find.

in reply to:  6 comment:8 by Simon Etter <ettersi@…>, 8 years ago

Regarding comment:6: Let's say it is a consequence of my bug. The point is that pd1 does not point to &l1[0][0]. Rather it points to what I called &tll_.back(). Since tll_ was only a temporary and is deleted by now, dereferencing pd1 results once again in undefined behaviour.

On your machine, it seems that sometimes your "lucky" and l1[0][0] gets allocated at the location where tll_[0] was, which is the one pd1 points to. On my machine, I am not that lucky and the assert fires every time. So if you want to do debugging, I suggest you look for a machine that behaves as mine. I use g++ version 4.8.1 with the --std=c++11 switch running on a Ubuntu 14.04. If you need further information, you know where to find me :-)

Yes, the addresses of the dummy objects (x, in your pseudocode) get moved twice. But the first move is covered by the resetobjectaddress(x), so this does not cause a problem. The problem is the second move, for which you only call resetobjectaddress(t), but you should also be calling

for each x in t
    resetobjectaddress(x)

by Simon Etter <ettersi@…>, 8 years ago

Attachment: vector.hpp added

Partial fix.

comment:9 by Simon Etter <ettersi@…>, 8 years ago

I uploaded a fix for std::vector. If I include this header instead of the Boost.Serialization code for std::vector, test_z.cpp runs without problems.

The fix could be easily incorporated into collections_load_imp.hpp. The only problem is that it requires C++11 emplace and move semantics.

comment:10 by Robert Ramey, 8 years ago

OK - I'm pretty sure I've definitively fixed this for non-indexed containers - vector, list, etc. A fix for indexed containers is still pending and would be trickier.

comment:11 by Simon Etter, 8 years ago

Thanks a lot for your work! I had a look at your changes to std::vector deserialization and agree that this fixes the problem. However, the use of std::vector::resize() (vector.hpp:88) breaks support for non-default-constructible types, doesn't it?

comment:12 by Robert Ramey, 8 years ago

Damn!

This was probably why I implemented it the way I did originally. When I made this fix, I use presumed that the the type T in vector<T> had to be default constructible as a requirement. This turned out to be wrong - the requirement is CopyConststructable. But it's worse, the requirements for c++11 are different. In fact, the requirements vary by function so it's even trickier to get right.

To get this really right the following needs to be considered: a) Go back to the previous method - but use move and/or swap to place the item in the array. move is only available with C++11 though there is a boost move which works with previous versions but it may require some more declarations so I'm not sure it would work. b) use the current method and live with the new restriction. Or possible use boost::has_trivial_constructor to condition which method is used. c) one has to be sure that it's as fast as possible. The reason for this isn't performance, it's that some bonehead will (and have) made a "benchmark" which makes a really simple case then uses it to make a very broad statement that the serialization library is really slow. d) I would like to keep a c++03 implementation available - conditioned on a config flag.

For now I'm setting this aside - but you're free to invest some effort in it if you want. I'm happy to incorporate improvements from other authors. But note that all the above has to be addressed - not just a fix for a specific case.

Robert Ramey

comment:13 by anonymous, 8 years ago

uh oh - your comment made me realize that I created a huge blunder. I went and looked this up in the code. It turns out that that original implementation used stack_construct which in turn used load_construct_data which permitted non-default constructors. So the current implementation does use this - so it's likely breaking reads on existing archives !!! So This can't be ignored. I'll see what I can do about it. As usual, making any kind of change turns out to have a lot more implications than first meet the eye. The difficulty and effort required to write and maintain this library are way under-estimated.

Robert Ramey

comment:14 by indrek@…, 7 years ago

I just wasted an entire day of my life with this bug. Why has it not been fixed? Just use this patch. It fixes the problem, makes deserialize faster, and should not break anything existing:

--- original.collections_load_imp.hpp   2015-09-23 16:17:14.772000000 +0300
+++ collections_load_imp.hpp    2015-09-23 16:26:08.396000000 +0300
@@ -60,10 +60,9 @@
     ){
         typedef BOOST_DEDUCED_TYPENAME Container::value_type type;
         detail::stack_construct<Archive, type> t(ar, v);
-        // borland fails silently w/o full namespace
-        ar >> boost::serialization::make_nvp("item", t.reference());
         s.push_back(t.reference());
-        ar.reset_object_address(& s.back() , & t.reference());
+        // borland fails silently w/o full namespace
+        ar >> boost::serialization::make_nvp("item", s.back());
         return hint;
     }
 };

comment:15 by indrek@…, 7 years ago

Resolution: invalid
Status: closedreopened

Reopening. The fix seems really simple and straightforward. So why not?

in reply to:  15 comment:16 by Robert Ramey, 7 years ago

Resolution: invalid
Status: reopenedclosed

Replying to indrek@…:

Reopening. The fix seems really simple and straightforward. So why not?

Well, it might look simple and straight forward to you - but not to me.

Since version 1.56, this file has undergone some changes so I don't think that your patch can be applied. I think these changes were actually to address this specific problem. So if you try the latest version you might have better luck.

Robert Ramey

comment:17 by indrek@…, 7 years ago

I clicked on the top left corner of this site, saw "The development branch is available at http://svn.boost.org/svn/boost/trunk", got the development branch of boost, and it was still broken there. Apparently I got the wrong development branch. Would have helped if there was a note that it is no longer used -- I did try my best to report on the current version. I confirm it has been fixed in 1.58.

Note: See TracTickets for help on using tickets.