Opened 12 years ago

Last modified 12 years ago

#4307 new Patches

basic_oarchive optimization

Reported by: anonymous Owned by: Robert Ramey
Milestone: Boost 1.43.0 Component: serialization
Version: Boost 1.44.0 Severity: Optimization
Keywords: Cc:

Description

attached are patches that optimizes memory allocation and runtime complexity of basic_oarchive. it should not change any serialization behaviour, and archives by the patched basic_oarchive can be read by the unpatched basic_iarchive.

what the patch does in detail:

  • replace log(O) lookup of "cobject" class info for each serialized class with a log(1) vector lookup. avoids container node allocation.
  • replace log(O) lookup of "aobject" object tracking info with log(1) hashing. avoids container node allocation.
  • use allocation pools for "aobject" object tracking info
  • store whether a class was stored as a pointer inside the "cobject" class info, to remove std::set stored_pointers

before:

  • 4 allocations on construction
  • 1 allocation for each serialized class, tracked or not
  • 1 allocation for each tracked object

after:

  • 1 allocation on construction(PIMPL)
  • no allocation for classes if less than 256 classes are serialized. 1 allocation for 512 classes, 2 for 1024, ...
  • no allocation for object tracking if less than 256 objects are serialized. after that, about 1 allocation for 16 tracked objects

Attachments (1)

basic_oarchive_patch.zip (6.0 KB ) - added by anonymous 12 years ago.
patches create boost/archive_patched and libs/serialization/src_patched

Download all attachments as: .zip

Change History (7)

by anonymous, 12 years ago

Attachment: basic_oarchive_patch.zip added

patches create boost/archive_patched and libs/serialization/src_patched

in reply to:  description comment:1 by anonymous, 12 years ago

  • replace O(log N) lookup of "cobject" class info for each serialized class with a O(1) vector lookup. avoids container node allocation.
  • replace O(log N) lookup of "aobject" object tracking info with O(1) hashing. avoids container node allocation.

comment:2 by anonymous, 12 years ago

I've take a very quick look at this.

a) Looks very promising to me. b) TODO - this part needs to be looked at. I made singleton with the idea that it only be called prior to main(..) invocation. During the loading of DLLS this is not true. So in fact the serialization library is only thread safe if one avoids loading/unloading DLLS while simultaneasly saving/loading an archive. This is my compromise to avoid requiring thread syncronization inside the library. c) The above begs the question: what about basic_iarchive? Can it be improved as well? d) What is the status of Autobuffer? is incorporation in to boost expected to be soon?. If not, I would like to see a simple web page documenting it similar to BOOST_SERIALIZATION_STRONG_TYPE, BOOST_SERIALIZATION_STATE_SAVER, etc. If/when it get's into boost, this can be dropped/moved. e) Running all the tests is indispensable for something like this. My procedure is:

1) start with boost release 2) include patches in serialization library 3) cd boostroot/libs/serialization/test 4) ../../../tools/regression/src/library_status.sh (or.bat) toolset=msvc-7.1 &

This should create an html table with all the test results.

f) Finally, it would be interesting if there were some preformance results for the patch. It's pretty obvious to me that this would result in improved performance, so it's not strictly necessary if it turned out to be a lot of work. You might look at boostroot/libs/serialization/performance for an example.

g) it seems you rely on the presumption that every class_id/object_id is assigned in sequence. This seems correct to me. But I'm a little concerned if I had a reason for using std::set in the first place. I don't remember now if I did. You might want to include some "assert" if there is some assumption you're relying on that isn't totally obvious.

Again, looks very promising to me. I can't test it now as I'm bogged down in fixing some extremely arcane issue related to versioning.

Robert Ramey

comment:3 by anonymous, 12 years ago

b) thread synchronization: that's the same compromise the code marked with TODO makes, it only modifies data before main() - TODO can be removed.

c) basic_iarchive: haven't looked at it in detail, but yes. will provide patch as soon as we've figured out what parts of this patch can make it into boost.

d) I don't know the status of AutoBuffer. the supplied archive::detail::auto_buffer only implements a small part of it. I'd rather keep this an implementation detail. fully implementing and documenting it would duplicate Boost.Autobuffer.

e) tests: yep

f) performance test: likewise

g) class_id/object_id in sequence: I don't make the assumption that class_id is allocated in sequence, I assume that it is ok that they are NOT in sequence. so while the unpatched basic_oarchive might have procudec an output like

<class id="1">...</class><class id="2">...</class>

the patched one might produce

<class id="43">...</class><class id="25">...</class>

wrt your concern about std::set: a std::set/std::map is indeed necessary, I've only changed the point of construction of this map. previously, when a class was serialized it was added to the map. patched, every class for which serialization is instantiated is assigned a type_id before main(), which is then used to index a vector.

before: class A{}; class B{}; class C{};

int main(){

ar << c; assigns class id 1 to C

ar << a; assigns class id 2 to A

}

after:

class A{}; class B{}; class C{};

before main: assign class id 1 to A, 2 to B, 3 to C

int main(){

ar << c; serializes c with class id 3

ar << a; serializes a with class id 1

}

the downside of this: if serialization is instantiated for say 10000 classes, and then an archive is used to serialize exactly one object whose type happens to have been assign type_id 9999, a vector<cobject> with size 10000 is initialized.

comment:4 by Robert Ramey, 12 years ago

Does this not presume that all types are exported? The library doesn't currently require this.

Robert Ramey

comment:5 by anonymous, 12 years ago

why? because of instantiation?

the type_id itself can change between saving and loading, e.g. when loading an archive the internal type_id can be 5 while the class_id saved in the archive is 10.

the type_id is only used for internal bookkeeping. before the patch the mapping was

type_info -> cobject{ class_id } (through a std::set keyed on type_info)

after the patch it is:

type_info -> type_id -> cobject{ class_id }

with type_id incidentally used as class_id, because I assumed the order of the class_id doesn't matter. but that can be changed, if you want the class_ids to be assigned sequentially.

I think I'm going to map the type_ids to "cobject" by an intrusive unordered_map, too, though. that is almost as good as vector lookup(almost perfect hash function), and doesn't have the 10000-initialization problem described above.

comment:6 by anonymous, 12 years ago

some simple performance tests:

my focus was on speeding up "archive construction - serialize few or 1 object - destruction", but you're probably most interested in bulk performance into a file, so this is binary_oarchive into a std::ofstream filestream.

numbers are throughput increase, 100% means it's twice as fast.

unique untracked objects: ~ 35% unique tracked objects: ~130 % tracked duplicates (serializing pointers to the same objects over and over again): ~ 30%

"unique tracked objects" is probably closer to the other ones for fewer objects, this was lots of objects so I could measure it in seconds. but it does make sense, as that's the case that has to constantly add to the object tracking set, while the others only read from the data structures, when all classes/objects are registered.

Note: See TracTickets for help on using tickets.