id summary reporter owner description type status milestone component version severity resolution keywords cc 1418 polymorphic archive performance improvement Kim Barrett Robert Ramey "The attached patch (against boost-1.34.1 release) addresses a performance problem when using boost.serialization polymorphic archives. To simplify the description, we'll only discuss here the output side (i.e. serialization), although everything described here has a symmetrical part on the input (deserialization) side. When serializing an object, the oserializer class template's save_object_data is called with arguments of a basic_oarchive and a void* referring to the object's data. That function must, among other things, convert the archive argument to the ""most specialized"" type of the archive, as specified by the oserializer class's Archive template argument. In boost_1.34.1 (and before), this conversion is performed using boost::smart_cast_reference, to convert the basic_oarchive& to an Archive&. When using a ""normal"" (non-polymorphic) archive, the Archive type is the most specialized type for the archive object, and that type is (indirectly) derived from basic_oarchive. Thus, the conversion can be safely performed using a downward static_cast. (The smart_cast mechanism uses a checked downward dynamic_cast when compiled in debug mode rather than release mode.) When using a polymorphic archive, the Archive type is polymorphic_oarchive (or perhaps a more specialized variant of polymorphic_oarchive). The polymorphic_oarchive class is not in a base/derived relationship with basic_oarchive, in either direction. Thus, the smart_cast-based conversion always uses a dynamic_cast, and that dynamic_cast is in fact a cross-cast. Note that all conversions involving a given archive instance are to the same Archive type. In the non-polymorphic case, Archive must always be the most specialized type of the archive. In the polymorphic case, Archive must always be polymorphic_oarchive, else there would be little point to using a polymorphic archive. Note that this archive conversion must be performed for each subobject of each object that is saved, recursively down through the subobjects, which means that it gets performed a lot. For non-polymorphic archives, there is no problem, since a downward static_cast is cheap. For polymorphic archives though, there is a serious problem, because dynamic_cast in general and especially cross-casts may be very expensive on some platforms. Our measurements have shown that in boost-1.34.1 the performance of serialization with polymorphic archives is completely dominated by this conversion on some platforms (gcc3.x), and strongly impacted by it on others (gcc4.x). Separate measurements seem to indicate that Windows compilers may be closer to gcc3.x than gcc4.x in this respect. This patch introduces a new operation, archive_cast, which is used by the serializers to perform this conversion. archive_cast uses the preexisting smart_cast-based conversion when the Archive type is derived from the source type. Otherwise, it performs the expensive dynamic cast once and then caches the result in the archive for later use. The result is a dramatic speedup of polymorphic archive serialization on gcc3.x platforms and a significant speedup on gcc4.x platforms. This patch is against the boost-1.34.1 release, and has only been tested against that release. There might be some adjustments required in order to apply it to the current trunk, though it looks like it shouldn't require any major changes to the patch. This patch has only been tested by us on gcc3.4 and gcc4.1. Though it doesn't do anything that appears all that complex in the way of templates, there is still a possibility that it might run afoul of some compiler limitation on some platform presently supported by the serialization library. It does use partial class template specialization over a non-type template parameter. According to boost/config, some very old compilers don't support partial specialization of class templates. A reasonable fallback in such a case, if some workaround isn't available, is to just use the preexisting smart_cast-based conversion, i.e. have archive_cast always use smart_cast_reference and continue to have the performance issue. Because this patch has only been tested by us on gcc-based platforms, there is a pretty good chance that some of the Windows declspec-related stuff is wrong in the patch. We've made a good-faith but largely uninformed attempt at dealing with that, so beware. " Patches closed Boost 1.49.0 serialization Boost Release Branch Optimization invalid Saitec Eclipse