Opened 15 years ago

Closed 8 years ago

Last modified 8 years ago

#1418 closed Patches (invalid)

polymorphic archive performance improvement

Reported by: Kim Barrett <kab@…> Owned by: Robert Ramey
Milestone: Boost 1.49.0 Component: serialization
Version: Boost Release Branch Severity: Optimization
Keywords: Saitec Eclipse Cc:

Description

The attached patch (against boost-1.34.1 release) addresses a performance problem when using boost.serialization polymorphic archives.

To simplify the description, we'll only discuss here the output side (i.e. serialization), although everything described here has a symmetrical part on the input (deserialization) side.

When serializing an object, the oserializer class template's save_object_data is called with arguments of a basic_oarchive and a void* referring to the object's data. That function must, among other things, convert the archive argument to the "most specialized" type of the archive, as specified by the oserializer class's Archive template argument. In boost_1.34.1 (and before), this conversion is performed using boost::smart_cast_reference, to convert the basic_oarchive& to an Archive&.

When using a "normal" (non-polymorphic) archive, the Archive type is the most specialized type for the archive object, and that type is (indirectly) derived from basic_oarchive. Thus, the conversion can be safely performed using a downward static_cast. (The smart_cast mechanism uses a checked downward dynamic_cast when compiled in debug mode rather than release mode.)

When using a polymorphic archive, the Archive type is polymorphic_oarchive (or perhaps a more specialized variant of polymorphic_oarchive). The polymorphic_oarchive class is not in a base/derived relationship with basic_oarchive, in either direction. Thus, the smart_cast-based conversion always uses a dynamic_cast, and that dynamic_cast is in fact a cross-cast.

Note that all conversions involving a given archive instance are to the same Archive type. In the non-polymorphic case, Archive must always be the most specialized type of the archive. In the polymorphic case, Archive must always be polymorphic_oarchive, else there would be little point to using a polymorphic archive.

Note that this archive conversion must be performed for each subobject of each object that is saved, recursively down through the subobjects, which means that it gets performed a lot.

For non-polymorphic archives, there is no problem, since a downward static_cast is cheap.

For polymorphic archives though, there is a serious problem, because dynamic_cast in general and especially cross-casts may be very expensive on some platforms. Our measurements have shown that in boost-1.34.1 the performance of serialization with polymorphic archives is completely dominated by this conversion on some platforms (gcc3.x), and strongly impacted by it on others (gcc4.x). Separate measurements seem to indicate that Windows compilers may be closer to gcc3.x than gcc4.x in this respect.

This patch introduces a new operation, archive_cast, which is used by the serializers to perform this conversion. archive_cast uses the preexisting smart_cast-based conversion when the Archive type is derived from the source type. Otherwise, it performs the expensive dynamic cast once and then caches the result in the archive for later use. The result is a dramatic speedup of polymorphic archive serialization on gcc3.x platforms and a significant speedup on gcc4.x platforms.

This patch is against the boost-1.34.1 release, and has only been tested against that release. There might be some adjustments required in order to apply it to the current trunk, though it looks like it shouldn't require any major changes to the patch.

This patch has only been tested by us on gcc3.4 and gcc4.1. Though it doesn't do anything that appears all that complex in the way of templates, there is still a possibility that it might run afoul of some compiler limitation on some platform presently supported by the serialization library. It does use partial class template specialization over a non-type template parameter. According to boost/config, some very old compilers don't support partial specialization of class templates. A reasonable fallback in such a case, if some workaround isn't available, is to just use the preexisting smart_cast-based conversion, i.e. have archive_cast always use smart_cast_reference and continue to have the performance issue.

Because this patch has only been tested by us on gcc-based platforms, there is a pretty good chance that some of the Windows declspec-related stuff is wrong in the patch. We've made a good-faith but largely uninformed attempt at dealing with that, so beware.

Attachments (6)

kab-071107-cross-cast.patch (16.9 KB ) - added by Kim Barrett <kab@…> 15 years ago.
kab-071225-measure.cpp (4.4 KB ) - added by Kim Barrett <kab@…> 15 years ago.
kab-080910-cross-cast.patch (17.3 KB ) - added by kab@… 14 years ago.
updated patch fixing inverted assertions tests. still against boost 1.34.1
archive-cast.patch.boost_1_42_0 (16.7 KB ) - added by kab@… 13 years ago.
update patch for boost 1.42
measure-archive-cast.cpp.boost_1_42_0 (4.4 KB ) - added by kab@… 13 years ago.
update test program for boost 1.42
archive-cast.patch.boost_1_43_0 (16.6 KB ) - added by kab@… 12 years ago.
update patch for boost 1.43 - no real changes from 1.42, just regenerated so it applies cleanly

Download all attachments as: .zip

Change History (19)

by Kim Barrett <kab@…>, 15 years ago

Attachment: kab-071107-cross-cast.patch added

comment:1 by Robert Ramey, 15 years ago

Status: newassigned

This is very interesting. I'm impressed with your understanding of the implementation of the library. I would be curious to see some tests which show how much time is consumed in the "smart_cast" and how much improvement is obtained. I'm also curious as to how common/popular the polymorphic archive actually is.

I'm still digesting this for now.

Robert Ramey

comment:2 by Kim Barrett <kab@…>, 15 years ago

I don't presently have a standalone performance test. I'll try to make one soon, but am in the midst of a release crunch at work right now, so it might be a few weeks. This issue was discovered when we measured (using callgrind) an application that was badly blowing its time budget, and were astonished to discover that more than 1/3 of the entire application time was being spent in the dynamic_cast in question (and this application does a lot more than just serialization).

I can't provide any insight on how much use polymorphic archives are getting. We're presently using them to avoid instantiating serialization code for 3 different archive classes, though we're also looking at revisiting that decision and more carefully investigating the space/time tradeoffs and their impact on us. That investigation likely won't happen real soon though.

by Kim Barrett <kab@…>, 15 years ago

Attachment: kab-071225-measure.cpp added

comment:3 by Kim Barrett <kab@…>, 15 years ago

I've added an example program for use in comparing the performance with and without the patch. With gcc4.1 about 25% of the unpatched time is spent in the cross-cast. I don't have access to a gcc3.x system at the moment to measure there, but expect the results on such a platform to be *much* worse. I'll be interested to hear what the difference is on a Windows platform.

comment:4 by kab@…, 14 years ago

In kab-071107-cross-cast.patch there are two occurrences of a backward test in an assertion. Mostly we use boost built in release mode, so only recently tripped over this. The change is in basic_iarchive.cpp and basic_oarchive.cpp, replacing

assert(caster != ar.pimpl->m_cached_archive_caster);

with

assert(caster == ar.pimpl->m_cached_archive_caster);

I'm also uploading a fresh patch with this change. Someday soon I hope that we'll be upgrading to boost 1.36, in which case I'll supply a fresh patch if this issue has not yet been addressed.

by kab@…, 14 years ago

Attachment: kab-080910-cross-cast.patch added

updated patch fixing inverted assertions tests. still against boost 1.34.1

comment:5 by pharmaceuticals Cialis, 13 years ago

I would like to electrocute everyone who uses the word 'fair' in connection with income tax policies.

-- William F. Buckley

viagra on line fioricet uk order cialis NkgjzCH tramadol com

by kab@…, 13 years ago

update patch for boost 1.42

by kab@…, 13 years ago

update test program for boost 1.42

comment:6 by kab@…, 13 years ago

I've just added updated versions of the patch and the test program. The patch is against boost 1.42, which required some changes to account for some refactoring in the serialization library. I think the only change to the test program was a minor fix to the g++ build instructions.

We're still using this technique, and the test program shows that it still makes a fairly significant difference, even on fairly recent versions of g++. I haven't tested the patch on windows at all, not having ready access to such.

by kab@…, 12 years ago

update patch for boost 1.43 - no real changes from 1.42, just regenerated so it applies cleanly

comment:7 by Kim Barrett <kab@…>, 12 years ago

The following discussion is with respect to boost 1.43. I haven't checked whether there are any relevant changes in more recent versions of boost.

The proposed implementation of archive_cast involves some relatively intrusive changes to base_iarchive and base_oarchive to record the cached archive caster and cached archive pointer. The reason for this is that there is no common base class between polymorphic_[i,o]archive and the non-polymorphic-archive implementation archives.

polymorphic_iarchive_impl derives only from detail::interface_iarchive<polymorphic_iarchive>.

polymorphic_iarchive derives only from polymorphic_iarchive_impl.

binary_iarchive derives from binary_iarchive_impl<binary_iarchive, ...>.

binary_iarchive_impl<Archive, ...> derives from basic_binary_iprimitive<Archive, ...> and basic_binary_iarchive<Archive>.

basic_binary_iarchive<Archive> derives from detail::common_iarchive<Archive>.

detail::common_iarchive<Archive> derives from detail::basic_iarchive and from detail::interface_iarchive<Archive>.

So there are no common base classes between polymorphic_iarchive and binary_iarchive.

If there were such a common base class (which would have to be a virtual base class on both inheritance branches), then the archive_cast conversion could consist of implicit up-conversion from the non-polymorphic-archive implementation class to that common base class, and then a static (or dynamic when debugging) down-cast to the polymorphic-archive class.

The obvious place to put such a common base class would be to add a virtual non-template base class to detail::interface_iarchive<>. This would make detail::interface_iarchive polymorphic (as in boost::is_polymorphic<>, not to be confused with the polymorphic-archive classes), which it presently isn't, so there is some cost there.

[Note that the detail::interface_iarchive<> non-virtual default destructor is presently public, which is probably a mistake. And the default public copy constructor and assignment operator are also present here, which also seems like a mistake. I think a polymorphic_iarchive can improperly be copied or assigned.]

A different attachment point for such a common virtual base class would be basic_iarchive and polymorphic_iarchive_impl, both of which are already polymorphic.

If it is a base class for detail::interface_iarchive<> then the obvious name is detail::basic_interface_iarchive, and it provides no functionality (beyond non-public default constructor, non-public destructor, and suppressed copy constructor and assignment operator). Not sure what the name should be if the attachment point is at the polymorphic_iarchive_impl and base_iarchive layer. Also likely requires more documentation there. Presently, all archives derive from detail::interface_iarchive<>, though perhaps that really is an implementation detail.

comment:8 by fgahek, 10 years ago

Keywords: Saitec Eclipse added
Milestone: To Be DeterminedBoost 1.49.0
Version: Boost 1.34.1Boost Release Branch

comment:9 by Robert Ramey, 8 years ago

"and then a static (or dynamic when debugging) down-cast to the polymorphic-archive class."

a static downcast from a virtual base is not permitted. see N3797 paragraph 5.2.9(2)

comment:10 by Robert Ramey, 8 years ago

Resolution: invalid
Status: assignedclosed

in reply to:  9 ; comment:11 by anonymous, 8 years ago

Replying to ramey:

"and then a static (or dynamic when debugging) down-cast to the polymorphic-archive class."

a static downcast from a virtual base is not permitted. see N3797 paragraph 5.2.9(2)

Oops. That does potentially kill off that suggested alternative that could be used if there were a common base class. But since there isn't such a base class, it's pretty much moot anyway, just meaning there may not be a reason to add such a base class in order to support this alternative implementation. It could be that adding such a base class and using a dynamic cast would be an improvement on the current implementation. My recollection of my testing way back then indicated that it was cross-casts that were very expensive, and down-casts not so bad. But that was only with (now) fairly old versions of gcc; newer versions of gcc or other compilers might be significantly different.

in reply to:  10 comment:12 by kab@…, 8 years ago

Replying to ramey:

I don't understand why this was set invalid? The problem noted in comment 9 does not exist in the current implementation of the patch, only in a suggested possible alternative.

in reply to:  11 comment:13 by kab@…, 8 years ago

Replying to anonymous:

Replying to ramey:

"and then a static (or dynamic when debugging) down-cast to the polymorphic-archive class."

a static downcast from a virtual base is not permitted. see N3797 paragraph 5.2.9(2)

Oops. [...]

That reply was me; forgot to set my email when submitting the reply.

Note: See TracTickets for help on using tickets.