Opened 10 years ago

Closed 10 years ago

#6830 closed Patches (fixed)

make_shared slower than shared_ptr(new) on VC++9 and 10

Reported by: ierceg@… Owned by: Peter Dimov
Milestone: To Be Determined Component: smart_ptr
Version: Boost 1.48.0 Severity: Optimization
Keywords: make_shared Cc:

Description

I created a simple benchmark for measuing raw allocation throughput for 3 classes of different sizes with a common base class (constructors and destructors trivial). The number of allocations was set to 40,000,000 as it was roughly giving me 10 seconds running time per test.

it turns out that on VC++9 (release target with default optimizations) boost::make_shared is significantly slower than simply doing boost::shared_ptr(new). Here's the benchmark output:

TestBoostMakeShared 10.577s 3.78179e+006 allocs/s

TestBoostSharedPtrNew 8.907s 4.49085e+006 allocs/s

As you can see boost::make_shared is over 15% slower than boost::shared_ptr(new) idiom.

One suggested solution:

boost::shared_ptr doesn't have a way to retrieve the deleter without using RTTI which is what is slowing down the execution on VC++9/10. I decided to add one and use it from an alternative boost::make_shared. So I did the following:

  1. I added a virtual function to detail::sp_counted_base (detail\sp_counted_base_w32.hpp):

virtual void * get_raw_deleter( ) = 0;

  1. I implemented get_raw_deleter() function in sp_counted_impl_p (detail\sp_counted_impl.hpp):

virtual void * get_raw_deleter( ) {

return 0;

}

  1. I implemented get_raw_deleter() function in sp_counted_impl_pd (detail\sp_counted_impl.hpp):

virtual void * get_raw_deleter( ) {

return &reinterpret_cast<char&>( del );

}

  1. I implemented get_raw_deleter() function in sp_counted_impl_pda (detail\sp_counted_impl.hpp):

virtual void * get_raw_deleter( ) {

return &reinterpret_cast<char&>( d_ );

}

  1. I added the following function to detail::shared_count:

void * get_raw_deleter( ) const {

return pi_? pi_->get_raw_deleter( ): 0;

}

  1. I added the following function to shared_ptr<>:

void * _internal_get_raw_deleter( ) const {

return pn.get_raw_deleter( );

}

  1. I made a separate copy of boost::make_shared function and replaced a single line from:

boost::detail::sp_ms_deleter< T > * pd = boost::get_deleter< boost::detail::sp_ms_deleter< T > >( pt );

to:

boost::detail::sp_ms_deleter< T > * pd = static_cast<boost::detail::sp_ms_deleter< T > *>(pt._internal_get_raw_deleter());

Benchmarking the results afterwards gave me the following results on VC++9:

TestBoostSharedPtrNew 9.204s 4.34594e+006 allocs/s

TestBoostMakeShared 10.499s 3.80989e+006 allocs/s

TestBoostMakeSharedAlt 7.831s 5.1079e+006 allocs/s

These changes translated into almost 35% improvement in allocation speed over the current implementation of boost::make_shared. Or to put it differently, they amount to 25+% decrease in running time as we could have supposed from the profiling results.

Change History (6)

comment:1 by Dave Abrahams, 10 years ago

I don't understand; new is a keyword, and shared_ptr is a class template, so boost::shared_ptr(new) is not legal C++. Could you please attach your benchmarks so we can reproduce your results?

Last edited 10 years ago by Dave Abrahams (previous) (diff)

comment:2 by ierceg@…, 10 years ago

I meant "boost::shared_ptr(new)" as an idiom and not as literal code i.e. boost::shared_ptr(new int) vs. boost::make_shared<int>()

comment:3 by ierceg@…, 10 years ago

This is a duplicate of #6829 so I suggest we close it and move the discussion there especially since the solution proposed there is significantly more efficient.

comment:4 by Peter Dimov, 10 years ago

(In [81860]) Change make_shared to use the new _internal_get_untyped_deleter. Refs #6830.

comment:5 by Peter Dimov, 10 years ago

I was unable to reproduce the timing results, by the way. make_shared was always faster than shared_ptr(new), on both VC++8.0 and 10.0. I haven't tested with 9.0.

comment:6 by Peter Dimov, 10 years ago

Resolution: fixed
Status: newclosed

(In [81899]) Merged revision(s) 81860-81861 from trunk: Change make_shared to use the new _internal_get_untyped_deleter. Fixes #6830. ........ Add allocate_shared_noinit. ........

Note: See TracTickets for help on using tickets.