Opened 13 years ago

Last modified 13 years ago

#3408 assigned Tasks

gamma distribution's quantile performance

Reported by: chhenning@… Owned by: John Maddock
Milestone: Boost 1.41.0 Component: math
Version: Boost 1.40.0 Severity: Optimization
Keywords: Cc:

Description

Hi there, I have created a test which times the performance of boost::math::quantile( ... ) when using a gamma distribution. I ran it against source code we use here at work, for ages. It can be found here:

http://people.sc.fsu.edu/~burkardt/cpp_src/dcdflib/dcdflib.html

The old source code is about 2x times faster than the boost version.

MS Visual Studio 2005: boost: 35.4sec att_bell:19sec

Intel 11.1: boost: 21.4sec att_bell: 11.2sec

Question is if there is a way to incorporate such a function into boost::math? As far, as I can tell the results are almost identical.

Here the code:

#include <dcdflib.h>

#include <boost/math/distributions/gamma.hpp>

#include <boost/timer.hpp>

double min_mean = 2000; 2,000 double max_mean = 500000000; 500,000,000

double min_std = 10000; 10,000 double max_std = 100000000; 100,000,000

double min_max = 600000000; 600,000,000 double max_max = 1000000000; 1,000,000,000

const std::size_t max_year = 5000000; 5,000,000

const double right = 0.999; const double left = 0.001;

inline double get_rand() {

return static_cast< double >( std::rand() )

/ static_cast< double >( RAND_MAX );

}

inline void boost_( boost::math::gamma_distribution<>& d, double q ) {

double value = boost::math::quantile( d, q );

}

inline void att_bell( double alpha, double beta, double q ) {

double q_Minus1 = 1 - q; double value = 0.0; double bound = 0.0;

int which = 2; int status = 0;

cdfgam( &which

, &q , &q_Minus1 , &value , &alpha , &beta , &status , &bound );

}

int main() {

boost {

std::srand( 0 );

boost::timer timer; for( std::size_t y = 0; y < max_year; ++y ) {

if(( y % 100000 ) == 0 )

std::cout << y << std::endl;

double mean = get_rand() * ( max_mean - min_mean ) + min_mean; double std = get_rand() * ( max_std - min_std ) + min_std;

double alpha = mean * mean / std / std; shape parameter double beta = mean / alpha; scale parameter

boost::math::gamma_distribution<> d( alpha, beta ); boost_( d, right ); boost_( d, left );

}

std::cout << "Boost - Time elapsed: " << timer.elapsed() << "

sec" << std::endl;

}

att bell {

std::srand( 0 );

boost::timer timer; for( std::size_t y = 0; y < max_year; ++y ) {

if(( y % 100000 ) == 0 )

std::cout << y << std::endl;

double mean = get_rand() * ( max_mean - min_mean ) + min_mean; double std = get_rand() * ( max_std - min_std ) + min_std;

double alpha = mean * mean / std / std; shape parameter double beta = mean / alpha; scale parameter

att_bell( alpha, beta, right ); att_bell( alpha, beta, left );

}

std::cout << "ATT Bell - Time elapsed: " << timer.elapsed() <<

" sec" << std::endl;

}

return 0;

}

Change History (11)

comment:1 by Steven Watanabe, 13 years ago

This doesn't seem to affect the timings for me, but you're not computing the same thing with Boost that you are with dcdflib in this test case, since boost::math::gamma_distribution takes a parameter 1/beta instead of beta.

comment:2 by chhenning@…, 13 years ago

Steven, are you saying both test running at the same speed on your machine?

Also, I'm not sure about the 1/beta parameter. My code results in the same values when using boost and ATT Bell.

in reply to:  2 comment:3 by Steven Watanabe, 13 years ago

Replying to chhenning@…:

Steven, are you saying both test running at the same speed on your machine?

Actually Boost.Math is about 4 times slower on my machine with msvc 9.0. I was just saying that usign 1/beta vs. beta didn't affect the timings.

Also, I'm not sure about the 1/beta parameter. My code results in the same values when using boost and ATT Bell.

I'll check again. All I did was to change att_bell and boost_ to return value and compare the results for the same arguments. They were totally different until I switched boost_ to 1/beta.

comment:4 by John Maddock, 13 years ago

Status: newassigned

Thanks for reminding me about this, I'm back now (sort of anyway!) so I'll try and investigate soon. Your test showed about a 2x difference on my machine BTW.

As for integrating the dcd lib code into Boost... there would likely be licence issues preventing that... in any case the two libraries implement *the same algorithm* so it's a question of trying to track down why they're showing a difference.

John.

comment:5 by John Maddock, 13 years ago

I thought I'd add a short update: I'm concentrating on the igamma and it's inverse functions at this stage, as ultimately these are what the stats functions of both libraries depend upon.

Accuracy wise the two libries are broadly similar:

Tests run with Microsoft Visual C++ version 9.0, Dinkumware standard library version 505, Win32 Testing tgamma(a, z) medium values with type double boost::math::gamma_q<double> Max = 23.69 RMS Mean=3.955

worst case at row: 600 { 91.615684509277344, 183.23136901855469, 6.0094745549395681e+125, 2.5194958887453263e-014, 2.3851892681325433e+139, 0.9999999999999748 }

other::gamma_q(double) Max = 125.1 RMS Mean=9.671

worst case at row: 27 { 4.053, 405.3, 8.563e-169, 1.334e-169, 6.418, 1 }

boost::math::gamma_p<double> Max = 35.09 RMS Mean=6.961

worst case at row: 651 { 96.78564453125, 0.96785646677017212, 3.7245981312945362e+149, 1, 0.00016783328089080209, 4.5060775679568222e-154 }

other::gamma_p(double) Max = 426.3 RMS Mean=46.18

worst case at row: 518 { 80.13, 0.8013, 1.566e+117, 1, 1.104e-010, 7.05e-128 }

Testing tgamma(a, z) small values with type double boost::math::gamma_q<double> Max = 2.898 RMS Mean=1.05

worst case at row: 73 { 1.0221641311147778e-009, 1.0221641311147778e-009, 20.124127878209329, 2.0570161699209098e-008, 978316444.88368392, 0.99999997942983831 }

other::gamma_q(double) Max = 31.86 RMS Mean=7.103

worst case at row: 223 { 0.001962, 100, 3.717e-046, 7.303e-049, 509, 1 }

boost::math::gamma_p<double> Max = 0.5133 RMS Mean=0.03234

worst case at row: 226 { 0.0055537549778819084, 0.0049983793869614601, 4.6545082571173033, 0.025932342985794662, 174.83209885998627, 0.97406765701420539 }

other::gamma_p(double) Max = 0.7172 RMS Mean=0.2328

worst case at row: 245 { 0.05124, 0.0005124, 5.75, 0.3029, 13.24, 0.6971 }

Testing tgamma(a, z) large values with type double boost::math::gamma_q<double> Max = 469.8 RMS Mean=31.51

worst case at row: 222 { 2057.796630859375, 4115.59326171875, 0.1184802275824791, 5.1589245564542783e-277, 0.22966070987459913, 1 }

other::gamma_q(double) Max = 310.7 RMS Mean=35.14

worst case at row: 277 { 7.884e+005, 7.884e+005, 0.1624, 0.499, 0.1631, 0.501 }

boost::math::gamma_p<double> Max = 244.4 RMS Mean=19.38

worst case at row: 211 { 1169.2916259765625, 584.64581298828125, 0.22117898880238315, 1, 0.42515811737132952, 1.9222355598668354e-100 }

other::gamma_p(double) Max = 310.3 RMS Mean=36.91

worst case at row: 277 { 7.884e+005, 7.884e+005, 0.1624, 0.499, 0.1631, 0.501 }

Testing tgamma(a, z) integer and half integer values with type double boost::math::gamma_q<double> Max = 8.485 RMS Mean=1.351

worst case at row: 138 { 37, 74, 2.7177865101844111e+035, 7.306008776118165e-007, 3.7199305501125022e+041, 0.99999926939912243 }

other::gamma_q(double) Max = 103.1 RMS Mean=12.59

worst case at row: 20 { 4.5, 450, 7.196e-187, 6.187e-188, 11.63, 1 }

boost::math::gamma_p<double> Max = 13.03 RMS Mean=2.932

worst case at row: 133 { 37, 0.37000000476837158, 3.7199332678990125e+041, 1, 1.9898558488031992e-018, 5.3491708197417564e-060 }

other::gamma_p(double) Max = 123.4 RMS Mean=20.06

worst case at row: 91 { 25, 0.25, 6.204e+023, 1, 2.794e-017, 4.503e-041 }

So for the most part the Boost version can be a bit more accurate.

And for the inverse:

Tests run with Microsoft Visual C++ version 9.0, Dinkumware standard library version 505, Win32 Testing incomplete gamma inverse(a, z) medium values with type double boost::math::gamma_p_inv<double> Max = 1.652 RMS Mean=0.3353

worst case at row: 2 { 9.7540397644042969, 0.22103404998779297, 7.2622991072035479, 11.97587613793354 }

boost::math::gamma_q_inv<double> Max = 2.254 RMS Mean=0.3822

worst case at row: 49 { 22.103404998779297, 0.96886777877807617, 31.653035526093614, 14.199388115898186 }

other::gamma_p_inv<double> Max = 9.755 RMS Mean=2.085

worst case at row: 23 { 13.547700881958008, 0.30816704034805298, 11.480744407137443, 15.117986636057008 }

other::gamma_q_inv<double> Max = 24.23 RMS Mean=2.871

worst case at row: 49 { 22.103404998779297, 0.96886777877807617, 31.653035526093614, 14.199388115898186 }

Testing incomplete gamma inverse(a, z) large values with type double boost::math::gamma_p_inv<double> Max = 0.9242 RMS Mean=0.1082

worst case at row: 10 { 581.3642578125, 0.12698681652545929, 553.96697937820386, 608.96241888374834 }

boost::math::gamma_q_inv<double> Max = 0.8143 RMS Mean=0.1072

worst case at row: 70 { 40010.84375, 0.12698681652545929, 39782.764004009827, 40239.124371052443 }

other::gamma_p_inv<double> Max = 1.22 RMS Mean=0.4401

worst case at row: 88 { 106978.296875, 0.9133758544921875, 107424.0055670203, 106533.15792214176 }

other::gamma_q_inv<double> Max = 1.07 RMS Mean=0.4135

worst case at row: 30 { 3758.09765625, 0.12698681652545929, 3688.2692219050818, 3828.1269667349475 }

Testing incomplete gamma inverse(a, z) small values with type double boost::math::gamma_p_inv<double> Max = 1163 RMS Mean=103.7

worst case at row: 86 { 0.000398525211494416, 0.83500856161117554, 1.7877325072629434e-197, 0 }

boost::math::gamma_q_inv<double> Max = 962.6 RMS Mean=82.05

worst case at row: 82 { 0.000398525211494416, 0.22103404998779297, 0, 3.4835777677025903e-273 }

other::gamma_p_inv<double> Max = 1229 RMS Mean=108

worst case at row: 86 { 0.000398525211494416, 0.83500856161117554, 1.7877325072629434e-197, 0 }

other::gamma_q_inv<double> Max = 1145 RMS Mean=135.6

worst case at row: 80 { 0.000398525211494416, 0.12698681652545929, 0, 5.6992492776090963e-149 }

So broadly similar again.

Time wise, I've modified our performance tests to test dcdflib as well and get:

Testing igamma 8.611e-007 Testing igamma-dcd 5.849e-007 Testing igamma_inv 3.626e-006 Testing igamma_inv-dcd 1.810e-006

which confirms what you're seeing.

After the first round of changes/optimisations (now in Trunk) I get:

Testing igamma 7.067e-007 Testing igamma-dcd 5.849e-007 Testing igamma_inv 3.003e-006 Testing igamma_inv-dcd 1.855e-006

So a bit better, but still some way to go, unfortunately many of the remaining differences appear to come from either:

  • Not such good optimisations when calling boilerplate code (external "inline" routines) as compared to the declared-all-inline spagetti code of the original fortran.
  • Apparently a lower overhead in calling the functions and going through the algorithm-selection logic in the dcd code.

However, I'm not particularly keen to replace the current, almost readable, structured code, with a bunch of goto's and spagetti code :-(

Still looking for more low hanging fruit yours, John.

comment:6 by chhenning@…, 13 years ago

John, some quick questions.

  1. Are the underlying algorithms for the two functions the same?
  2. Is it possible to change the internal function variables to static storage?
  3. How about using forceinline on Windows? Dunno what it's on Linux.

Thanks for your work on this.

Christian

comment:7 by John Maddock, 13 years ago

  1. Are the underlying algorithms for the two functions the same?

Yes. However, there may be detail differences that make one version or the other slightly faster.

  1. Is it possible to change the internal function variables to static storage?

No, that wouldn't be thread safe (obviously constants are already static).

  1. How about using forceinline on Windows? Dunno what it's on Linux.

Maybe, that's sort of a last resort, there are still plenty of other things to work on first!

John.

comment:8 by John Maddock, 13 years ago

OK another round of changes and now I see:

Testing igamma 5.849e-007
Testing igamma-dcd 5.857e-007
Testing igamma_inv 2.651e-006
Testing igamma_inv-dcd 1.855e-006

So still some work to do on the inverse, but the forward function is looking pretty healthy now. These changes effect some of the underlying boilerplate code, and so should improve times right throughout the library :-)

BTW, these times were obtained with -Ox and -DNDEBUG, the latter is particularly important, as there are quite a few asserts in our code!

Committed to trunk with revision #56503.

Onwards to work on the inverse now.... John.

comment:9 by John Maddock, 13 years ago

Another update to Trunk and I think this is almost fixed now:

Testing igamma                                            5.849e-007
Testing igamma-dcd                                        5.857e-007
Testing igamma_inv                                        1.943e-006
Testing igamma_inv-dcd                                    1.855e-006

The Boost inverse is somewhat slower as it computes more digits; changing the policy to 10 decimal places (which is what the Didonato and Morris code promises) then gives:

Testing igamma_inv                                        1.459e-006
Testing igamma_inv-dcd                                    1.855e-006

And your original test code gives:

Boost                 30.5s
Boost (10 digits)     21.3s
DCDFLIB               27s

Unfortunately, I've found a few other functions that appear to be slower than they should be, and the speedup achieved on Linux is very disappointing so far :-(

So still more to do, but this basically fixes the original issue I believe. If you're able to give the new code in Trunk a test on your machine that would be much appreciated, as results will vary from machine to machine a bit...

HTH, John.

comment:10 by chhenning@…, 13 years ago

John, thanks for all of your work. I did update my boost trunk and rerun my test. Both boost and ATT are pretty much on par with each other. Att still being slightly a faster.

Test is to compare boost's gamma quantile function vs. Att's cdfgam function.

Boost - 19.968sec Att - 18.547sec

This is very good! I have used VS2005 producing 64bit code.

Can you summaries how you almost doubled the boost performance?

Thanks again for your great work. I can now propose to use boost::math at my work.

Christian

comment:11 by John Maddock, 13 years ago

Can you summaries how you almost doubled the boost performance?

Lot's of very small tweeks :-)

To the forward function:

Look for opportunities to better reuse intermediate results so there are fewer special-function calls made.

To the inverse (and hence quantile):

  • Adjust/fine tune iteration limits so that fewer Halley iterations are used to find the inverse.
  • Fixed a couple of bugs in the evaluation of the the "guess" used for inital Halley iteration.
  • Improved re-use of intermediate results (fewer function calls).

Regards, John.

Note: See TracTickets for help on using tickets.