Opened 13 years ago
Last modified 13 years ago
#3408 assigned Tasks
gamma distribution's quantile performance
Reported by: | Owned by: | John Maddock | |
---|---|---|---|
Milestone: | Boost 1.41.0 | Component: | math |
Version: | Boost 1.40.0 | Severity: | Optimization |
Keywords: | Cc: |
Description
Hi there, I have created a test which times the performance of boost::math::quantile( ... ) when using a gamma distribution. I ran it against source code we use here at work, for ages. It can be found here:
http://people.sc.fsu.edu/~burkardt/cpp_src/dcdflib/dcdflib.html
The old source code is about 2x times faster than the boost version.
MS Visual Studio 2005: boost: 35.4sec att_bell:19sec
Intel 11.1: boost: 21.4sec att_bell: 11.2sec
Question is if there is a way to incorporate such a function into boost::math? As far, as I can tell the results are almost identical.
Here the code:
#include <dcdflib.h>
#include <boost/math/distributions/gamma.hpp>
#include <boost/timer.hpp>
double min_mean = 2000; 2,000 double max_mean = 500000000; 500,000,000
double min_std = 10000; 10,000 double max_std = 100000000; 100,000,000
double min_max = 600000000; 600,000,000 double max_max = 1000000000; 1,000,000,000
const std::size_t max_year = 5000000; 5,000,000
const double right = 0.999; const double left = 0.001;
inline double get_rand() {
return static_cast< double >( std::rand() )
/ static_cast< double >( RAND_MAX );
}
inline void boost_( boost::math::gamma_distribution<>& d, double q ) {
double value = boost::math::quantile( d, q );
}
inline void att_bell( double alpha, double beta, double q ) {
double q_Minus1 = 1 - q; double value = 0.0; double bound = 0.0;
int which = 2; int status = 0;
cdfgam( &which
, &q , &q_Minus1 , &value , &alpha , &beta , &status , &bound );
}
int main() {
boost {
std::srand( 0 );
boost::timer timer; for( std::size_t y = 0; y < max_year; ++y ) {
if(( y % 100000 ) == 0 )
std::cout << y << std::endl;
double mean = get_rand() * ( max_mean - min_mean ) + min_mean; double std = get_rand() * ( max_std - min_std ) + min_std;
double alpha = mean * mean / std / std; shape parameter double beta = mean / alpha; scale parameter
boost::math::gamma_distribution<> d( alpha, beta ); boost_( d, right ); boost_( d, left );
}
std::cout << "Boost - Time elapsed: " << timer.elapsed() << "
sec" << std::endl;
}
att bell {
std::srand( 0 );
boost::timer timer; for( std::size_t y = 0; y < max_year; ++y ) {
if(( y % 100000 ) == 0 )
std::cout << y << std::endl;
double mean = get_rand() * ( max_mean - min_mean ) + min_mean; double std = get_rand() * ( max_std - min_std ) + min_std;
double alpha = mean * mean / std / std; shape parameter double beta = mean / alpha; scale parameter
att_bell( alpha, beta, right ); att_bell( alpha, beta, left );
}
std::cout << "ATT Bell - Time elapsed: " << timer.elapsed() <<
" sec" << std::endl;
}
return 0;
}
Change History (11)
comment:1 by , 13 years ago
follow-up: 3 comment:2 by , 13 years ago
Steven, are you saying both test running at the same speed on your machine?
Also, I'm not sure about the 1/beta parameter. My code results in the same values when using boost and ATT Bell.
comment:3 by , 13 years ago
Replying to chhenning@…:
Steven, are you saying both test running at the same speed on your machine?
Actually Boost.Math is about 4 times slower on my machine with msvc 9.0. I was just saying that usign 1/beta vs. beta didn't affect the timings.
Also, I'm not sure about the 1/beta parameter. My code results in the same values when using boost and ATT Bell.
I'll check again. All I did was to change att_bell and boost_ to return value and compare the results for the same arguments. They were totally different until I switched boost_ to 1/beta.
comment:4 by , 13 years ago
Status: | new → assigned |
---|
Thanks for reminding me about this, I'm back now (sort of anyway!) so I'll try and investigate soon. Your test showed about a 2x difference on my machine BTW.
As for integrating the dcd lib code into Boost... there would likely be licence issues preventing that... in any case the two libraries implement *the same algorithm* so it's a question of trying to track down why they're showing a difference.
John.
comment:5 by , 13 years ago
I thought I'd add a short update: I'm concentrating on the igamma and it's inverse functions at this stage, as ultimately these are what the stats functions of both libraries depend upon.
Accuracy wise the two libries are broadly similar:
Tests run with Microsoft Visual C++ version 9.0, Dinkumware standard library version 505, Win32
Testing tgamma(a, z) medium values with type double
boost::math::gamma_q<double> Max = 23.69 RMS Mean=3.955
worst case at row: 600 { 91.615684509277344, 183.23136901855469, 6.0094745549395681e+125, 2.5194958887453263e-014, 2.3851892681325433e+139, 0.9999999999999748 }
other::gamma_q(double) Max = 125.1 RMS Mean=9.671
worst case at row: 27 { 4.053, 405.3, 8.563e-169, 1.334e-169, 6.418, 1 }
boost::math::gamma_p<double> Max = 35.09 RMS Mean=6.961
worst case at row: 651 { 96.78564453125, 0.96785646677017212, 3.7245981312945362e+149, 1, 0.00016783328089080209, 4.5060775679568222e-154 }
other::gamma_p(double) Max = 426.3 RMS Mean=46.18
worst case at row: 518 { 80.13, 0.8013, 1.566e+117, 1, 1.104e-010, 7.05e-128 }
Testing tgamma(a, z) small values with type double
boost::math::gamma_q<double> Max = 2.898 RMS Mean=1.05
worst case at row: 73 { 1.0221641311147778e-009, 1.0221641311147778e-009, 20.124127878209329, 2.0570161699209098e-008, 978316444.88368392, 0.99999997942983831 }
other::gamma_q(double) Max = 31.86 RMS Mean=7.103
worst case at row: 223 { 0.001962, 100, 3.717e-046, 7.303e-049, 509, 1 }
boost::math::gamma_p<double> Max = 0.5133 RMS Mean=0.03234
worst case at row: 226 { 0.0055537549778819084, 0.0049983793869614601, 4.6545082571173033, 0.025932342985794662, 174.83209885998627, 0.97406765701420539 }
other::gamma_p(double) Max = 0.7172 RMS Mean=0.2328
worst case at row: 245 { 0.05124, 0.0005124, 5.75, 0.3029, 13.24, 0.6971 }
Testing tgamma(a, z) large values with type double
boost::math::gamma_q<double> Max = 469.8 RMS Mean=31.51
worst case at row: 222 { 2057.796630859375, 4115.59326171875, 0.1184802275824791, 5.1589245564542783e-277, 0.22966070987459913, 1 }
other::gamma_q(double) Max = 310.7 RMS Mean=35.14
worst case at row: 277 { 7.884e+005, 7.884e+005, 0.1624, 0.499, 0.1631, 0.501 }
boost::math::gamma_p<double> Max = 244.4 RMS Mean=19.38
worst case at row: 211 { 1169.2916259765625, 584.64581298828125, 0.22117898880238315, 1, 0.42515811737132952, 1.9222355598668354e-100 }
other::gamma_p(double) Max = 310.3 RMS Mean=36.91
worst case at row: 277 { 7.884e+005, 7.884e+005, 0.1624, 0.499, 0.1631, 0.501 }
Testing tgamma(a, z) integer and half integer values with type double
boost::math::gamma_q<double> Max = 8.485 RMS Mean=1.351
worst case at row: 138 { 37, 74, 2.7177865101844111e+035, 7.306008776118165e-007, 3.7199305501125022e+041, 0.99999926939912243 }
other::gamma_q(double) Max = 103.1 RMS Mean=12.59
worst case at row: 20 { 4.5, 450, 7.196e-187, 6.187e-188, 11.63, 1 }
boost::math::gamma_p<double> Max = 13.03 RMS Mean=2.932
worst case at row: 133 { 37, 0.37000000476837158, 3.7199332678990125e+041, 1, 1.9898558488031992e-018, 5.3491708197417564e-060 }
other::gamma_p(double) Max = 123.4 RMS Mean=20.06
worst case at row: 91 { 25, 0.25, 6.204e+023, 1, 2.794e-017, 4.503e-041 }
So for the most part the Boost version can be a bit more accurate.
And for the inverse:
Tests run with Microsoft Visual C++ version 9.0, Dinkumware standard library version 505, Win32
Testing incomplete gamma inverse(a, z) medium values with type double
boost::math::gamma_p_inv<double> Max = 1.652 RMS Mean=0.3353
worst case at row: 2 { 9.7540397644042969, 0.22103404998779297, 7.2622991072035479, 11.97587613793354 }
boost::math::gamma_q_inv<double> Max = 2.254 RMS Mean=0.3822
worst case at row: 49 { 22.103404998779297, 0.96886777877807617, 31.653035526093614, 14.199388115898186 }
other::gamma_p_inv<double> Max = 9.755 RMS Mean=2.085
worst case at row: 23 { 13.547700881958008, 0.30816704034805298, 11.480744407137443, 15.117986636057008 }
other::gamma_q_inv<double> Max = 24.23 RMS Mean=2.871
worst case at row: 49 { 22.103404998779297, 0.96886777877807617, 31.653035526093614, 14.199388115898186 }
Testing incomplete gamma inverse(a, z) large values with type double
boost::math::gamma_p_inv<double> Max = 0.9242 RMS Mean=0.1082
worst case at row: 10 { 581.3642578125, 0.12698681652545929, 553.96697937820386, 608.96241888374834 }
boost::math::gamma_q_inv<double> Max = 0.8143 RMS Mean=0.1072
worst case at row: 70 { 40010.84375, 0.12698681652545929, 39782.764004009827, 40239.124371052443 }
other::gamma_p_inv<double> Max = 1.22 RMS Mean=0.4401
worst case at row: 88 { 106978.296875, 0.9133758544921875, 107424.0055670203, 106533.15792214176 }
other::gamma_q_inv<double> Max = 1.07 RMS Mean=0.4135
worst case at row: 30 { 3758.09765625, 0.12698681652545929, 3688.2692219050818, 3828.1269667349475 }
Testing incomplete gamma inverse(a, z) small values with type double
boost::math::gamma_p_inv<double> Max = 1163 RMS Mean=103.7
worst case at row: 86 { 0.000398525211494416, 0.83500856161117554, 1.7877325072629434e-197, 0 }
boost::math::gamma_q_inv<double> Max = 962.6 RMS Mean=82.05
worst case at row: 82 { 0.000398525211494416, 0.22103404998779297, 0, 3.4835777677025903e-273 }
other::gamma_p_inv<double> Max = 1229 RMS Mean=108
worst case at row: 86 { 0.000398525211494416, 0.83500856161117554, 1.7877325072629434e-197, 0 }
other::gamma_q_inv<double> Max = 1145 RMS Mean=135.6
worst case at row: 80 { 0.000398525211494416, 0.12698681652545929, 0, 5.6992492776090963e-149 }
So broadly similar again.
Time wise, I've modified our performance tests to test dcdflib as well and get:
Testing igamma 8.611e-007 Testing igamma-dcd 5.849e-007 Testing igamma_inv 3.626e-006 Testing igamma_inv-dcd 1.810e-006
which confirms what you're seeing.
After the first round of changes/optimisations (now in Trunk) I get:
Testing igamma 7.067e-007 Testing igamma-dcd 5.849e-007 Testing igamma_inv 3.003e-006 Testing igamma_inv-dcd 1.855e-006
So a bit better, but still some way to go, unfortunately many of the remaining differences appear to come from either:
- Not such good optimisations when calling boilerplate code (external "inline" routines) as compared to the declared-all-inline spagetti code of the original fortran.
- Apparently a lower overhead in calling the functions and going through the algorithm-selection logic in the dcd code.
However, I'm not particularly keen to replace the current, almost readable, structured code, with a bunch of goto's and spagetti code :-(
Still looking for more low hanging fruit yours, John.
comment:6 by , 13 years ago
John, some quick questions.
- Are the underlying algorithms for the two functions the same?
- Is it possible to change the internal function variables to static storage?
- How about using forceinline on Windows? Dunno what it's on Linux.
Thanks for your work on this.
Christian
comment:7 by , 13 years ago
- Are the underlying algorithms for the two functions the same?
Yes. However, there may be detail differences that make one version or the other slightly faster.
- Is it possible to change the internal function variables to static storage?
No, that wouldn't be thread safe (obviously constants are already static).
- How about using forceinline on Windows? Dunno what it's on Linux.
Maybe, that's sort of a last resort, there are still plenty of other things to work on first!
John.
comment:8 by , 13 years ago
OK another round of changes and now I see:
Testing igamma 5.849e-007
Testing igamma-dcd 5.857e-007
Testing igamma_inv 2.651e-006
Testing igamma_inv-dcd 1.855e-006
So still some work to do on the inverse, but the forward function is looking pretty healthy now. These changes effect some of the underlying boilerplate code, and so should improve times right throughout the library :-)
BTW, these times were obtained with -Ox and -DNDEBUG, the latter is particularly important, as there are quite a few asserts in our code!
Committed to trunk with revision #56503.
Onwards to work on the inverse now.... John.
comment:9 by , 13 years ago
Another update to Trunk and I think this is almost fixed now:
Testing igamma 5.849e-007 Testing igamma-dcd 5.857e-007 Testing igamma_inv 1.943e-006 Testing igamma_inv-dcd 1.855e-006
The Boost inverse is somewhat slower as it computes more digits; changing the policy to 10 decimal places (which is what the Didonato and Morris code promises) then gives:
Testing igamma_inv 1.459e-006 Testing igamma_inv-dcd 1.855e-006
And your original test code gives:
Boost 30.5s Boost (10 digits) 21.3s DCDFLIB 27s
Unfortunately, I've found a few other functions that appear to be slower than they should be, and the speedup achieved on Linux is very disappointing so far :-(
So still more to do, but this basically fixes the original issue I believe. If you're able to give the new code in Trunk a test on your machine that would be much appreciated, as results will vary from machine to machine a bit...
HTH, John.
comment:10 by , 13 years ago
John, thanks for all of your work. I did update my boost trunk and rerun my test. Both boost and ATT are pretty much on par with each other. Att still being slightly a faster.
Test is to compare boost's gamma quantile function vs. Att's cdfgam function.
Boost - 19.968sec Att - 18.547sec
This is very good! I have used VS2005 producing 64bit code.
Can you summaries how you almost doubled the boost performance?
Thanks again for your great work. I can now propose to use boost::math at my work.
Christian
comment:11 by , 13 years ago
Can you summaries how you almost doubled the boost performance?
Lot's of very small tweeks :-)
To the forward function:
Look for opportunities to better reuse intermediate results so there are fewer special-function calls made.
To the inverse (and hence quantile):
- Adjust/fine tune iteration limits so that fewer Halley iterations are used to find the inverse.
- Fixed a couple of bugs in the evaluation of the the "guess" used for inital Halley iteration.
- Improved re-use of intermediate results (fewer function calls).
Regards, John.
This doesn't seem to affect the timings for me, but you're not computing the same thing with Boost that you are with dcdflib in this test case, since boost::math::gamma_distribution takes a parameter 1/beta instead of beta.