Opened 9 years ago
Last modified 9 years ago
#9393 new Feature Requests
quantile for hypergeometric distribution - precision/rounding issue
Reported by: | anonymous | Owned by: | John Maddock |
---|---|---|---|
Milestone: | To Be Determined | Component: | math |
Version: | Boost Development Trunk | Severity: | Showstopper |
Keywords: | Cc: |
Description
Hi (John),
This is the same as Ticket #9183 (https://svn.boost.org/trac/boost/ticket/9183) but applies to the hypergeometric distribution. For example,
#define BOOST_MATH_DISCRETE_QUANTILE_POLICY integer_round_up #include <cstdio> #include <boost/math/distributions/hypergeometric.hpp> int main() { unsigned int xmin = 71; unsigned int length = 9; boost::math::hypergeometric_distribution<> dist( 79u, 101u, 109u ); for ( unsigned int i = 0; i < length; ++i ) { double x = xmin + i; double y = boost::math::cdf( dist, x ); // boost::math::cdf( boost::math::complement( dist, x ) ); std::printf( " x = %g\tCDF(x) = %f\tinverse-CDF(CDF(x)) = %g\n", x, y, boost::math::quantile( dist, y ) ); } return 0; }
returns
x = 71 CDF(x) = 0.068671 inverse-CDF(CDF(x)) = 72 x = 72 CDF(x) = 0.297575 inverse-CDF(CDF(x)) = 72 x = 73 CDF(x) = 0.615846 inverse-CDF(CDF(x)) = 73 x = 74 CDF(x) = 0.856699 inverse-CDF(CDF(x)) = 74 x = 75 CDF(x) = 0.965083 inverse-CDF(CDF(x)) = 75 x = 76 CDF(x) = 0.994746 inverse-CDF(CDF(x)) = 77 x = 77 CDF(x) = 0.999561 inverse-CDF(CDF(x)) = 77 x = 78 CDF(x) = 0.999985 inverse-CDF(CDF(x)) = 78 x = 79 CDF(x) = 1.000000 inverse-CDF(CDF(x)) = 79
I was hoping/expecting to obtain:
x y z [1,] 71 0.06867117 71 [2,] 72 0.29757505 72 [3,] 73 0.61584552 73 [4,] 74 0.85669885 74 [5,] 75 0.96508285 75 [6,] 76 0.99474584 76 [7,] 77 0.99956126 77 [8,] 78 0.99998459 78 [9,] 79 1.00000000 79
which I was used to on R.
Thank you!
Change History (5)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Oops, just realized that I am "anonymous" ... so never got any indication that you have replied.
Thank you for thinking about it.
comment:3 by , 9 years ago
Hi John,
When computing the lower tail probabilities (inverse cdf) I set the rounding policy to integer_round_down, and when computing the upper tail probabilities (inverse survival function) I set the rounding policy to integer_round_up. This seems most consistent in the context of lower or upper tail tests.
In my very limited testing, when I use these rounding policies for hypergeometric quantiles I find I get the most accurate, or at least the most consistent, results in boost::math if I set the fudge_factor defined in hypergeometric_quantile_imp() an passed to round_x_from_p() to 1. That is, I remove the fudge_factor from the computations.
Perhaps this can help suggest an approach to this issue.
Best regards, Paul
comment:4 by , 9 years ago
John,
Here are some examples illustrating the effect of setting the fudge_factor to 1 with different quantile policies (thank you Anonymous).
#include "stdafx.h" #define BOOST_MATH_DISCRETE_QUANTILE_POLICY integer_round_down #include <cstdio> #include <boost/math/distributions/hypergeometric.hpp> int main() { unsigned int xmin = 0; unsigned int length = 9; boost::math::hypergeometric_distribution<> dist( 8u, 8u, 16u ); for ( unsigned int i = 0; i < length; ++i ) { double x = xmin + i; double y = boost::math::cdf( dist, x ); // boost::math::cdf( boost::math::complement( dist, x ) ); std::printf( " x = %g\tCDF(x) = %f\tinverse-CDF(CDF(x)) = %g\n", x, y, boost::math::quantile( dist, y ) ); } return 0; }
With fudge_factor unchanged, and quantile policy integer_round_down I get:
x = 0 CDF(x) = 0.000078 inverse-CDF(CDF(x)) = 0 x = 1 CDF(x) = 0.005051 inverse-CDF(CDF(x)) = 0 <-- x = 2 CDF(x) = 0.065967 inverse-CDF(CDF(x)) = 1 <-- x = 3 CDF(x) = 0.309635 inverse-CDF(CDF(x)) = 2 <-- x = 4 CDF(x) = 0.690365 inverse-CDF(CDF(x)) = 3 <-- x = 5 CDF(x) = 0.934033 inverse-CDF(CDF(x)) = 4 <-- x = 6 CDF(x) = 0.994949 inverse-CDF(CDF(x)) = 5 <-- x = 7 CDF(x) = 0.999922 inverse-CDF(CDF(x)) = 7 x = 8 CDF(x) = 1.000000 inverse-CDF(CDF(x)) = 8
Changing the quantile policy to integer_round_up I get:
x = 0 CDF(x) = 0.000078 inverse-CDF(CDF(x)) = 1 <-- x = 1 CDF(x) = 0.005051 inverse-CDF(CDF(x)) = 2 <-- x = 2 CDF(x) = 0.065967 inverse-CDF(CDF(x)) = 3 <-- x = 3 CDF(x) = 0.309635 inverse-CDF(CDF(x)) = 4 <-- x = 4 CDF(x) = 0.690365 inverse-CDF(CDF(x)) = 5 <-- x = 5 CDF(x) = 0.934033 inverse-CDF(CDF(x)) = 6 <-- x = 6 CDF(x) = 0.994949 inverse-CDF(CDF(x)) = 7 <-- x = 7 CDF(x) = 0.999922 inverse-CDF(CDF(x)) = 8 <-- x = 8 CDF(x) = 1.000000 inverse-CDF(CDF(x)) = 8
However, if I force fudge_factor to 1.0 in hypergeometric_quantile.hpp, then with quantile policy integer_round_down I get:
x = 0 CDF(x) = 0.000078 inverse-CDF(CDF(x)) = 0 x = 1 CDF(x) = 0.005051 inverse-CDF(CDF(x)) = 1 x = 2 CDF(x) = 0.065967 inverse-CDF(CDF(x)) = 2 x = 3 CDF(x) = 0.309635 inverse-CDF(CDF(x)) = 3 x = 4 CDF(x) = 0.690365 inverse-CDF(CDF(x)) = 4 x = 5 CDF(x) = 0.934033 inverse-CDF(CDF(x)) = 5 x = 6 CDF(x) = 0.994949 inverse-CDF(CDF(x)) = 5 <-- x = 7 CDF(x) = 0.999922 inverse-CDF(CDF(x)) = 7 x = 8 CDF(x) = 1.000000 inverse-CDF(CDF(x)) = 8
and with quantile policy integer_round_up I get:
x = 0 CDF(x) = 0.000078 inverse-CDF(CDF(x)) = 0 x = 1 CDF(x) = 0.005051 inverse-CDF(CDF(x)) = 1 x = 2 CDF(x) = 0.065967 inverse-CDF(CDF(x)) = 2 x = 3 CDF(x) = 0.309635 inverse-CDF(CDF(x)) = 4 <-- x = 4 CDF(x) = 0.690365 inverse-CDF(CDF(x)) = 4 x = 5 CDF(x) = 0.934033 inverse-CDF(CDF(x)) = 5 x = 6 CDF(x) = 0.994949 inverse-CDF(CDF(x)) = 6 x = 7 CDF(x) = 0.999922 inverse-CDF(CDF(x)) = 8 <-- x = 8 CDF(x) = 1.000000 inverse-CDF(CDF(x)) = 8
Oh :(
At present I don't know how to fix that without breaking other things, I need to think about this some more, it's possible too that it's basically unfixable without completely hammering the efficiency.