Opened 6 years ago

Last modified 6 years ago

#12688 new Bugs

Boost..Accumulators test/median.cpp testing method is flawed

Reported by: A. Sinan Unur <sinan@…> Owned by: Eric Niebler
Milestone: To Be Determined Component: accumulator
Version: Boost 1.62.0 Severity: Problem
Keywords: testing, median, algorithm, accumulator Cc:

Description

I am going to stick with Stats terminology, because I get confused. I am going to use the word sample to mean a collection of data points or draws or observations, not a single one.

  1. In the test, you generate a sample of 100,000 observations from N(1,1), and check if the medians computed by the implementations are close enough to the true population median of 1.

That is flawed: The algorithms are for calculating the median of the set of numbers you have. Therefore, the medians they produce should be compared to the actual median of the sample you just generated. The median of that sample can be obtained by storing the draws in a vector and getting the exact median from that.

This is not hard to do. I am not, however, submitting a pull request to do that because I am not convinced this approach is the right way to test the algorithms.

  1. In the case of P2, the original paper contains a worked-out example by the authors using 20 observations. They show exactly what the algorithm should produce at each step. Your implementation does not match those steps.

The following short program demonstrates this:

#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <vector>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/median.hpp>

namespace bacc = boost::accumulators;

int main(void)
{
    bacc::accumulator_set<double,
        bacc::stats<bacc::tag::median(bacc::with_p_square_quantile)> > acc;

    // See http://www.cse.wustl.edu/~jain/papers/psqr.htm

    // First five observations
    acc(0.02);
    acc(0.5);
    acc(0.74);
    acc(3.39);
    acc(0.83);

    const std::vector<std::pair<double, double> > jain_chlamtac {
        { 22.37, 0.74 },
        { 10.15, 0.74 },
        { 15.43, 2.18 },
        { 38.62, 4.75 },
        { 15.92, 4.75 },
        { 34.60, 9.28 },
        { 10.28, 9.28 },
        {  1.47, 9.28 },
        {  0.40, 9.28 },
        {  0.05, 6.30 },
        { 11.39, 6.30 },
        {  0.27, 6.30 },
        {  0.42, 6.30 },
        {  0.09, 4.44 },
        { 11.37, 4.44 },
    };

    for (auto p: jain_chlamtac)
    {
        acc(p.first);
        std::cout << "calculated= " << bacc::median(acc)
            << "\texpected= " << p.second << '\n';
    }

    return 0;
}

This produces

calculated= 0.74        expected= 0.74
calculated= 0.74        expected= 0.74
calculated= 2.06167     expected= 2.18
calculated= 4.55176     expected= 4.75
calculated= 4.55176     expected= 4.75
calculated= 9.15196     expected= 9.28
calculated= 9.15196     expected= 9.28
calculated= 9.15196     expected= 9.28
calculated= 9.15196     expected= 9.28
calculated= 6.17976     expected= 6.3
calculated= 6.17976     expected= 6.3
calculated= 6.17976     expected= 6.3
calculated= 6.17976     expected= 6.3
calculated= 4.24624     expected= 4.44
calculated= 4.24624     expected= 4.44

More a more detailed exposition, see my blog post https://www.nu42.com/2016/12/cpp-boost-median-test.html

This is the first time I am looking at Boost.Accumulators and I haven't figured out everything yet, so I do not have a fix (and I can't promise I am going to have a fix soon), but I thought I would get the ball rolling. Maybe the issue will be more transparent to you.

To summarize:

  1. The implementations should be tested against cases where we know what they are supposed to do.
  1. It looks like there is a problem in the implementation of the P2 algorithm.

HTH,

-- Sinan

Change History (2)

comment:1 by A. Sinan Unur <sinan@…>, 6 years ago

Actually, looking again at Table I in the paper, there is the possibility that intermediate calculations were made with rounded figures, and that may explain explain the discrepancy. I need to verify my paper & pencil calculations again before I can say that conclusively.

-- Sinan

comment:2 by A. Sinan Unur <sinan@…>, 6 years ago

This is the final actually: As I was verifying intermediate calculations by hand, I noticed that the body of the paper lists the first five observations as

0.02, 0.5, 0.74, 3.99, 0.83

whereas they use

0.02, 0.15, 0.74, 3.99, 0.83

to produce Table I.

With that adjustment, the program:

#include <algorithm>
#include <cstdio>
#include <string>
#include <utility>
#include <vector>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/median.hpp>

namespace bacc = boost::accumulators;

int main(void)
{
    bacc::accumulator_set<double,
        bacc::stats<bacc::tag::median(bacc::with_p_square_quantile)> > acc;

    // See http://www.cse.wustl.edu/~jain/papers/psqr.htm

    // First five observations
    acc(0.02);
    acc(0.15);
    acc(0.74);
    acc(3.39);
    acc(0.83);

    const std::vector<std::pair<double, double> > jain_chlamtac {
        {22.37, 0.74},
        {10.15, 0.74},
        {15.43, 2.18},
        {38.62, 4.75},
        {15.92, 4.75},
        {34.60, 9.28},
        {10.28, 9.28},
        {1.47,  9.28},
        {0.40,  9.28},
        {0.05,  6.30},
        {11.39, 6.30},
        {0.27,  6.30},
        {0.42,  6.30},
        {0.09,  4.44},
        {11.37, 4.44},
    };

    for (auto p: jain_chlamtac)
    {
        acc(p.first);
        std::printf("calculated= %.3f\texpected= %.2f\n", bacc::median(acc), p.second);
    }

    return 0;
}

produces the output:

calculated= 0.740       expected= 0.74
calculated= 0.740       expected= 0.74
calculated= 2.178       expected= 2.18
calculated= 4.753       expected= 4.75
calculated= 4.753       expected= 4.75
calculated= 9.275       expected= 9.28
calculated= 9.275       expected= 9.28
calculated= 9.275       expected= 9.28
calculated= 9.275       expected= 9.28
calculated= 6.297       expected= 6.30
calculated= 6.297       expected= 6.30
calculated= 6.297       expected= 6.30
calculated= 6.297       expected= 6.30
calculated= 4.441       expected= 4.44
calculated= 4.441       expected= 4.44

which is definitely good enough.

I sent an email to Dr. Jain pointing out the discrepancy between the text of the paper and the numbers used to produce Table I.

This removes my concern that there was something subtly wrong with the implementation in p_square_quantile.hpp (I did re-write the key sections several times trying to figure out what it might be) and establishes that the implementation can replicate the output in the original paper.

I still think the test needs to be improved, but now I feel free to focus on that.

I will submit a pull request as soon as I have some time to put it together.

HTH,

-- Sinan

Note: See TracTickets for help on using tickets.