Boost C++ Libraries: Ticket #6992: accumulator's median feature skips 1st two data points. https://svn.boost.org/trac10/ticket/6992 <p> Hi, </p> <p> I used accumulator's median feature to calculate mean/median of input data. I also used armadillo library's mean/median. The mean from two libraries always agree. However the medians don't. I tried to give 1,2,3,4,5 input data to the program. and found accumulator's median outputs 0 if the number of input data is 1 or 2. It starts to output non-zero median if given more than 2 data points however, first two data points are not used. </p> <p> Please check what is going on. </p> <p> Thanks, yu </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/6992 Trac 1.4.3 polyactis@… Sat, 16 Jun 2012 01:11:12 GMT attachment set https://svn.boost.org/trac10/ticket/6992 https://svn.boost.org/trac10/ticket/6992 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">CalculateMedianMeanOfInputColumn.cc</span> </li> </ul> <p> source code </p> Ticket polyactis@… Sat, 16 Jun 2012 01:11:33 GMT attachment set https://svn.boost.org/trac10/ticket/6992 https://svn.boost.org/trac10/ticket/6992 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">CalculateMedianMeanOfInputColumn.h</span> </li> </ul> <p> header file </p> Ticket Eric Niebler Thu, 15 Nov 2012 18:45:12 GMT owner changed https://svn.boost.org/trac10/ticket/6992#comment:1 https://svn.boost.org/trac10/ticket/6992#comment:1 <ul> <li><strong>owner</strong> changed from <span class="trac-author">Eric Niebler</span> to <span class="trac-author">Matthias Troyer</span> </li> </ul> <p> Matthias, can you have a look at this one when you get a chance? </p> Ticket Matthias Troyer Thu, 15 Nov 2012 20:43:41 GMT <link>https://svn.boost.org/trac10/ticket/6992#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/6992#comment:2</guid> <description> <p> Eric, having dinner with you in Redmond is dangerous especially if I agree to take a look at the open accumulator tickets. Here is my conclusion after looking at the issue: </p> <p> First, median estimated are notoriously hard and you never get an exact median unless you store at least half of the samples. Hence, unlike the mean which can easily and unambiguously be estimated (as long as the variance is finite), median estimation is harder and different algorithms to estimate the median will give different results. </p> <p> Our default estimator is a P<sup>2</sup> quantile estimator, which only stores and updates five numbers and hence has a minimal memory footprint. However, it requires at least five samples before it gives sensible output and I am thus not surprised that using less than five samples does not work. </p> <p> Shall we throw an exception if the count is less than 5, or just document it more clearly? </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Eric Niebler</dc:creator> <pubDate>Thu, 15 Nov 2012 23:57:15 GMT</pubDate> <title>owner, status changed https://svn.boost.org/trac10/ticket/6992#comment:3 https://svn.boost.org/trac10/ticket/6992#comment:3 <ul> <li><strong>owner</strong> changed from <span class="trac-author">Matthias Troyer</span> to <span class="trac-author">Eric Niebler</span> </li> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> <p> There are <em>many</em> open accumulator tickets. I only gave you the two that required knowledge of the algorithms that you wrote. :-) Thanks for having a look. </p> <p> If the precondition of the p<sup>2</sup> quantile accumulator is that it must have at least 5 samples, then the right thing to do is assert the condition in the accessor. And document it. I'll add this to my to-do list, unless you beat me to it. </p> <p> Thanks again. </p> Ticket A. Sinan Unur <sinan@…> Wed, 14 Dec 2016 11:44:04 GMT <link>https://svn.boost.org/trac10/ticket/6992#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/6992#comment:4</guid> <description> <p> Leaving aside the fact that the point of the P<sup>2</sup> algorithm is to deal with much larger sample sizes, there is no reason the implementation cannot return the exact median for <em>n</em> &lt;= 5. </p> <p> It is the P<sup>2</sup> algorithm's <em>approximation</em> that kicks in when the 6<sup>th</sup> observation arrives. Your implementation could simply return the exact median until then. </p> </description> <category>Ticket</category> </item> </channel> </rss>