Boost C++ Libraries: Ticket #12039: cpp_bin_float convert_to<double>() rounding mistake https://svn.boost.org/trac10/ticket/12039 <p> There are cases where cpp_bin_float convert_to&lt;double&gt;() does not produce the nearest 'double' result. It happens when the source argument differs form the middle point between two double-precision values in 65th bit or further. I didn't look at boost source code, but pretty sure that the mistakes occurs due to double rounding. I.e. instead of direct rounding to 53-bit precision of 'double' the number is initially converted to 64-bit 'long double' and then converted from 'long double' to 'double'. </p> <p> In particular, on Microsoft compilers 'long double' and 'double' refer to the same type and the mistake does not happen. But on the same machine/OS with gcc compiler the mistake does happen. </p> <p> Below attached a simple test case that prints 'good' on platforms without bug and prints bad on platforms with bug. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/12039 Trac 1.4.3 Michael Shatz <shatz@…> Wed, 02 Mar 2016 15:33:58 GMT attachment set https://svn.boost.org/trac10/ticket/12039 https://svn.boost.org/trac10/ticket/12039 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">convert_test.cpp</span> </li> </ul> <p> convert_to&lt;double&gt;() bug reproducer </p> Ticket John Maddock Sat, 12 Mar 2016 16:52:59 GMT component changed; owner set https://svn.boost.org/trac10/ticket/12039#comment:1 https://svn.boost.org/trac10/ticket/12039#comment:1 <ul> <li><strong>owner</strong> set to <span class="trac-author">John Maddock</span> </li> <li><strong>component</strong> <span class="trac-field-old">None</span> → <span class="trac-field-new">multiprecision</span> </li> </ul> Ticket John Maddock Tue, 15 Mar 2016 08:16:47 GMT status changed; resolution set https://svn.boost.org/trac10/ticket/12039#comment:2 https://svn.boost.org/trac10/ticket/12039#comment:2 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> Fixed in <a class="ext-link" href="https://github.com/boostorg/multiprecision/commit/8a8b2211d4348477dead2b4f4a21a23ccbd84a48"><span class="icon">​</span>https://github.com/boostorg/multiprecision/commit/8a8b2211d4348477dead2b4f4a21a23ccbd84a48</a> </p> <p> Thanks for the report! </p> Ticket Michael Shatz <shatz@…> Tue, 15 Mar 2016 11:45:29 GMT attachment set https://svn.boost.org/trac10/ticket/12039 https://svn.boost.org/trac10/ticket/12039 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">convert32_test.cpp</span> </li> </ul> Ticket Michael Shatz <shatz@…> Tue, 15 Mar 2016 11:47:31 GMT <link>https://svn.boost.org/trac10/ticket/12039#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12039#comment:3</guid> <description> <p> Hi John </p> <p> There exists another more obscure but very similar problem: double rounding in conversion to 'float' due to initial conversion to 'double'. Practically, due to huge difference in precision between 'double' and 'float', this problem is far less severe than the previous one. It is very unlikely that it negatively affect real-world applications. Still, when you know where to look, it can be easily demonstrated. </p> <p> May be, when you are at it, it makes sense to fix this minor problem as well? </p> <p> A test case (convert32_test.cpp) attached above. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>John Maddock</dc:creator> <pubDate>Tue, 15 Mar 2016 18:25:51 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12039#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12039#comment:4</guid> <description> <p> That case was already fixed by the patch above - however there was a different one involving rounding of ties - test case and patch for that in: <a class="ext-link" href="https://github.com/boostorg/multiprecision/commit/a96bea66e191ba827626a75c9721f3018c7fb1f3"><span class="icon">​</span>https://github.com/boostorg/multiprecision/commit/a96bea66e191ba827626a75c9721f3018c7fb1f3</a> </p> </description> <category>Ticket</category> </item> <item> <author>shatz@…</author> <pubDate>Sat, 19 Mar 2016 22:02:26 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/12039 https://svn.boost.org/trac10/ticket/12039 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">convert_to_double_core.cpp</span> </li> </ul> Ticket shatz@… Sat, 19 Mar 2016 22:05:30 GMT <link>https://svn.boost.org/trac10/ticket/12039#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12039#comment:5</guid> <description> <p> May be, I don't know how fetch your patch (my github skills are below basic), but it seems to me that instead of fixing bad case you had broken good case. Since my stupid test only compares two results of conversion to each other, it is happy. But it shouldn't. </p> <p> I can't say that I fully understood your cpp_bin_float format (in particular, I can't figure out the business with guards) but assuming that I didn't misunderstood too badly, I recommend the attached core routine for conversion to double. In this routine rounding/ties handling is done by compiler/hardware rather than by us. Sometimes it does a better job. As additional advantage, it's likely several times (or many times for wide numbers) faster than your variant. Of course, it only works when arg.backend().bits().limbs() has a type 'uint64_t*' or its equivalent. I didn't figure out if it's a case on all supported platforms or only on mine (x64). But even if it's the later, it still makes sense to specialize, because I think it's safe to assume that x64 platform is by far the most important for your customers. </p> <p> Another thing that I didn't figure out is what happens when # of binary digits is not an integer multiple of 64. But I would believe, that you will have no trouble to handle this case as well. At worst it will take a simple mask applied to the first (i.e. least significant) word. </p> <p> Best regards, Michael </p> </description> <category>Ticket</category> </item> <item> <author>shatz@…</author> <pubDate>Mon, 21 Mar 2016 23:24:44 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12039#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12039#comment:6</guid> <description> <p> Hi, </p> <p> My previous comment was wrong. Because of my unfamiliarity with github, I was testing the version after the first patch, not after the second patch. The second version looks o.k. Actually, it is pretty close to my proposed variant above. A bit slower than mine, but a difference in speed is not much above noise level. </p> <p> Good job, John. Thank you. </p> <p> Regards, Michael </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Tue, 22 Mar 2016 08:22:40 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12039#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12039#comment:7</guid> <description> <p> OK good, I just pushed a few additional commits, adds an exhaustive convert-to-float test program (takes about half a day to run, so not part of the regular tests). This uncovered a fencepost error in the subtraction code, plus a double-rounding error in mpfr_float_backend (both also fixed). </p> <p> The main advantage of the new code is that it uses cpp_bin_float's existing rounding code to ensure a correct result, and is completely agnostic as to the widest integer size, or the size of the float being converted to. In trivial cases it basically degenerates to the same principle as yours - the temporary cpp_bin_float becomes a trivial wrapper around a single unsigned integer, and the bit-extraction loop executes just the once. </p> </description> <category>Ticket</category> </item> </channel> </rss>