Opened 8 years ago

Closed 8 years ago

#10639 closed Bugs (fixed)

lexical_cast<double>(string) wrong in C++11

Reported by: richard.kreckel@… Owned by: Antony Polukhin
Milestone: Boost 1.58.0 Component: lexical_cast
Version: Boost 1.56.0 Severity: Problem
Keywords: Cc: pbristow@…

Description

It's a well-known fact that printing a double precision floating-point number to 17 decimal digits is enough to recover the original double precision number (c.f. Theorem 15 in David Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic").

This useful property has been broken somewhere between Boost 1.40 and Boost 1.55 if the program is compiled with -std=c++11 or -std=c++14 (GCC or CLang compiler switches).

Here is a test program which should print the same number three times:

#include <iostream>
#include <string>
#include <boost/lexical_cast.hpp>
#include <boost/math/special_functions/next.hpp>
#include <boost/format.hpp>

int main()
{
    double x = 1.0316536643319239e-255;
    std::printf("x == %.17g\n", x);

    std::string s = (boost::format("%.17g") % x).str();
    std::printf("s == %s\n", s.c_str());

    double y = boost::lexical_cast<double>(s);  // ERROR
    std::printf("y == %.17g\n", y);

    if (x != y) {
        std::cout << "x and y are " 
                  << boost::math::float_distance(x, y) 
                  << " ULP apart" << std::endl;
        return 1;
    }

    return 0;
}

However, with Boost 1.55 or 1.56 it prints:

x == 1.0316536643319239e-255
s == 1.0316536643319239e-255
y == 1.031653664331924e-255
x and y are 1 ULP apart

I'm testing this on an AMD64 machine running Linux with both G++ 4.9.1 and CLang 3.5.0. In both cases, the compiler switch -std=c++11 or -std=c++14 has to be set in order to trigger the problem.

Change History (5)

comment:1 by John Maddock, 8 years ago

Cc: pbristow@… added

I had a quick look at this, and can confirm the issue - I'm really pretty surprised to see that the code is parsing the number itself rather than relying on std::num_get or whatever. Here's the thing: you absolutely cannot do this correctly (base 10 to 2 conversion) without arbitrary precision arithmetic - indeed in the worst case there is basically no limit to how many digits you may need in order to carry out the conversion correctly (though such cases are extremely rare, as in they will never ever occur in practice!). In fact getting this right is really bloody hard - Boost.Multiprecision has an algorithm under boost/multiprecision/cpp_bin_float/io.hpp based on MPFR's largely brute force approach, but honestly I wouldn't use that either for convert-to-built-in-type. There's more information at http://www.exploringbinary.com/real-c-rounding-is-perfect-gcc-now-converts-correctly/ which shows that even many respected std lib's often don't get this right in certain cases. I'm adding Paul Bristow to the CC list, as I know he has an interest in this.

comment:2 by Antony Polukhin, 8 years ago

Status: newassigned

The problem seems to be much worse than it looks like.

CLANG-3.4 and GCC-4.8.2 produce same results in C++03 and C++11 mode. Moreover, code in LexicalCast that does the conversion must work exactly the same way in C++11 and C++03. It is pretty simple and does not use real number types until the very end:

    const wide_result_t result = std::pow(static_cast<wide_result_t>(10.0), pow_of_10) * mantissa;
    value = static_cast<T>( has_minus ? (boost::math::changesign)(result) : result);

This makes me think that there's probably some precision error in std::pow.

Unfortunately I have no access to GCC-4.9 and Clang-3.5 at this moment, so I can not investigate this issue further. Please, could someone do it?

Thanks for the link! I'll put the test cases from it to the lexical cast auto tests and in case of errors will fallback to something like std::num_get. Maybe even with tests passing fallback to num_get will be done: current algo heavily relies on hardware precision and does not work in some cases (issue #6975).

comment:3 by John Maddock, 8 years ago

Given:

const wide_result_t result = std::pow(static_cast<wide_result_t>(10.0), pow_of_10) * mantissa;

Then you have two floating point operations - which is to say, even if std::pow is accurate to 0.5ulp, and the multiplication likewise, you can still be wrong to 1ulp in the final result. Note that this is true even if long double is wider than double due to the "double rounding" problem: http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/.

You are correct that your code is the same in C++03 and C++11 modes which makes me wonder what's changed - my guess is that because of the issues outlined above, your code will be very susceptible to choice of floating-point registers used, and/or the level of compiler optimization used. Which is to say, the compiler only has to output slightly different code at the machine level, and stuff which worked before - largely by accident - will now break.

Fun isn't it? ;-)

HTH, John.

in reply to:  3 comment:4 by Antony Polukhin, 8 years ago

Replying to johnmaddock:

Given:

const wide_result_t result = std::pow(static_cast<wide_result_t>(10.0), pow_of_10) * mantissa;

Then you have two floating point operations - which is to say, even if std::pow is accurate to 0.5ulp, and the multiplication likewise, you can still be wrong to 1ulp in the final result. Note that this is true even if long double is wider than double due to the "double rounding" problem: http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/.

Seems like a final nail into the coffin of my naive implementation. That's sad, std::num_get and others work slow because of memory allocations or do not respect locale specific separators. I'll force lexical cast to use std::stream based conversions, but this change possibly won't get its way into the 1.57 release (significant change that requires more testing).

You are correct that your code is the same in C++03 and C++11 modes which makes me wonder what's changed - my guess is that because of the issues outlined above, your code will be very susceptible to choice of floating-point registers used, and/or the level of compiler optimization used. Which is to say, the compiler only has to output slightly different code at the machine level, and stuff which worked before - largely by accident - will now break.

There's almost no chance that two compilers maintained by two different teams will change code generation between two minor releases at the same time only for the same specific set of input options.

This looks more like a regression in Standard Library implementation. As I understand in both test cases (Clang and GCC) the same Standard Library was used which makes it the first candidate for inspection.

Fun isn't it? ;-)

It makes me think that libc++ developers never stop laughing because of such fun... :-)

comment:5 by Antony Polukhin, 8 years ago

Milestone: To Be DeterminedBoost 1.58.0
Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.