Opened 14 years ago

Closed 12 years ago

#2659 closed Bugs (wontfix)

optional_io insertion operator assumes presence of whitespace on stream

Reported by: Andrew Troschinetz <ast@…> Owned by: Fernando Cacciola
Milestone: Boost 1.38.0 Component: optional
Version: Boost 1.37.0 Severity: Problem
Keywords: Cc:

Description

In optional_io.hpp, the insertion operator is written as:

if ( in.good() )
{
  int d = in.get();
  if ( d == ' ' )
  {
    T x ;
    in >> x;
    v = x ;
  }
  else
    v = optional<T>() ;
}

However the assumption that there must be a space on the stream means something like this fails:

istringstream in ("test");
optional<string> s;
assert (s);

It fails because we eat 'd', throw it away, and then return an empty optional object because we didn't get the space we were expecting.

If the intent was to eat up spare whitespace, I would prepose something like this:

if (in)
{
  T x;
  if (in >> std::ws >> x)
    v = x;
  else
    v = optional<T>();
}
return in;

However doesn't that interfere with the intent of the skipws flag? For example:

istringstream in ("  test1   test2");

optional<string> s;
in >> s;
assert (s);

in.unsetf (ios::skipws);
in >> s;
assert (!s);

This doesn't work with "in >> std::skipws >> x" because the second assertion will fail.

Instead you'd need to use "in >> x", which I believe would is the best solution.

Change History (5)

comment:1 by Andrew Troschinetz <ast@…>, 14 years ago

Sorry, just noticed my slip of the tongue. I meant to say that the problem was with the extraction operator, not the insertion operator. Hopefully that was obvious from the explanation of the problem.

comment:2 by Andrew Troschinetz <ast@…>, 14 years ago

I think I understand why optional_io is coded the way it is. I also think it can be improved upon.

For example:

    istringstream sin (" 1 this 5.5 is a test -3.2 a");
    optional<double> token;
    while (sin >> token) cout << token.get_value_or (numeric_limits<double>::quiet_NaN ()) << endl;

This will print out 1 and then exit the loop, but I think it would be nicer, and less surprising if it printed out:

    1
    nan
    5.5
    nan
    nan
    nan
    -3.2
    nan

This can be accomplished by changing operator>>() to the following:

    if (in)
        {
            std::string token;
            in >> token;

            if (!in)
            {
                v = optional<T> ();
                return in;
            }

            std::istringstream sin (token);
            T x;
            if (sin >> x) v = x;
            else v = optional<T> ();
        }
        else
        {
            v = optional<T> ();
        }
    return in;

We protect the istream from its failbit getting set by first extracting the next token as a string, which is sure to succeed if there is anything at all on the stream.

This way, we don't set the failbit on the istream just because we fail to extract a value for our optional type. I think this fits with the semantics of optional types because it's a valid state for an them to be uninitialized, unlike just about any other type.

This change also nicely fixes the problem I initially reported.

comment:3 by Andrew Troschinetz <ast@…>, 14 years ago

If I'm being a nuisance by continuing to reply to this ticket please let me know. I'm learning a lot about streams from trying to figure out this problem; allow me to refine the description of the problem I'm trying to fix and the code I posted in the previous reply.

I think any given optional type should behave as much like the type it wraps as possible. For example:

// This:
stringstream sin ("1.2.3.4.5");
optional<double> token;
while (sin >> token) cout << token << endl;

// Should have the same output as this:
stringstream sin ("1.2.3.4.5");
double token;
while (sin >> token) cout << token << endl;

Also this should work (as of boost 1.37.0 it does not):

stringstream sin ("test");
optional<string> token;
sin >> token; // fails to read "test", instead we get an empty optional type

However I think that using an optional type should offer a little more than just that (call me crazy). Since it's semantically valid to have an un-initialized optional type, I think failure to extract something off the stream should return an empty optional type without setting the failbit on the stream. So like:

// outputs 1 nan 5.5 nan nan nan -3.2 nan
istringstream sin (" 1 this 5.5 is a test -3.2 a");
optional<double> token;
const double nan = numeric_limits<double>::quiet_NaN ();
while (sin >> token) cout << token.get_value_or (nan) << " ";

// outputs 1
istringstream sin (" 1 this 5.5 is a test -3.2 a");
double token;
while (sin >> token) cout << token << " ";

With that in mind, here's my preposed refinement:

if (in)
{
    T x;
    if (in >> x)
    {
        v = x;
    }
    else
    {
        if (in.fail ())
        {
            in.clear ();
            std::string dump;
            in >> dump;
        }
        v = optional<T> ();
    }
}
else
{
    v = optional<T> ();
}
return in;

comment:4 by Andrew Troschinetz <ast@…>, 14 years ago

And once again, even better:

if (in)
{
    T x;
    if (in >> x)
    {
        v = x;
    }
    else
    {
        if (in.fail () && !in.eof ())
        {
            in.clear ();
            in.get ();
        }
        v = optional<T> ();
    }
}
else
{
    v = optional<T> ();
}
return in;

Instead of reading in a std::string, we should really just skip over the 1 character that gave us trouble and keep trying. Also added a check to make sure we didn't fail b/c we tried to read past eof, because if that's the case we really should fail and not try to recover.

comment:5 by Andrey Semashev, 12 years ago

Resolution: wontfix
Status: newclosed

IMO, if reading a value from stream failed, the stream should be marked with failbit and reading must be stopped. This behavior is not only valid for boost::optional, but also for other types. This, in particular, allows user to detect invalid character sequences and not accidentally read invalid data (which could be a security issue).

Extraction operator has been changed recently ([67183]) to include more robust input checking and not to miss invalid input.

PS: Regarding the initial test case, the input "test" is not valid for boost::optional and thus reading it must fail. The basic rule about IO is that whatever is written with insertion operator may be read with extraction operator. And insertion operator always writes either space or dash as leading character.

Note: See TracTickets for help on using tickets.