Opened 10 years ago

Closed 18 months ago

#8261 closed Bugs (duplicate)

lexical_cast<unsigned> returns unexpected result when using with split_iterator<std::wstring::iterator>

Reported by: s.a.moreno.a.s@… Owned by: Antony Polukhin
Milestone: To Be Determined Component: lexical_cast
Version: Boost 1.52.0 Severity: Problem
Keywords: Cc:

Description

The following code will print "4948.4948" instead of expected "10.10" (This are ascii codes of the symbols instead of values):

std::wstring wstr(L"10.10"); typedef boost::split_iterator<std::wstring::iterator> wsplit_iter_t; wsplit_iter_t wdot_iter = boost::make_split_iterator( wstr, boost::first_finder(L ".")); std::cout<<boost::lexical_cast<unsigned>(*wdot_iter++)<<'.'<<boost::lexical_cast<unsigned>(*wdot_iter++);

Attachments (2)

boost_1_55_0_lexical_cast.patch (451 bytes ) - added by Troy Korjuslommi <troykor@…> 9 years ago.
Patch is against boost_1_55_0 lexical_cast.hpp.
boost_1_55_0_lexical_cast.2.patch (724 bytes ) - added by Troy Korjuslommi <troykor@…> 9 years ago.
v2 of patch to handle iterator_range types.

Download all attachments as: .zip

Change History (10)

comment:1 by Antony Polukhin, 10 years ago

This is a known problem, which is very hard to solve in s good, portable and generic way.

As a temporary workaround you may use the following solution:

#include <boost/algorithm/string.hpp>
#include <boost/lexical_cast.hpp>

int main() {
    const wchar_t wstr[] = L"20.20"; 
    typedef boost::split_iterator<const wchar_t*> wsplit_iter_t; 
    wsplit_iter_t wdot_iter = boost::make_split_iterator( wstr, boost::first_finder(L"."));    
    std::cout << boost::lexical_cast<unsigned>(*wdot_iter++) 
              << '.' 
              << boost::lexical_cast<unsigned>(*wdot_iter++); 
}

comment:2 by Troy Korjuslommi <troykor@…>, 9 years ago

The attached patch is against boost 1.55.0's lexical_cast.hpp.

This patch catches iterator_range objects and calls a version of lexical_cast specialized on the correct internal char type.

The root cause is that the templates in the default lexical_cast function do not consider iterator_range objects. It is more efficient to catch all iterator_range objects in a specialized function.

Another issue which might be of concern is that the member variables start and finish (start_ and finish_ would be a nice change) are pointers to char types. Using an iterator abstraction would be more generic. It looks like a big change at first peak, and its merits would have to evaluated too.

by Troy Korjuslommi <troykor@…>, 9 years ago

Patch is against boost_1_55_0 lexical_cast.hpp.

in reply to:  2 comment:3 by Antony Polukhin, 9 years ago

Replying to Troy Korjuslommi <troykor@…>:

The attached patch is against boost 1.55.0's lexical_cast.hpp.

This patch catches iterator_range objects and calls a version of lexical_cast specialized on the correct internal char type.

The root cause is that the templates in the default lexical_cast function do not consider iterator_range objects. It is more efficient to catch all iterator_range objects in a specialized function.

This is a good attempt, however it will break one of the usecaes: &lexical_cast<int, char*>. This situation is covered by the regression test, which could be run by ./b2 in boost/libs/conversion/tests/ folder. Breaking existing use cases is unfordable.

Another issue which might be of concern is that the member variables start and finish (start_ and finish_ would be a nice change) are pointers to char types. Using an iterator abstraction would be more generic. It looks like a big change at first peak, and its merits would have to evaluated too.

This will increase the size of a resulting binary without actually adding new functionality.

The issue in this ticket comes from the ability of iterator_range<std::wstring::iterator> to stream using std::stream and std::wstream. lexical_cast chooses the std::stream version, according to the assumption that non wide character streaming is more optimized and used more often by users, leading to smaller binary size.

Same issue occurs with any wide&non-wide streamable classes.

A simple solution would be to specialize detail::stream_char_common (can be found in boost/lexical_cast.hpp) for std::wstring::iterator/std::string::iterator/std::6string::const_iterator/std::wstring::const_iterator and ensure that lexical_cast works well with std::wstring::iterator and iterator_range<std::wstring::const_iterator>.

A super cool solution would be to additionally resolve some of the cases with wide&non-wide streamable classes by introducing a boost::has_right_shift<std::basic_istream<char>, T > type trait that does not allow implicit conversions and type promotions for T. Not know how to do it thou.

comment:4 by Troy Korjuslommi <troykor@…>, 9 years ago

I didn't get a failure on any of the tests on 1_55_0. Must a dev branch new test or something.

My understanding of the problem is as follows. The eventual problem is that the code interprets wchar_t* as char*. The reason this happens is that the traits template says the char type for CharT is char, not wchar_t. The reason it gets it wrong is that it doesn't have any checks for iterator_range types, so it just fumbles. It catches those iterator_range types which have a parameter char or wchar_t, but iterator_range<iterator/const_iterator> types have one more level of embedded types, so it misses those.

One could add those checks, but that would just add complexity without any real benefit. A specialization for iterator_range would just get added somewhere else anyway. It is much simpler and safer to catch all iterator_range types in their own lexical_cast function. If something breaks, then write more special cases. I am attaching another patch which adds another level of indirection, so specializations can be written as needed.

I know this is slightly different take on what you mentioned. I do see your point too with the stream types. I ended up taking a different route. Please give it some thought.

by Troy Korjuslommi <troykor@…>, 9 years ago

v2 of patch to handle iterator_range types.

comment:5 by Troy Korjuslommi <troykor@…>, 9 years ago

I tried your stream_char_common specialization solution. Didn't work. Probably some typo or other brain fart type thing. The template should define wchar_t as the type, but it somehow doesn't. requires_stringbuf gets set to false in lexical_cast_stream_traits, which it shouldn't if the type is char or wchar_t.

I have to look at it again later. Maybe it'll come to me after a break. Anyway, if you feel like having a go at it, here's the one that fails (but matches).

template < class Traits, class Alloc, template<class,class> class Iter > struct stream_char_common< boost::iterator_range< Iter <wchar_t*, std::basic_string<wchar_t,Traits,Alloc> > > > {

typedef wchar_t type;

};

in reply to:  4 comment:6 by Antony Polukhin, 9 years ago

Replying to Troy Korjuslommi <troykor@…>:

I didn't get a failure on any of the tests on 1_55_0. Must a dev branch new test or something.

Just out of curiosity downloaded the 1.55. Test is in there: boost_1_55_0/libs/conversion/lexical_cast_test.cpp. With your patches test void test_getting_pointer_to_function() fail to compile. Breaking existing use cases is unfordable.

My understanding of the problem is as follows. The eventual problem is that the code interprets wchar_t* as char*. The reason this happens is that the traits template says the char type for CharT is char, not wchar_t. The reason it gets it wrong is that it doesn't have any checks for iterator_range types, so it just fumbles. It catches those iterator_range types which have a parameter char or wchar_t, but iterator_range<iterator/const_iterator> types have one more level of embedded types, so it misses those.

One could add those checks, but that would just add complexity without any real benefit. A specialization for iterator_range would just get added somewhere else anyway. It is much simpler and safer to catch all iterator_range types in their own lexical_cast function. If something breaks, then write more special cases. I am attaching another patch which adds another level of indirection, so specializations can be written as needed.

I know this is slightly different take on what you mentioned. I do see your point too with the stream types. I ended up taking a different route. Please give it some thought.

Your approach can be used with developer branch. try_lexical_convert function does not guarantee unambiguity while getting pointer to function, so it can be specialized in your way. Fork the conversion repo, apply your patch, add tests to ensure that new feature work and make sure that all tests pass.

Iter <wchar_t*, std::basic_string<wchar_t,Traits,Alloc> >

This is not a portable solution (it won't work with SCARY iterators).

comment:7 by Troy Korjuslommi <troykor@…>, 9 years ago

The stream_char_common template function I posted was meant just for testing this special case. I made it as specific as possible, so I could test it easily.

I didn't get the errors with either patch, so this must be an environment issue. My test machine is g++-4.4 on x86_64 (Debian).

I will test the fork.

comment:8 by Antony Polukhin, 18 months ago

Resolution: duplicate
Status: newclosed
Note: See TracTickets for help on using tickets.