Opened 15 years ago

Closed 13 years ago

#1273 closed Bugs (fixed)

CR+LF newlines in position_iterator

Reported by: slehuitouze@… Owned by: Joel de Guzman
Milestone: To Be Determined Component: spirit
Version: Severity: Problem
Keywords: Cc:

Description

On september 13th, I sent a mail on "spirit-general" mailing list to describe a bug I ran into using position_iterator, which is entitled "Various newline styles and position_iterator". I'm not sure it is useful to rewrite everything here, I'll just come to the conclusion : "position_iterator< file_iterator<char> >" has iterator category "random_access_iterator_tag", whereas direct pointer arithmetic is not possible on it (because of the eating of LF when facing CR+LF newline). As a consequence, one may end up with an unitialized character when one tries to copy a range of two position_iterator in a "std::vector<char>". This is demonstrated by the attached C++ source code, whose (part of the) output on my machine is as follows: BEGINNING OF OUTPUT* We have read following characters in a 'vector<char>' container from a file: #0: 65 (A) #1: 66 (B) #2: 13 (\r) #3: 87 (W) #4: 205 (unexpected character) END OF OUTPUT*

You will see while perusing the code that I have provided two versions : one dealing with a file (i.e. type "position_iterator< file_iterator<char> >"), one dealing with a mere character buffer (i.e. type "position_iterator<const char*>"). Both of them cause the bug. I also tried a variant (that can be activated by commenting out line #2) that uses a "std::string" instead of a "std::vector<char>", and which does not exhibit the problem. I have not looked in detail, but it's probably because "std::string" copy is probably implemented by a pre-reservation followed by a loop of "insert" and "push_back", rather than a pre-allocation followed by a loop of assignment and incrementation (as in "std::vector"). This approach (i.e. using a "std::string" rather than a "std::vector") is not a practical workaround for my problem, since the problem is inside spirit itself (more precisely, at lines 246-248 in 1.8.3 file "spirit/tree/common.hpp"), where variable "text" has type "std::vector<char>":

node_val_data(IteratorT const& _first, IteratorT const& _last)

: text(_first, _last), is_root_(false), parser_id_(), value_() {}

As I said in my original mail, rapid solution is to simply change the iterator category of "position_iterator" to "forward_iterator_tag".

But I think a more serious reflexion should also be considered: Is it normal that the stream of char coming out of a "position_iterator< file_iterator<char> >" may be different than the one coming out of a "file_iterator<char>"? I'm not sure of the answer... In the above-mentionned mail, I suggested a correction for method "increment" (that needs an extra member variable "_crJustSeen") that would not change the stream, this might be the base for a new implementation that you could do.

Regards.

--Serge Le Huitouze

Change History (2)

comment:1 by Joel de Guzman, 15 years ago

Serge, can you attach the test file instead? It's garbled.

comment:2 by anonymous, 13 years ago

Resolution: fixed
Status: newclosed

This was fixed in [49232]

Note: See TracTickets for help on using tickets.