Opened 15 years ago
Closed 13 years ago
#1273 closed Bugs (fixed)
CR+LF newlines in position_iterator
Reported by: | Owned by: | Joel de Guzman | |
---|---|---|---|
Milestone: | To Be Determined | Component: | spirit |
Version: | Severity: | Problem | |
Keywords: | Cc: |
Description
On september 13th, I sent a mail on "spirit-general" mailing list to describe a bug I ran into using position_iterator, which is entitled "Various newline styles and position_iterator". I'm not sure it is useful to rewrite everything here, I'll just come to the conclusion : "position_iterator< file_iterator<char> >" has iterator category "random_access_iterator_tag", whereas direct pointer arithmetic is not possible on it (because of the eating of LF when facing CR+LF newline). As a consequence, one may end up with an unitialized character when one tries to copy a range of two position_iterator in a "std::vector<char>". This is demonstrated by the attached C++ source code, whose (part of the) output on my machine is as follows: BEGINNING OF OUTPUT* We have read following characters in a 'vector<char>' container from a file: #0: 65 (A) #1: 66 (B) #2: 13 (\r) #3: 87 (W) #4: 205 (unexpected character) END OF OUTPUT*
You will see while perusing the code that I have provided two versions : one dealing with a file (i.e. type "position_iterator< file_iterator<char> >"), one dealing with a mere character buffer (i.e. type "position_iterator<const char*>"). Both of them cause the bug. I also tried a variant (that can be activated by commenting out line #2) that uses a "std::string" instead of a "std::vector<char>", and which does not exhibit the problem. I have not looked in detail, but it's probably because "std::string" copy is probably implemented by a pre-reservation followed by a loop of "insert" and "push_back", rather than a pre-allocation followed by a loop of assignment and incrementation (as in "std::vector"). This approach (i.e. using a "std::string" rather than a "std::vector") is not a practical workaround for my problem, since the problem is inside spirit itself (more precisely, at lines 246-248 in 1.8.3 file "spirit/tree/common.hpp"), where variable "text" has type "std::vector<char>":
node_val_data(IteratorT const& _first, IteratorT const& _last)
: text(_first, _last), is_root_(false), parser_id_(), value_() {}
As I said in my original mail, rapid solution is to simply change the iterator category of "position_iterator" to "forward_iterator_tag".
But I think a more serious reflexion should also be considered: Is it normal that the stream of char coming out of a "position_iterator< file_iterator<char> >" may be different than the one coming out of a "file_iterator<char>"? I'm not sure of the answer... In the above-mentionned mail, I suggested a correction for method "increment" (that needs an extra member variable "_crJustSeen") that would not change the stream, this might be the base for a new implementation that you could do.
Regards.
--Serge Le Huitouze
Serge, can you attach the test file instead? It's garbled.