Boost C++ Libraries: Ticket #1273: CR+LF newlines in position_iterator https://svn.boost.org/trac10/ticket/1273 <p> On september 13th, I sent a mail on "spirit-general" mailing list to describe a bug I ran into using position_iterator, which is entitled "Various newline styles and position_iterator". I'm not sure it is useful to rewrite everything here, I'll just come to the conclusion : "position_iterator&lt; file_iterator&lt;char&gt; &gt;" has iterator category "random_access_iterator_tag", whereas direct pointer arithmetic is not possible on it (because of the eating of LF when facing CR+LF newline). As a consequence, one may end up with an unitialized character when one tries to copy a range of two position_iterator in a "std::vector&lt;char&gt;". This is demonstrated by the attached C++ source code, whose (part of the) output on my machine is as follows: <strong></strong><strong></strong><strong></strong><strong>BEGINNING OF OUTPUT</strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong>* We have read following characters in a 'vector&lt;char&gt;' container from a file: <a class="missing ticket">#0</a>: 65 (A) <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/1" title="#1: Bugs: boost.build causes ftjam to segfault (closed: Wont Fix)">#1</a>: 66 (B) <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/2" title="#2: Bugs: list::size should be const (closed: fixed)">#2</a>: 13 (\r) <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/3" title="#3: Bugs: automatic conversion and overload proble (closed: fixed)">#3</a>: 87 (W) <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/4" title="#4: Bugs: any_ptr in any library documentation? (closed: Fixed)">#4</a>: 205 (unexpected character) <strong></strong><strong></strong><strong></strong><strong>END OF OUTPUT</strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong>* </p> <p> You will see while perusing the code that I have provided two versions : one dealing with a file (i.e. type "position_iterator&lt; file_iterator&lt;char&gt; &gt;"), one dealing with a mere character buffer (i.e. type "position_iterator&lt;const char*&gt;"). Both of them cause the bug. I also tried a variant (that can be activated by commenting out line <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/2" title="#2: Bugs: list::size should be const (closed: fixed)">#2</a>) that uses a "std::string" instead of a "std::vector&lt;char&gt;", and which does not exhibit the problem. I have not looked in detail, but it's probably because "std::string" copy is probably implemented by a pre-reservation followed by a loop of "insert" and "push_back", rather than a pre-allocation followed by a loop of assignment and incrementation (as in "std::vector"). This approach (i.e. using a "std::string" rather than a "std::vector") is not a practical workaround for my problem, since the problem is inside spirit itself (more precisely, at lines 246-248 in 1.8.3 file "spirit/tree/common.hpp"), where variable "text" has type "std::vector&lt;char&gt;": <strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong> </p> <blockquote> <p> node_val_data(IteratorT const&amp; _first, IteratorT const&amp; _last) </p> <blockquote> <p> : text(_first, _last), is_root_(false), parser_id_(), value_() {} </p> </blockquote> </blockquote> <p> <strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong><strong></strong> </p> <p> As I said in my original mail, rapid solution is to simply change the iterator category of "position_iterator" to "forward_iterator_tag". </p> <p> But I think a more serious reflexion should also be considered: Is it normal that the stream of char coming out of a "position_iterator&lt; file_iterator&lt;char&gt; &gt;" may be different than the one coming out of a "file_iterator&lt;char&gt;"? I'm not sure of the answer... In the above-mentionned mail, I suggested a correction for method "increment" (that needs an extra member variable "_crJustSeen") that would not change the stream, this might be the base for a new implementation that you could do. </p> <p> Regards. </p> <p> --Serge Le Huitouze </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/1273 Trac 1.4.3 Joel de Guzman Wed, 28 Nov 2007 17:16:59 GMT <link>https://svn.boost.org/trac10/ticket/1273#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1273#comment:1</guid> <description> <p> Serge, can you attach the test file instead? It's garbled. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Thu, 25 Mar 2010 23:27:32 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/1273#comment:2 https://svn.boost.org/trac10/ticket/1273#comment:2 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> This was fixed in <a class="changeset" href="https://svn.boost.org/trac10/changeset/49232" title="position_iterator is a forward iterator, so tag it appropriately. ...">[49232]</a> </p> Ticket