Boost C++ Libraries: Ticket #484: Boost.Iostreams and newline translation https://svn.boost.org/trac10/ticket/484 <pre class="wiki">I like the Boost.Iostreams framework in Boost 1.33. But I haven't found a way to make it do exactly what I want. I maintain software that parses files out of disk images instead of the local file system. I'd like to be able to wrap a C++ istream around my raw binary read/seek APIs to make it easier to parse data from disk images and interface with existing code. Boost.Iostreams makes this job a piece of cake--except that I haven't found a way to make the stream act like an ifstream opened in text mode (I'm on Win32, btw). In other words, I'd like to have CR, LF, and CRLF line endings translated into \n when I read the data. There is a newline_filter class provided which does nearly exactly what I want--but it doesn't support seeking. I still need seekg() and tellg() to work. I tried writing my own filter, but I ran into some problems. It seems the Boost stream buffer class is built on the assumption that this will hold: std::streamsize P0 = str.tellg(); str.read(buf, 100); std::streamsize P1 = str.tellg(); assert(P1 - P0 == 100); But with an ifstream opened in text mode, this assertion can fail (P1 - P0 &gt; 100) if two-character CRLF combinations were translated into a one-character newline in the intervening data. Is it possible to implement an input-seekable filter that will let a Boost.Iostream behave this way? </pre> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/484 Trac 1.4.3 Jonathan Turkanis Fri, 23 Sep 2005 01:02:20 GMT <link>https://svn.boost.org/trac10/ticket/484#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/484#comment:1</guid> <description> <pre class="wiki">Logged In: YES user_id=811799 Thanks for your interest in Boost.Iostreams. SourceForge.net wrote: &gt; Submitted By: jpstanley (jpstanley) &gt; I maintain software that parses files out of disk &gt; images instead of the local file system. I'd like to &gt; be able to wrap a C++ istream around my raw binary &gt; read/seek APIs to make it easier to parse data from &gt; disk images and interface with existing code. &gt; Boost.Iostreams makes this job a piece of cake--except &gt; that I haven't found a way to make the stream act like &gt; an ifstream opened in text mode (I'm on Win32, btw). &gt; In other words, I'd like to have CR, LF, and CRLF line &gt; endings translated into \n when I read the data. There &gt; is a newline_filter class provided which does nearly &gt; exactly what I want--but it doesn't support seeking. I &gt; still need seekg() and tellg() to work. &gt; &gt; I tried writing my own filter, but I ran into some &gt; problems. It seems the Boost stream buffer class is &gt; built on the assumption that this will hold: &gt; &gt; std::streamsize P0 = str.tellg(); &gt; str.read(buf, 100); &gt; std::streamsize P1 = str.tellg(); &gt; assert(P1 - P0 == 100); Note quite: you could reach EOF before reading 100 characters ;-) But I know what you mean. When you are reading a filtered sequence, there are two file pointers two worry about: there's the current position in the filtered sequence, and the current position in the unfiltered sequence. When you successfully read 100 characters from the filtered sequence, as here: str.read(buf, 100); the current position in the *filtered* sequence is advanced by exactly 100 characters. This may correspond to more than 100 or fewer than 100 characters in the unfiltered sequence. &gt; But with an ifstream opened in text mode, this &gt; assertion can fail (P1 - P0 &gt; 100) if two-character &gt; CRLF combinations were translated into a one-character &gt; newline in the intervening data. Here you're talking about the current position in the unfiltered sequence. To query this value, you can't use str.tellg(); you have to call seek() on the underlying filter. &gt; Is it possible to implement an input-seekable filter &gt; that will let a Boost.Iostream behave this way? If you want to be able to seek within the filtered sequence, it's possible, but it will be inefficient. If you want to be able to seek with offsets interpretted relative to the unfiltered sequence, I'm not sure it can be done. I'd like you to describe more exactly what you want to do. -- Jonathan Turkanis www.kangaroologic.com </pre> </description> <category>Ticket</category> </item> <item> <dc:creator>jpstanley</dc:creator> <pubDate>Fri, 23 Sep 2005 04:48:07 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/484#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/484#comment:2</guid> <description> <pre class="wiki">Logged In: YES user_id=1326483 Thank you for your timely response. The main reason I want seekg() and tellg() to work is so that I can remember where in the file I found a piece of data and come back to it later. For example, suppose I want to build an index of an mbox file (essentially a large text file with one email message following another). When I have the read head positioned at the start of a message, I use tellg() to retrieve a pointer to it. Then I can use seekg() later on when I want to retrieve the message. A byte offset in the unfiltered sequence would be the most natural form of pointer--I could retrieve a single message quickly, without filtering everything in front of it. Of course I couldn't compare pointers to learn the size of a message in filtered characters, but that's not my primary objective. To try and make the behavior I expect a little clearer, here's an example program: // test.txt contains the sequence "Hello\x0D\x0AWorld!" // Compile on MSVC++ with CL /GX newline_test.cpp #include &lt;iostream&gt; #include &lt;fstream&gt; int main(int argc, char **argv) { std::ifstream infile("test.txt"); infile.seekg(5, std::ios::beg); std::cout &lt;&lt; infile.tellg() &lt;&lt; std::endl; infile.get(); std::cout &lt;&lt; infile.tellg() &lt;&lt; std::endl; infile.close(); return 0; } The seekg() statement positions the read head at the CR character. The first tellg() returns, unsurprisingly, 5. Then get() extracts both the CR and the LF from the unfiltered sequence and returns '\n'. The second tellg() call returns 7, even though there is only one character in the filtered sequence between the two points. </pre> </description> <category>Ticket</category> </item> <item> <dc:creator>Daryle Walker</dc:creator> <pubDate>Fri, 03 Aug 2007 11:57:58 GMT</pubDate> <title>component changed; severity set https://svn.boost.org/trac10/ticket/484#comment:3 https://svn.boost.org/trac10/ticket/484#comment:3 <ul> <li><strong>component</strong> <span class="trac-field-old">None</span> → <span class="trac-field-new">iostreams</span> </li> <li><strong>severity</strong> → <span class="trac-field-new">Problem</span> </li> </ul> Ticket Jonathan Turkanis Wed, 26 Dec 2007 01:28:10 GMT type, severity changed; milestone set https://svn.boost.org/trac10/ticket/484#comment:4 https://svn.boost.org/trac10/ticket/484#comment:4 <ul> <li><strong>type</strong> <span class="trac-field-old">Support Requests</span> → <span class="trac-field-new">Feature Requests</span> </li> <li><strong>severity</strong> <span class="trac-field-old">Problem</span> → <span class="trac-field-new">Optimization</span> </li> <li><strong>milestone</strong> → <span class="trac-field-new">Boost 1.36.0</span> </li> </ul> <p> I'm sorry I let this go so long. </p> <p> I will consider it as a feature request for a form of seekability weaker than that currently supported, in which you can only request that the file pointer be restored to a location that was previously saved. A weak-seekable newline filter would then not have to worry about seeking relative to the current location, but would just have to remember the file offsets of the downstream device at various previousl-queried locations. </p> <p> It may already be possible to implement this in the currently library, but introducing a new concept might clarify the situation. </p> <p> I will consider implementing this in 1.36. </p> Ticket Jonathan Turkanis Sun, 25 May 2008 22:36:21 GMT status, resolution changed https://svn.boost.org/trac10/ticket/484#comment:5 https://svn.boost.org/trac10/ticket/484#comment:5 <ul> <li><strong>status</strong> <span class="trac-field-old">assigned</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> <span class="trac-field-old">None</span> → <span class="trac-field-new">fixed</span> </li> </ul> <p> I've decided the correct way to solve this problem is to write a seekable filter adapter that provides an implementation of seek in terms of user supplied implementations of read and write. I have added this idea to the list of possible new filters and devices in the <a class="wiki" href="https://svn.boost.org/trac10/wiki/IostreamsFiltersAndDevices">Iostreams Roadmap</a>. </p> <p> In case the wiki entry changes, here is the current description: "A seekable_filter_adapter that provides an implementation of seek() when the user has defined read() and write(). seek() would work by checking whether the offset is relative to the beginning of the stream, and if so, whether it corresponds to a previously saved offset. If so, it fetches the stored offset in the unfiltered stream and performs a seek on the downstream device. This would solve the problem raised by <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/484" title="#484: Feature Requests: Boost.Iostreams and newline translation (closed: fixed)">#484</a>" </p> Ticket