Opened 17 years ago
Closed 14 years ago
#484 closed Feature Requests (fixed)
Boost.Iostreams and newline translation
Reported by: | jpstanley | Owned by: | Jonathan Turkanis |
---|---|---|---|
Milestone: | Boost 1.36.0 | Component: | iostreams |
Version: | None | Severity: | Optimization |
Keywords: | Cc: |
Description
I like the Boost.Iostreams framework in Boost 1.33. But I haven't found a way to make it do exactly what I want. I maintain software that parses files out of disk images instead of the local file system. I'd like to be able to wrap a C++ istream around my raw binary read/seek APIs to make it easier to parse data from disk images and interface with existing code. Boost.Iostreams makes this job a piece of cake--except that I haven't found a way to make the stream act like an ifstream opened in text mode (I'm on Win32, btw). In other words, I'd like to have CR, LF, and CRLF line endings translated into \n when I read the data. There is a newline_filter class provided which does nearly exactly what I want--but it doesn't support seeking. I still need seekg() and tellg() to work. I tried writing my own filter, but I ran into some problems. It seems the Boost stream buffer class is built on the assumption that this will hold: std::streamsize P0 = str.tellg(); str.read(buf, 100); std::streamsize P1 = str.tellg(); assert(P1 - P0 == 100); But with an ifstream opened in text mode, this assertion can fail (P1 - P0 > 100) if two-character CRLF combinations were translated into a one-character newline in the intervening data. Is it possible to implement an input-seekable filter that will let a Boost.Iostream behave this way?
Change History (5)
comment:2 by , 17 years ago
Logged In: YES user_id=1326483 Thank you for your timely response. The main reason I want seekg() and tellg() to work is so that I can remember where in the file I found a piece of data and come back to it later. For example, suppose I want to build an index of an mbox file (essentially a large text file with one email message following another). When I have the read head positioned at the start of a message, I use tellg() to retrieve a pointer to it. Then I can use seekg() later on when I want to retrieve the message. A byte offset in the unfiltered sequence would be the most natural form of pointer--I could retrieve a single message quickly, without filtering everything in front of it. Of course I couldn't compare pointers to learn the size of a message in filtered characters, but that's not my primary objective. To try and make the behavior I expect a little clearer, here's an example program: // test.txt contains the sequence "Hello\x0D\x0AWorld!" // Compile on MSVC++ with CL /GX newline_test.cpp #include <iostream> #include <fstream> int main(int argc, char **argv) { std::ifstream infile("test.txt"); infile.seekg(5, std::ios::beg); std::cout << infile.tellg() << std::endl; infile.get(); std::cout << infile.tellg() << std::endl; infile.close(); return 0; } The seekg() statement positions the read head at the CR character. The first tellg() returns, unsurprisingly, 5. Then get() extracts both the CR and the LF from the unfiltered sequence and returns '\n'. The second tellg() call returns 7, even though there is only one character in the filtered sequence between the two points.
comment:3 by , 15 years ago
Component: | None → iostreams |
---|---|
Severity: | → Problem |
comment:4 by , 15 years ago
Milestone: | → Boost 1.36.0 |
---|---|
Severity: | Problem → Optimization |
Type: | Support Requests → Feature Requests |
I'm sorry I let this go so long.
I will consider it as a feature request for a form of seekability weaker than that currently supported, in which you can only request that the file pointer be restored to a location that was previously saved. A weak-seekable newline filter would then not have to worry about seeking relative to the current location, but would just have to remember the file offsets of the downstream device at various previousl-queried locations.
It may already be possible to implement this in the currently library, but introducing a new concept might clarify the situation.
I will consider implementing this in 1.36.
comment:5 by , 14 years ago
Resolution: | None → fixed |
---|---|
Status: | assigned → closed |
I've decided the correct way to solve this problem is to write a seekable filter adapter that provides an implementation of seek in terms of user supplied implementations of read and write. I have added this idea to the list of possible new filters and devices in the Iostreams Roadmap.
In case the wiki entry changes, here is the current description: "A seekable_filter_adapter that provides an implementation of seek() when the user has defined read() and write(). seek() would work by checking whether the offset is relative to the beginning of the stream, and if so, whether it corresponds to a previously saved offset. If so, it fetches the stored offset in the unfiltered stream and performs a seek on the downstream device. This would solve the problem raised by #484"