Opened 14 years ago

Closed 12 years ago

Last modified 12 years ago

#1896 closed Bugs (fixed)

boost::iostreams::gzip_decompressor silently ignores multiple members

Reported by: Bruce MacDonald <bruce@…> Owned by: Jonathan Turkanis
Milestone: Boost 1.41.0 Component: iostreams
Version: Boost 1.35.0 Severity: Problem
Keywords: iostreams gzip gzip_decompressor Cc:

Description

The gzip file format RFC 1952 allows for the concatenation of multiple "members" in one gzip file. I have a provider who, unfortunately, sends me gzipped files which consist of about 8 500M (uncompressed) members. Why they are doing this I don't know, but this seems to be a long standing weirdness with gzip. The gzip command line utility simply decompresses all the members in a single stream and this is the behaviour I would expect from gzip_decompressor.

The gzip_decompressor only processes the first member and silently ignores the others. In fact, the implementation of read_footer attempts to slurp the rest of the compressed file into a string which it then discards.

In order to read the rest of the members all we have to do is read and process the actual trailer (8 bytes) and then recursively process the rest of the input (perhaps after invoking close() on ourself?).

I have attempted to write a fix for this myself but have been defeated by the complexity of the library.

Attachments (1)

t.gz (111 bytes ) - added by Bruce MacDonald <bruce@…> 14 years ago.
Toy example gzip file containing two strings.

Download all attachments as: .zip

Change History (7)

by Bruce MacDonald <bruce@…>, 14 years ago

Attachment: t.gz added

Toy example gzip file containing two strings.

comment:1 by Jonathan Turkanis, 14 years ago

Status: newassigned

Yes, the implementation is tricky because when the end of a deflated sequence is reached, the symmetric filter will usually have some unconsumed characters in the buffer, which need to be fed back through the decompressor as part of the next member.

Now that I've figured out the problem, it shouldn't be too hard to implement.

comment:2 by Jonathan Turkanis, 14 years ago

Resolution: fixed
Status: assignedclosed

(In [46001]) added support for archives with multiple members; added tests for metadata and for multiple members (fixes #1896)

comment:3 by come.raczy@…, 13 years ago

Milestone: Boost 1.36.0Boost 1.41.0
Resolution: fixed
Status: closedreopened

I doesn't look like the change 46001 https://svn.boost.org/trac/boost/changeset/46001 made it into boost_1_36_0 or any other release

Since the change seems to work, would it be possible to push it into 1.41.0?

comment:4 by Steven Watanabe, 12 years ago

Resolution: fixed
Status: reopenedclosed

iostreams was fully merged to the release branch in [56830].

comment:5 by bruce@…, 12 years ago

Yes, but multiple members don't work in 1.43.0. An exception is thrown during footer processing on the second member as the crc's don't match.

I have fixed this locally by adding:

crc_ = 0;

in zlib_base::reset() on line 145 of libs/iostreams/src/zlib.cpp.

comment:6 by Steven Watanabe, 12 years ago

I fixed the crc problem a few weeks ago.

Note: See TracTickets for help on using tickets.