Boost C++ Libraries: Ticket #1896: boost::iostreams::gzip_decompressor silently ignores multiple members https://svn.boost.org/trac10/ticket/1896 <p> The gzip file format RFC 1952 allows for the concatenation of multiple "members" in one gzip file. I have a provider who, unfortunately, sends me gzipped files which consist of about 8 500M (uncompressed) members. Why they are doing this I don't know, but this seems to be a long standing weirdness with gzip. The gzip command line utility simply decompresses all the members in a single stream and this is the behaviour I would expect from gzip_decompressor. </p> <p> The gzip_decompressor only processes the first member and silently ignores the others. In fact, the implementation of read_footer attempts to slurp the rest of the compressed file into a string which it then discards. </p> <p> In order to read the rest of the members all we have to do is read and process the actual trailer (8 bytes) and then recursively process the rest of the input (perhaps after invoking close() on ourself?). </p> <p> I have attempted to write a fix for this myself but have been defeated by the complexity of the library. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/1896 Trac 1.4.3 Bruce MacDonald <bruce@…> Mon, 05 May 2008 10:07:43 GMT attachment set https://svn.boost.org/trac10/ticket/1896 https://svn.boost.org/trac10/ticket/1896 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">t.gz</span> </li> </ul> <p> Toy example gzip file containing two strings. </p> Ticket Jonathan Turkanis Thu, 29 May 2008 22:44:56 GMT status changed https://svn.boost.org/trac10/ticket/1896#comment:1 https://svn.boost.org/trac10/ticket/1896#comment:1 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> <p> Yes, the implementation is tricky because when the end of a deflated sequence is reached, the symmetric filter will usually have some unconsumed characters in the buffer, which need to be fed back through the decompressor as part of the next member. </p> <p> Now that I've figured out the problem, it shouldn't be too hard to implement. </p> Ticket Jonathan Turkanis Sat, 31 May 2008 22:53:59 GMT status changed; resolution set https://svn.boost.org/trac10/ticket/1896#comment:2 https://svn.boost.org/trac10/ticket/1896#comment:2 <ul> <li><strong>status</strong> <span class="trac-field-old">assigned</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> (In <a class="changeset" href="https://svn.boost.org/trac10/changeset/46001" title="added support for archives with multiple members; added tests for ...">[46001]</a>) added support for archives with multiple members; added tests for metadata and for multiple members (fixes <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/1896" title="#1896: Bugs: boost::iostreams::gzip_decompressor silently ignores multiple members (closed: fixed)">#1896</a>) </p> Ticket come.raczy@… Mon, 12 Oct 2009 12:42:39 GMT status, milestone changed; resolution deleted https://svn.boost.org/trac10/ticket/1896#comment:3 https://svn.boost.org/trac10/ticket/1896#comment:3 <ul> <li><strong>status</strong> <span class="trac-field-old">closed</span> → <span class="trac-field-new">reopened</span> </li> <li><strong>resolution</strong> <span class="trac-field-deleted">fixed</span> </li> <li><strong>milestone</strong> <span class="trac-field-old">Boost 1.36.0</span> → <span class="trac-field-new">Boost 1.41.0</span> </li> </ul> <p> I doesn't look like the change 46001 <a class="ext-link" href="https://svn.boost.org/trac/boost/changeset/46001"><span class="icon">​</span>https://svn.boost.org/trac/boost/changeset/46001</a> made it into boost_1_36_0 or any other release </p> <p> Since the change seems to work, would it be possible to push it into 1.41.0? </p> Ticket Steven Watanabe Thu, 24 Jun 2010 21:32:04 GMT status changed; resolution set https://svn.boost.org/trac10/ticket/1896#comment:4 https://svn.boost.org/trac10/ticket/1896#comment:4 <ul> <li><strong>status</strong> <span class="trac-field-old">reopened</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> iostreams was fully merged to the release branch in <a class="changeset" href="https://svn.boost.org/trac10/changeset/56830" title="iostreams: merge trunk per Jonathan">[56830]</a>. </p> Ticket bruce@… Fri, 25 Jun 2010 11:03:14 GMT <link>https://svn.boost.org/trac10/ticket/1896#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1896#comment:5</guid> <description> <p> Yes, but multiple members don't work in 1.43.0. An exception is thrown during footer processing on the second member as the crc's don't match. </p> <p> I have fixed this locally by adding: </p> <p> crc_ = 0; </p> <p> in zlib_base::reset() on line 145 of libs/iostreams/src/zlib.cpp. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Steven Watanabe</dc:creator> <pubDate>Fri, 25 Jun 2010 14:36:36 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/1896#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/1896#comment:6</guid> <description> <p> I fixed the crc problem a few weeks ago. </p> </description> <category>Ticket</category> </item> </channel> </rss>