Boost C++ Libraries: Ticket #3853: bzip2_decompressor reads only one block of a pbzip2-created files https://svn.boost.org/trac10/ticket/3853 <p> reading bz2 files created with parallel version "pbzip2" stops exactly after correctly decompressing the first bzip block (900k). this same bz2 file is correctly decompressed by the non-parallel bunzip2 version so, imho the bzip2 filter should do the same or throw an exception if this is some kind of unrecognized format. </p> <p> i've been able to reproduce this bug with boost 1.35.1, 1.36.0, 1.40.0, 1.41.0 so i guess it is present in all versions. the bug can be seen by running the attached programs: </p> <blockquote> <p> c++ spit.cc -o spit c++ test_bzip2_decompressor.cc -o test_bzip2_decompressor ./spit &gt;test.txt bzip2 -c test.txt &gt;test.txt.bz2 pbzip2 -c test.txt &gt;test.txt.parallel.bz2 </p> </blockquote> <p> running </p> <blockquote> <p> ./test_bzip2_decompressor test.txt.bz2 | wc </p> </blockquote> <p> outputs correct 500 lines: </p> <blockquote> <p> 500 250000 995395 </p> </blockquote> <p> while </p> <blockquote> <p> ./test_bzip2_decompressor test.txt.parallel.bz2 | wc </p> </blockquote> <p> finds only 452 lines and only 900k in total: </p> <blockquote> <p> 452 226164 900000 </p> </blockquote> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/3853 Trac 1.4.3 Darko Veberic <darko.veberic@…> Thu, 21 Jan 2010 08:50:51 GMT attachment set https://svn.boost.org/trac10/ticket/3853 https://svn.boost.org/trac10/ticket/3853 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">spit.cc</span> </li> </ul> <p> creates test file </p> Ticket Darko Veberic <darko.veberic@…> Thu, 21 Jan 2010 08:52:22 GMT attachment set https://svn.boost.org/trac10/ticket/3853 https://svn.boost.org/trac10/ticket/3853 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">test_bzip2_decompressor.cc</span> </li> </ul> <p> decompressor using boost iostream filter </p> Ticket Darko Veberic <darko.veberic@…> Thu, 21 Jan 2010 08:56:10 GMT <link>https://svn.boost.org/trac10/ticket/3853#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3853#comment:1</guid> <description> <p> sorry, the line in the original bug description got concatenated. it should read: </p> <blockquote> <p> c++ spit.cc -o spit </p> </blockquote> <blockquote> <p> c++ test_bzip2_decompressor.cc -o test_bzip2_decompressor </p> </blockquote> <blockquote> <p> ./spit &gt;test.txt </p> </blockquote> <blockquote> <p> bzip2 -c test.txt &gt;test.txt.bz2 </p> </blockquote> <blockquote> <p> pbzip2 -c test.txt &gt;test.txt.parallel.bz2 </p> </blockquote> </description> <category>Ticket</category> </item> <item> <author>jeff.gilchrist@…</author> <pubDate>Wed, 27 Jan 2010 17:50:49 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/3853#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3853#comment:2</guid> <description> <p> It seems like boost does not support multiple bz2 streams in a file. Both bzip2 and pbzip2 support reading multiple bz2 streams, so when you see the end of a bz2 sequence, you have check if it is the EOF as well, if not, look for another bz2 sequence and concatenate the decompressed results if one is found. </p> <p> So you can have files like: </p> <p> |bz1| </p> <p> or </p> <p> |bz1|bz2|bz3|bz4| etc... </p> <p> And you need to support both if you want to be compatible with bzip2 and pbzip2 since they both support this and it is a valid format. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Steven Watanabe</dc:creator> <pubDate>Thu, 17 Jun 2010 19:14:17 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/3853#comment:3 https://svn.boost.org/trac10/ticket/3853#comment:3 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> (In <a class="changeset" href="https://svn.boost.org/trac10/changeset/63057" title="Allow bzip2_decompressor to process multiple concatenated streams. ...">[63057]</a>) Allow bzip2_decompressor to process multiple concatenated streams. Fixes <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/3853" title="#3853: Bugs: bzip2_decompressor reads only one block of a pbzip2-created files (closed: fixed)">#3853</a>. </p> Ticket