Boost C++ Libraries: Ticket #9749: bzip2_decompressor input filter gives data_error_magic https://svn.boost.org/trac10/ticket/9749 <p> The bzip2_decompressor input filter is not able to read some bz2 files properly. It can read only the first few lines of the file then terminates with data_error_magic (BZ_DATA_ERROR_MAGIC in bzlib.h). </p> <p> After examining the problematic bz2 files I noticed that the first compressed stream is much shorter (536 bytes long) than the following streams. Debugging lead me to the conclusion that the simmetric_filter class's read method fills the buffer with new data even when the buffer hasn't been consumed completely. With the proposed solution below the files can be read to the end without further problems. </p> <p> The original code segment: </p> <pre class="wiki">if (status == f_good) status = fill(src); </pre><p> The proposed modification: </p> <pre class="wiki">if (status == f_good &amp;&amp; buf.ptr() == buf.eptr()) status = fill(src); </pre><p> Could someone please make this modification? Regards, </p> <blockquote> <p> Balazs </p> </blockquote> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/9749 Trac 1.4.3 Balazs Andor Zalanyi <zalanyi.balazs@…> Thu, 06 Mar 2014 20:26:17 GMT <link>https://svn.boost.org/trac10/ticket/9749#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9749#comment:1</guid> <description> <p> To reproduce the issue: download the wikidata dump - for example: <a class="ext-link" href="http://dumps.wikimedia.org/wikidatawiki/20140123/wikidatawiki-20140123-pages-articles-multistream.xml.bz2"><span class="icon">​</span>http://dumps.wikimedia.org/wikidatawiki/20140123/wikidatawiki-20140123-pages-articles-multistream.xml.bz2</a> and try to print it line by line. </p> <pre class="wiki">std::ifstream df("wikidatawiki-20140123-pages-articles-multistream.xml.bz2", std::ios::in | std::ios::binary); boost::iostreams::filtering_stream&lt;boost::iostreams::input&gt; in; in.push(boost::iostreams::bzip2_decompressor()); in.push(df); std::string s; while (std::getline(in, s)) { std::cout &lt;&lt; s &lt;&lt; std::endl; } </pre> </description> <category>Ticket</category> </item> <item> <author>jtwang@…</author> <pubDate>Wed, 20 May 2015 03:22:27 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/9749 https://svn.boost.org/trac10/ticket/9749 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">bzip2.patch</span> </li> </ul> Ticket jtwang@… Wed, 20 May 2015 03:33:11 GMT <link>https://svn.boost.org/trac10/ticket/9749#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9749#comment:2</guid> <description> <p> This also affects me, attached patch provides a workaround within bzip2_decompressor rather than changing symmetric_filter. Please forgive the bad indentation, I couldn't be bothered to change my editor. </p> </description> <category>Ticket</category> </item> <item> <author>disp.reg.misc.boost@…</author> <pubDate>Fri, 06 Nov 2015 02:21:00 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/9749#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/9749#comment:3</guid> <description> <p> The patch in comment 2 works, any chance this gets integrated into boost? </p> </description> <category>Ticket</category> </item> </channel> </rss>