Opened 9 years ago

Last modified 7 years ago

#9749 new Bugs

bzip2_decompressor input filter gives data_error_magic

Reported by: Balazs Andor Zalanyi <zalanyi.balazs@…> Owned by: Jonathan Turkanis
Milestone: To Be Determined Component: iostreams
Version: Boost 1.55.0 Severity: Problem
Keywords: bzip2_decompressor Cc:

Description

The bzip2_decompressor input filter is not able to read some bz2 files properly. It can read only the first few lines of the file then terminates with data_error_magic (BZ_DATA_ERROR_MAGIC in bzlib.h).

After examining the problematic bz2 files I noticed that the first compressed stream is much shorter (536 bytes long) than the following streams. Debugging lead me to the conclusion that the simmetric_filter class's read method fills the buffer with new data even when the buffer hasn't been consumed completely. With the proposed solution below the files can be read to the end without further problems.

The original code segment:

if (status == f_good)
  status = fill(src);

The proposed modification:

if (status == f_good && buf.ptr() == buf.eptr())
  status = fill(src);

Could someone please make this modification? Regards,

Balazs

Attachments (1)

bzip2.patch (894 bytes ) - added by jtwang@… 7 years ago.

Download all attachments as: .zip

Change History (4)

comment:1 by Balazs Andor Zalanyi <zalanyi.balazs@…>, 9 years ago

To reproduce the issue: download the wikidata dump - for example: http://dumps.wikimedia.org/wikidatawiki/20140123/wikidatawiki-20140123-pages-articles-multistream.xml.bz2 and try to print it line by line.

std::ifstream df("wikidatawiki-20140123-pages-articles-multistream.xml.bz2", std::ios::in | std::ios::binary);
boost::iostreams::filtering_stream<boost::iostreams::input> in;
in.push(boost::iostreams::bzip2_decompressor());
in.push(df);
std::string s;
while (std::getline(in, s)) {
  std::cout << s << std::endl;
}

by jtwang@…, 7 years ago

Attachment: bzip2.patch added

comment:2 by jtwang@…, 7 years ago

This also affects me, attached patch provides a workaround within bzip2_decompressor rather than changing symmetric_filter. Please forgive the bad indentation, I couldn't be bothered to change my editor.

comment:3 by disp.reg.misc.boost@…, 7 years ago

The patch in comment 2 works, any chance this gets integrated into boost?

Note: See TracTickets for help on using tickets.