Opened 9 years ago
Last modified 7 years ago
#9749 new Bugs
bzip2_decompressor input filter gives data_error_magic
Reported by: | Owned by: | Jonathan Turkanis | |
---|---|---|---|
Milestone: | To Be Determined | Component: | iostreams |
Version: | Boost 1.55.0 | Severity: | Problem |
Keywords: | bzip2_decompressor | Cc: |
Description
The bzip2_decompressor input filter is not able to read some bz2 files properly. It can read only the first few lines of the file then terminates with data_error_magic (BZ_DATA_ERROR_MAGIC in bzlib.h).
After examining the problematic bz2 files I noticed that the first compressed stream is much shorter (536 bytes long) than the following streams. Debugging lead me to the conclusion that the simmetric_filter class's read method fills the buffer with new data even when the buffer hasn't been consumed completely. With the proposed solution below the files can be read to the end without further problems.
The original code segment:
if (status == f_good) status = fill(src);
The proposed modification:
if (status == f_good && buf.ptr() == buf.eptr()) status = fill(src);
Could someone please make this modification? Regards,
Balazs
Attachments (1)
Change History (4)
comment:1 by , 9 years ago
by , 7 years ago
Attachment: | bzip2.patch added |
---|
comment:2 by , 7 years ago
This also affects me, attached patch provides a workaround within bzip2_decompressor rather than changing symmetric_filter. Please forgive the bad indentation, I couldn't be bothered to change my editor.
comment:3 by , 7 years ago
The patch in comment 2 works, any chance this gets integrated into boost?
To reproduce the issue: download the wikidata dump - for example: http://dumps.wikimedia.org/wikidatawiki/20140123/wikidatawiki-20140123-pages-articles-multistream.xml.bz2 and try to print it line by line.