Opened 13 years ago
Closed 12 years ago
#3853 closed Bugs (fixed)
bzip2_decompressor reads only one block of a pbzip2-created files
Reported by: | Owned by: | Jonathan Turkanis | |
---|---|---|---|
Milestone: | Boost 1.42.0 | Component: | iostreams |
Version: | Boost 1.41.0 | Severity: | Problem |
Keywords: | Cc: |
Description
reading bz2 files created with parallel version "pbzip2" stops exactly after correctly decompressing the first bzip block (900k). this same bz2 file is correctly decompressed by the non-parallel bunzip2 version so, imho the bzip2 filter should do the same or throw an exception if this is some kind of unrecognized format.
i've been able to reproduce this bug with boost 1.35.1, 1.36.0, 1.40.0, 1.41.0 so i guess it is present in all versions. the bug can be seen by running the attached programs:
c++ spit.cc -o spit c++ test_bzip2_decompressor.cc -o test_bzip2_decompressor ./spit >test.txt bzip2 -c test.txt >test.txt.bz2 pbzip2 -c test.txt >test.txt.parallel.bz2
running
./test_bzip2_decompressor test.txt.bz2 | wc
outputs correct 500 lines:
500 250000 995395
while
./test_bzip2_decompressor test.txt.parallel.bz2 | wc
finds only 452 lines and only 900k in total:
452 226164 900000
Attachments (2)
Change History (5)
by , 13 years ago
by , 13 years ago
Attachment: | test_bzip2_decompressor.cc added |
---|
decompressor using boost iostream filter
comment:1 by , 13 years ago
sorry, the line in the original bug description got concatenated. it should read:
c++ spit.cc -o spit
c++ test_bzip2_decompressor.cc -o test_bzip2_decompressor
./spit >test.txt
bzip2 -c test.txt >test.txt.bz2
pbzip2 -c test.txt >test.txt.parallel.bz2
comment:2 by , 13 years ago
It seems like boost does not support multiple bz2 streams in a file. Both bzip2 and pbzip2 support reading multiple bz2 streams, so when you see the end of a bz2 sequence, you have check if it is the EOF as well, if not, look for another bz2 sequence and concatenate the decompressed results if one is found.
So you can have files like:
|bz1|
or
|bz1|bz2|bz3|bz4| etc...
And you need to support both if you want to be compatible with bzip2 and pbzip2 since they both support this and it is a valid format.
comment:3 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
creates test file