Opened 13 years ago

Closed 12 years ago

#3853 closed Bugs (fixed)

bzip2_decompressor reads only one block of a pbzip2-created files

Reported by: Darko Veberic <darko.veberic@…> Owned by: Jonathan Turkanis
Milestone: Boost 1.42.0 Component: iostreams
Version: Boost 1.41.0 Severity: Problem
Keywords: Cc:

Description

reading bz2 files created with parallel version "pbzip2" stops exactly after correctly decompressing the first bzip block (900k). this same bz2 file is correctly decompressed by the non-parallel bunzip2 version so, imho the bzip2 filter should do the same or throw an exception if this is some kind of unrecognized format.

i've been able to reproduce this bug with boost 1.35.1, 1.36.0, 1.40.0, 1.41.0 so i guess it is present in all versions. the bug can be seen by running the attached programs:

c++ spit.cc -o spit c++ test_bzip2_decompressor.cc -o test_bzip2_decompressor ./spit >test.txt bzip2 -c test.txt >test.txt.bz2 pbzip2 -c test.txt >test.txt.parallel.bz2

running

./test_bzip2_decompressor test.txt.bz2 | wc

outputs correct 500 lines:

500 250000 995395

while

./test_bzip2_decompressor test.txt.parallel.bz2 | wc

finds only 452 lines and only 900k in total:

452 226164 900000

Attachments (2)

spit.cc (227 bytes ) - added by Darko Veberic <darko.veberic@…> 13 years ago.
creates test file
test_bzip2_decompressor.cc (519 bytes ) - added by Darko Veberic <darko.veberic@…> 13 years ago.
decompressor using boost iostream filter

Download all attachments as: .zip

Change History (5)

by Darko Veberic <darko.veberic@…>, 13 years ago

Attachment: spit.cc added

creates test file

by Darko Veberic <darko.veberic@…>, 13 years ago

Attachment: test_bzip2_decompressor.cc added

decompressor using boost iostream filter

comment:1 by Darko Veberic <darko.veberic@…>, 13 years ago

sorry, the line in the original bug description got concatenated. it should read:

c++ spit.cc -o spit

c++ test_bzip2_decompressor.cc -o test_bzip2_decompressor

./spit >test.txt

bzip2 -c test.txt >test.txt.bz2

pbzip2 -c test.txt >test.txt.parallel.bz2

comment:2 by jeff.gilchrist@…, 13 years ago

It seems like boost does not support multiple bz2 streams in a file. Both bzip2 and pbzip2 support reading multiple bz2 streams, so when you see the end of a bz2 sequence, you have check if it is the EOF as well, if not, look for another bz2 sequence and concatenate the decompressed results if one is found.

So you can have files like:

|bz1|

or

|bz1|bz2|bz3|bz4| etc...

And you need to support both if you want to be compatible with bzip2 and pbzip2 since they both support this and it is a valid format.

comment:3 by Steven Watanabe, 12 years ago

Resolution: fixed
Status: newclosed

(In [63057]) Allow bzip2_decompressor to process multiple concatenated streams. Fixes #3853.

Note: See TracTickets for help on using tickets.