Opened 6 years ago

Last modified 5 years ago

#12471 new Bugs

gzip compressor decompressor broken

Reported by: anonymous Owned by: Jonathan Turkanis
Milestone: To Be Determined Component: iostreams
Version: Boost 1.61.0 Severity: Problem
Keywords: Cc:

Description

Hi guys, when I use the iostreams with the gzip tools it fails to give correct answers on final dezip. my data is all small integers separated by commas, and the recovered data interchanges the order of commas and numbers as if it was a bad multithreaded program. The result is worse than useless. This is a big file phenomena (up to gb size).

Looking at the compressor code (gzip.hpp) I see many problems between 32 and 64 bit. Currently I am looking at an assignment: One that is problematic at the

    template<typename Sink>
    std::streamsize write(Sink& snk, const char_type* s, std::streamsize n)
    {
        if (!(flags_ & f_header_done)) {
            std::streamsize amt = 
                static_cast<std::streamsize>(header_.size() - offset_);
            offset_ += boost::iostreams::write(snk, header_.data() + offset_, amt);
            if (offset_ == header_.size())
                flags_ |= f_header_done;
            else
                return 0;
        }
        return base_type::write(snk, s, n);
    }

offset_ is size_t while boost::iostreams::write(returns streamsize a signed 64 bit type (long long). So in 32 bit code there is a problem. Streamsize occurs many places, as does size_t; but one seems to be consistently 64 bit across platforms. offset seems to be 32 bit in 32 bit code. I do not think this this is the cause of the issues with gzip but short of rewriting it I don't know where to start.

I would like ot use and have confidence in this compression code but so far it only gives grief...

Change History (2)

comment:1 by Jonathan Turkanis, 6 years ago

Thank you for reporting this issue. Would you please post a code example that produces an incorrect outcome?

comment:2 by anonymous, 5 years ago

While I don't like the code much, I did review most if not all of those case. boost::iostreams::write cannot return a value larger than amt, amt is header_.size() - offset_, and header_.size() is size_t, thus there is no issue in this case. Well, not correctness wise. The code is annoying/confusing and potentially triggers compiler warnings. So yes, a specific test case for reproducing any issue would be useful.

Note: See TracTickets for help on using tickets.