Opened 11 years ago
Last modified 9 years ago
#5629 assigned Bugs
base64 encode/decode for std::istreambuf_iterator/std::ostreambuf_iterator
Reported by: | Owned by: | Robert Ramey | |
---|---|---|---|
Milestone: | To Be Determined | Component: | serialization |
Version: | Boost 1.45.0 | Severity: | Problem |
Keywords: | Cc: |
Description
MSVS 2008 The code:
#include "boost/archive/iterators/base64_from_binary.hpp" #include "boost/archive/iterators/binary_from_base64.hpp" #include "boost/archive/iterators/transform_width.hpp" //typedefs typedef std::istreambuf_iterator<char> my_istream_iterator; typedef std::ostreambuf_iterator<char> my_ostream_iterator; typedef boost::archive::iterators::base64_from_binary< boost::archive::iterators::transform_width< my_istream_iterator, 6, 8> > bin_to_base64; typedef boost::archive::iterators::transform_width< boost::archive::iterators::binary_from_base64< my_istream_iterator >, 8, 6 > base64_to_bin; void test() { { //INPUT FILE!!! std::ifstream ifs("test.zip", std::ios_base::in|std::ios_base::binary); std::ofstream ofs("test.arc", std::ios_base::out|std::ios_base::binary); std::copy( bin_to_base64( my_istream_iterator(ifs >> std::noskipws) ), bin_to_base64( my_istream_iterator() ), my_ostream_iterator(ofs) ); } { std::ifstream ifs("test.arc", std::ios_base::in|std::ios_base::binary); std::ofstream ofs("test.rez", std::ios_base::out|std::ios_base::binary); std::copy( base64_to_bin( my_istream_iterator(ifs >> std::noskipws) ), base64_to_bin( my_istream_iterator() ), my_ostream_iterator(ofs) ); } }
Result: 1) If the INPUT FILE will be any of ZIP-file format. The result was:
a) _DEBUG_ERROR("istreambuf_iterator is not dereferencable"); it can be disabled or ignored b) The encoded file "test.rez" will have one superfluous byte than INPUT FILE
2) If the INPUT FILE will any other file (binary or text) all will be OK.
Change History (11)
comment:1 by , 11 years ago
comment:2 by , 10 years ago
Not so much a bug but a missing feature - no function to add/remove "=" padding. See http://stackoverflow.com/questions/8033942/boost-base64-url-encode-decode
comment:4 by , 10 years ago
Owner: | changed from | to
---|
comment:5 by , 10 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:6 by , 10 years ago
Component: | iterator → serialization |
---|
comment:7 by , 10 years ago
The root cause is that sequences whose size doesn't divide by four get a buffer overrun. Here is my workaround.
#include <sstream> #include <cassert> struct to_base64 : public std::stringstream { to_base64(const std::string& str); to_base64(const char* begin, const char* end); }; struct from_base64 : public std::stringstream { from_base64(const std::string& str); from_base64(const char* begin, const char* end); }; #include <boost/archive/iterators/binary_from_base64.hpp> #include <boost/archive/iterators/base64_from_binary.hpp> #include <boost/archive/iterators/transform_width.hpp> #include <boost/archive/iterators/ostream_iterator.hpp> // slightly generalized version of the example here: // http://stackoverflow.com/questions/7053538/how-do-i-encode-a-string-to-base64-using-only-boost template <typename TransformIterator> static void apply(const char* begin, const char* end, std::stringstream& target) { std::copy(TransformIterator(begin), TransformIterator(end), std::ostreambuf_iterator<char>(target)); } template <typename TransformIterator> static void applyTwice(const char* begin, const char* end, std::stringstream& target) { long size = end - begin; int remainder = size % 4; const char* truncated = end - remainder; apply<TransformIterator>(begin, truncated, target); if (remainder) { assert(remainder != 1); /* it can never be =1 if this whole thing about dividing by four is correct */ char padded[4] = { 'A', 'A', 'A', 'A' }; const char* src = truncated; char* dest = &padded[0]; while (src != end) *(dest++) = *(src++); apply<TransformIterator>(&padded[0], &padded[sizeof(padded)], target); std::ios::streampos pos = target.tellp(); pos -= (4 - remainder); target.seekp(pos); } } using namespace boost::archive::iterators; typedef base64_from_binary<transform_width<const char*, 6, 8> > to; to_base64::to_base64(const char* begin, const char* end) { apply<to>(begin, end, *this); } to_base64::to_base64(const std::string& str) { apply<to>(str.c_str(), str.c_str() + str.length(), *this); } typedef transform_width<binary_from_base64<const char*>, 8, 6> from; from_base64::from_base64(const char* begin, const char* end) { applyTwice<from>(begin, end, *this); } from_base64::from_base64(const std::string& str) { applyTwice<from>(str.c_str(), str.c_str() + str.length(), *this); } int main() { size_t length = 0; do { // generate source bytes char source[RAND_MAX + 1]; for (size_t pos = 0; pos < length; ++pos) source[pos] = '0' + char(rand() % 32); source[length] = '\0'; // convert them to base64 to_base64 b(&source[0], &source[length]); std::string b64 = b.str(); // and convert them back from_base64 result(b64.c_str(), b64.c_str() + b64.size()); // compare as binary size_t size = (size_t)result.tellp(); assert(size == length); char dest[RAND_MAX]; result.read(&dest[0], size); for (size_t pos = 0; pos < length; ++pos) assert(source[pos] == dest[pos]); // compare as text std::string asString = result.str(); assert(!strcmp(asString.c_str(), &source[0])); } while (++length < 100); return 0; }
comment:9 by , 10 years ago
Replying to anonymous:
Already fixed?
Not entirely. Although a fix was put into boost 1.53 so that '=' characters will not cause the decoder to crash, it still doesn't treat them as padding. The fix will cause the decoder to add nulls to the end of the decoded value, which is probably not what you want. Granted, you should be able to figure out the right thing to to given the number of '=' characters on the end of the encoded stream, but you shouldn't have to.
Also, the encoder still won't produce a padded encoding.
comment:10 by , 9 years ago
The problem of padding could be solved by making the transform_width stateful. If you want to encode/decode to/from BASE64 and you input comes in chunks or as a stream, you need to save your previous state to be able to continue with the next chunk correctly.
The transform_width currently goes "blindly" for the next item pointed to by the iterator, although there may not be enough items to finish the minimum quantum. It even ends up with buffer overflow, if your input sequence is not zero-padded. Also, this eager reading of the zero padding makes it useful only to convert once a complete buffer.
Having the transform_width "know" that there is a minimum quantum of units to read which has to be available to produce an output unit, it would prevent the buffer overflow and allow transforming chunked input.
I added the end-iterator and state to the transform_width (either directly or to the transformed iterator by an iterator adaptor). Reading ahead and storing the next value makes the code a little longer and note so compact. Also, tt runs significantly slower than a hand-coded BASE64 encoder; I\m nor sure why. Maybe copying of the iterators around?
Would it make sense to include a stateful transform_width in boost?
comment:11 by , 9 years ago
I did spend a significant amount of time on this while (apparently) not getting it quite right.
How about:
a) updating the current test so that if fails
b) suggesting a patch
Robert Ramey
If it may help. The workaround code and example how to use you can find here: http://rsdn.ru/forum/cpp.applied/4317966.1.aspx