Opened 11 years ago

Last modified 9 years ago

#5629 assigned Bugs

base64 encode/decode for std::istreambuf_iterator/std::ostreambuf_iterator

Reported by: nen777w@… Owned by: Robert Ramey
Milestone: To Be Determined Component: serialization
Version: Boost 1.45.0 Severity: Problem
Keywords: Cc:

Description

MSVS 2008 The code:

#include "boost/archive/iterators/base64_from_binary.hpp"
#include "boost/archive/iterators/binary_from_base64.hpp"
#include "boost/archive/iterators/transform_width.hpp"

//typedefs
typedef  std::istreambuf_iterator<char>    my_istream_iterator;
typedef  std::ostreambuf_iterator<char>    my_ostream_iterator;

typedef boost::archive::iterators::base64_from_binary<
          boost::archive::iterators::transform_width< my_istream_iterator, 6, 8>
> bin_to_base64;

typedef boost::archive::iterators::transform_width<
    boost::archive::iterators::binary_from_base64< my_istream_iterator >, 8, 6
> base64_to_bin;

void test()
{
   {
        //INPUT FILE!!!
    std::ifstream ifs("test.zip", std::ios_base::in|std::ios_base::binary);
    std::ofstream ofs("test.arc", std::ios_base::out|std::ios_base::binary);

    std::copy(
        bin_to_base64( my_istream_iterator(ifs >> std::noskipws) ),
        bin_to_base64( my_istream_iterator() ),
        my_ostream_iterator(ofs)
    );
  }

  {
    std::ifstream ifs("test.arc", std::ios_base::in|std::ios_base::binary);
    std::ofstream ofs("test.rez", std::ios_base::out|std::ios_base::binary);

    std::copy(
        base64_to_bin( my_istream_iterator(ifs >> std::noskipws) ),
        base64_to_bin( my_istream_iterator() ),
        my_ostream_iterator(ofs)
    );
  }
}

Result: 1) If the INPUT FILE will be any of ZIP-file format. The result was:

a) _DEBUG_ERROR("istreambuf_iterator is not dereferencable"); it can be disabled or ignored b) The encoded file "test.rez" will have one superfluous byte than INPUT FILE

2) If the INPUT FILE will any other file (binary or text) all will be OK.

Change History (11)

comment:1 by nen777w@…, 11 years ago

If it may help. The workaround code and example how to use you can find here: http://rsdn.ru/forum/cpp.applied/4317966.1.aspx

comment:2 by anonymous, 10 years ago

Not so much a bug but a missing feature - no function to add/remove "=" padding. See http://stackoverflow.com/questions/8033942/boost-base64-url-encode-decode

comment:3 by Ruslan Teliuk <nen777w@…>, 10 years ago

comment:4 by Dave Abrahams, 10 years ago

Owner: changed from Dave Abrahams to jeffrey.hellrung

comment:5 by Robert Ramey, 10 years ago

Owner: changed from jeffrey.hellrung to Robert Ramey
Status: newassigned

comment:6 by Dave Abrahams, 10 years ago

Component: iteratorserialization

comment:7 by iGene, 10 years ago

The root cause is that sequences whose size doesn't divide by four get a buffer overrun. Here is my workaround.

#include <sstream>
#include <cassert>

struct to_base64 : public std::stringstream {
	to_base64(const std::string& str);
	to_base64(const char* begin, const char* end);
};

struct from_base64 : public std::stringstream {
	from_base64(const std::string& str);
	from_base64(const char* begin, const char* end);
};

#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/transform_width.hpp>
#include <boost/archive/iterators/ostream_iterator.hpp>

// slightly generalized version of the example here:
// http://stackoverflow.com/questions/7053538/how-do-i-encode-a-string-to-base64-using-only-boost

template <typename TransformIterator>
static void apply(const char* begin, const char* end, std::stringstream& target) {
	std::copy(TransformIterator(begin), TransformIterator(end), std::ostreambuf_iterator<char>(target));
}
template <typename TransformIterator>
static void applyTwice(const char* begin, const char* end, std::stringstream& target) {
	long size = end - begin;
	int remainder = size % 4;
	const char* truncated = end - remainder;
	apply<TransformIterator>(begin, truncated, target);
	if (remainder) {
		assert(remainder != 1); /* it can never be =1 if this whole thing about dividing by four is correct */
		char padded[4] = { 'A', 'A', 'A', 'A' };
		const char* src = truncated; 
		char* dest = &padded[0];
		while (src != end)
			*(dest++) = *(src++);
		apply<TransformIterator>(&padded[0], &padded[sizeof(padded)], target);
		std::ios::streampos pos = target.tellp();
		pos -= (4 - remainder);
		target.seekp(pos);
	}
}

using namespace boost::archive::iterators;

typedef base64_from_binary<transform_width<const char*, 6, 8> > to;
to_base64::to_base64(const char* begin, const char* end) { apply<to>(begin, end, *this); }
to_base64::to_base64(const std::string& str) { apply<to>(str.c_str(), str.c_str() + str.length(), *this); }

typedef transform_width<binary_from_base64<const char*>, 8, 6> from;
from_base64::from_base64(const char* begin, const char* end) { applyTwice<from>(begin, end, *this); }
from_base64::from_base64(const std::string& str) { applyTwice<from>(str.c_str(), str.c_str() + str.length(), *this); }

int main()
{
	size_t length = 0;
	do {
		// generate source bytes
		char source[RAND_MAX + 1];
		for (size_t pos = 0; pos < length; ++pos)
			source[pos] = '0' + char(rand() % 32);
		source[length] = '\0';
		// convert them to base64
		to_base64 b(&source[0], &source[length]);
		std::string b64 = b.str();
		// and convert them back
		from_base64 result(b64.c_str(), b64.c_str() + b64.size());
		// compare as binary
		size_t size = (size_t)result.tellp();
		assert(size == length);
		char dest[RAND_MAX];
		result.read(&dest[0], size);
		for (size_t pos = 0; pos < length; ++pos)
			assert(source[pos] == dest[pos]);
		// compare as text
		std::string asString = result.str();
		assert(!strcmp(asString.c_str(), &source[0]));
	} while (++length < 100);

	return 0;
}

comment:8 by anonymous, 10 years ago

Already fixed?

in reply to:  8 comment:9 by anonymous, 10 years ago

Replying to anonymous:

Already fixed?

Not entirely. Although a fix was put into boost 1.53 so that '=' characters will not cause the decoder to crash, it still doesn't treat them as padding. The fix will cause the decoder to add nulls to the end of the decoded value, which is probably not what you want. Granted, you should be able to figure out the right thing to to given the number of '=' characters on the end of the encoded stream, but you shouldn't have to.

Also, the encoder still won't produce a padded encoding.

comment:10 by prantlf@…, 9 years ago

The problem of padding could be solved by making the transform_width stateful. If you want to encode/decode to/from BASE64 and you input comes in chunks or as a stream, you need to save your previous state to be able to continue with the next chunk correctly.

The transform_width currently goes "blindly" for the next item pointed to by the iterator, although there may not be enough items to finish the minimum quantum. It even ends up with buffer overflow, if your input sequence is not zero-padded. Also, this eager reading of the zero padding makes it useful only to convert once a complete buffer.

Having the transform_width "know" that there is a minimum quantum of units to read which has to be available to produce an output unit, it would prevent the buffer overflow and allow transforming chunked input.

I added the end-iterator and state to the transform_width (either directly or to the transformed iterator by an iterator adaptor). Reading ahead and storing the next value makes the code a little longer and note so compact. Also, tt runs significantly slower than a hand-coded BASE64 encoder; I\m nor sure why. Maybe copying of the iterators around?

Would it make sense to include a stateful transform_width in boost?

comment:11 by Robert Ramey, 9 years ago

I did spend a significant amount of time on this while (apparently) not getting it quite right.

How about:

a) updating the current test so that if fails

b) suggesting a patch

Robert Ramey

Note: See TracTickets for help on using tickets.