Opened 7 years ago

Last modified 5 years ago

#11981 reopened Bugs

boost::archive::xml_woarchive with locale dosen't work

Reported by: anonymous Owned by: Robert Ramey
Milestone: To Be Determined Component: serialization
Version: Boost 1.65.0 Severity: Regression
Keywords: locale, xml_woarchive, serialization Cc:

Description

New locale library seems to have a bug. "Implemented generic codecvt facet and add general purpose utf8_codecvt facet"

#include <string>
#include <locale>
#include <fstream>
#include <boost/serialization/string.hpp>
#include <boost/serialization/nvp.hpp>
#include <boost/archive/xml_woarchive.hpp>
#include <boost/archive/xml_wiarchive.hpp>

int wmain(int argc, wchar_t* argv[])
{
    std::locale::global(std::locale("japanese"));

    std::wofstream wofs("output.xml");
    boost::archive::xml_woarchive oa(wofs);  // exception in 1.60
    oa << boost::serialization::make_nvp("string", std::string("日本語文字列"));
    wofs.close();

    std::string str;
    std::wifstream wifs("output.xml");
    boost::archive::xml_wiarchive ia(wifs);
    ia >> boost::serialization::make_nvp("string", str);
    wifs.close();

    return 0;
}

An exception occurs in boost 1.60 in Visual Studio 2013. "invalid multbyte/wide char conversion".

This exception doesn't occur in boost 1.59, but this code makes invalid xml. The encoding is not UTF-8 but SJIS.

In boost 1.57, it makes valid UTF-8 encoding xml.

Change History (7)

comment:1 by JeremyS3DS, 6 years ago

Severity: ProblemRegression

comment:2 by Artyom Beilis, 6 years ago

Component: localeserialization
Owner: changed from Artyom Beilis to Robert Ramey

comment:3 by anonymous, 6 years ago

If we run this code in boost 1.61 in Visual Studio 2013 x86 Windows 10, abort() is called.

Assertion failed: std::codecvt_base::ok == r, file D:\Project\boost_1_61_0\boost/archive/iterators/wchar_from_mb.hpp, line 175

comment:4 by Robert Ramey, 6 years ago

Resolution: wontfix
Status: newclosed

the reason for the regression is that I improved the test. That is, it's a problem that was always there but not exhaustively tested as it is now. When you say the encoding is SJIS what do you mean? The test uses UTF-8 characters. I've had a lot of problem with this test on various platforms so any information you want to give would be appreciated.

I'm marking tis "wont fix" But that's not entirely true - I would like to say "can't fix" but that choice is not presented.

comment:5 by anonymous, 6 years ago

When you say the encoding is SJIS what do you mean? The test uses UTF-8 characters.

The output xml should be UTF-8 characters but it was not in boost 1.59. In boost 1.61 abort() is called and there is no output xml.

I want to use new boost but I can't because of this bug. Is it possible to use boost 1.57 for serialization and boost 1.61 for others? How can we mix different versions?

Platform. Windows 10 Japanese 64bit. Visual Studio 2013 Update 5.

comment:6 by anonymous, 6 years ago

If I copy these files from boost 1.57.0 into boost 1.61.0, the test code works well.

boost\serialization\pfto.hpp
boost\archive\iterators\mb_from_wchar.hpp
boost\archive\iterators\wchar_from_mb.hpp

comment:7 by matsu, 5 years ago

Resolution: wontfix
Status: closedreopened
Version: Boost 1.60.0Boost 1.65.0

This problem still exists on boost 1.65.1. Platform. Windows 10 Japanese 64bit. Visual Studio 2017 Update 1.

Abort is called at this line.

oa << boost::serialization::make_nvp("string", std::string("日本語文字列"));

This is because std::string is not always UTF-8 encoding. It depends on the locale, in my case it is Shift_JIS encording. But utf8_codecvt_facet is always used in xml_woarchive_impl.ipp.

I have changed utf8_codecvt_facet to mbstowcs_s and it works well. mbstowcs_s refers the locale and converts accordingly.

xml_woarchive_impl.ipp

#define BOOST_NO_UTF8 // my change
#ifdef BOOST_NO_UTF8
#include <stdlib.h>
#else
#include <boost/archive/iterators/wchar_from_mb.hpp>
#endif

// copy chars to output escaping to xml and widening characters as we go
template<class InputIterator>
void save_iterator(std::wostream &os, InputIterator begin, InputIterator end){
#ifdef BOOST_NO_UTF8
    std::size_t len = end - begin + 1;
    std::vector<wchar_t> dst(len);
    if (::mbstowcs_s(&len, dst.data(), len, begin, len - 1) != 0) {
        throw std::system_error(errno, std::system_category());
    }
    std::copy(
        dst.data(),
        dst.data() + len - 1,
        boost::archive::iterators::ostream_iterator<wchar_t>(os)
    );
#else
    typedef iterators::wchar_from_mb<
        iterators::xml_escape<InputIterator>
    > xmbtows;
    std::copy(
        xmbtows(begin),
        xmbtows(end),
        boost::archive::iterators::ostream_iterator<wchar_t>(os)
    );
#endif
}

xml_wiarchive_impl.ipp

#define BOOST_NO_UTF8 // my change
#ifdef BOOST_NO_UTF8
#include <stdlib.h>
#else
#include <boost/archive/iterators/wchar_from_mb.hpp>
#endif

void copy_to_ptr(char * s, const std::wstring & ws){
#ifdef BOOST_NO_UTF8
    std::size_t len = ws.size() * sizeof(wchar_t) + 1;
    if (::wcstombs_s(&len, s, len, ws.c_str(), len - 1) != 0) {
        throw std::system_error(errno, std::system_category());
    }
#else
    std::copy(
        iterators::mb_from_wchar<std::wstring::const_iterator>(
            ws.begin()
        ), 
        iterators::mb_from_wchar<std::wstring::const_iterator>(
            ws.end()
        ), 
        s
    );
    s[ws.size()] = 0;
#endif
}

template<class Archive>
BOOST_WARCHIVE_DECL void
xml_wiarchive_impl<Archive>::load(std::string & s){
    std::wstring ws;
    bool result = gimpl->parse_string(is, ws);
    if(! result)
        boost::serialization::throw_exception(
            xml_archive_exception(xml_archive_exception::xml_archive_parsing_error)
        );
    #if BOOST_WORKAROUND(_RWSTD_VER, BOOST_TESTED_AT(20101))
    if(NULL != s.data())
    #endif
        s.resize(0);
#ifdef BOOST_NO_UTF8
    std::size_t len = ws.size() * sizeof(wchar_t) + 1;
    s.resize(len);
    if (::wcstombs_s(&len, &s[0], len, ws.c_str(), _TRUNCATE) != 0) {
        throw std::system_error(errno, std::system_category());
    }
    s.resize(len - 1);
#else
    s.reserve(ws.size());
    std::copy(
        iterators::mb_from_wchar<std::wstring::iterator>(
            ws.begin()
        ), 
        iterators::mb_from_wchar<std::wstring::iterator>(
            ws.end()
        ), 
        std::back_inserter(s)
    );
#endif
}
Note: See TracTickets for help on using tickets.