Opened 6 years ago
Last modified 6 years ago
#12787 new Bugs
how to read non-utf8 strings with boost::property_tree
Reported by: | Owned by: | Sebastian Redl | |
---|---|---|---|
Milestone: | To Be Determined | Component: | property_tree |
Version: | Boost 1.61.0 | Severity: | Showstopper |
Keywords: | Cc: | steinbac@… |
Description
Hi, I am having this problem with boost::property_tree::read_json since 1.59! My json tree looks like this (code attached):
static std::string reduced("{\n \"pipename\": \"quantiser(decode_lut_string=<verbatim>\\u0000@\\u0000\200\\u0000<\\/verbatim>)\",\n \"raw\": {\n \"type\": \"t\",\n \"rank\": \"3\",\n \"shape\": {\n \"dim\": \"256\",\n \"dim\": \"128\",\n \"dim\": \"128\"\n }\n },\n \"encoded\": {\n \"bytes\": \"4194304\"\n }\n}\n",296);
If you check, there is a section between <verbatim>..</verbatim> that I'd like to parse as is. the read_json method however throws an exception saying that:
<unspecified file>(4): invalid code sequence
my guess is, that the characters quoted above are non-utf8 and hence property_tree throws. Is this a bug or a feature?
if it is a feature, i.e. property_tree is meant to only yield utf8 encoded strings, how would I store an arbitrary string in the property tree? NB. the string above works with boost 1.58 and older!
Best, P
Attachments (2)
Change History (5)
by , 6 years ago
Attachment: | test_json_fails.cpp added |
---|
by , 6 years ago
Attachment: | test_json_fails.2.cpp added |
---|
update to yield only the problematic string
comment:1 by , 6 years ago
apparently the problem lies in
boost::property_tree::detail::json_parser::utf8_utf8_encoding
located in narrow_encoding.hpp. the function
utf8_utf8_encoding::trail_table(unsigned char c)returns -1 for the '\002' character that is contained in my problematic string. I guess it would be nice to have a custom encoding as I see no other way around it. Besides, can someone elaborate what this function does? It's not clear to me and it's not documented.
comment:2 by , 6 years ago
I played around a bit and I guess the easy solution would be to store everything with wchar_t (i.e. std::wstring et al). I already converted the example and the code works as expected. If there are any alternatives to this approach, feel free to suggest them.
comment:3 by , 6 years ago
ok, I resolved this problem for myself by not using boost::property_tree. :/ in all honesty, I have to wonder why the value of a key inside the json needs to be utf8 compliant.
example code that reproduces the problem (in reduced fashion) with boost 1.59 and newer, the original string that caused the problem is also contained