Opened 6 years ago

Last modified 6 years ago

#12787 new Bugs

how to read non-utf8 strings with boost::property_tree

Reported by: steinbac@… Owned by: Sebastian Redl
Milestone: To Be Determined Component: property_tree
Version: Boost 1.61.0 Severity: Showstopper
Keywords: Cc: steinbac@…

Description

Hi, I am having this problem with boost::property_tree::read_json since 1.59! My json tree looks like this (code attached):

static std::string reduced("{\n    \"pipename\": \"quantiser(decode_lut_string=<verbatim>\\u0000@\\u0000\200\\u0000<\\/verbatim>)\",\n    \"raw\": {\n        \"type\": \"t\",\n        \"rank\": \"3\",\n        \"shape\": {\n            \"dim\": \"256\",\n            \"dim\": \"128\",\n            \"dim\": \"128\"\n        }\n    },\n    \"encoded\": {\n        \"bytes\": \"4194304\"\n    }\n}\n",296);

If you check, there is a section between <verbatim>..</verbatim> that I'd like to parse as is. the read_json method however throws an exception saying that:

<unspecified file>(4): invalid code sequence

my guess is, that the characters quoted above are non-utf8 and hence property_tree throws. Is this a bug or a feature?

if it is a feature, i.e. property_tree is meant to only yield utf8 encoded strings, how would I store an arbitrary string in the property tree? NB. the string above works with boost 1.58 and older!

Best, P

Attachments (2)

test_json_fails.cpp (7.1 KB ) - added by steinbac@… 6 years ago.
example code that reproduces the problem (in reduced fashion) with boost 1.59 and newer, the original string that caused the problem is also contained
test_json_fails.2.cpp (6.8 KB ) - added by Peter Steinbach 6 years ago.
update to yield only the problematic string

Download all attachments as: .zip

Change History (5)

by steinbac@…, 6 years ago

Attachment: test_json_fails.cpp added

example code that reproduces the problem (in reduced fashion) with boost 1.59 and newer, the original string that caused the problem is also contained

by Peter Steinbach, 6 years ago

Attachment: test_json_fails.2.cpp added

update to yield only the problematic string

comment:1 by steinbac@…, 6 years ago

apparently the problem lies in

boost::property_tree::detail::json_parser::utf8_utf8_encoding

located in narrow_encoding.hpp. the function

utf8_utf8_encoding::trail_table(unsigned char c)

returns -1 for the '\002' character that is contained in my problematic string. I guess it would be nice to have a custom encoding as I see no other way around it. Besides, can someone elaborate what this function does? It's not clear to me and it's not documented.

comment:2 by steinbac@…, 6 years ago

I played around a bit and I guess the easy solution would be to store everything with wchar_t (i.e. std::wstring et al). I already converted the example and the code works as expected. If there are any alternatives to this approach, feel free to suggest them.

comment:3 by steinbac@…, 6 years ago

ok, I resolved this problem for myself by not using boost::property_tree. :/ in all honesty, I have to wonder why the value of a key inside the json needs to be utf8 compliant.

Note: See TracTickets for help on using tickets.