Boost C++ Libraries: Ticket #12787: how to read non-utf8 strings with boost::property_tree https://svn.boost.org/trac10/ticket/12787 <p> Hi, I am having this problem with boost::property_tree::read_json since 1.59! My json tree looks like this (code attached): </p> <pre class="wiki">static std::string reduced("{\n \"pipename\": \"quantiser(decode_lut_string=&lt;verbatim&gt;\\u0000@\\u0000\200\\u0000&lt;\\/verbatim&gt;)\",\n \"raw\": {\n \"type\": \"t\",\n \"rank\": \"3\",\n \"shape\": {\n \"dim\": \"256\",\n \"dim\": \"128\",\n \"dim\": \"128\"\n }\n },\n \"encoded\": {\n \"bytes\": \"4194304\"\n }\n}\n",296); </pre><p> If you check, there is a section between &lt;verbatim&gt;..&lt;/verbatim&gt; that I'd like to parse as is. the read_json method however throws an exception saying that: </p> <pre class="wiki">&lt;unspecified file&gt;(4): invalid code sequence </pre><p> my guess is, that the characters quoted above are non-utf8 and hence property_tree throws. Is this a bug or a feature? </p> <p> if it is a feature, i.e. property_tree is meant to only yield utf8 encoded strings, how would I store an arbitrary string in the property tree? NB. the string above works with boost 1.58 and older! </p> <p> Best, P </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/12787 Trac 1.4.3 steinbac@… Wed, 25 Jan 2017 09:10:27 GMT attachment set https://svn.boost.org/trac10/ticket/12787 https://svn.boost.org/trac10/ticket/12787 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">test_json_fails.cpp</span> </li> </ul> <p> example code that reproduces the problem (in reduced fashion) with boost 1.59 and newer, the original string that caused the problem is also contained </p> Ticket Peter Steinbach Wed, 25 Jan 2017 09:14:04 GMT attachment set https://svn.boost.org/trac10/ticket/12787 https://svn.boost.org/trac10/ticket/12787 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">test_json_fails.2.cpp</span> </li> </ul> <p> update to yield only the problematic string </p> Ticket steinbac@… Wed, 25 Jan 2017 10:10:43 GMT <link>https://svn.boost.org/trac10/ticket/12787#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12787#comment:1</guid> <description> <p> apparently the problem lies in </p> <pre class="wiki">boost::property_tree::detail::json_parser::utf8_utf8_encoding </pre><blockquote> <p> located in narrow_encoding.hpp. the function </p> <pre class="wiki">utf8_utf8_encoding::trail_table(unsigned char c) </pre><p> returns -1 for the '\002' character that is contained in my problematic string. I guess it would be nice to have a custom encoding as I see no other way around it. Besides, can someone elaborate what this function does? It's not clear to me and it's not documented. </p> </blockquote> </description> <category>Ticket</category> </item> <item> <author>steinbac@…</author> <pubDate>Wed, 25 Jan 2017 10:39:13 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12787#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12787#comment:2</guid> <description> <p> I played around a bit and I guess the easy solution would be to store everything with wchar_t (i.e. std::wstring et al). I already converted the example and the code works as expected. If there are any alternatives to this approach, feel free to suggest them. </p> </description> <category>Ticket</category> </item> <item> <author>steinbac@…</author> <pubDate>Thu, 26 Jan 2017 13:46:44 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/12787#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/12787#comment:3</guid> <description> <p> ok, I resolved this problem for myself by not using boost::property_tree. :/ in all honesty, I have to wonder why the value of a key inside the json needs to be utf8 compliant. </p> </description> <category>Ticket</category> </item> </channel> </rss>