Boost C++ Libraries: Ticket #7303: XML Serialization - Skip/Ignore unexpected data. https://svn.boost.org/trac10/ticket/7303 <p> Please find attached an extension to boost::serialization. The purpose of this work is to add some support for forward compatibility of boost::serialization XML files; specifically the ability to skip/ignore unexpected data. </p> <p> I would describe the patch as a "first working version", the tests all pass (gcc 4.6) with some expected failures (see below), but further work is required. I guess I'm trying to gauge interest, get some feedback on the implementation, and get inspired enough to invest more time in it. </p> <h2 class="section" id="Implementation">Implementation</h2> <p> Two new archive types, <code>rapidxml_iarchive</code>, and <code>rapidxml_wiarchive</code>, have been created. Their implementation is based on <code>xml_[w]iarchive</code> with the XML parsing provided by the rapidxml parser used in boost::property_tree. </p> <p> This seemed the best approach to the problem as it avoided issues with <code>ungetc</code>. </p> <p> Polymorphic versions of <code>rapidxml_[w]iarchive</code> have not been implemented. </p> <h2 class="section" id="Teststatus">Test status</h2> <p> All tests are passing with the following caveats: </p> <ul><li>Polymorphic rapidxml archives have not been implemented resulting in 6 tests failing to compile </li><li>The following tests have had to be tweaked to accommodate rapidxml_[w]iarchive not ignoring element names <ul><li>test_derived_class </li><li>test_recursion </li><li>test_nvp </li><li>test_non_default_ctor2 </li><li>test_diamond </li><li>test_diamond_complex </li></ul></li></ul><h2 class="section" id="Notesandfurtherwork">Notes and further work</h2> <p> The current implementation is a "first working version" and requires some polishing. There are a number of things that require further investigation, broadly speaking they can be categorized as: </p> <ul><li>Better reuse <ul><li>Factor out a base class (templated on char type) for <code>rapidxml_iarchive</code> and <code>rapidxml_wiarchive</code> </li><li>Some code could be shared between <code>rapidxml_[w]iarchive</code> and <code>xml_[w]iarchive</code> <ul><li>See <code>[rapid]xml_iarchive::load(std::wstring&amp;)</code> </li><li>See <code>[rapid]xml_wiarchive::load(std::string&amp;)</code> </li></ul></li></ul></li><li>Better error handling </li><li>Flag support <ul><li>Currently there are no plans to support any kind of flags/alternative behaviour so the existing flag code may need to be removed </li></ul></li><li>Miscellany <ul><li>Go through comments to see what's still relevant </li><li>Replace history map with vector </li></ul></li></ul> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/7303 Trac 1.4.3 anonymous Thu, 30 Aug 2012 12:40:53 GMT attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">rapidxml_work.zip</span> </li> </ul> Ticket anonymous Thu, 30 Aug 2012 12:43:27 GMT attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">svn.9.diff.zip</span> </li> </ul> <p> Diff against release branch </p> Ticket anonymous Thu, 30 Aug 2012 13:05:25 GMT <link>https://svn.boost.org/trac10/ticket/7303#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:1</guid> <description> <p> Related links: </p> <p> <a class="ext-link" href="http://boost.2283326.n4.nabble.com/Re-serialization-Skipping-unexpected-elements-inXMLarchive-td2559559.html"><span class="icon">​</span>http://boost.2283326.n4.nabble.com/Re-serialization-Skipping-unexpected-elements-inXMLarchive-td2559559.html</a> </p> <p> <a class="ext-link" href="http://lists.boost.org/boost-users/2012/04/74081.php"><span class="icon">​</span>http://lists.boost.org/boost-users/2012/04/74081.php</a> </p> <p> <a class="ext-link" href="http://boost.2283326.n4.nabble.com/serialization-skipping-data-td2559099.html"><span class="icon">​</span>http://boost.2283326.n4.nabble.com/serialization-skipping-data-td2559099.html</a> </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Mon, 03 Sep 2012 08:47:36 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7303#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:2</guid> <description> <p> Correction: 4 tests are failing because polymorphic archives haven't been implemented. 2 are failing because I hadn't implemented <code>load_binary</code>. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Fri, 07 Sep 2012 10:03:59 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">rapidxml_work_16.zip</span> </li> </ul> <p> Updated patch </p> Ticket anonymous Fri, 07 Sep 2012 10:05:44 GMT attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">svn.16.diff.zip</span> </li> </ul> <p> Updated patch </p> Ticket anonymous Fri, 07 Sep 2012 10:10:35 GMT <link>https://svn.boost.org/trac10/ticket/7303#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:3</guid> <description> <p> <strong> Update </strong> </p> <p> I've fixed the two tests that were failing due to <code>load_binary</code> not being implemented. </p> <p> I've still not looked into implementing polymorphic rapidxml archives. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Mon, 10 Sep 2012 09:11:15 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">rapidxml_work_17.zip</span> </li> </ul> <p> Source files (v17) </p> Ticket anonymous Mon, 10 Sep 2012 09:11:33 GMT attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">svn.17.diff.zip</span> </li> </ul> <p> Diff (v17) </p> Ticket anonymous Mon, 10 Sep 2012 09:11:50 GMT attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">rapidxml_work_18.zip</span> </li> </ul> <p> Source files (v18) </p> Ticket anonymous Mon, 10 Sep 2012 09:12:06 GMT attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">svn.18.diff.zip</span> </li> </ul> <p> Diff (v18) </p> Ticket anonymous Mon, 10 Sep 2012 09:20:06 GMT <link>https://svn.boost.org/trac10/ticket/7303#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:4</guid> <description> <p> <strong>Update</strong> </p> <ul><li>Added polymorphic rapixml archive support (much easier than I anticipated) </li><li>Added support for flags </li></ul><p> <span class="underline">All tests now pass</span> (gcc 4.6 and clang 3.0) </p> <p> I think the next thing I'll do is go through the tests in detail, see what's covered, and what needs covering. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Robert Ramey</dc:creator> <pubDate>Mon, 10 Sep 2012 16:08:58 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7303#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:5</guid> <description> <p> why have you not used the spirit parser as xml_iarchive does? </p> <p> Seems to me that all this could have been achieved in a much simpler way with less code and less future maintenance requirement by updating the grammar on xml_iarchive. This would have guaranteed passing of all current tests out of the box. </p> <p> Missing: a) <a class="missing wiki">Documentation/Explanation</a> of what new features are offered and how they are used b) Tests of the new features </p> <p> Robert Ramey </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Tue, 11 Sep 2012 11:40:25 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7303#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:6</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/7303#comment:5" title="Comment 5">ramey</a>: </p> <blockquote class="citation"> <p> why have you not used the spirit parser as xml_iarchive does? </p> </blockquote> <p> For the poor reason that I don't have any experience with spirit. </p> <blockquote class="citation"> <p> Seems to me that all this could have been achieved in a much simpler way with less code and less future maintenance requirement by updating the grammar on xml_iarchive. </p> </blockquote> <p> So would I be right in thinking that you would extend the grammar to support element content, (either data or child elements), and extend <code>basic_xml_grammar</code> with a new method <code>parse_content</code> In this way I could skip to the end tag of the start tag just read if the name didn't match. </p> <blockquote class="citation"> <p> This would have guaranteed passing of all current tests out of the box. </p> </blockquote> <p> A feature of <code>xml_iarchive</code> is that top level elements don't have their name checked. If there are multiple top level elements and I wish to skip any of them I will need to inspect their names. So changing the tests in the way that I have seems inevitable. </p> <blockquote class="citation"> <p> Missing: a) <a class="missing wiki">Documentation/Explanation</a> of what new features are offered and how they are used b) Tests of the new features </p> </blockquote> <p> Indeed. I am well aware that at this time it falls short of being a patch, which is why I didn't label it as such. I was really just after feedback, which you have provided, and for which I am grateful. </p> <p> It would be extremely useful for me to be able to skip extra data in my applications' configuration file. It would allow some support for ver.8 applications opening ver.9 config files, though the limitations would be many: I could only ever add extra fields, I suspect I wouldn't be able to skip data if it was reference by other parts of the xml. But even with these limitations it would still be a useful feature to me. And since others have suggested it, I thought worth pursuing. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Wed, 12 Sep 2012 08:27:09 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7303#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:7</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/7303#comment:6" title="Comment 6">anonymous</a>: </p> <blockquote class="citation"> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/7303#comment:5" title="Comment 5">ramey</a>: </p> <blockquote class="citation"> <p> This would have guaranteed passing of all current tests out of the box. </p> </blockquote> <p> A feature of <code>xml_iarchive</code> is that top level elements don't have their name checked. If there are multiple top level elements and I wish to skip any of them I will need to inspect their names. So changing the tests in the way that I have seems inevitable. </p> </blockquote> <p> Actually, not being able to skip top level elements wouldn't be a significant limitation for me. </p> <p> I'll give it some more thought. Thanks for your comments. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Tue, 18 Sep 2012 11:58:33 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">svn.23.diff.zip</span> </li> </ul> <p> Diff against release branch </p> Ticket anonymous Tue, 18 Sep 2012 11:58:58 GMT attachment set https://svn.boost.org/trac10/ticket/7303 https://svn.boost.org/trac10/ticket/7303 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">work_23.zip</span> </li> </ul> <p> Source files </p> Ticket anonymous Tue, 18 Sep 2012 12:15:39 GMT <link>https://svn.boost.org/trac10/ticket/7303#comment:8 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:8</guid> <description> <p> I've done as you suggested. But in thinking about the documentation, I've started to wonder if this really is a good idea. I certainly don't think it should be documented as a <em>feature</em> because of the restrictions on it's use. At best it could be described in terms of some forgiveness/tolerance in the load. </p> <p> Anyway, I'll leave it for your consideration. </p> <p> Reference -&gt; Special Considerations -&gt; XML Archives </p> <p> In addition, the XML format permits skipping unexpected content. XML archives will skip unexpected data but if that data is require by other parts of the archive the load will fail. In particular: </p> <ul><li>Objects at the top of the archive may not be skipped. </li><li>It is not possible to skip the first occurrence of a tracked object. </li></ul> </description> <category>Ticket</category> </item> <item> <dc:creator>Ramki T</dc:creator> <pubDate>Mon, 28 Jan 2013 06:32:07 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7303#comment:9 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:9</guid> <description> <p> Hi, </p> <blockquote> <p> I require forward compatibility in boost's xml serialization. So what is the status of this work? Please let me know. </p> </blockquote> <p> Thanks &amp; Regards, Ramki. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Tue, 19 Feb 2013 12:51:23 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/7303#comment:10 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/7303#comment:10</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/7303#comment:9" title="Comment 9">Ramki T</a>: </p> <blockquote class="citation"> <p> Hi, </p> <blockquote> <p> I require forward compatibility in boost's xml serialization. So what is the status of this work? Please let me know. </p> </blockquote> </blockquote> <p> You might be better off watching <a class="ext-link" href="https://svn.boost.org/trac/boost/ticket/8088"><span class="icon">​</span>https://svn.boost.org/trac/boost/ticket/8088</a> </p> <p> In the end I converted my project to use boost::property_tree. Older versions of the application, written to use boost::serialization, couldn't load the boost::property_tree XML, but since they did't have future compatibility anyway, it wasn't much of a loss. However applications written to use boost::property_tree can easily load boost::serialization XML. And once you've made the jump, boost::property_tree provides better support for skipping extra data, and handling missing data. All-in-all it was a fairly painless conversion. </p> <p> There are caveats: </p> <ul><li>If you need object tracking then boost::property_tree is of no use to you. But as I noted above I couldn't work out a way to modify boost::serialization to support future compatibility <em>and</em> object tracking (not fully at any rate). Perhaps others are more insightful. </li></ul><ul><li>The two libraries almost certainly have different memory/performance characteristics but again that didn't affect my project. </li></ul><ul><li>If you need UTF-8 support you'll need to imbue your input/output stream before calling boost::property_tree::read_xml/write_xml. (UTF-8 support isn't out-of-the-box.) </li></ul><p> I hope that's of some use to you. </p> </description> <category>Ticket</category> </item> </channel> </rss>