Opened 10 years ago

Last modified 10 years ago

#7303 new Feature Requests

XML Serialization - Skip/Ignore unexpected data.

Reported by: anonymous Owned by: Robert Ramey
Milestone: To Be Determined Component: serialization
Version: Boost 1.51.0 Severity: Not Applicable
Keywords: Cc:

Description

Please find attached an extension to boost::serialization. The purpose of this work is to add some support for forward compatibility of boost::serialization XML files; specifically the ability to skip/ignore unexpected data.

I would describe the patch as a "first working version", the tests all pass (gcc 4.6) with some expected failures (see below), but further work is required. I guess I'm trying to gauge interest, get some feedback on the implementation, and get inspired enough to invest more time in it.

Implementation

Two new archive types, rapidxml_iarchive, and rapidxml_wiarchive, have been created. Their implementation is based on xml_[w]iarchive with the XML parsing provided by the rapidxml parser used in boost::property_tree.

This seemed the best approach to the problem as it avoided issues with ungetc.

Polymorphic versions of rapidxml_[w]iarchive have not been implemented.

Test status

All tests are passing with the following caveats:

  • Polymorphic rapidxml archives have not been implemented resulting in 6 tests failing to compile
  • The following tests have had to be tweaked to accommodate rapidxml_[w]iarchive not ignoring element names
    • test_derived_class
    • test_recursion
    • test_nvp
    • test_non_default_ctor2
    • test_diamond
    • test_diamond_complex

Notes and further work

The current implementation is a "first working version" and requires some polishing. There are a number of things that require further investigation, broadly speaking they can be categorized as:

  • Better reuse
    • Factor out a base class (templated on char type) for rapidxml_iarchive and rapidxml_wiarchive
    • Some code could be shared between rapidxml_[w]iarchive and xml_[w]iarchive
      • See [rapid]xml_iarchive::load(std::wstring&)
      • See [rapid]xml_wiarchive::load(std::string&)
  • Better error handling
  • Flag support
    • Currently there are no plans to support any kind of flags/alternative behaviour so the existing flag code may need to be removed
  • Miscellany
    • Go through comments to see what's still relevant
    • Replace history map with vector

Attachments (10)

rapidxml_work.zip (27.9 KB ) - added by anonymous 10 years ago.
svn.9.diff.zip (7.3 KB ) - added by anonymous 10 years ago.
Diff against release branch
rapidxml_work_16.zip (30.8 KB ) - added by anonymous 10 years ago.
Updated patch
svn.16.diff.zip (7.8 KB ) - added by anonymous 10 years ago.
Updated patch
rapidxml_work_17.zip (35.4 KB ) - added by anonymous 10 years ago.
Source files (v17)
svn.17.diff.zip (8.8 KB ) - added by anonymous 10 years ago.
Diff (v17)
rapidxml_work_18.zip (35.6 KB ) - added by anonymous 10 years ago.
Source files (v18)
svn.18.diff.zip (8.5 KB ) - added by anonymous 10 years ago.
Diff (v18)
svn.23.diff.zip (3.4 KB ) - added by anonymous 10 years ago.
Diff against release branch
work_23.zip (11.6 KB ) - added by anonymous 10 years ago.
Source files

Download all attachments as: .zip

Change History (20)

by anonymous, 10 years ago

Attachment: rapidxml_work.zip added

by anonymous, 10 years ago

Attachment: svn.9.diff.zip added

Diff against release branch

comment:2 by anonymous, 10 years ago

Correction: 4 tests are failing because polymorphic archives haven't been implemented. 2 are failing because I hadn't implemented load_binary.

by anonymous, 10 years ago

Attachment: rapidxml_work_16.zip added

Updated patch

by anonymous, 10 years ago

Attachment: svn.16.diff.zip added

Updated patch

comment:3 by anonymous, 10 years ago

Update

I've fixed the two tests that were failing due to load_binary not being implemented.

I've still not looked into implementing polymorphic rapidxml archives.

by anonymous, 10 years ago

Attachment: rapidxml_work_17.zip added

Source files (v17)

by anonymous, 10 years ago

Attachment: svn.17.diff.zip added

Diff (v17)

by anonymous, 10 years ago

Attachment: rapidxml_work_18.zip added

Source files (v18)

by anonymous, 10 years ago

Attachment: svn.18.diff.zip added

Diff (v18)

comment:4 by anonymous, 10 years ago

Update

  • Added polymorphic rapixml archive support (much easier than I anticipated)
  • Added support for flags

All tests now pass (gcc 4.6 and clang 3.0)

I think the next thing I'll do is go through the tests in detail, see what's covered, and what needs covering.

comment:5 by Robert Ramey, 10 years ago

why have you not used the spirit parser as xml_iarchive does?

Seems to me that all this could have been achieved in a much simpler way with less code and less future maintenance requirement by updating the grammar on xml_iarchive. This would have guaranteed passing of all current tests out of the box.

Missing: a) Documentation/Explanation of what new features are offered and how they are used b) Tests of the new features

Robert Ramey

in reply to:  5 ; comment:6 by anonymous, 10 years ago

Replying to ramey:

why have you not used the spirit parser as xml_iarchive does?

For the poor reason that I don't have any experience with spirit.

Seems to me that all this could have been achieved in a much simpler way with less code and less future maintenance requirement by updating the grammar on xml_iarchive.

So would I be right in thinking that you would extend the grammar to support element content, (either data or child elements), and extend basic_xml_grammar with a new method parse_content In this way I could skip to the end tag of the start tag just read if the name didn't match.

This would have guaranteed passing of all current tests out of the box.

A feature of xml_iarchive is that top level elements don't have their name checked. If there are multiple top level elements and I wish to skip any of them I will need to inspect their names. So changing the tests in the way that I have seems inevitable.

Missing: a) Documentation/Explanation of what new features are offered and how they are used b) Tests of the new features

Indeed. I am well aware that at this time it falls short of being a patch, which is why I didn't label it as such. I was really just after feedback, which you have provided, and for which I am grateful.

It would be extremely useful for me to be able to skip extra data in my applications' configuration file. It would allow some support for ver.8 applications opening ver.9 config files, though the limitations would be many: I could only ever add extra fields, I suspect I wouldn't be able to skip data if it was reference by other parts of the xml. But even with these limitations it would still be a useful feature to me. And since others have suggested it, I thought worth pursuing.

in reply to:  6 comment:7 by anonymous, 10 years ago

Replying to anonymous:

Replying to ramey:

This would have guaranteed passing of all current tests out of the box.

A feature of xml_iarchive is that top level elements don't have their name checked. If there are multiple top level elements and I wish to skip any of them I will need to inspect their names. So changing the tests in the way that I have seems inevitable.

Actually, not being able to skip top level elements wouldn't be a significant limitation for me.

I'll give it some more thought. Thanks for your comments.

by anonymous, 10 years ago

Attachment: svn.23.diff.zip added

Diff against release branch

by anonymous, 10 years ago

Attachment: work_23.zip added

Source files

comment:8 by anonymous, 10 years ago

I've done as you suggested. But in thinking about the documentation, I've started to wonder if this really is a good idea. I certainly don't think it should be documented as a feature because of the restrictions on it's use. At best it could be described in terms of some forgiveness/tolerance in the load.

Anyway, I'll leave it for your consideration.

Reference -> Special Considerations -> XML Archives

In addition, the XML format permits skipping unexpected content. XML archives will skip unexpected data but if that data is require by other parts of the archive the load will fail. In particular:

  • Objects at the top of the archive may not be skipped.
  • It is not possible to skip the first occurrence of a tracked object.

comment:9 by Ramki T, 10 years ago

Hi,

I require forward compatibility in boost's xml serialization. So what is the status of this work? Please let me know.

Thanks & Regards, Ramki.

in reply to:  9 comment:10 by anonymous, 10 years ago

Replying to Ramki T:

Hi,

I require forward compatibility in boost's xml serialization. So what is the status of this work? Please let me know.

You might be better off watching https://svn.boost.org/trac/boost/ticket/8088

In the end I converted my project to use boost::property_tree. Older versions of the application, written to use boost::serialization, couldn't load the boost::property_tree XML, but since they did't have future compatibility anyway, it wasn't much of a loss. However applications written to use boost::property_tree can easily load boost::serialization XML. And once you've made the jump, boost::property_tree provides better support for skipping extra data, and handling missing data. All-in-all it was a fairly painless conversion.

There are caveats:

  • If you need object tracking then boost::property_tree is of no use to you. But as I noted above I couldn't work out a way to modify boost::serialization to support future compatibility and object tracking (not fully at any rate). Perhaps others are more insightful.
  • The two libraries almost certainly have different memory/performance characteristics but again that didn't affect my project.
  • If you need UTF-8 support you'll need to imbue your input/output stream before calling boost::property_tree::read_xml/write_xml. (UTF-8 support isn't out-of-the-box.)

I hope that's of some use to you.

Note: See TracTickets for help on using tickets.