Opened 10 years ago
Last modified 10 years ago
#7303 new Feature Requests
XML Serialization - Skip/Ignore unexpected data.
Reported by: | anonymous | Owned by: | Robert Ramey |
---|---|---|---|
Milestone: | To Be Determined | Component: | serialization |
Version: | Boost 1.51.0 | Severity: | Not Applicable |
Keywords: | Cc: |
Description
Please find attached an extension to boost::serialization. The purpose of this work is to add some support for forward compatibility of boost::serialization XML files; specifically the ability to skip/ignore unexpected data.
I would describe the patch as a "first working version", the tests all pass (gcc 4.6) with some expected failures (see below), but further work is required. I guess I'm trying to gauge interest, get some feedback on the implementation, and get inspired enough to invest more time in it.
Implementation
Two new archive types, rapidxml_iarchive
, and rapidxml_wiarchive
, have been created. Their implementation is based on xml_[w]iarchive
with the XML parsing provided by the rapidxml parser used in boost::property_tree.
This seemed the best approach to the problem as it avoided issues with ungetc
.
Polymorphic versions of rapidxml_[w]iarchive
have not been implemented.
Test status
All tests are passing with the following caveats:
- Polymorphic rapidxml archives have not been implemented resulting in 6 tests failing to compile
- The following tests have had to be tweaked to accommodate rapidxml_[w]iarchive not ignoring element names
- test_derived_class
- test_recursion
- test_nvp
- test_non_default_ctor2
- test_diamond
- test_diamond_complex
Notes and further work
The current implementation is a "first working version" and requires some polishing. There are a number of things that require further investigation, broadly speaking they can be categorized as:
- Better reuse
- Factor out a base class (templated on char type) for
rapidxml_iarchive
andrapidxml_wiarchive
- Some code could be shared between
rapidxml_[w]iarchive
andxml_[w]iarchive
- See
[rapid]xml_iarchive::load(std::wstring&)
- See
[rapid]xml_wiarchive::load(std::string&)
- See
- Factor out a base class (templated on char type) for
- Better error handling
- Flag support
- Currently there are no plans to support any kind of flags/alternative behaviour so the existing flag code may need to be removed
- Miscellany
- Go through comments to see what's still relevant
- Replace history map with vector
Attachments (10)
Change History (20)
by , 10 years ago
Attachment: | rapidxml_work.zip added |
---|
by , 10 years ago
Attachment: | svn.9.diff.zip added |
---|
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Correction: 4 tests are failing because polymorphic archives haven't been implemented. 2 are failing because I hadn't implemented load_binary
.
comment:3 by , 10 years ago
Update
I've fixed the two tests that were failing due to load_binary
not being implemented.
I've still not looked into implementing polymorphic rapidxml archives.
comment:4 by , 10 years ago
Update
- Added polymorphic rapixml archive support (much easier than I anticipated)
- Added support for flags
All tests now pass (gcc 4.6 and clang 3.0)
I think the next thing I'll do is go through the tests in detail, see what's covered, and what needs covering.
follow-up: 6 comment:5 by , 10 years ago
why have you not used the spirit parser as xml_iarchive does?
Seems to me that all this could have been achieved in a much simpler way with less code and less future maintenance requirement by updating the grammar on xml_iarchive. This would have guaranteed passing of all current tests out of the box.
Missing: a) Documentation/Explanation of what new features are offered and how they are used b) Tests of the new features
Robert Ramey
follow-up: 7 comment:6 by , 10 years ago
Replying to ramey:
why have you not used the spirit parser as xml_iarchive does?
For the poor reason that I don't have any experience with spirit.
Seems to me that all this could have been achieved in a much simpler way with less code and less future maintenance requirement by updating the grammar on xml_iarchive.
So would I be right in thinking that you would extend the grammar to support element content, (either data or child elements), and extend basic_xml_grammar
with a new method parse_content
In this way I could skip to the end tag of the start tag just read if the name didn't match.
This would have guaranteed passing of all current tests out of the box.
A feature of xml_iarchive
is that top level elements don't have their name checked. If there are multiple top level elements and I wish to skip any of them I will need to inspect their names. So changing the tests in the way that I have seems inevitable.
Missing: a) Documentation/Explanation of what new features are offered and how they are used b) Tests of the new features
Indeed. I am well aware that at this time it falls short of being a patch, which is why I didn't label it as such. I was really just after feedback, which you have provided, and for which I am grateful.
It would be extremely useful for me to be able to skip extra data in my applications' configuration file. It would allow some support for ver.8 applications opening ver.9 config files, though the limitations would be many: I could only ever add extra fields, I suspect I wouldn't be able to skip data if it was reference by other parts of the xml. But even with these limitations it would still be a useful feature to me. And since others have suggested it, I thought worth pursuing.
comment:7 by , 10 years ago
Replying to anonymous:
Replying to ramey:
This would have guaranteed passing of all current tests out of the box.
A feature of
xml_iarchive
is that top level elements don't have their name checked. If there are multiple top level elements and I wish to skip any of them I will need to inspect their names. So changing the tests in the way that I have seems inevitable.
Actually, not being able to skip top level elements wouldn't be a significant limitation for me.
I'll give it some more thought. Thanks for your comments.
comment:8 by , 10 years ago
I've done as you suggested. But in thinking about the documentation, I've started to wonder if this really is a good idea. I certainly don't think it should be documented as a feature because of the restrictions on it's use. At best it could be described in terms of some forgiveness/tolerance in the load.
Anyway, I'll leave it for your consideration.
Reference -> Special Considerations -> XML Archives
In addition, the XML format permits skipping unexpected content. XML archives will skip unexpected data but if that data is require by other parts of the archive the load will fail. In particular:
- Objects at the top of the archive may not be skipped.
- It is not possible to skip the first occurrence of a tracked object.
follow-up: 10 comment:9 by , 10 years ago
Hi,
I require forward compatibility in boost's xml serialization. So what is the status of this work? Please let me know.
Thanks & Regards, Ramki.
comment:10 by , 10 years ago
Replying to Ramki T:
Hi,
I require forward compatibility in boost's xml serialization. So what is the status of this work? Please let me know.
You might be better off watching https://svn.boost.org/trac/boost/ticket/8088
In the end I converted my project to use boost::property_tree. Older versions of the application, written to use boost::serialization, couldn't load the boost::property_tree XML, but since they did't have future compatibility anyway, it wasn't much of a loss. However applications written to use boost::property_tree can easily load boost::serialization XML. And once you've made the jump, boost::property_tree provides better support for skipping extra data, and handling missing data. All-in-all it was a fairly painless conversion.
There are caveats:
- If you need object tracking then boost::property_tree is of no use to you. But as I noted above I couldn't work out a way to modify boost::serialization to support future compatibility and object tracking (not fully at any rate). Perhaps others are more insightful.
- The two libraries almost certainly have different memory/performance characteristics but again that didn't affect my project.
- If you need UTF-8 support you'll need to imbue your input/output stream before calling boost::property_tree::read_xml/write_xml. (UTF-8 support isn't out-of-the-box.)
I hope that's of some use to you.
Diff against release branch