Boost C++ Libraries: Ticket #11600: boost property_tree exponential newline growth

attachment set

Sun, 30 Aug 2015 14:07:21 GMT

attachment → property_tree_bugreport.tar.bz2

Testcase and diff

Mon, 26 Oct 2015 15:08:15 GMT

I made a pull request fixing the newline introduction. https://github.com/boostorg/property_tree/pull/16

The newline and tab translation behaviour is unchanged, but I think it should still be fixed. I can send a pull request for that immediately, too, if required.

Sebastian Redl — Wed, 10 Feb 2016 12:56:49 GMT

XML whitespace behavior is a mess, but anything that introduces greater roundtrip fidelity under non-strip_whitespace mode is an improvement. If you still have that second pull request, please send it.

Wed, 10 Feb 2016 16:01:29 GMT

Thank you for merging!

I made a pull request for the second issue and explained it in more detail there: https://github.com/boostorg/property_tree/pull/18

status changed; resolution set

Sebastian Redl — Thu, 11 Feb 2016 09:35:04 GMT

status new → closed
resolution → invalid

I've reverted this. After thinking it over, it doesn't make sense to parse XML without stripping whitespace, writing it out in pretty-print mode, and expecting this to roundtrip.

Thu, 11 Feb 2016 09:51:54 GMT

Without my code you currently cannot round-trip the following XML using Property Tree:

<XML>

<Text>AB CD</Text>

</XML>

There are two protected spaces in between AB and CD. I need those two spaces.

Using the previous boost 1.59 code you end up with:

<?xml version="1.0" encoding="utf-8"?> <XML>


 
 
 

 

 
 <Text>AB CD</Text>

</XML>

This is good, because the text is not broken, but if the XML is rewritten several million times you end up with a GB of '
' (or actual newlines in case of pull request #18).

Using trim_whitespace you end up with:

<?xml version="1.0" encoding="utf-8"?> <XML>

<Text>AB CD</Text>

</XML>

which makes everything look nice, but the double whitespace in the middle is gone.

The problem here is that property_tree::trim_whitespace is converted to rapidxml::normalize_whitespace. The trim_whitespace option of rapidxml is not exposed.

My change (incorrectly, you are right) trims whitespace, even if trim_whitespace is not enabled. Therefore my suggestion would be: I send a third Pull request in which I integrate a new option boost::property_tree::xml_parser::trim_but_dont_normalize_whitespace, which enables rapidxml::trim_whitespace but not rapidxml::normalize_whitespace.

Would this be acceptable for you?

Thu, 11 Feb 2016 09:57:29 GMT

Quick remark: "My change (incorrectly, you are right) trims whitespace, even if trim_whitespace is not enabled."

This is not completely correct. It actually only trims, if the xml element contains ONLY whitespace. So I could integrate the option boost::property_tree::xml_parser::prune_whitespace_xml_data