Opened 12 years ago

Closed 7 years ago

#4523 closed Patches (fixed)

fix for #4501

Reported by: ookami1@… Owned by: No-Maintainer
Milestone: Boost 1.44.0 Component: preprocessor
Version: Boost 1.44.0 Severity: Problem
Keywords: Cc:

Description

The following three patches fix issue #4501. detail_auto_rec.patch and repetition_for.patch extend the maximum loop count in BOOST_PP_FOR from 255 to 256. seq_for_each.patch modifies the state of BOOST_PP_SEQ_FOR_EACH in such a way that the calculation of the sequence size is done just once now, and its cached value is used for loop control instead. This cuts down on the evaluation cost especially for long sequences. Small sequences benefit from an early-out execution path, in the frequent case, when no outer for loop blocks iteration slots.

I manually enabled configuration flags 0x0004, 0x0010, 0x0020 (MSVC, BCC, EDG resp.) to enforce the execution of compiler specific code under GCC, and they compiled and executed fine, but that is only a poor replacement for the native compilers, of course. The code for MWCC and DMG does not conform to C90 or later, so GNU gcc could not be fooled to emulate them.

cheers

Wolf Lammen

complexity:

BOOST_CONFIG_FLAGS 0x0001 (strict)

boost 1.43.0:                     patched:

  0 elements   345 replacements      67 replacements
  1            388                  105
  2            431                  142
  4            520                  216
  8            710                  364
 16           1138                  660
 32           2186                 1252
 64           5050                 2436
128          13850                 4804
255 elements 43441 replacements    9503
256          failure               9540


Attachments (6)

detail_auto_rec.patch (1.3 KB ) - added by ookami1@… 12 years ago.
repetition_for.patch (7.2 KB ) - added by ookami1@… 12 years ago.
seq_for_each.patch (3.9 KB ) - added by ookami1@… 12 years ago.
detail_auto_rec.2.patch (100.0 KB ) - added by ookami1@… 12 years ago.
repetition_for.2.patch (5.9 KB ) - added by ookami1@… 12 years ago.
repetition_for.optimize.patch (122.6 KB ) - added by anonymous 12 years ago.

Download all attachments as: .zip

Change History (21)

by ookami1@…, 12 years ago

Attachment: detail_auto_rec.patch added

by ookami1@…, 12 years ago

Attachment: repetition_for.patch added

by ookami1@…, 12 years ago

Attachment: seq_for_each.patch added

comment:1 by Steven Watanabe, 12 years ago

For the future, it would have been better to have added this as a comment to #4501 instead of creating a new ticket.

comment:2 by ookami1@…, 12 years ago

Hold on a second here. This cannot be applied unmodified without breaking other code.

If I was to design the preprocessor code from scratch, I would start looping indices from 0, as this would simplify their handling in several places. Unfortunately, there is already code out there, that expects them starting from 1, so BOOST_PP_FOR must not shift the start index to 0, as suggested by the patch. I'll come back to this, once I have some spare time again.

cheers Wolf Lammen

comment:3 by ookami1@…, 12 years ago

As a first step, the following patch modifies the automatic detection of free loop slots. The 1.44 implementation returns values between 1 and 256 only for calls to BOOST_PP_AUTO_REC(predicate, 256). This is an insufficient range for loops handling sequences of maximum length.

This patch extends the range of returned values for BOOST_PP_AUTO_REC(predicate, n) from 1 to n+1. This is in preparation for other patches following, for now, you normally should not see any difference. The extra value is delivered only when a loop overflows anyway, and so, at most, error cases are affected.

The patch speeds up the expension of all outermost loops, but increases the number of expansions by a few percent for all inner loops.

cheers

Wolf Lammen

by ookami1@…, 12 years ago

Attachment: detail_auto_rec.2.patch added

comment:4 by ookami1@…, 12 years ago

The following patch allows BOOST_PP_FOR to run up to 256 iterations, one more than 1.44 code does. This extension is nessecary to make a couple of macros in preprocessor/seq expand successfully on sequences of maximum length, and it brings BOOST_PP_FOR in line with BOOST_PP_LIMIT_FOR indicating 256 successful iterations.

In a loop of n iterations, you need n loop control checks for each iteration, and one final extra check to exit the loop. This last extra check is not always provided for in the 1.44 BOOST_PP_FOR implementation. If the loop count reaches 256, the maximum allowed, the 257-th cycle expansion unconditionally results in an error, and, thus, omits the final check.

The idea here is to run into an error only, if a final check indicates continuation of the loop. As a consequence, the 257-th loop cycle BOOST_PP_FOR_257 becomes a meaningful entry point into the loop, which in turn requires an extension of BOOST_PP_AUTO_REC. The previous patch does the nessecary modifications, and needs to be applied as well.

It would have had been nice, if indexing of loop iterations had started with 0, so reserving the range 0 - 255 for successful iterations. It would have meant, all indices were from the preprocessor integer range and would have appeared in natural order. Unfortunately, it was once decided to start with 1, and meanwhile there is code out there, that relies on this fact.

In order to not break outside code, I decided to leave the indexing as is, i.e running from 1 to 256, then index 257 assigned to the ultimate final check cycle, and index 0 reserved for the unconditionally failing error macro.

The proposed changes here should not affect code that runs successfully under 1.44 code, as it extends BOOST_PP_FOR only. Code triggering errors in BOOST_PP_FOR is likely to see different behavior, as previously failing loops may now succeed, and error indices have been moved. I was unable to locate an instance inside Boost itself, where this hurts.

The following code fails under the current 1.44 implementation, but will run successfully after having the patches applied:

#include <boost/preprocessor.hpp>

#define P(r, s) s
#define O(r, s) BOOST_PP_DEC(s)
#define M(r, s) (s)
char const a[] = BOOST_PP_STRINGIZE(BOOST_PP_FOR(256, P, O, M));

int main()
{
	printf("%s", a);
	return 0;
}


Cheers

Wolf Lammen

by ookami1@…, 12 years ago

Attachment: repetition_for.2.patch added

comment:5 by ookami1@…, 12 years ago

While I am at BOOST_PP_FOR, I suggest the following simplification, that reduces

  • file size;
  • text to store in macros;
  • evaluation costs, measured in replacements;

for all preprocessors complying to either C90 or C99. Broken preprocessors are not affected, as their code is held in separate files.

This is pure optimization, the behavior should not change, and these modifications are orthogonal to the changes in the previous patch.

cheers

Wolf Lammen

by anonymous, 12 years ago

comment:6 by ookami1@…, 12 years ago

The complete revisited set of patches now comprises

  • detail_auto_rec.2.patch
  • repetition_for.2.patch
  • seq_for_each.patch

If you like, add

  • repetition_for.optimize.patch

which further cuts down on evaluation costs (last line in the table above: from 9540 down to 8247)

Cheers

Wolf Lammen

comment:7 by Edward Diener, 7 years ago

I am not seeing this example failing in the latest version of Boost, which is Boost 1.58. I do see a failure in VC++ stating:

fatal error C1009: compiler limit : macros nested too deeply

but that is a compiler problem which Boost PP cannot solve.

Your test passes with gcc and clang.

comment:8 by ookami1@…, 7 years ago

Weird.

I checked my example program against boost 1.58, and still got following output (the obvious middle part replaced by ...):

(256) (255) (254)...(5) (4) (3) (2) (1) BOOST_PP_ERROR(0x0002, BOOST_PP_FOR_OVERFLOW).

I cannot confirm that the problem is fixed.

Cheers Wolf

in reply to:  7 comment:9 by anonymous, 7 years ago

Seems to be not fixed, see my comment below.

Some more information:

gcc -v yields ... gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)

uname -a: Linux wolf-tux 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Replying to eldiener:

I am not seeing this example failing in the latest version of Boost, which is Boost 1.58. I do see a failure in VC++ stating:

fatal error C1009: compiler limit : macros nested too deeply

but that is a compiler problem which Boost PP cannot solve.

Your test passes with gcc and clang.

comment:10 by Edward Diener, 7 years ago

Both of you are correct. My test passed but the output for the seq_256 test is junk. I will look into this further.

comment:11 by Edward Diener, 7 years ago

I have fixed this on the latest 'develop' branch of Boost PP. I have used a different technique than the patches submitted as I did not want to make as drastic a change as was suggested to auto_rec. I did use the technique suggested in the seq_for_each patch of caching the seq size rather than having to recalculate it each time. The fix further consists of accepting the 257th inner-loop in BOOST_PP_FOR without forcing an error as long the op(r,state) returned 0, meaning the BOOST_PP_FOR processing is finished.

I tested this successfully with gcc and clang with the example given in #4501. The VC++ compiler issues a compiler error:

fatal error C1009: compiler limit : macros nested too deeply

but Boost PP can do nothing about that. VC++ succeeds when the number of sequence elements are more reasonable, although I did not experiment with the exact maximum number on which VC++ will process a BOOST_PP_SEQ_FOR_EACH correctly.

Last edited 7 years ago by Edward Diener (previous) (diff)

comment:12 by Edward Diener, 7 years ago

I have had to back out this change in 'develop' because of a problem encountered. I will be working on another fix for this problem.

comment:13 by Edward Diener, 7 years ago

I have corrected the fix for this bug and it is now on the 'develop' branch. The previous fix was not wrong itself but led to potential warnings from various compilers in C++03 mode. The corrected fix avoids any warnings in any C++ mode.

comment:14 by Edward Diener, 7 years ago

This has now been fixed in the latest Boost 1.59 release.

comment:15 by Edward Diener, 7 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.