Opened 13 years ago

Last modified 11 years ago

#3030 new Patches

[crc] optimization of crc_optimal::process_block

Reported by: qehgt0@… Owned by: Daryle Walker
Milestone: Boost 1.40.0 Component: crc
Version: Boost 1.39.0 Severity: Optimization
Keywords: crc Cc: phprus@…

Description

Function 'process_block()' can be easily optimized (I got about 20% in performance on ARM9/GCC 3.4.3/4.2.0)

Attachments (1)

my.patch (1003 bytes ) - added by qehgt0@… 13 years ago.
patch file for crc.hpp

Download all attachments as: .zip

Change History (8)

by qehgt0@…, 13 years ago

Attachment: my.patch added

patch file for crc.hpp

comment:1 by Olaf van der Spek <olafvdspek@…>, 11 years ago

Daryle?

comment:2 by Daryle Walker, 11 years ago

The patch replaces using a data member with using a local variable, assigning between the two at the starting and finishing points of the member function. How does this speed things up instead of slowing them down (by using an extra object)? Is member object access that slow, or is there some cache issue?! This seems to be configuration-specific, and I don't want to add something that may help on some systems but hurt on others.

comment:3 by Vasily Titskiy <qehgt0@…>, 11 years ago

Compiler is assuming (and it is correct assumption from compiler's point of view) that memory location of member variable 'this->rem' can be between 'bytes_begin' and 'bytes_end'.

So, compiler _must_ generate non-optimal code for this loop that always stores temporary value of 'rem' variable into memory before processing every next byte.

It should be same or faster on every system.

I can check it on msvc2008, armv7-gcc3.2/gcc4.6, intel x86/amd64 later, if it needed.

comment:4 by Vasily Titskiy <qehgt0@…>, 11 years ago

It should be same or faster on every system

After patch, I mean.

comment:5 by Vasily Titskiy <qehgt0@…>, 11 years ago

I made a (quick-and-dirty) test. Just two source files, to measure time of original and optimized version of crc32 calculation.

http://pastebin.com/BQDyBmNx http://pastebin.com/ZUmLB9Cc

And run this test on four platforms: linux-amd64 with gcc 4.6, linux-x86 with gcc 4.6, armv7 with gcc 4.6, win32 with msvc 2008. On Linux "-O3" optimization option was used, on win32 "-O2 -EHsc -MD" options were used.

Results:

  • linux-amd64 with gcc 4.6: same time (no difference),
  • linux-x86 with gcc 4.6: 105 ms before optimization, 93 ms after (12% better)
  • armv7 with gcc 4.6: 360 ms before optimization, 305 ms after (18% better)
  • win32 with msvc 2008: 173 ms before optimization, 87 ms after (98% better)

Pretty good results for just changing few lines of code.

comment:6 by Daryle Walker, 11 years ago

I did a major re-factoring of the reflection and CRC table code. By moving the actual CRC computation step to a new function, I think I did your suggestion here. The timing tests from crc_test.cpp suggest a doubling in speed (from 39% speed of the reference to 85%)! The changes are at [76197].

comment:7 by Vladislav <phprus@…>, 11 years ago

Cc: phprus@… added

MS Visual C++ 2010 does not support boolean operator "not" without custom compiler settings.

Error:

...\boost/crc.hpp(591) : error C2146: syntax error : missing ')' before identifier 'reflect'
...\boost/crc.hpp(592) : error C2059: syntax error : ')'

And the same code on line 710.

To fix this bug replace "not reflect," to "!reflect," in file boost/crc.hpp

Note: See TracTickets for help on using tickets.