Opened 13 years ago
Last modified 11 years ago
#3030 new Patches
[crc] optimization of crc_optimal::process_block
Reported by: | Owned by: | Daryle Walker | |
---|---|---|---|
Milestone: | Boost 1.40.0 | Component: | crc |
Version: | Boost 1.39.0 | Severity: | Optimization |
Keywords: | crc | Cc: | phprus@… |
Description
Function 'process_block()' can be easily optimized (I got about 20% in performance on ARM9/GCC 3.4.3/4.2.0)
Attachments (1)
Change History (8)
by , 13 years ago
comment:2 by , 11 years ago
The patch replaces using a data member with using a local variable, assigning between the two at the starting and finishing points of the member function. How does this speed things up instead of slowing them down (by using an extra object)? Is member object access that slow, or is there some cache issue?! This seems to be configuration-specific, and I don't want to add something that may help on some systems but hurt on others.
comment:3 by , 11 years ago
Compiler is assuming (and it is correct assumption from compiler's point of view) that memory location of member variable 'this->rem' can be between 'bytes_begin' and 'bytes_end'.
So, compiler _must_ generate non-optimal code for this loop that always stores temporary value of 'rem' variable into memory before processing every next byte.
It should be same or faster on every system.
I can check it on msvc2008, armv7-gcc3.2/gcc4.6, intel x86/amd64 later, if it needed.
comment:5 by , 11 years ago
I made a (quick-and-dirty) test. Just two source files, to measure time of original and optimized version of crc32 calculation.
http://pastebin.com/BQDyBmNx http://pastebin.com/ZUmLB9Cc
And run this test on four platforms: linux-amd64 with gcc 4.6, linux-x86 with gcc 4.6, armv7 with gcc 4.6, win32 with msvc 2008. On Linux "-O3" optimization option was used, on win32 "-O2 -EHsc -MD" options were used.
Results:
- linux-amd64 with gcc 4.6: same time (no difference),
- linux-x86 with gcc 4.6: 105 ms before optimization, 93 ms after (12% better)
- armv7 with gcc 4.6: 360 ms before optimization, 305 ms after (18% better)
- win32 with msvc 2008: 173 ms before optimization, 87 ms after (98% better)
Pretty good results for just changing few lines of code.
comment:6 by , 11 years ago
I did a major re-factoring of the reflection and CRC table code. By moving the actual CRC computation step to a new function, I think I did your suggestion here. The timing tests from crc_test.cpp
suggest a doubling in speed (from 39% speed of the reference to 85%)! The changes are at [76197].
comment:7 by , 11 years ago
Cc: | added |
---|
MS Visual C++ 2010 does not support boolean operator "not" without custom compiler settings.
Error:
...\boost/crc.hpp(591) : error C2146: syntax error : missing ')' before identifier 'reflect' ...\boost/crc.hpp(592) : error C2059: syntax error : ')'
And the same code on line 710.
To fix this bug replace "not reflect," to "!reflect," in file boost/crc.hpp
patch file for crc.hpp