Opened 11 years ago
Closed 10 years ago
#6308 closed Feature Requests (fixed)
Add sp_counted_base_aix.hpp using AIX atomic operations
Reported by: | Owned by: | Peter Dimov | |
---|---|---|---|
Milestone: | To Be Determined | Component: | smart_ptr |
Version: | Boost 1.48.0 | Severity: | Optimization |
Keywords: | Cc: |
Description
Add sp_counted_base_aix.hpp that uses fetch_and_add and compare_and_swap atomic operations available on AIX (see http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.kerneltechref%2Fdoc%2Fktechrf1%2Ffetch_and_add.htm).
The average time for each iteration in the attached test.cpp was reduced from 23 microseconds to 4.7 microseconds on a machine running AIX 5.3 with VisualAge C++ 8.0.
Attachments (6)
Change History (23)
by , 11 years ago
Attachment: | sp_counted_base_aix.patch added |
---|
comment:2 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
by , 11 years ago
Attachment: | sp_counted_base_aix_memory_barrier.patch added |
---|
Patch to add memory barrier to sp_counted_base_aix.hpp
comment:3 by , 11 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
As documented in smart_ptr/detail/atomic_count.hpp, in cases when the reference count is decremented to 0 we need a memory barrier before destroying the pointed-to object.
sp_counted_base_aix_memory_barrier.patch adds this using the isync instruction.
comment:4 by , 11 years ago
If fetch_and_add doesn't contain any memory barriers, a trailing isync is not enough, we need a leading (lw)sync as well. See sp_counted_base_gcc_ppc.hpp. Does AIX only work on PPC?
by , 11 years ago
Attachment: | sp_counted_base_aix_122311.hpp added |
---|
comment:5 by , 11 years ago
AIX only runs on POWER and PowerPC.
fetch_and_add is not documented to do any memory barrier. I looked at the instructions for fetch_and_add with listi in dbx. It does lwarx and stwcx but does not do lwsync, sync, or isync. So I believe it does not contain a barrier.
I am attaching a new version of sp_counted_base_aix.hpp (sp_counted_base_aix_122311.hpp). This version is written more in the style of sp_counted_base_gcc_ppc.hpp. I added a leading sync instruction to atomic_decrement and a trailing isync for a full barrier.
The performance of this version is a bit worse than my original no-barrier version. Each iteration of the test program now takes about 8.8 microseconds. This is still 2.5X faster than using pthread_mutex.
Thanks for your patience in getting this correct.
comment:6 by , 11 years ago
comment:7 by , 11 years ago
lwsync should be faster. We didn't use it in the PPC version because, as I recall, GCC didn't support it. I'm not sure what compiler are you targeting on AIX.
by , 11 years ago
Attachment: | sp_counted_base_aix_builtin_ns.patch added |
---|
follow-up: 10 comment:9 by , 11 years ago
I experimented with lwsync. It was slightly faster than sync in my tests (around 300 nanoseconds). lwsync does not enforce ordering of stores followed by loads (http://www.ibm.com/developerworks/systems/articles/powerpc.html). So I think I'll stick with sync for this reason, and to be consistent with sp_counted_base_gcc_ppc.hpp.
I am attaching one last (hopefully!) patch which does the following:
- Uses builtin functions sync() and isync() instead of inline asm.
- Puts the atomic_* functions inside the boost::detail namespace - I accidentally put them in the boost namespace previously.
Thanks again for all your help on this.
comment:10 by , 11 years ago
Replying to aaron.riekenberg@…:
I experimented with lwsync. It was slightly faster than sync in my tests (around 300 nanoseconds). lwsync does not enforce ordering of stores followed by loads (http://www.ibm.com/developerworks/systems/articles/powerpc.html).
Yes, I know what lwsync does. You don't need sync here.
comment:11 by , 11 years ago
I convinced myself you are correct about lwsync.
Average performance of all approaches in test.cpp per iteration:
Original pthread_mutex implementation: 23 microseconds Atomic operations, barrier using sync: 8.8 microseconds Atomic operations, barrier using lwsync: 8.2 microseconds Atomic operations, no barrier (broken): 4.7 microseconds
comment:12 by , 11 years ago
comment:14 by , 10 years ago
What is the status of sp_counted_base_aix.hpp? It is not #included in sp_counted_base.hpp, but it is a part of the official sources and there is no warning that it is e.g. experimental or for testing only.
comment:15 by , 10 years ago
Hmm. It seems that the patch in #6667 removed the #ifdef _AIX part of sp_counted_base.hpp for some reason and I didn't notice.
comment:16 by , 10 years ago
comment:17 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Patch