Opened 11 years ago

Closed 10 years ago

#6308 closed Feature Requests (fixed)

Add sp_counted_base_aix.hpp using AIX atomic operations

Reported by: aaron.riekenberg@… Owned by: Peter Dimov
Milestone: To Be Determined Component: smart_ptr
Version: Boost 1.48.0 Severity: Optimization
Keywords: Cc:

Description

Add sp_counted_base_aix.hpp that uses fetch_and_add and compare_and_swap atomic operations available on AIX (see http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.kerneltechref%2Fdoc%2Fktechrf1%2Ffetch_and_add.htm).

The average time for each iteration in the attached test.cpp was reduced from 23 microseconds to 4.7 microseconds on a machine running AIX 5.3 with VisualAge C++ 8.0.

Attachments (6)

sp_counted_base_aix.patch (3.3 KB ) - added by aaron.riekenberg@… 11 years ago.
Patch
test.cpp (1.3 KB ) - added by aaron.riekenberg@… 11 years ago.
Benchmark program
sp_counted_base_aix_memory_barrier.patch (622 bytes ) - added by aaron.riekenberg@… 11 years ago.
Patch to add memory barrier to sp_counted_base_aix.hpp
sp_counted_base_aix_122311.hpp (2.9 KB ) - added by aaron.riekenberg@… 11 years ago.
sp_counted_base_aix_builtin_ns.patch (744 bytes ) - added by aaron.riekenberg@… 11 years ago.
sp_counted_base_aix_lwsync.patch (746 bytes ) - added by aaron.riekenberg@… 11 years ago.
Switch from sync to lwsync

Download all attachments as: .zip

Change History (23)

by aaron.riekenberg@…, 11 years ago

Attachment: sp_counted_base_aix.patch added

Patch

by aaron.riekenberg@…, 11 years ago

Attachment: test.cpp added

Benchmark program

comment:1 by Peter Dimov, 11 years ago

(In [76086]) Apply AIX patch from #6308. Refs #6308.

comment:2 by Peter Dimov, 11 years ago

Resolution: fixed
Status: newclosed

(In [76087]) Merge [76086] to release. Fixes #6308.

by aaron.riekenberg@…, 11 years ago

Patch to add memory barrier to sp_counted_base_aix.hpp

comment:3 by aaron.riekenberg@…, 11 years ago

Resolution: fixed
Status: closedreopened

As documented in smart_ptr/detail/atomic_count.hpp, in cases when the reference count is decremented to 0 we need a memory barrier before destroying the pointed-to object.

sp_counted_base_aix_memory_barrier.patch adds this using the isync instruction.

comment:4 by Peter Dimov, 11 years ago

If fetch_and_add doesn't contain any memory barriers, a trailing isync is not enough, we need a leading (lw)sync as well. See sp_counted_base_gcc_ppc.hpp. Does AIX only work on PPC?

by aaron.riekenberg@…, 11 years ago

comment:5 by aaron.riekenberg@…, 11 years ago

AIX only runs on POWER and PowerPC.

fetch_and_add is not documented to do any memory barrier. I looked at the instructions for fetch_and_add with listi in dbx. It does lwarx and stwcx but does not do lwsync, sync, or isync. So I believe it does not contain a barrier.

I am attaching a new version of sp_counted_base_aix.hpp (sp_counted_base_aix_122311.hpp). This version is written more in the style of sp_counted_base_gcc_ppc.hpp. I added a leading sync instruction to atomic_decrement and a trailing isync for a full barrier.

The performance of this version is a bit worse than my original no-barrier version. Each iteration of the test program now takes about 8.8 microseconds. This is still 2.5X faster than using pthread_mutex.

Thanks for your patience in getting this correct.

comment:6 by Peter Dimov, 11 years ago

(In [76119]) Add memory barriers to sp_counted_base_aix.hpp. Refs #6308.

comment:7 by Peter Dimov, 11 years ago

lwsync should be faster. We didn't use it in the PPC version because, as I recall, GCC didn't support it. I'm not sure what compiler are you targeting on AIX.

comment:8 by Peter Dimov, 11 years ago

(In [76120]) Merge [76119] to release. Refs #6308.

by aaron.riekenberg@…, 11 years ago

comment:9 by aaron.riekenberg@…, 11 years ago

I experimented with lwsync. It was slightly faster than sync in my tests (around 300 nanoseconds). lwsync does not enforce ordering of stores followed by loads (http://www.ibm.com/developerworks/systems/articles/powerpc.html). So I think I'll stick with sync for this reason, and to be consistent with sp_counted_base_gcc_ppc.hpp.

I am attaching one last (hopefully!) patch which does the following:

  1. Uses builtin functions sync() and isync() instead of inline asm.
  1. Puts the atomic_* functions inside the boost::detail namespace - I accidentally put them in the boost namespace previously.

Thanks again for all your help on this.

in reply to:  9 comment:10 by Peter Dimov, 11 years ago

Replying to aaron.riekenberg@…:

I experimented with lwsync. It was slightly faster than sync in my tests (around 300 nanoseconds). lwsync does not enforce ordering of stores followed by loads (http://www.ibm.com/developerworks/systems/articles/powerpc.html).

Yes, I know what lwsync does. You don't need sync here.

by aaron.riekenberg@…, 11 years ago

Switch from sync to lwsync

comment:11 by aaron.riekenberg@…, 11 years ago

I convinced myself you are correct about lwsync.

Average performance of all approaches in test.cpp per iteration:

Original pthread_mutex implementation: 23 microseconds Atomic operations, barrier using sync: 8.8 microseconds Atomic operations, barrier using lwsync: 8.2 microseconds Atomic operations, no barrier (broken): 4.7 microseconds

comment:12 by Peter Dimov, 11 years ago

(In [76123]) sp_counted_base_aix.hpp: switch to lwsync and builtins. Refs #6308.

comment:13 by Peter Dimov, 11 years ago

(In [76124]) Merge [76123] to release. Refs #6308.

comment:14 by vfried@…, 10 years ago

What is the status of sp_counted_base_aix.hpp? It is not #included in sp_counted_base.hpp, but it is a part of the official sources and there is no warning that it is e.g. experimental or for testing only.

comment:15 by Peter Dimov, 10 years ago

Hmm. It seems that the patch in #6667 removed the #ifdef _AIX part of sp_counted_base.hpp for some reason and I didn't notice.

comment:16 by Peter Dimov, 10 years ago

(In [81134]) Add back _AIX-specific #ifdef that was mistakenly removed. Refs #6308. Refs #6667.

comment:17 by Peter Dimov, 10 years ago

Resolution: fixed
Status: reopenedclosed

(In [81336]) Merged [81134] from trunk. Fixes #6308. Refs #6667.

Note: See TracTickets for help on using tickets.