Opened 7 years ago
Last modified 5 years ago
#11798 assigned Tasks
Implementation of boost::shared_mutex on POSIX is suboptimal
Reported by: | Owned by: | viboes | |
---|---|---|---|
Milestone: | To Be Determined | Component: | thread |
Version: | Boost 1.59.0 | Severity: | Optimization |
Keywords: | pthread shared_mutex performance concurrency spinlock | Cc: | Andrey.Semashev@… |
Description
The current (as of boost 1.59) implementation of boost::shared_mutex for 'pthread' is pretty suboptimal as it's using a heavyweight mutex to guard the internal mutex state. This is more evident when shared locks are used to guard state whose access concurrency is high, due to contention on the mutex state lock (in these cases, the shared mutex is effectively exclusive). In comparison, the 'win32' implementation uses lightweight spinlocks underneath.
There are a couple options to fix this for 'pthread', e.g. using a spinlock or leveraging pthreads_rwlock. I'm happy to provide with an initial patch for this.
Attachments (2)
Change History (17)
comment:1 by , 7 years ago
Owner: | changed from | to
---|---|
Severity: | Problem → Optimization |
Status: | new → assigned |
Type: | Bugs → Tasks |
comment:2 by , 7 years ago
Sounds good, I'll push a patch. I meant "initial" as I've not committed patches against boost before, so I'll be learning the process as I go :)
The code in the patch will be guarded by compile-time flags.
comment:3 by , 7 years ago
A very initial patch is below. Notably, it's lacking a couple things that will prevent this to be production-ready:
- Doesn't implement the timeout-based locking public methods. This should be relatively straightforward using pthread_rwlock_timedrdlock(3) and pthread_rwlock_timedwrlock(3).
- Doesn't implement any of the upgrade methods. This will be somewhat more convoluted, but perhaps we can leverage the existing boost::mutex to implement those upgrades atomically.
- I'm unclear how this would interact with thread interruption. Have not looked into this enough to have insights at this moment.
Also, as for code style wrt per-platform ifdefs, exceptions (this patch throws if the pthread_* methods fail, which is unexpected), etc, I'm open to pointers so that I can polish the patch.
comment:4 by , 7 years ago
Updated the patch with a minor fix.
Also note that as mentioned in the original description, we can alternatively keep most of the existing implementation the same and use a pthread_spin_lock instead of a boost::mutex to guard the internal state. This will make the implementation quite similar to the win32 one.
comment:5 by , 7 years ago
Hi,
if you prefer, you can create your own git fork and propose a PR on github. This will allow me to check it without needing to do any merge.
Hmm, I missed that you talked of pthread_rwlock_xxx. I believe that this direction will be incompatible with the whole set of function (interruptions, upgrade, downgrade, ...), but who knows.
comment:6 by , 7 years ago
Was unaware that boost has moved to github - a pleasant surprise :)
WRT the pthread_rwlock_xxx interface - I'm also starting to see that the alternate route (i.e. having a lightweight spin lock instead of a mutex, similar to the win32 implementation) might be more viable. I'll have a PR on github with this approach soon.
comment:7 by , 7 years ago
Hi, please, don't provide a patch with a lightweight spin lock for this class. I believe that it would merit another name. BTW, there are already spin lock in Boost (I don't remember if in detail or in smart_ptr).
comment:8 by , 6 years ago
I think I just ran into this bug with this repo: https://github.com/silverhammermba/sanity
Basically I'm timing how long it takes a bunch of threads to do concurrent reads on a std::unordered_map. In one case the reads are protected by std::lock_guard<std::mutex>, so each thread has to wait its turn. In the other case, the reads are protected by boost::shared_lock<boost::shared_mutex>, so other than checking in with the shared_mutex the reads should be concurrent.
One would think this is the ideal use-case for a shared lock: lots of reads, no writes. But on my Arch Linux box the shared_lock version is almost twice as slow as a naive mutex. My friend compiled the same code on Windows and saw the exact opposite, with shared_lock being almost twice as fast (as we would hope).
This is not just a matter of optimization, this makes boost::shared_mutex completely useless on Linux. Please upgrade the severity of this report.
comment:11 by , 6 years ago
Hi, I'm the maintainer of Boost.Thread, but I didn't wrote the code for shared mutex.
Any help on this issue will be recognized by the whole Boost community.
I've no time to work on this issue now.
Please, fork the git repository and prepare a PR.
comment:14 by , 6 years ago
Cc: | added |
---|
Hi,
any patch that makes the code more efficient is welcome. Boost.Thread is an old library that doesn't make use of spinlocks.
Ideally, I will prefer a patch that allows to switch to the new optimized version usedg a compiler flag. The reason is that we have a lot of platforms and we need to verify the new code before removing completely the old one.
What do you mean by an initial patch?
Ah, BTW, I've moved this to a task/optimization ticket, as it is not a bug/problem.