Opened 17 years ago

Closed 16 years ago

#479 closed Support Requests (None)

Multithreaded process pausing but not deadlocking or crashin

Reported by: nobody Owned by: Roland Schwarz
Milestone: Component: None
Version: None Severity:
Keywords: Cc:

Description

Hi,

I am writing a largely multithreaded linux program (20-60 
threads) on version Fedora Core 2. I am using glib c 
version 2.3.3-27. In addition, I am using the boost 
libraries (version 1.32.0) for my threading and locking. 
Specifically, I am using mutexes, recursive_mutexes, 
scoped locking, and conditions.

My problem is that the process will suddenly cease 
activity for random lengths of time (1 sec to minutes). 
However, it never crashes or produces incorrect results. 
Also, I do not think that it is deadlocking because it 
always resumes its activity. 

I have done some profiling of the locks, and it shows 
very strange behavior. For instance, threads will block 
for long lengths of time (the length of the inactivity) while 
no thread is holding the corresponding mutex more than 
fractions of a second. When I explored this further, it 
appears that thread A is blocking on a mutex while 
thread B holds it. I am using 
boost::recursive_mutex::scoped_lock objects for the 
locking. The weird thing is that thread B pauses at the 
very end of the lock's scope (again for a the length of 
inactivity), as though the attempt to unlock the mutex is 
not waking thread A and descheduling thread B for a 
long time. 

I created a test program that spawns 30 threads that 
just do a bunch of locking of these boost scoped locks 
and yielding. This program, too, shows the same 
downtime activity (again without crashing or 
deadlocking), though less frequently (I suspect because 
the locking pattern is probably different than in my 
program).  It also always resumes its activity eventually.
I have attached the test program code.

I'm not sure whether this is a problem with the boost 
libraries or a linux problem, or my own problem. 

I was wondering if anyone has experienced similar 
behavior and might be able to offer some insight or 
guidance. Thanks!

Matt


Change History (1)

comment:1 by Roland Schwarz, 16 years ago

Status: assignedclosed
Logged In: YES 
user_id=541730
Originator: NO

Don't know if the "problem" has been resolved, however I was not able to reproduce the described behaviour. The submitted code also does not clearly show what is going wrong.
I tested the code on linux and windows, and it gave comparable results. The code however does run _very_ long. I compared this to a loop which only locked/unlocked and calculated i*i+1 like the example. This run for 95 minutes.

If the question is still open, please try to be more specific.

Regards Roland

Note: See TracTickets for help on using tickets.