Opened 6 years ago

#12828 new Bugs

Fatal crash on multiple irecv with the same communicator, sender, tag

Reported by: thomas.ilsche@… Owned by: Matthias Troyer
Milestone: To Be Determined Component: mpi
Version: Boost 1.63.0 Severity: Problem
Keywords: Cc:

Description

Posting two irecv on the same sender/tag of a non-primitive data type result in a crash. The problem is the 'two phase' receive, where the actual data irecv is posted only during wait, once the size is known.

Basically the sender will always send in this order (regardless if blocking or not)

send(&msg0.size)
send(msg0.buffer)
send(&msg1.size)
send(msg1.buffer)

However, a nonblocking receiver will post:

irecv(&msg0.size)
irecv(&msg1.size)
// WaitAll or the likes
irecv(msg0.buffer)
irecv(msg1.buffer)

A trivial example that crashes is attached.

MPI has a well-defined determinist order guarantee (Section 3.5), and also specifically "Nonblocking communication operations are ordered according to the execution order of the calls that initiate the communication." (Section 3.7.4)

I cannot find any explanation of this severe limitation in the boost documentation.

I could think of two possible ways out:

Create hidden additional duplicate communicators for each boost::mpi::communicator so that two different communicators are used for sending size & buffer. That seems to be quite clean, but requires lots of changes to the interface.

One could also split the tag-space in half and use the upper half for the additional messages. That of course is really nasty and can result in non standards conforming behavior.

Attachments (1)

ipanic.cpp (496 bytes ) - added by anonymous 6 years ago.

Download all attachments as: .zip

Change History (1)

by anonymous, 6 years ago

Attachment: ipanic.cpp added
Note: See TracTickets for help on using tickets.