Boost C++ Libraries: Ticket #11728: interprocess::message_queue deadlocked when process exists unexpectedly on Windows https://svn.boost.org/trac10/ticket/11728 <p> Suppose two processes communicate via a <code>message_queue</code>. On Windows operating systems, when either of the two exists unexpectedly (e.g., via a crash or a kill process command), the other gets deadlocked. Attached is code that reproduces the deadlock reliably. First, launch the server process (reader), and then the client process (writer). Kill the server, and the client gets deadlocked within <code>try_send()</code>. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/11728 Trac 1.4.3 Lingxi Li <lilingxi.cs@…> Thu, 15 Oct 2015 07:40:41 GMT attachment set https://svn.boost.org/trac10/ticket/11728 https://svn.boost.org/trac10/ticket/11728 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">example.zip</span> </li> </ul> <p> Code that reproduces the issue reliably </p> Ticket Lingxi Li <lilingxi.cs@…> Thu, 15 Oct 2015 09:56:30 GMT <link>https://svn.boost.org/trac10/ticket/11728#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11728#comment:1</guid> <description> <p> In other words, when process exits unexpectedly, it is not guaranteed that locks owned by it with kernel of file-system persistence are released, at least on Windows platforms. This could lead to very serious problems. For example, suppose multiple processes are putting log records to a viewer process using an interprocess::message_queue. When any process, be it a logger or the viewer, crashes, all processes involved may just be deadlocked. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Ion Gaztañaga</dc:creator> <pubDate>Tue, 27 Oct 2015 10:18:42 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11728#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11728#comment:2</guid> <description> <p> Thanks for the ticket and the test case. This is known issue since there is no guarantee for deadlock detection even in non-Windows systems, robust mutexes are not mandatory, there is no easy fix for this. In case this helps you, you can define the following before using including interprocess: </p> <p> BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING </p> <p> and optionally define a timeout (by default 10 seconds) </p> <p> BOOST_INTERPROCESS_TIMEOUT_WHEN_LOCKING_DURATION_MS </p> <p> This converts any infinite mutex lock into a timed lock and throws an exception if the timeout passes. This won't if the message queue is waiting in the condition variable (one can in theory wait for a message for a long time). This could detect dead processes when trying to send a message. You can use timed receives as a workaround in reception. </p> <p> It's the best we can do now. It's similar to a inter-thead communication, if a thread dies or deadlocks, then you are lost. </p> <p> In any case BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING, which was experimental, should be documented. Let me know if this at least alleviates the problem. </p> </description> <category>Ticket</category> </item> <item> <author>Lingxi Li <lilingxi.cs@…></author> <pubDate>Tue, 27 Oct 2015 13:40:00 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/11728#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11728#comment:3</guid> <description> <p> Thanks for the reply. I just thought the library is no longer maintained. </p> <p> As to the issue, I think there is a not-so-difficult fix at least on Windows platforms. The keypoint is that mutex kernel object in Windows is indeed robust. Please see the description for the return value <code>WAIT_ABANDONED</code> on [this](<a class="ext-link" href="https://msdn.microsoft.com/en-us/library/windows/desktop/ms687032(v=vs.85).aspx"><span class="icon">​</span>https://msdn.microsoft.com/en-us/library/windows/desktop/ms687032(v=vs.85).aspx</a>) MSDN page. </p> <p> With a robust mutex mechanism, the fix is pretty much straightforward. First, we synthesize a name for the mutex kernel object based on the name of the message queue which is supplied by the user. Then, everytime a process tires to access the message queue, it first creates or opens the mutex with the synthesized name. Note that this does not affect the kernel or filesystem persistence nature of the message queue. </p> </description> <category>Ticket</category> </item> <item> <author>daniel.kruegler@…</author> <pubDate>Fri, 02 Sep 2016 11:39:08 GMT</pubDate> <title>cc changed https://svn.boost.org/trac10/ticket/11728#comment:4 https://svn.boost.org/trac10/ticket/11728#comment:4 <ul> <li><strong>cc</strong> <span class="trac-author">daniel.kruegler@…</span> added </li> </ul> Ticket Arne.Brix@… Fri, 02 Sep 2016 12:52:46 GMT <link>https://svn.boost.org/trac10/ticket/11728#comment:5 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11728#comment:5</guid> <description> <p> We are suffering from this problem. </p> <p> On our systems (Windows 7) the mentioned Workaround helps neither using boost 1.57 nor 1.60. </p> <p> Also the problem is triggered by timed_send() the same way as try_send() </p> <p> This is rather critical for us, so any help would be greatly apreciated! </p> </description> <category>Ticket</category> </item> <item> <author>Arne.Brix@…</author> <pubDate>Fri, 02 Sep 2016 12:58:02 GMT</pubDate> <title>attachment set https://svn.boost.org/trac10/ticket/11728 https://svn.boost.org/trac10/ticket/11728 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">reproducer.zip</span> </li> </ul> <p> Reproducer with workaround and using timed_send() </p> Ticket davids@… Tue, 11 Jul 2017 22:34:02 GMT <link>https://svn.boost.org/trac10/ticket/11728#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/11728#comment:6</guid> <description> <p> Can you elaborate on this solution? How does use of the Window's named mutex prevent trying to lock an abandoned file-persistent mutex? </p> </description> <category>Ticket</category> </item> </channel> </rss>