Opened 13 years ago

Closed 12 years ago

Last modified 12 years ago

#4010 closed Bugs (fixed)

Boost message queue bug

Reported by: rusty0831 <rusty_lai@…> Owned by: Ion Gaztañaga
Milestone: Boost 1.45.0 Component: interprocess
Version: Boost 1.42.0 Severity: Problem
Keywords: bug message queue temp folder bootstamp Cc: anders.widen@…

Description

There is a serious bug within the message queue. Originally boost message queue intends to create temp files under a randomly generated temp folder, boost uses undocumented Windows APIs to get the bootstamp to generate the folder name, the folder name looks like "C:\Documents and Settings\All Users\Application Data/boost_interprocess/D0F325BE8579CA01/". Unfortunately that there is a bug of the method to generate the bootstamp, that the bootstamp will vary, even without rebooting!!

This will cause problems that, if a message queue is running for hours, further request from client cannot connect to it because of the newly generated bootstamp is different!!

This bug can be replicated by

[Method 1]

  1. Write a test program (A), create a message queue and let it running for hours (e.g. 3hours...) When the message queue is created, a folder under "C:\Documents and Settings\All Users\Application Data\boost_interprocess\D0F325BE8579CA01" will be created. Notice the folder name "D0F325BE8579CA01".
  2. Write another test program (B) to connect to the message queue created by test program (A). You will notice that it's unable to connect to the message queue created by program (A).

You can also find that another folder "C:\Documents and Settings\All Users\Application Data\boost_interprocess\9053E2F2EBC0CA01" is created. Notice that the folder name "9053E2F2EBC0CA01" is different from "D0F325BE8579CA01".

[Method 2]

There is another more simple method to replicate the issue instead of to wait for hours. The steps are mostly the same as [Method 1], the difference is before running test program (B), please change the system time.

Afterwards test program (B) is unable to connect to test program (A) anymore.

Change History (10)

comment:1 by rusty0831 <rusty_lai@…>, 13 years ago

*Note: What I mean is the "boost::interprocess::message_queue" class.

comment:2 by Anders Widén <anders.widen@…>, 13 years ago

This is even more serious! The problem seems to apply to all named Boost.Interprocess resources (e.g. shared memory and named semaphores).

As a workaround I have rebuilt my code without the pre-processor symbols BOOST_INTERPROCESS_HAS_WINDOWS_KERNEL_BOOTTIME and BOOST_INTERPROCESS_HAS_KERNEL_BOOTTIME. I believe this would give me filesystem-persistence but this should be ok since the documentation is stating that all named resources could have either filesystem or kernel persistence.

comment:3 by Anders Widén <anders.widen@…>, 13 years ago

Cc: anders.widen@… added

in reply to:  description comment:4 by anonymous, 12 years ago

Thanks for this bug, it saves us a lot of time to trace a bug: when the applications are running for a few hrs, they won't communicate correctly with each other if the child process is created dynamically.

I have a simple fix for this, only works for windows system. Try to use the windows_shared_memory instead of shared_memory_object. It works as changing the system time, not test on leave it there for a few hrs. If anyone interested in the changes: replace detail::managed_open_or_create_impl< windows_shared_memory, false> m_shmem; with detail::managed_open_or_create_impl<shared_memory_object> m_shmem; and change the header include. comment out the message_queue::remove.

Hope this could help some1 on Windows to work around this problem by now.

comment:5 by dxj19831029@…, 12 years ago

Thanks for this bug, it saves us a lot of time to trace a bug: when the applications are running for a few hrs, they won't communicate correctly with each other if the child process is created dynamically.

I have a simple fix for this, only works for windows system. Try to use the windows_shared_memory instead of shared_memory_object. It works as changing the system time, not test on leave it there for a few hrs. If anyone interested in the changes: replace detail::managed_open_or_create_impl< windows_shared_memory, false> m_shmem; with detail::managed_open_or_create_impl<shared_memory_object> m_shmem; and change the header include. comment out the message_queue::remove.

Hope this could help some1 on Windows to work around this problem by now.

comment:6 by klaas@…, 12 years ago

I also noted this bug. It happened when using a shared_memory_object. We have a windows service that keeps running that we communicate with. After a while we had clients that failed to communicate with it.

I tracked down the bug. It seems that NtQuerySystemInformation is used in get_system_time_of_day_information to create the path to store the files for sharing the memory. I noticed that after windows changed the time/date either by the Windows Time synchronization or by doing it manually, NtQuerySystemInformation returned another boot time then before. Because of this, when another process was started and tried to communicate with the windows service it failed as it was looking for the files in a different directory.

I don't really have a suggestion with a decent fix. A workaround could be disabling the Windows Time service that does automatic synchronization (this workaround is untested).

comment:7 by Ion Gaztañaga, 12 years ago

Try latest trunk code. NtQuerySystemInformation has been replaced with a call to WMI (slower, but I think it's much more robust).

comment:8 by Ion Gaztañaga, 12 years ago

Milestone: Boost 1.43.0Boost-1.45.0
Resolution: fixed
Status: newclosed

Fixed for Boost 1.45 in release branch

comment:9 by marek, 12 years ago

i have been hit by this bug for longer also and always use workaround

testing beta 1.45 release now

seems now it is working on my vista(32bit) OS. but it still fails on legacy XP (32bit) i don't know about others versions

so i back to use my work-around of this problem and it is working smoothly again

\boost\boost\interprocess\detail\tmp_dir_helpers.hpp

#if defined (BOOST_INTERPROCESS_HAS_WINDOWS_KERNEL_BOOTTIME)
inline void get_bootstamp(std::string &s, bool add = false)
{
   std::string bootstamp;
   winapi::get_last_bootup_time(bootstamp);
+   bootstamp = "";
   if(add){
      s += bootstamp;
   }
   else{
      s = bootstamp;
   }
}
#elif defined(BOOST_INTERPROCESS_HAS_BSD_KERNEL_BOOTTIME)

if any1 else is seeing this it may be worth reopening this bug

thanks

comment:10 by Andrey Semashev, 12 years ago

I took a look at the WMI code and noticed that in a few places you pass wide strings to COM methods. Strictly speaking, this is not correct because COM methods accept BSTRs, which are binary incompatible with wide C strings, unless the method implementation treats them as wide C strings. In particular, if it calls SysStringLen on the argument, the result will be undefined.

FWIW, from this thread it seems that using WMI has its drawbacks, let alone the complexity. Perhaps, using performance counters would suffice? Here I found an example of reading a few system counters, of which "\Process(System)\Elapsed Time" might be what you need. The registry key also looks interesting but I didn't find a way to interpret it (looks like it changed its format in Vista). I did not dig deep enough into the code though, so this may be of little help to you.

Last edited 12 years ago by Andrey Semashev (previous) (diff)
Note: See TracTickets for help on using tickets.