Boost C++ Libraries: Ticket #3306: Bug in serializing MPI datatypes/classes with arrays https://svn.boost.org/trac10/ticket/3306 <p> Hi, </p> <blockquote> <p> We received a bug report from a user using BOOST_MPI + MPICH2 on windows. On further investigation we found that the bug seems to be in the boost MPI serialization code. The code in 1.39.0 does not handle serialization of arrays (for MPI datatypes) correctly. The following change in the boost header file "boost\mpi\detail\mpi_datatype_primitive.hpp" is required for programs that serialize arrays to behave correctly. The fix below is a *quick-fix*. You can integrate the following patch or fix the bug *correctly*. I replaced the save_array() member function (line 61-66 in mpi_datatype_primitive.hpp @ 1.39.0) with the code below to get my test case working, </p> </blockquote> <pre class="wiki"> // fast saving of arrays of MPI types template&lt;class T&gt; void save_array(serialization::array&lt;T&gt; const&amp; x, unsigned int /* version */) { BOOST_ASSERT (addresses.size() &gt; 0); BOOST_ASSERT (types.size() &gt; 0); BOOST_ASSERT (lengths.size() &gt; 0); // We don't need the array size. Pop it ! addresses.pop_back(); types.pop_back(); lengths.pop_back(); if (x.count()) save_impl(x.address(), boost::mpi::get_mpi_datatype(*x.address()), x.count()); } </pre><blockquote> <p> I used the code below (Run it with 1 MPI process, mpiexec -n 1 foo.exe) for my testing, </p> </blockquote> <pre class="wiki"> #include &lt;iostream&gt; #include &lt;boost/mpi/environment.hpp&gt; #include &lt;boost/mpi/communicator.hpp&gt; #include &lt;boost/serialization/array.hpp&gt; #include &lt;boost/serialization/base_object.hpp&gt; #include &lt;boost/serialization/utility.hpp&gt; using namespace std; namespace mpi = boost::mpi; class TFoo { private: friend class boost::serialization::access; template&lt;class Archive&gt; void serialize(Archive &amp; ar, const unsigned int version) { ar &amp; a_; ar &amp; d_; ar &amp; f_; ar &amp; foo_arr; } int a_; double d_; float f_; int foo_arr[5]; public: TFoo(): a_(0), d_(0.0),f_(0.0){ for(int i=0; i&lt;5; i++){ foo_arr[i] = i; } } TFoo(int a, double d, float f): a_(a),d_(d),f_(f){ for(int i=0; i&lt;5; i++){ foo_arr[i] = i; } } TFoo(const TFoo &amp;foo){ a_ = foo.a_; d_ = foo.d_; f_ = foo.f_; for(int i=0; i&lt;5; i++){ foo_arr[i] = foo.foo_arr[i]; } } void print(void ){ std::cout &lt;&lt; a_ &lt;&lt; " , " &lt;&lt; d_ &lt;&lt; " , " &lt;&lt; f_ &lt;&lt; endl; for(int i=0; i&lt;5; i++){ std::cout &lt;&lt; foo_arr[i] &lt;&lt; " , "; } std::cout &lt;&lt; std::endl; } void printAddresses(void){ std::cout &lt;&lt; &amp;a_ &lt;&lt; " , " &lt;&lt; &amp;d_ &lt;&lt; " , " &lt;&lt; &amp;f_ &lt;&lt; endl; } }; BOOST_IS_MPI_DATATYPE(TFoo) class TBar { private: friend class boost::serialization::access; int h_[2]; TFoo q_[2]; template&lt;class Archive&gt; void serialize(Archive &amp; ar, const unsigned int version) { ar &amp; h_; ar &amp; q_; } public: TBar(){ h_[0] = 0; h_[1] = 0; } TBar(int h_0, int h_1, TFoo q_0, TFoo q_1){ h_[0] = h_0; h_[1] = h_1; q_[0] = q_0; q_[1] = q_1; } void print(void ){ cout &lt;&lt; h_[0] &lt;&lt; " , " &lt;&lt; h_[1] &lt;&lt; endl; q_[0].print(); q_[1].print(); } void printAddresses(void ){ std::cout &lt;&lt; " &amp;h_ = " &lt;&lt; &amp;h_ &lt;&lt; " &amp;q_ = " &lt;&lt; &amp;q_ &lt;&lt; std::endl; std::cout &lt;&lt; &amp;h_[0] &lt;&lt; " , " &lt;&lt; &amp;h_[1] &lt;&lt; std::endl; q_[0].printAddresses(); q_[1].printAddresses(); } }; BOOST_IS_MPI_DATATYPE(TBar) int main(int argc, char* argv[]) { int i=0; mpi::environment env(argc, argv); mpi::communicator world; std::cout &lt;&lt; "I am process " &lt;&lt; world.rank() &lt;&lt; " of " &lt;&lt; world.size() &lt;&lt; "." &lt;&lt; std::endl; TFoo foo(1234, 3.14, 3.14f); TFoo foo_next(5678, 4.12, 4.12f); TBar bar(1234, 1234, foo, foo_next); try{ if (world.rank() == 0) { TFoo foo1; TBar bar1; world.isend(0, 0, foo);//AV here!!! world.isend(0, 0, bar);//AV here!!! world.recv(0, 0, foo1); world.recv(0, 0, bar1); std::cout &lt;&lt; "FOO ======================" &lt;&lt; std::endl; foo1.print(); std::cout &lt;&lt; "BAR ======================" &lt;&lt; std::endl; bar1.print(); } }catch(mpi::exception &amp;exception){ cout &lt;&lt; "Error :" &lt;&lt; exception.what(); } return 0; } </pre><blockquote> <p> We recommended the user to contact BOOST MPI devs regarding the *correct* fix. </p> </blockquote> <p> Regards, </p> <p> Jayesh Krishna MPICH2 team </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/3306 Trac 1.4.3 anonymous Tue, 04 Aug 2009 20:26:15 GMT <link>https://svn.boost.org/trac10/ticket/3306#comment:1 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3306#comment:1</guid> <description> <p> What was the original error? </p> </description> <category>Ticket</category> </item> <item> <dc:creator>anonymous</dc:creator> <pubDate>Tue, 04 Aug 2009 20:26:54 GMT</pubDate> <title>owner changed https://svn.boost.org/trac10/ticket/3306#comment:2 https://svn.boost.org/trac10/ticket/3306#comment:2 <ul> <li><strong>owner</strong> changed from <span class="trac-author">Douglas Gregor</span> to <span class="trac-author">Matthias Troyer</span> </li> </ul> Ticket anonymous Tue, 04 Aug 2009 20:33:41 GMT <link>https://svn.boost.org/trac10/ticket/3306#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3306#comment:3</guid> <description> <p> The BOOST MPI library was creating an invalid MPI datatype when trying to serialize MPI datatypes with arrays. This was causing segfault when using shared memory channel (shm channel) in MPICH2. </p> <p> The test code would fail when using the shared memory channel of MPICH2 on windows (mpiexec -n 2 -channel shm foo.exe). </p> <p> Regards, Jayesh </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Matthias Troyer</dc:creator> <pubDate>Tue, 04 Aug 2009 23:12:28 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/3306#comment:4 https://svn.boost.org/trac10/ticket/3306#comment:4 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> Looking into this I believe that a patch I submitted for Boost.Serialization yesterday should already have fixed the issue and actually made your fix invalid. The size of arrays of fixed length was not marked correctly as part of the skeleton, but now is. Could you please check whether this solves your problem? </p> Ticket