Opened 13 years ago
Closed 13 years ago
#3306 closed Bugs (fixed)
Bug in serializing MPI datatypes/classes with arrays
Reported by: | Owned by: | Matthias Troyer | |
---|---|---|---|
Milestone: | Boost 1.40.0 | Component: | mpi |
Version: | Boost 1.39.0 | Severity: | Problem |
Keywords: | Cc: |
Description
Hi,
We received a bug report from a user using BOOST_MPI + MPICH2 on windows. On further investigation we found that the bug seems to be in the boost MPI serialization code. The code in 1.39.0 does not handle serialization of arrays (for MPI datatypes) correctly. The following change in the boost header file "boost\mpi\detail\mpi_datatype_primitive.hpp" is required for programs that serialize arrays to behave correctly. The fix below is a *quick-fix*. You can integrate the following patch or fix the bug *correctly*. I replaced the save_array() member function (line 61-66 in mpi_datatype_primitive.hpp @ 1.39.0) with the code below to get my test case working,
// fast saving of arrays of MPI types template<class T> void save_array(serialization::array<T> const& x, unsigned int /* version */) { BOOST_ASSERT (addresses.size() > 0); BOOST_ASSERT (types.size() > 0); BOOST_ASSERT (lengths.size() > 0); // We don't need the array size. Pop it ! addresses.pop_back(); types.pop_back(); lengths.pop_back(); if (x.count()) save_impl(x.address(), boost::mpi::get_mpi_datatype(*x.address()), x.count()); }
I used the code below (Run it with 1 MPI process, mpiexec -n 1 foo.exe) for my testing,
#include <iostream> #include <boost/mpi/environment.hpp> #include <boost/mpi/communicator.hpp> #include <boost/serialization/array.hpp> #include <boost/serialization/base_object.hpp> #include <boost/serialization/utility.hpp> using namespace std; namespace mpi = boost::mpi; class TFoo { private: friend class boost::serialization::access; template<class Archive> void serialize(Archive & ar, const unsigned int version) { ar & a_; ar & d_; ar & f_; ar & foo_arr; } int a_; double d_; float f_; int foo_arr[5]; public: TFoo(): a_(0), d_(0.0),f_(0.0){ for(int i=0; i<5; i++){ foo_arr[i] = i; } } TFoo(int a, double d, float f): a_(a),d_(d),f_(f){ for(int i=0; i<5; i++){ foo_arr[i] = i; } } TFoo(const TFoo &foo){ a_ = foo.a_; d_ = foo.d_; f_ = foo.f_; for(int i=0; i<5; i++){ foo_arr[i] = foo.foo_arr[i]; } } void print(void ){ std::cout << a_ << " , " << d_ << " , " << f_ << endl; for(int i=0; i<5; i++){ std::cout << foo_arr[i] << " , "; } std::cout << std::endl; } void printAddresses(void){ std::cout << &a_ << " , " << &d_ << " , " << &f_ << endl; } }; BOOST_IS_MPI_DATATYPE(TFoo) class TBar { private: friend class boost::serialization::access; int h_[2]; TFoo q_[2]; template<class Archive> void serialize(Archive & ar, const unsigned int version) { ar & h_; ar & q_; } public: TBar(){ h_[0] = 0; h_[1] = 0; } TBar(int h_0, int h_1, TFoo q_0, TFoo q_1){ h_[0] = h_0; h_[1] = h_1; q_[0] = q_0; q_[1] = q_1; } void print(void ){ cout << h_[0] << " , " << h_[1] << endl; q_[0].print(); q_[1].print(); } void printAddresses(void ){ std::cout << " &h_ = " << &h_ << " &q_ = " << &q_ << std::endl; std::cout << &h_[0] << " , " << &h_[1] << std::endl; q_[0].printAddresses(); q_[1].printAddresses(); } }; BOOST_IS_MPI_DATATYPE(TBar) int main(int argc, char* argv[]) { int i=0; mpi::environment env(argc, argv); mpi::communicator world; std::cout << "I am process " << world.rank() << " of " << world.size() << "." << std::endl; TFoo foo(1234, 3.14, 3.14f); TFoo foo_next(5678, 4.12, 4.12f); TBar bar(1234, 1234, foo, foo_next); try{ if (world.rank() == 0) { TFoo foo1; TBar bar1; world.isend(0, 0, foo);//AV here!!! world.isend(0, 0, bar);//AV here!!! world.recv(0, 0, foo1); world.recv(0, 0, bar1); std::cout << "FOO ======================" << std::endl; foo1.print(); std::cout << "BAR ======================" << std::endl; bar1.print(); } }catch(mpi::exception &exception){ cout << "Error :" << exception.what(); } return 0; }
We recommended the user to contact BOOST MPI devs regarding the *correct* fix.
Regards,
Jayesh Krishna MPICH2 team
Change History (4)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
Owner: | changed from | to
---|
comment:3 by , 13 years ago
The BOOST MPI library was creating an invalid MPI datatype when trying to serialize MPI datatypes with arrays. This was causing segfault when using shared memory channel (shm channel) in MPICH2.
The test code would fail when using the shared memory channel of MPICH2 on windows (mpiexec -n 2 -channel shm foo.exe).
Regards, Jayesh
comment:4 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Looking into this I believe that a patch I submitted for Boost.Serialization yesterday should already have fixed the issue and actually made your fix invalid. The size of arrays of fixed length was not marked correctly as part of the skeleton, but now is. Could you please check whether this solves your problem?
What was the original error?