Opened 13 years ago

Closed 13 years ago

#3306 closed Bugs (fixed)

Bug in serializing MPI datatypes/classes with arrays

Reported by: jayesh@… Owned by: Matthias Troyer
Milestone: Boost 1.40.0 Component: mpi
Version: Boost 1.39.0 Severity: Problem
Keywords: Cc:

Description

Hi,

We received a bug report from a user using BOOST_MPI + MPICH2 on windows. On further investigation we found that the bug seems to be in the boost MPI serialization code. The code in 1.39.0 does not handle serialization of arrays (for MPI datatypes) correctly. The following change in the boost header file "boost\mpi\detail\mpi_datatype_primitive.hpp" is required for programs that serialize arrays to behave correctly. The fix below is a *quick-fix*. You can integrate the following patch or fix the bug *correctly*. I replaced the save_array() member function (line 61-66 in mpi_datatype_primitive.hpp @ 1.39.0) with the code below to get my test case working,

    // fast saving of arrays of MPI types
    template<class T>
    void save_array(serialization::array<T> const& x, unsigned int /* version */)
    {
      BOOST_ASSERT (addresses.size() > 0);
      BOOST_ASSERT (types.size() > 0);
      BOOST_ASSERT (lengths.size() > 0);
      // We don't need the array size. Pop it !
      addresses.pop_back();
      types.pop_back();
      lengths.pop_back();

      if (x.count())
        save_impl(x.address(), boost::mpi::get_mpi_datatype(*x.address()), x.count());
    }

I used the code below (Run it with 1 MPI process, mpiexec -n 1 foo.exe) for my testing,

 #include <iostream>
 #include <boost/mpi/environment.hpp>
 #include <boost/mpi/communicator.hpp>

 #include <boost/serialization/array.hpp>
 #include <boost/serialization/base_object.hpp>
 #include <boost/serialization/utility.hpp>
 
 using namespace std;
 namespace mpi = boost::mpi;

 class TFoo
 {
private: 
    friend class boost::serialization::access;

    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & a_;
        ar & d_;
        ar & f_;
        ar & foo_arr;
    }

    int      a_;
    double   d_;
    float    f_;
    int      foo_arr[5];
 public:
     TFoo(): a_(0), d_(0.0),f_(0.0){
         for(int i=0; i<5; i++){
             foo_arr[i] = i;
         }
     }
     TFoo(int a, double d, float f): a_(a),d_(d),f_(f){
         for(int i=0; i<5; i++){
             foo_arr[i] = i;
         }
     }
     TFoo(const TFoo &foo){
         a_ = foo.a_;
         d_ = foo.d_;
         f_ = foo.f_;
         for(int i=0; i<5; i++){
             foo_arr[i] = foo.foo_arr[i];
         }
     }
     void print(void ){
         std::cout << a_ << " , " << d_ << " , " << f_ << endl;
         for(int i=0; i<5; i++){
             std::cout << foo_arr[i] << " , ";
         }
         std::cout << std::endl;
     }
     void printAddresses(void){
         std::cout << &a_ << " , " << &d_ << " , " << &f_ << endl;
     }
 };

BOOST_IS_MPI_DATATYPE(TFoo)

 class TBar
 {
private: 
    friend class boost::serialization::access;

    int  h_[2];
    TFoo q_[2];

    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & h_;
        ar & q_;
    }
 public:
     TBar(){
         h_[0] = 0; h_[1] = 0;
     }
     TBar(int h_0, int h_1, TFoo q_0, TFoo q_1){
         h_[0] = h_0;
         h_[1] = h_1;
         q_[0] = q_0;
         q_[1] = q_1;
     }
     void print(void ){
         cout << h_[0] << " , " << h_[1] << endl;
         q_[0].print(); q_[1].print();
     }
     void printAddresses(void ){
         std::cout << " &h_ = " << &h_ << " &q_ = " << &q_  << std::endl;
         std::cout << &h_[0] << " , " << &h_[1] << std::endl;
         q_[0].printAddresses(); q_[1].printAddresses();
     }
 };

BOOST_IS_MPI_DATATYPE(TBar)

 int main(int argc, char* argv[])
 {
    int i=0;
    mpi::environment env(argc, argv);
    mpi::communicator world;
    std::cout << "I am process " << world.rank() << " of " << world.size()
              << "." << std::endl;

    TFoo foo(1234, 3.14, 3.14f);
    TFoo foo_next(5678, 4.12, 4.12f);
    TBar bar(1234, 1234, foo, foo_next);

    try{
        if (world.rank() == 0)
        {
            TFoo foo1;
            TBar bar1;

            world.isend(0, 0, foo);//AV here!!!
            world.isend(0, 0, bar);//AV here!!!

            world.recv(0, 0, foo1);
            world.recv(0, 0, bar1);
            std::cout << "FOO ======================" << std::endl;
            foo1.print();
            std::cout << "BAR ======================" << std::endl;
            bar1.print();
        }
    }catch(mpi::exception &exception){
        cout << "Error :" << exception.what();
    }

    return 0;
 }

We recommended the user to contact BOOST MPI devs regarding the *correct* fix.

Regards,

Jayesh Krishna MPICH2 team

Change History (4)

comment:1 by anonymous, 13 years ago

What was the original error?

comment:2 by anonymous, 13 years ago

Owner: changed from Douglas Gregor to Matthias Troyer

comment:3 by anonymous, 13 years ago

The BOOST MPI library was creating an invalid MPI datatype when trying to serialize MPI datatypes with arrays. This was causing segfault when using shared memory channel (shm channel) in MPICH2.

The test code would fail when using the shared memory channel of MPICH2 on windows (mpiexec -n 2 -channel shm foo.exe).

Regards, Jayesh

comment:4 by Matthias Troyer, 13 years ago

Resolution: fixed
Status: newclosed

Looking into this I believe that a patch I submitted for Boost.Serialization yesterday should already have fixed the issue and actually made your fix invalid. The size of arrays of fixed length was not marked correctly as part of the skeleton, but now is. Could you please check whether this solves your problem?

Note: See TracTickets for help on using tickets.