Opened 11 years ago

Last modified 10 years ago

#5849 new Bugs

all_to_all() passes invalid args to allocate when sending lots of data

Reported by: Kelly Davis <kdavis@…> Owned by: Matthias Troyer
Milestone: To Be Determined Component: mpi
Version: Boost 1.47.0 Severity: Problem
Keywords: Cc: kdavis@…

Description

For boost 1.47.0 code of the following form fails...

boost::mpi::communicator communicator;
std::vector< std::list<Thing*> > in_values(communicator.size());
std::vector< std::list<Thing*> > out_values(communicator.size());
// Point the lists in in_values to more than numeric_limits<int>::max() bytes of data
boost::mpi::all_to_all(communicator,in_values,out_values);

Such code ends up passing the allocator, using openmpi-1.4.3, an invalid argument.

Looking at all_to_all.hpp and the method...

template<typename T>
void
all_to_all_impl(const communicator& comm, const T* in_values, int n,
                T* out_values, mpl::false_)
{
    ...
}

I think I see why this happens. The method begins as follows...

  std::vector<int> send_sizes(size);

  // The displacements for each outgoing value.
  std::vector<int> send_disps(size);

  // The buffer that will store all of the outgoing values
  std::vector<char, allocator<char> > outgoing;

  // Pack the buffer with all of the outgoing values.
  for (int dest = 0; dest < size; ++dest) {
    // Keep track of the displacements
    send_disps[dest] = outgoing.size();

    // Our own value will never be transmitted, so don't pack it.
    if (dest != rank) {
      packed_oarchive oa(comm, outgoing);
      for (int i = 0; i < n; ++i)
        oa << in_values[dest * n + i];
    }

    // Keep track of the sizes
    send_sizes[dest] = outgoing.size() - send_disps[dest];
  }

If outgoing.size() or (outgoing.size() - send_disps[dest]) is greater than numeric_limits<int>::max(), then send_disps[dest] or send_sizes[dest] flips over to negative values or is simply junk. After that happens the rest of the code in all_to_all_impl() doesn't really make any sense.

My guess is that instead of using int here...

std::vector<int> send_sizes(size);
std::vector<int> send_disps(size);
std::vector<int> recv_sizes(size);
...

you should use allocator::size_type.

Change History (1)

comment:1 by Matthias Troyer, 10 years ago

Owner: changed from Douglas Gregor to Matthias Troyer
Note: See TracTickets for help on using tickets.