Opened 14 years ago

Closed 14 years ago

#2220 closed Bugs (invalid)

Stack overflow in get_mpi_datatype() for struct declared as primitive_type

Reported by: Evgeny Owned by: Matthias Troyer
Milestone: Boost 1.36.0 Component: mpi
Version: Boost 1.36.0 Severity: Problem
Keywords: Cc:

Description

struct MyStruct { int dummy; };

BOOST_IS_MPI_DATATYPE(MyStruct); BOOST_CLASS_IMPLEMENTATION(MyStruct,primitive_type); ... get_mpi_datatype<MyStruct>() produces stack overflow

## Call stack: get_mpi_datatype<MyStruct>(); mpi_datatype_map.datatype<MyStruct>(...); mpi_datatype_oarchive::mpi_datatype_oarchive(const T& x); *this << x; this serialization in turn calls get_mpi_datatype()

Possible solution: use another mpi_datatype_oarchive constructor with no serialization (only for MPI_Type_struct() to be called)

Change History (6)

comment:1 by Douglas Gregor, 14 years ago

Owner: changed from Douglas Gregor to Matthias Troyer

Matthias, could you take a look at this?

comment:2 by Matthias Troyer, 14 years ago

The problem is that the serialization mechanism is not used for primitive types. This struct will actually not serialize with a text or XML-based archive either. Why is it declared primitive?

To make it work you have to overload get_mpi_datatype for this "primitive" type.

in reply to:  2 comment:3 by anonymous, 14 years ago

If this behavior is by design, then sorry for this bug ticket.

My intention is to transport binary structures on a platform-dependent binary basis, without any member-wise serialization. This must lower CPU utilization and thus increase transfer rates. As far as I know this feature exists in MPI, so I expected to obtain this behavior in the boost wrapper.

Exactly, by declaring structures as primitive_type I expected to avoid serialization procedure at all (and actually after such declaration, serialize() becomes unneeded in compile time). But - implementation fails with stack overflow.

Can you advise?

comment:4 by Matthias Troyer, 14 years ago

This is not supported since it might fail on heterogeneous machines. We actually have an undocumented feature doing just what you want. You need to declare your type as bitwise serializable and specify at build time that you build for a homogeneous machine. Then at least arrays of this type are transmitted without serialization. We could add further optimizations for single instances.

comment:5 by Evgeny, 14 years ago

Thank you. I think that an option to transfer without serialization might be extremely valuable in homogeneous environments. So this may be a good idea to add corresponding level of serialization (since primitive_type has some other semantics).

Will try to build in a way you advised.

comment:6 by Matthias Troyer, 14 years ago

Resolution: invalid
Status: newclosed
Note: See TracTickets for help on using tickets.