Opened 11 years ago
Last modified 9 years ago
#6704 new Bugs
Boost version of mpi_in_place
Reported by: | Owned by: | Matthias Troyer | |
---|---|---|---|
Milestone: | To Be Determined | Component: | mpi |
Version: | Boost 1.48.0 | Severity: | Problem |
Keywords: | Cc: |
Description
In some mpi collective operations, like all_reduce, that have an input and output bufer of the same size, it is possible to specify the same buffer for both input and output by using the MPI_IN_PLACE constant as a sendbuffer (see above link).
That feature does not seems to be supported through boost::mpi.
This can be illustrated through the attached program:
alainm@vai:~/code/boost/mpibug$ mpiexec -np 2 ./a.out This is P0 good This is P1 good alainm@vai:~/code/boost/mpibug$
alainm@vai:~/code/boost/mpibug$ mpiexec -np 2 ./a.out bug terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception> >' what(): MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer [vai:03464] *** Process received signal *** terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception> >' what(): MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer [vai:03465] *** Process received signal *** [vai:03465] Signal: Aborted (6) [vai:03465] Signal code: (-6) [vai:03464] Signal: Aborted (6) [vai:03464] Signal code: (-6) [vai:03465] [ 0] [0x4002040c] [vai:03465] *** End of error message *** [vai:03464] [ 0] [0x4002040c] [vai:03464] *** End of error message *** -------------------------------------------------------------------------- mpiexec noticed that process rank 1 with PID 3465 on node vai exited on signal 6 (Aborted). -------------------------------------------------------------------------- 2 total processes killed (some possibly by mpiexec during cleanup) alainm@vai:~/code/boost/mpibug$
Attachments (6)
Change History (27)
by , 11 years ago
Attachment: | allreduce.cpp added |
---|
comment:1 by , 11 years ago
One solution could be to transparently test for equality of the two buffers and silently pass MPI_IN_PLACE to MPI_Allreduce, another could to provide a boost mpi_in_place object with automatic conversion to T*. The first one is probably a no-brainner for user, but then, something explicit in MPI is made implicit in boost mpi. I can provide the patch, other operation could be concerned.
Regards,
comment:2 by , 11 years ago
Another issue with th e implicit solution is that the input buffer is not left unchanged, while it is documented as such. So I guess a acceptable solution would be to simply provide a global:
namespace boost{ namespace mpi { template<typename T> T const* mpi_in_place(T const* p = 0) { return static_cast<T const*>(MPI_IN_PLACE); } } }
function ?
by , 11 years ago
Attachment: | mpi_in_place_tck6704_rev77427.patch added |
---|
A proposed patch for the missing (?) feature)
comment:3 by , 11 years ago
The attached path propose a fix for this issue. It add the 'missing' all_reduce prototypes with only one input/output buffer. When appropriate, it is implemented by passing MPI_IN_PLACE to the existing boost implementations as a sendbuffer. When it is not possible (user types and user op), it re-introduces the missing buffer and uses a reduce/broadcast combination. The all_reduce test have been updated to check for in place and out of place array all_reduce operations.
Please let me know if there is a problem wit this patch.
Regards
comment:4 by , 11 years ago
Note that since the new prototype are just an override of existing prototype with only one parameter, the existing documentation probably still holds.
follow-up: 6 comment:5 by , 10 years ago
I think the nicest would be a version that just takes a single inout value.
comment:6 by , 10 years ago
Replying to troyer:
I think the nicest would be a version that just takes a single inout value.
Agreed, but should we do something about the situation where the user explicitly provides the MPI_IN_PLACE argument ? Provided we let the code fail at runtime, should we throw an exception ?
comment:7 by , 10 years ago
Why would the user want to provide it explicitly if we give a nicer interface to all_reduce that takes just a single inout parameter?
comment:8 by , 10 years ago
Maybe someone coming from the C api.... but agreed, it's (probably too) far fetched.
I will go for the one inout param along with a clarification in the documentation then.
follow-up: 13 comment:9 by , 10 years ago
One more question:
Right now, in the attached patch, we have
template<typename T, typename Op> void all_reduce(const communicator & comm, const T & in_value, T & out_value, Op op); template<typename T, typename Op> T all_reduce(const communicator & comm, const T & in_value, Op op); template<typename T, typename Op> void all_reduce(const communicator & comm, const T * in_values, int n, T * out_values, Op op);
plus
template<typename T, typename Op> void all_reduce(const communicator & comm, T * inout_values, int n, Op op);
but we do not have:
template<typename T, typename Op> void all_reduce(const communicator & comm, T & in_value, Op op);
that's based on the assumption that there isn't much usage for it and that it would cause confusing overload issues.
I am still trying to find out how to modify the documentation :-)
comment:10 by , 10 years ago
So, the attachment mpi_all_reduce_in_place.patch should implement the proposed modification. It include a inout version of the function for arrays, a modification of the doxygen documentation (what I was unable to test because I have issues with the doc generation) and updated test suite to exercise the feature.
Here is a summary of the context in which the patch was produced:
[alainm@login02 trunk]$ svn up At revision 80953. [alainm@login02 trunk]$ svn diff > mpi_all_reduce_in_place.patch [alainm@login02 trunk]$ svn status -q M boost/mpi/collectives.hpp M boost/mpi/collectives/all_reduce.hpp M libs/mpi/test/all_reduce_test.cpp M libs/mpi/doc/mpi.qbk [alainm@login02 trunk]$ svn info . Path: . URL: http://svn.boost.org/svn/boost/trunk Repository Root: http://svn.boost.org/svn/boost Repository UUID: b8fc166d-592f-0410-95f2-cb63ce0dd405 Revision: 80953 Node Kind: directory Schedule: normal Last Changed Author: viboes Last Changed Rev: 80950 Last Changed Date: 2012-10-11 01:24:24 +0200 (Thu, 11 Oct 2012) [alainm@login02 trunk]$
Pleae let me know if it's ok.
Thanks!
comment:12 by , 10 years ago
Hi,
If something more need to be done before it can be commited, please let me know.
Regards
comment:13 by , 10 years ago
Replying to anonymous:
One more question:
Right now, in the attached patch, we have
template<typename T, typename Op> void all_reduce(const communicator & comm, const T & in_value, T & out_value, Op op); template<typename T, typename Op> T all_reduce(const communicator & comm, const T & in_value, Op op); template<typename T, typename Op> void all_reduce(const communicator & comm, const T * in_values, int n, T * out_values, Op op);plus
template<typename T, typename Op> void all_reduce(const communicator & comm, T * inout_values, int n, Op op);but we do not have:
template<typename T, typename Op> void all_reduce(const communicator & comm, T & in_value, Op op);that's based on the assumption that there isn't much usage for it and that it would cause confusing overload issues.
After discussing this I find that actually an in-place version that does not return a copy makes more sense for in-place semantics, and the call might be as
template<typename T, typename Op> void all_reduce(const communicator & comm, mpi::inplace_t<T> in_value, Op op);
where mpi::inplace_t<T> wrapper might be created by an mpi::inplace function. Thus by using mpi::inplace around the one argument we don't return anything. I would not use a two-parameter version where one parameter is replaced by an mpi::inplace placeholder like the C API does. We should go for a nicer C++ API and not try to just copy the C API as the deprecated official MPI C++ API did.
comment:14 by , 10 years ago
Owner: | changed from | to
---|
by , 10 years ago
Attachment: | inplace6704-r82503.patch added |
---|
patch with explicit inplace version of all reduce
comment:15 by , 10 years ago
in inplace6704-r82503.patch, you will find a patch with an explicit in place version of all_reduce. Both pointer and non pointer version requires the inplace wrapper.
A new file has been added, I did put my name in the copyright (I do not know the usage, feel free to modify).
comment:17 by , 9 years ago
Can you please add documentation for this and a regression test? I will then commit it.
by , 9 years ago
Attachment: | mpi_all_red-r85156-6704.patch added |
---|
patch with tests and documentation
comment:18 by , 9 years ago
In the last attachment mpi_all_red-r85156-6704.patch, the doc has been updated (mostly the doxigen documentation) the regression test was already in [source:/trunk/libs/mpi/test/all_reduce_test.cpp] (I followed existing practice, that file now also test the new overload).
A typo has been fixed in the docbook, but it's been otherwise left untouched (since it's more an optimization than a functionnality, I did not think where was anything to add in there (?))
Let me know if it's ok.
Thanks!
comment:19 by , 9 years ago
It seems that the patch has not been applied, is there a pending issue with the fix ?
Best regards
Alain
comment:20 by , 9 years ago
It would be good to explain it in the documentation, so that people know about it. Sorry for not being clear but I've been waiting for that.
comment:21 by , 9 years ago
ticket6704-svn85498.patch.gz contains the same path augmented with new documentation:
- in mpi.qbk, an example has been added to illustrate the usage of all_reduce using the in place and out of place version and the possibility of optimization associated with MPI_IN_PLACE.
- an entry as been added in the C-mapping section.
Let me know if this is what is expected.
As a side note, we might want to add a similar functionality to reduce at some point (?)
Regards
c++ source code