Opened 11 years ago

Closed 9 years ago

#5596 closed Bugs (fixed)

MPI: problem creating communicator

Reported by: irek.szczesniak@… Owned by: Matthias Troyer
Milestone: To Be Determined Component: mpi
Version: Boost 1.42.0 Severity: Problem
Keywords: Cc:

Description

Where I create a communicator from a group, the program utilizes the CPU fully, and the code doesn't create the communicator. I'm attaching a simple example.

Attachments (1)

test.cpp (377 bytes ) - added by irek.szczesniak@… 11 years ago.
sample test

Download all attachments as: .zip

Change History (13)

by irek.szczesniak@…, 11 years ago

Attachment: test.cpp added

sample test

comment:1 by anonymous, 11 years ago

Forgot to add that I'm using OpenMPI 1.3 on Debian 6.

comment:2 by dwsel@…, 11 years ago

Hello!

I experience the same issue with OpenMPI 1.4.2, gcc 4.4.5 I'm still at very beginning of parallel programming, so I may be talking nonsense from time to time (if yes please correct me). I hope that discussion will attract some more professional users that will give better answer for this question.

So far I consider 3 possibilities:

  1. incompatibility between
    std:iterator from v.begin() and v.end()
    

as input parameter for

template<typename InputIterator>
group include(InputIterator first, InputIterator last);

can be false because for vector of given length

std::vector<int> v(x);
g.size();

gives number equal to vector length = x

  1. Some copy constructor/pointers issue

This is pretty much blind suspect after reading (problem with posting link to article)

  1. Wrong parentheses along with process number that performs the operation of group creation

I have to investigate the thing further by looking inside specific implementations. I hope to answer again in next few days.

comment:3 by monika.cienkus@…, 11 years ago

It's strange but dynamically created communicator work fine.

#include <vector>
#include <boost/mpi.hpp>
namespace mpi = boost::mpi;
int main(int argc, char argv[])
{
   mpi::environment env(argc, argv);
   mpi::communicator world, c;

   std::vector<int> v(1);
   mpi::group wg = world.group();
   mpi::group g = wg.include(v.begin(), v.end());
   c = new mpi::communicator(world, g);

   if (!world.rank()){
      std::cout << "v.size : " << v.size() << std::endl;
      std::cout << "wg.size : " << wg.size() << std::endl;
      std::cout << "g.size : " << g.size() << std::endl;
      std::cout << "c.size : " << c->size() << std::endl;
}

return 0;
}

comment:4 by monika.cienkus@…, 11 years ago

There should be:

   mpi::communicator world, *c;

in reply to:  3 comment:5 by dwsel <dwsel@…>, 11 years ago

Replying to monika.cienkus@…:

It's strange but dynamically created communicator work fine.

#include <vector>
#include <boost/mpi.hpp>
namespace mpi = boost::mpi;
int main(int argc, char argv[])

Hello!

I have noticed as well you left out clause:

if (!world.rank())

That could confirm my 3rd suspicion, but... more about my opinion below

You missed * in:

int main(int argc, char * argv[])

After that your code compiles well.

Could you show me example use of the c communicator? I can't seem to get it to work so far. Something as simple as:

if (c->rank() == 0)

gives me clone_impl exception in addition to the same pionter exceptions I get by simply dropping clause:

if (!world.rank())

in the example provided by the author of the ticket. It seems that using pointer to c is only delaying exceptions to the moment of the using it!

I think what are we doing here is simply redefinition of c but done by different threads, that's why I believe dropping clause will not help at all, because assigning processes to the communicator should be done by single process.

Please elaborate.

in reply to:  3 comment:6 by irek.szczesniak@…, 11 years ago

I tested your solution and it doesn't resolve the problem. The program still utilizes the CPU 100%, and doesn't finish. So creating the communicator dynamically doesn't make a difference.

Replying to monika.cienkus@…:

It's strange but dynamically created communicator work fine.

#include <vector>
#include <boost/mpi.hpp>
namespace mpi = boost::mpi;
int main(int argc, char argv[])
{
   mpi::environment env(argc, argv);
   mpi::communicator world, c;

   std::vector<int> v(1);
   mpi::group wg = world.group();
   mpi::group g = wg.include(v.begin(), v.end());
   c = new mpi::communicator(world, g);

   if (!world.rank()){
      std::cout << "v.size : " << v.size() << std::endl;
      std::cout << "wg.size : " << wg.size() << std::endl;
      std::cout << "g.size : " << g.size() << std::endl;
      std::cout << "c.size : " << c->size() << std::endl;
}

return 0;
}

comment:7 by tapir2@…, 11 years ago

#include <vector>
#include <boost/mpi.hpp>
namespace mpi = boost::mpi;
int main(int argc, char** argv)
{
	mpi::environment env(argc, argv);
	mpi::communicator world;

	std::vector<int> ranks(1); // {0}
	mpi::group g = world.group(); // getting group from MPI_COMM_WORLD...
	g = g.include(ranks.begin(), ranks.end()); // ...and selecting only one (first) host from it
   
	/* ---------------------------------------------------------------------------------
	//sample 1: not work, inappropriate using MPI library calls
	//MPI_Comm_create (called from a mpi::communicator constructor) is a collective 
	//operation and this function must be called on each host from parent communicator ("world" in this case)
	if (!world.rank()) 
	{
		mpi::communicator myComm(world, g); 
	}
	some_useful_function(); // <- we don't reach this place
	*/

	/* ---------------------------------------------------------------------------------
	//sample 2: still not working
	//remove a condition (but still using local variable scope)
	//we call MPI_Comm_create on each host, but, only one host create communicator, 
	//each other get MPI_COMM_NULL
	{
		mpi::communicator myComm(world, g); 
	} // <- at this place we have trouble with MPI_Comm_free(MPI_COMM_NULL)
	  //	because boost::mpi::communicator::comm_free don't check this
	*/


	/* ---------------------------------------------------------------------------------
	//sample 3: work, but with restriction
	//manually call MPI_Finalize before myComm destructor
	//
	*/
	mpi::communicator myComm(world, g); 
	MPI::Finalize();

	return 0;
}

as i think, decision for this problem is a small fix for communicator.hpp (i'm using boost 1.45):

	struct comm_free
	{
		void operator()(MPI_Comm* comm) const
		{
			int finalized;
			BOOST_MPI_CHECK_RESULT(MPI_Finalized, (&finalized));
			if (!finalized && (MPI_Comm)comm != MPI_COMM_NULL) //fix here
				BOOST_MPI_CHECK_RESULT(MPI_Comm_free, (comm));
			delete comm;
		}
	};

P.S. sorry for my english, it's not my first language :)

in reply to:  7 ; comment:8 by tapir2@…, 11 years ago

Replying to tapir2@…:

a little mistake, sorry... without type casting of course

	if (!finalized && *comm != MPI_COMM_NULL) //fix here

comment:9 by bschaeling, 11 years ago

A new communicator must always be created for all processes. The constructor calls MPI_Comm_create() which must be executed by all processes, even if they don't belong to the new group (see http://www.mpi-forum.org/docs/mpi-11-html/node102.html).

However I can confirm that I also have to call MPI_Finalize() myself (I'm using Boost 1.49.0). This call is only required though for those processes which don't belong to the newly created group. Boost.MPI behaves as if it skips calling MPI_Finalize() in the destructor of boost::mpi::environment for those processes.

in reply to:  8 comment:10 by irek.szczesniak@…, 10 years ago

Replying to tapir2@…:

Replying to tapir2@…:

a little mistake, sorry... without type casting of course

	if (!finalized && *comm != MPI_COMM_NULL) //fix here

I checked whether this code fixes the problem, and it doesn't.

comment:11 by Matthias Troyer, 10 years ago

Owner: changed from Douglas Gregor to Matthias Troyer

comment:12 by Matthias Troyer, 9 years ago

Resolution: fixed
Status: newclosed

(In [84739]) Fixed #6436 #5596 and added threaded initialization

Note: See TracTickets for help on using tickets.