Mpirun hangs when mpi send and recieve is put in a loop - c

I was trying to run the given program on a 4 node cluster using mpirun.
Node0 is distributing the data to node 1, 2 and 3.
In the program, computation has to be done for different values of variable 'dir',
ranging from -90 to 90.
So Node0 is distributing the data and collecting the result in a looped fashion(for different values of var 'dir').
When the do {*******}while(dir<=90); loop is given, mpirun hangs, and gets no output.
But when I comment the do {*******}while(dir<=90); loop output is obtained for initialized value of the variable dir,(dir=-90), and that output is correct. The problem occurs when given in loop.
Could anyone please help me solve this issue.
#include "mpi.h"
int main(int argc,char *argv[])
float dir=-90;
int rank,numprocs;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
if(rank==0)
{
do{
/*initializing data*/
for(dest=1;dest<numprocs;dest++)
{
MPI_Send(&offset,1,MPI_INT,dest,FROM_MASTER,MPI_COMM_WORLD);
MPI_Send(&s_psi[offset],count,MPI_FLOAT,dest,FROM_MASTER,MPI_COMM_WORLD);
}
gettimeofday(&start,NULL);
for (dest=1; dest<numprocs; dest++)
{
MPI_Recv(&offset,1,MPI_INT,dest,FROM_WORKER,MPI_COMM_WORLD,&status);
MPI_Recv(&P[offset],count,MPI_FLOAT,dest,FROM_WORKER,MPI_COMM_WORLD,&status);
}
gettimeofday(&end,NULL);
timersub(&end,&start,&total);
printf("time consumed=%ds %dus\n",total.tv_sec,total.tv_usec);
dir++;
}while(dir<=90);
}
if(rank>0)
{
MPI_Recv(&offset,1,MPI_INT,0,FROM_MASTER,MPI_COMM_WORLD,&status);
MPI_Recv(&s_psi[offset],count,MPI_FLOAT,0,FROM_MASTER,MPI_COMM_WORLD,&status);
//Does the computation
}
MPI_Send(&offset,1,MPI_INT,0,FROM_WORKER,MPI_COMM_WORLD);
MPI_Send(&P[offset],count,MPI_FLOAT,0,FROM_WORKER,MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}

The part where rank > 0 should be enclosed in a loop.
Each MPI_Send should have its corresponding MPI_Recv.
if(rank>0) {
do {
MPI_Recv(&offset,1,MPI_INT,0,FROM_MASTER,MPI_COMM_WORLD,&status);
MPI_Recv(&s_psi[offset],count,MPI_FLOAT,0,FROM_MASTER,MPI_COMM_WORLD,&status);
// Computation
MPI_Send(&offset,1,MPI_INT,0,FROM_WORKER,MPI_COMM_WORLD);
MPI_Send(&P[offset],count,MPI_FLOAT,0,FROM_WORKER,MPI_COMM_WORLD);
dir++;
} while(dir <= 90);
}
But you probably don't know dir in you worker nodes. Usually, we node0 send a magic packet to end the workers.
at the end of node0 :
for(r = 1; r < numprocs; r++)
MPI_Send(&dummy, 1, MPI_INT, r, STOP, COMM);
for the woker nodes :
if(rank>0) {
while(true) {
MPI_Recv(&offset,1,MPI_INT,0,FROM_MASTER,MPI_COMM_WORLD,&status);
MPI_Recv(&s_psi[offset],count,MPI_FLOAT,0,FROM_MASTER,MPI_COMM_WORLD,&status);
// Computation
MPI_Send(&offset,1,MPI_INT,0,FROM_WORKER,MPI_COMM_WORLD);
MPI_Send(&P[offset],count,MPI_FLOAT,0,FROM_WORKER,MPI_COMM_WORLD);
if(MPI_Iprobe(ANY_SOURCE, STOP, COMM, &flag, &status)) {
MPI_Recv(&dummy, 1, MPI_INT, ANY_SOURCE, STOP, COMM, NO_STATUS);
break;
}
};
}
And you can finally MPI_finalize
By the way, you might want to look at blocking and not bloking Send/Recv.

Related

Limit to number of Broadcasts on a cluster?

I have a problem with a loop containing mpi broadcast line. After 1000th iterations, the broadcast line stops or hangs the execution (waiting for the transmission presumably).
See the following code:
#include <stdio.h>
#include <string.h>
#include "mpi.h"
int main(int argc, char* argv[]){
int aa = 0, size = 10000;
int my_rank; /* rank of process */
int smessage = 9, rmessage = 9; /* storage for message */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
if(my_rank !=0) {
for(aa=1;aa<size;aa++) {
MPI_Bcast(&rmessage, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf("rec message(i=%d)=%d\n", aa, rmessage);
}
}
else {
for(aa=1;aa<size;aa++) {
smessage=aa;
printf("send message(i=%d)=%d\n", aa, smessage);
MPI_Bcast(&smessage, 1, MPI_INT, 0, MPI_COMM_WORLD);
}
}
MPI_Finalize();
return 0;
}
when I run it with
mpicc -openmp -fopenmp example_code.c -o example_prog
mpirun -n 3 example_prog
I get the expected output (up to some rearrangements)
sent message(i=1)=1
rec message(i=1)=1
rec message(i=1)=1
...
sent message(i=9999)=9999
rec message(i=9999)=9999
rec message(i=9999)=9999
However, when I send it to my university cluster using SLURM's sbatch and a script containing
mpirun -n 3 --loadbalance --cpus-per-proc 24 ./example_prog
(I'm using OpenMPI 1.6.5) I get the program either hanged in the middle of the loop or gets automatically terminated by the job handler. The output is
sent message(i=1)=1
rec message(i=1)=1
rec message(i=1)=1
...
rec message(i=9999)=999
rec message(i=9999)=999
sent message(i=9999)=1000
It clearly stops in the "middle" of the loop. Would you have any ideas as to what causes this error or how to avoid this error?
Thank you so much!

Two MPI_Allreduce() functions not working, gives error of NULL Communicator

I am using an example code from an MPI book [will give the name shortly].
What it does is the following:
a) It creates two communicators world = MPI_COMM_WORLD containing all the processes and worker which excludes the random number generator server (the last rank process).
b) So, the server generates random numbers and serves them to the workers on requests from the workers.
c) What the workers do is they count separately the number of samples falling inside and outside an unit circle inside an unit square.
d) After sufficient level of accuracy, the counts inside and outside are Allreduced to compute the value of PI as their ratio.
**The code compiles well. However, when running with the following command (actually with any value of n) **
>mpiexec -n 2 apple.exe 0.0001
I get the following errors:
Fatal error in MPI_Allreduce: Invalid communicator, error stack:
MPI_Allreduce(855): MPI_Allreduce(sbuf=000000000022EDCC, rbuf=000000000022EDDC,
count=1, MPI_INT, MPI_SUM, MPI_COMM_NULL) failed
MPI_Allreduce(780): Null communicator
pi = 0.00000000000000000000
job aborted:
rank: node: exit code[: error message]
0: PC: 1: process 0 exited without calling finalize
1: PC: 123
Edit: ((( Removed: But when I am removing any one of the two MPI_Allreduce() functions, it is running without any runtime errors, albeit with wrong answer.))
Code:
#include <mpi.h>
#include <mpe.h>
#include <stdlib.h>
#define CHUNKSIZE 1000
/* message tags */
#define REQUEST 1
#define REPLY 2
int main(int argc, char *argv[])
{
int iter;
int in, out, i, iters, max, ix, iy, ranks [1], done, temp;
double x, y, Pi, error, epsilon;
int numprocs, myid, server, totalin, totalout, workerid;
int rands[CHUNKSIZE], request;
MPI_Comm world, workers;
MPI_Group world_group, worker_group;
MPI_Status status;
MPI_Init(&argc,&argv);
world = MPI_COMM_WORLD;
MPI_Comm_size(world,&numprocs);
MPI_Comm_rank(world,&myid);
server = numprocs-1; /* last proc is server */
if(myid==0) sscanf(argv[1], "%lf", &epsilon);
MPI_Bcast(&epsilon, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Comm_group(world, &world_group);
ranks[0] = server;
MPI_Group_excl(world_group, 1, ranks, &worker_group);
MPI_Comm_create(world, worker_group, &workers);
MPI_Group_free(&worker_group);
if(myid==server) /* I am the rand server */
{
srand(time(NULL));
do
{
MPI_Recv(&request, 1, MPI_INT, MPI_ANY_SOURCE, REQUEST, world, &status);
if(request)
{
for(i=0; i<CHUNKSIZE;)
{
rands[i] = rand();
if(rands[i]<=INT_MAX) ++i;
}
MPI_Send(rands, CHUNKSIZE, MPI_INT,status.MPI_SOURCE, REPLY, world);
}
}
while(request>0);
}
else /* I am a worker process */
{
request = 1;
done = in = out = 0;
max = INT_MAX; /* max int, for normalization */
MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
MPI_Comm_rank(workers, &workerid);
iter = 0;
while(!done)
{
++iter;
request = 1;
MPI_Recv(rands, CHUNKSIZE, MPI_INT, server, REPLY, world, &status);
for(i=0; i<CHUNKSIZE;)
{
x = (((double) rands[i++])/max)*2-1;
y = (((double) rands[i++])/max)*2-1;
if(x*x+y*y<1.0) ++in;
else ++out;
}
/* ** see error here ** */
MPI_Allreduce(&in, &totalin, 1, MPI_INT, MPI_SUM, workers);
MPI_Allreduce(&out, &totalout, 1, MPI_INT, MPI_SUM, workers);
/* only one of the above two MPI_Allreduce() functions working */
Pi = (4.0*totalin)/(totalin+totalout);
error = fabs( Pi-3.141592653589793238462643);
done = (error<epsilon||(totalin+totalout)>1000000);
request = (done)?0:1;
if(myid==0)
{
printf("\rpi = %23.20f", Pi);
MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
}
else
{
if(request)
MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
}
MPI_Comm_free(&workers);
}
}
if(myid==0)
{
printf("\npoints: %d\nin: %d, out: %d, <ret> to exit\n", totalin+totalout, totalin, totalout);
getchar();
}
MPI_Finalize();
}
What is the error here? Am I missing something? Any help or pointer will be highly appreciated.
You are freeing the workers communicator before you are done using it. Move the MPI_Comm_free(&workers) call after the while(!done) { ... } loop.

MPI_Waitany causes segmentation fault

I am using MPI to distribute images to different processes so that:
Process 0 distribute images to different processes.
Processes other
than 0 process the image and then send the result back to process 0.
Process 0 tries to busy a process whenever the latter finishes its job with an image, so that as soon as it is idle, it is assigned another image to process. The code follows:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include "mpi.h"
#define MAXPROC 16 /* Max number of processes */
#define TOTAL_FILES 7
int main(int argc, char* argv[]) {
int i, nprocs, tprocs, me, index;
const int tag = 42; /* Tag value for communication */
MPI_Request recv_req[MAXPROC]; /* Request objects for non-blocking receive */
MPI_Request send_req[MAXPROC]; /* Request objects for non-blocking send */
MPI_Status status; /* Status object for non-blocing receive */
char myname[MPI_MAX_PROCESSOR_NAME]; /* Local host name string */
char hostname[MAXPROC][MPI_MAX_PROCESSOR_NAME]; /* Received host names */
int namelen;
MPI_Init(&argc, &argv); /* Initialize MPI */
MPI_Comm_size(MPI_COMM_WORLD, &nprocs); /* Get nr of processes */
MPI_Comm_rank(MPI_COMM_WORLD, &me); /* Get own identifier */
MPI_Get_processor_name(myname, &namelen); /* Get host name */
myname[namelen++] = (char)0; /* Terminating null byte */
/* First check that we have at least 2 and at most MAXPROC processes */
if (nprocs<2 || nprocs>MAXPROC) {
if (me == 0) {
printf("You have to use at least 2 and at most %d processes\n", MAXPROC);
}
MPI_Finalize(); exit(0);
}
/* if TOTAL_FILES < nprocs then use only TOTAL_FILES + 1 procs */
tprocs = (TOTAL_FILES < nprocs) ? TOTAL_FILES + 1 : nprocs;
int done = -1;
if (me == 0) { /* Process 0 does this */
int send_counter = 0, received_counter;
for (i=1; i<tprocs; i++) {
MPI_Isend(&send_counter, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
++send_counter;
/* Receive a message from all other processes */
MPI_Irecv (hostname[i], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[i]);
}
for (received_counter = 0; received_counter < TOTAL_FILES; received_counter++){
/* Wait until at least one message has been received from any process other than 0*/
MPI_Waitany(tprocs-1, &recv_req[1], &index, &status);
if (index == MPI_UNDEFINED) perror("Errorrrrrrr");
printf("Received a message from process %d on %s\n", status.MPI_SOURCE, hostname[index+1]);
if (send_counter < TOTAL_FILES){ /* si todavia faltan imagenes por procesar */
MPI_Isend(&send_counter, 1, MPI_INT, status.MPI_SOURCE, tag, MPI_COMM_WORLD, &send_req[status.MPI_SOURCE]);
++send_counter;
MPI_Irecv (hostname[status.MPI_SOURCE], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[status.MPI_SOURCE]);
}
}
for (i=1; i<tprocs; i++) {
MPI_Isend(&done, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
}
} else if (me < tprocs) { /* all other processes do this */
int y;
MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);
while (y != -1) {
printf("Process %d: Received image %d\n", me, y);
sleep(me%3+1); /* Let the processes sleep for 1-3 seconds */
/* Send own identifier back to process 0 */
MPI_Send (myname, namelen, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);
}
}
MPI_Finalize();
exit(0);
}
which is based on this example.
Right now I'm getting a segmentation fault, not sure why. I'm fairly new to MPI but I can't see a mistake in the code above. It only happens with certain numbers of processes. For example, when TOTAL_FILES = 7 and is run with 5, 6 or 7 processes. Works fine with 9 processes or above.
The entire code can be found here. Trying it with 6 processes causes the mentioned error.
To compile and execute :
mpicc -Wall sscce.c -o sscce -lm
mpirun -np 6 sscce
It's not MPI_Waitany that is causing segmentation fault but it is the way you handle the case when all requests in recv_req[] are completed (i.e. index == MPI_UNDEFINED). perror() does not stop the code and it continues further and then segfaults in the printf statement while trying to access hostname[index+1]. The reason for all requests in the array being completed is that due to the use of MPI_ANY_SOURCE in the receive call the rank of the sender is not guaranteed to be equal to the index of the request in recv_req[] - simply compare index and status.MPI_SOURCE after MPI_Waitany returns to see it for yourself. Therefore the subsequent calls to MPI_Irecv with great probability overwrite still not completed requests and thus the number of requests that can get completed by MPI_Waitany is less than the actual number of results expected.
Also note that you never wait for the send requests to complete. You are lucky that Open MPI implementation uses an eager protocol to send small messages and therefore those get sent even though MPI_Wait(any|all) or MPI_Test(any|all) is never called on the started send requests.

Unclear behaviour in simple MPI send/receive program

I've been having a bug in my code for some time and could not figure out yet how to solve it.
What I'm trying to achieve is easy enough: every worker-node (i.e. node with rank!=0) gets a row (represented by 1-dimensional arry) in a square-structure that involves some computation. Once the computation is done, this row gets sent back to the master.
For testing purposes, there is no computation involved. All that's happening is:
master sends row number to worker, worker uses the row number to calculate the according values
worker sends the array with the result values back
Now, my issue is this:
all works as expected up to a certain size for the number of elements in a row (size = 1006) and number of workers > 1
if the elements in a row exceed 1006, workers fail to shutdown and the program does not terminate
this only occurs if I try to send the array back to the master. If I simply send back an INT, then everything is OK (see commented out line in doMasterTasks() and doWorkerTasks())
Based on the last bullet point, I assume that there must be some race-condition which only surfaces when the array to be sent back to the master reaches a certain size.
Do you have any idea what the issue could be?
Compile the following code with: mpicc -O2 -std=c99 -o simple
Run the executable like so: mpirun -np 3 simple <size> (e.g. 1006 or 1007)
Here's the code:
#include "mpi.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MASTER_RANK 0
#define TAG_RESULT 1
#define TAG_ROW 2
#define TAG_FINISHOFF 3
int mpi_call_result, my_rank, dimension, np;
// forward declarations
void doInitWork(int argc, char **argv);
void doMasterTasks(int argc, char **argv);
void doWorkerTasks(void);
void finalize();
void quit(const char *msg, int mpi_call_result);
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
void doMasterTasks(int argc, char **argv) {
printf("Starting to distribute work...\n");
int size = dimension;
int * dataBuffer = (int *) malloc(sizeof(int) * size);
int currentRow = 0;
int receivedRow = -1;
int rowsLeft = dimension;
MPI_Status status;
for (int i = 1; i < np; i++) {
MPI_Send(&currentRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
// MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
MPI_Recv(&receivedRow, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(&currentRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
shutdownWorkers();
free(dataBuffer);
}
void doWorkerTasks() {
printf("Worker %d started\n", my_rank);
// send the processed row back as the first element in the colours array.
int size = dimension;
int * data = (int *) malloc(sizeof(int) * size);
memset(data, 0, sizeof(size));
int processingRow = -1;
MPI_Status status;
for (;;) {
MPI_Recv(&processingRow, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == TAG_FINISHOFF) {
printf("Finish-OFF tag received!\n");
break;
} else {
// MPI_Send(data, size, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
MPI_Send(&processingRow, 1, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
}
}
printf("Slave %d finished work\n", my_rank);
free(data);
}
int main(int argc, char **argv) {
if (argc == 2) {
sscanf(argv[1], "%d", &dimension);
} else {
dimension = 1000;
}
doInitWork(argc, argv);
if (my_rank == MASTER_RANK) {
doMasterTasks(argc, argv);
} else {
doWorkerTasks();
}
finalize();
}
void quit(const char *msg, int mpi_call_result) {
printf("\n%s\n", msg);
MPI_Abort(MPI_COMM_WORLD, mpi_call_result);
exit(mpi_call_result);
}
void finalize() {
mpi_call_result = MPI_Finalize();
if (mpi_call_result != 0) {
quit("Finalizing the MPI system failed, aborting now...", mpi_call_result);
}
}
void doInitWork(int argc, char **argv) {
mpi_call_result = MPI_Init(&argc, &argv);
if (mpi_call_result != 0) {
quit("Error while initializing the system. Aborting now...\n", mpi_call_result);
}
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
}
Any help is greatly appreciated!
Best,
Chris
If you take a look at your doWorkerTasks, you see that they send exactly as many data messages as they receive; (and they receive one more to shut them down).
But your master code:
for (int i = 1; i < np; i++) {
MPI_Send(&currentRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
MPI_Send(&currentRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
sends np-2 more data messages than it receives. In particular, it only keeps receiving data until it has no more to send, even though there should be np-2 more data messages outstanding. Changing the code to the following:
int rowsLeftToSend= dimension;
int rowsLeftToReceive = dimension;
for (int i = 1; i < np; i++) {
MPI_Send(&currentRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
while (rowsLeftToReceive > 0) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
rowsLeftToReceive--;
if (rowsLeftToSend> 0) {
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(&currentRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
}
Now works.
Why the code doesn't deadlock (note this is deadlock, not a race condition; this is a more common parallel error in distributed computing) for smaller message sizes is a subtle detail of how most MPI implementations work. Generally, MPI implementations just "shove" small messages down the pipe whether or not the receiver is ready for them, but larger messages (since they take more storage resources on the receiving end) need some handshaking between the sender and the receiver. (If you want to find out more, search for eager vs rendezvous protocols).
So for the small message case (less than 1006 ints in this case, and 1 int definitely works, too) the worker nodes did their send whether or not the master was receiving them. If the master had called MPI_Recv(), the messages would have been there already and it would have returned immediately. But it didn't, so there were pending messages on the master side; but it didn't matter. The master sent out its kill messages, and everyone exited.
But for larger messages, the remaining send()s have to have the receiver particpating to clear, and since the receiver never does, the remaining workers hang.
Note that even for the small message case where there was no deadlock, the code didn't work properly - there was missing computed data.
Update: There was a similar problem in your shutdownWorkers:
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
Here you are sending to all processes, including rank 0, the one doing the sending. In principle, that MPI_Send should deadlock, as it is a blocking send and there isn't a matching receive already posted. You could post a non-blocking receive before to avoid this, but that's unnecessary -- rank 0 doesn't need to let itself know to end. So just change the loop to
for (int i = 1; i < np; i++)
tl;dr - your code deadlocked because the master wasn't receiving enough messages from the workers; it happened to work for small message sizes because of an implementation detail common to most MPI libraries.

How to speed up this problem by MPI

(1). I am wondering how I can speed up the time-consuming computation in the loop of my code below using MPI?
int main(int argc, char ** argv)
{
// some operations
f(size);
// some operations
return 0;
}
void f(int size)
{
// some operations
int i;
double * array = new double [size];
for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?
{
array[i] = complicated_computation(); // time comsuming computation
}
// some operations using all elements in array
delete [] array;
}
As shown in the code, I want to do some operations before and after the part to be paralleled with MPI, but I don't know how to specify where the parallel part begins and ends.
(2) My current code is using OpenMP to speed up the comutation.
void f(int size)
{
// some operations
int i;
double * array = new double [size];
omp_set_num_threads(_nb_threads);
#pragma omp parallel shared(array) private(i)
{
#pragma omp for schedule(dynamic) nowait
for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array?
{
array[i] = complicated_computation(); // time comsuming computation
}
}
// some operations using all elements in array
}
I wonder if I change to use MPI, is it possible to have the code written both for OpenMP and MPI? If it is possible, how to write the code and how to compile and run the code?
(3) Our cluster has three versions of MPI: mvapich-1.0.1, mvapich2-1.0.3, openmpi-1.2.6.
Are their usage same? Especially in my case.
Which one is best for me to use?
Thanks and regards!
UPDATE:
I like to explain a bit more about my question about how to specify the start and end of the parallel part. In the following toy code, I want to limit the parallel part within function f():
#include "mpi.h"
#include <stdio.h>
#include <string.h>
void f();
int main(int argc, char **argv)
{
printf("%s\n", "Start running!");
f();
printf("%s\n", "End running!");
return 0;
}
void f()
{
char idstr[32]; char buff[128];
int numprocs; int myid; int i;
MPI_Status stat;
printf("Entering function f().\n");
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if(myid == 0)
{
printf("WE have %d processors\n", numprocs);
for(i=1;i<numprocs;i++)
{
sprintf(buff, "Hello %d", i);
MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }
for(i=1;i<numprocs;i++)
{
MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);
printf("%s\n", buff);
}
}
else
{
MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);
sprintf(idstr, " Processor %d ", myid);
strcat(buff, idstr);
strcat(buff, "reporting for duty\n");
MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
}
MPI_Finalize();
printf("Leaving function f().\n");
}
However, the running output is not expected. The printf parts before and after the parallel part have been executed by every process, not just the main process:
$ mpirun -np 3 ex2
Start running!
Entering function f().
Start running!
Entering function f().
Start running!
Entering function f().
WE have 3 processors
Hello 1 Processor 1 reporting for duty
Hello 2 Processor 2 reporting for duty
Leaving function f().
End running!
Leaving function f().
End running!
Leaving function f().
End running!
So it seems to me the parallel part is not limited between MPI_Init() and MPI_Finalize().
Besides this one, I am still hoping someone could answer my other questions. Thanks!
Quick edit (because I either can't figure out how to leave comments, or I'm not allowed to leave comments yet) -- 3lectrologos is incorrect about the parallel part of MPI programs. You cannot do serial work before MPI_Init and after MPI_Finalize and expect it to actually be serial -- it will still be executed by all MPI threads.
I think part of the issue is that the "parallel part" of an MPI program is the entire program. MPI will start executing the same program (your main function) on each node you specify at approximately the same time. The MPI_Init call just sets certain things up for the program so it can use the MPI calls correctly.
The correct "template" (in pseudo-code) for what I think you want to do would be:
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { // Do the serial part on a single MPI thread
printf("Performing serial computation on cpu %d\n", myid);
PreParallelWork();
}
ParallelWork(); // Every MPI thread will run the parallel work
if (myid == 0) { // Do the final serial part on a single MPI thread
printf("Performing the final serial computation on cpu %d\n", myid);
PostParallelWork();
}
MPI_Finalize();
return 0;
}
The MPI_Init (with args of &argc and &argv. It is the requirement of MPI implementations) must be really the first executed statement of MAIN. And Finalize must be the very last executed statement.
main() will be started on every node in MPI environment. Parameters like number of nodes, node_id, and master node address may be passed via argc and argv.
It is framework:
#include "mpi.h"
#include <stdio.h>
#include <string.h>
void f();
int numprocs; int myid;
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if(myid == 0)
{ /* main process. user interaction is ONLY HERE */
printf("%s\n", "Start running!");
MPI_Send ... requests with job
/*may be call f in main too*/
MPU_Reqv ... results..
printf("%s\n", "End running!");
}
else
{
/* Slaves. Do sit here and wait a job from main process */
MPI_Recv(.input..);
/* dispatch input by parsing it
(if there can be different types of work)
or just do the work */
f(..)
MPI_Send(.results..);
}
MPI_Finalize();
return 0;
}
If all the values in the array are independent, then it should be trivially parallelizable. Split the array into chunks of roughly equal size, give each chunk to a node, and then compile the results back together.
The easiest migration to cluster form OpenMP can be "Cluster OpenMP" from intel.
For MPI you need to completely rewrite dispatching of work.

Resources