Related
As a continuation to my previous question, I have modified the code for variable number of kernels. However, the way Gatherv is implemented in my code seems to be unreliable. Once in 3-4 runs the end sequence in the collecting buffer ends up being corrupted, it seems like, due to the memory leakage. Sample code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
int main (int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int world_size,*sendarray;
int rank, *rbuf=NULL, count,total_counts=0;
int *displs=NULL,i,*rcounts=NULL;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
if(rank==0){
displs = malloc((world_size+1)*sizeof(int));
for(int i=1;i<=world_size; i++)displs[i]=0;
rcounts=malloc(world_size*sizeof(int));
sendarray=malloc(1*sizeof(int));
for(int i=0;i<1;i++)sendarray[i]=1111;
count=1;
}
if(rank!=0){
int size=rank*2;
sendarray=malloc(size*sizeof(int));
for(int i=0;i<size;i++)sendarray[i]=rank;
count=size;
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Gather(&count,1,MPI_INT,rcounts,1,MPI_INT,0,MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
if(rank==0){
displs[0]=0;
for(int i=1;i<=world_size; i++){
for(int j=0; j<i; j++)displs[i]+=rcounts[j];
}
total_counts=0;
for(int i=0;i<world_size;i++)total_counts+=rcounts[i];
rbuf = malloc(10*sizeof(int));
}
MPI_Gatherv(sendarray, count, MPI_INT, rbuf, rcounts,
displs, MPI_INT, 0, MPI_COMM_WORLD);
if(rank==0){
int SIZE=total_counts;
for(int i=0;i<SIZE;i++)printf("(%d) %d ",i, rbuf[i]);
free(rbuf);
free(displs);
free(rcounts);
}
if(rank!=0)free(sendarray);
MPI_Finalize();
}
Why is this happening and is there a way to fix it?
This becomes much worse in my actual project. Each sending buffer contains 150 doubles. The receiving buffer gets very dirty and sometimes I get an error of bed termination with exit code 6 or 11.
Can anyone at least reproduce my errors?
My guess: I am allocating memory for sendarray on each thread separately. If my virtual machine was 1-to-1 to the hardware, then, probably, there would be no such problem. But I have only 2 cores and run a process for 4 or more. Could it be the reason?
Change this line:
rbuf = malloc(10*sizeof(int));
to:
rbuf = malloc(total_counts*sizeof(int));
As a side note: each MPI process exists in its own process address space and they cannot stomp on eachothers data except through erroneous data explicitly passed through the MPI_XXX functions, which results in undefined behavior.
I have a project where I need to time any bad implementation of MPI_Bcast using MPI_Isend and MPI_Irecv, and compare it against MPI_Bcast. Because the time on these programs is 0.000000 Seconds, I need to use a large array (as I have done). What is not yet in my code below is that the for loop and MPI_Irecv/Isend functions should be in a loop to make the program take a useful amount of time to finish.
Here is my code, and I'll discuss the problem I am having below it:
#include <stdio.h>
#include <string.h>
#include <mpi.h>
int main(int argc, char **argv) {
int a = 1000000000;
int i, N;
int Start_time, End_time, Elapse_Time;
int proc_rank, partner, world_size;
MPI_Status stat;
float mydata[a];
MPI_Request request;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &proc_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
Start_time = MPI_Wtime();
for (i = 0; i < a; i++) {
mydata[i] = 0.2567*i;
}
MPI_Irecv(mydata, a, MPI_BYTE, 0, 1, MPI_COMM_WORLD, &request);
MPI_Isend(mydata, a, MPI_BYTE, 0, 1, MPI_COMM_WORLD, &request);
End_time = MPI_Wtime();
Elapse_Time = End_time - Start_time;
printf("Time on process %d is %f Seconds.\n", proc_rank, Elapse_Time);
MPI_Finalize;
return 0;
}
When I run this using the command mpirun -np 4 ./a.out, I only get the time for one processor, but I'm not really sure why. I guess I'm just not understanding how these functions work, or how I should be using them.
Thank you for the help!
There are a few different issues in your code, all likely to lead to it to crash and or to behave strangely:
As already mentioned by #Olaf the allocation of the array mydata on the stack is a very bad idea. For arrays this large, you should definitely go for an allocation on the heap with an explicit call to malloc(). Even so, you are playing with some serious chunks of memory here, so be careful of not exhausting what's available on your machine. Moreover, some MPI libraries have difficulties to deal with messages of size greater than 2GB, which is the case of yours. So again, be careful with that.
You use mydata for both sending and receiving purpose. However, once you posted a non-blocking communication, you cannot reuse the corresponding message until the communication is finished. So in your case, you'll need two arrays, one for sending and one for receiving.
The type of the data you pass to your MPI calls, namely MPI_BYTE, isn't coherent with the actual type of the data you transfer, namely float. You should use MPI_FLOAT instead.
You call MPI_Irecv() and MPI_Isend() without calling any valid MPI_Wait() or MPI_Test() functions. This is wrong since this means that the communications might never occur.
MPI_Wtime() returns a double, not an int. This isn't an error per se but it might lead to unexpected results. Moreover, the format requested in your call to printf() corresponds to a floating point data, not an integer, so you have to make it coherent.
(Minor - typo ) You missed the () for MPI_Finalize().
(Minor - I guess) You only communicate with process #0...
So here is some possible version of a working code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
#include <mpi.h>
int main(int argc, char **argv) {
int a = 1000000000;
int i, from, to;
double Start_time, End_time, Elapse_Time;
int proc_rank, world_size;
float *mysenddata, *myrecvdata;
MPI_Request requests[2];
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &proc_rank );
MPI_Comm_size( MPI_COMM_WORLD, &world_size );
Start_time = MPI_Wtime();
mysenddata = (float*) malloc( a * sizeof( float ) );
myrecvdata = (float*) malloc( a * sizeof( float ) );
assert( mysenddata != NULL ); /*very crude sanity check */
assert( myrecvdata != NULL ); /*very crude sanity check */
for ( i = 0; i < a; i++ ) {
mysenddata[i] = 0.2567 * i;
}
from = ( proc_rank + world_size - 1 ) % world_size;
to = ( proc_rank + 1 ) % world_size;
MPI_Irecv( myrecvdata, a, MPI_FLOAT, from, 1, MPI_COMM_WORLD, &requests[0] );
MPI_Isend( mysenddata, a, MPI_FLOAT, to, 1, MPI_COMM_WORLD, &requests[1] );
MPI_Waitall( 2, requests, MPI_STATUSES_IGNORE );
End_time = MPI_Wtime();
Elapse_Time = End_time - Start_time;
printf( "Time on process %d is %f Seconds.\n", proc_rank, Elapse_Time );
free( mysenddata );
free( myrecvdata );
MPI_Finalize();
return 0;
}
NB: for the sake of having a code working in all circumstances, I implemented a communication ring here, were process 0 sends to process 1 and receives from process size-1... However, in the context of your re-implementation of a broadcast, you can just ignore this (ie. the from and to parameters).
The only explaination I see is your other process is crashing before the print. Try to put some part of your code in comment and reexecute the code.
Try this way and see if you see a difference
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &proc_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
/*Start_time = MPI_Wtime();
for (i = 0; i < a; i++) {
mydata[i] = 0.2567*i;
}
MPI_Irecv(mydata, a, MPI_BYTE, 0, 1, MPI_COMM_WORLD, &request);
MPI_Isend(mydata, a, MPI_BYTE, 0, 1, MPI_COMM_WORLD, &request);
End_time = MPI_Wtime();
Elapse_Time = End_time - Start_time;*/
printf("I'm process %d.\n", proc_rank);
MPI_Finalize;
I am using MPI to distribute images to different processes so that:
Process 0 distribute images to different processes.
Processes other
than 0 process the image and then send the result back to process 0.
Process 0 tries to busy a process whenever the latter finishes its job with an image, so that as soon as it is idle, it is assigned another image to process. The code follows:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include "mpi.h"
#define MAXPROC 16 /* Max number of processes */
#define TOTAL_FILES 7
int main(int argc, char* argv[]) {
int i, nprocs, tprocs, me, index;
const int tag = 42; /* Tag value for communication */
MPI_Request recv_req[MAXPROC]; /* Request objects for non-blocking receive */
MPI_Request send_req[MAXPROC]; /* Request objects for non-blocking send */
MPI_Status status; /* Status object for non-blocing receive */
char myname[MPI_MAX_PROCESSOR_NAME]; /* Local host name string */
char hostname[MAXPROC][MPI_MAX_PROCESSOR_NAME]; /* Received host names */
int namelen;
MPI_Init(&argc, &argv); /* Initialize MPI */
MPI_Comm_size(MPI_COMM_WORLD, &nprocs); /* Get nr of processes */
MPI_Comm_rank(MPI_COMM_WORLD, &me); /* Get own identifier */
MPI_Get_processor_name(myname, &namelen); /* Get host name */
myname[namelen++] = (char)0; /* Terminating null byte */
/* First check that we have at least 2 and at most MAXPROC processes */
if (nprocs<2 || nprocs>MAXPROC) {
if (me == 0) {
printf("You have to use at least 2 and at most %d processes\n", MAXPROC);
}
MPI_Finalize(); exit(0);
}
/* if TOTAL_FILES < nprocs then use only TOTAL_FILES + 1 procs */
tprocs = (TOTAL_FILES < nprocs) ? TOTAL_FILES + 1 : nprocs;
int done = -1;
if (me == 0) { /* Process 0 does this */
int send_counter = 0, received_counter;
for (i=1; i<tprocs; i++) {
MPI_Isend(&send_counter, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
++send_counter;
/* Receive a message from all other processes */
MPI_Irecv (hostname[i], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[i]);
}
for (received_counter = 0; received_counter < TOTAL_FILES; received_counter++){
/* Wait until at least one message has been received from any process other than 0*/
MPI_Waitany(tprocs-1, &recv_req[1], &index, &status);
if (index == MPI_UNDEFINED) perror("Errorrrrrrr");
printf("Received a message from process %d on %s\n", status.MPI_SOURCE, hostname[index+1]);
if (send_counter < TOTAL_FILES){ /* si todavia faltan imagenes por procesar */
MPI_Isend(&send_counter, 1, MPI_INT, status.MPI_SOURCE, tag, MPI_COMM_WORLD, &send_req[status.MPI_SOURCE]);
++send_counter;
MPI_Irecv (hostname[status.MPI_SOURCE], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[status.MPI_SOURCE]);
}
}
for (i=1; i<tprocs; i++) {
MPI_Isend(&done, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
}
} else if (me < tprocs) { /* all other processes do this */
int y;
MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);
while (y != -1) {
printf("Process %d: Received image %d\n", me, y);
sleep(me%3+1); /* Let the processes sleep for 1-3 seconds */
/* Send own identifier back to process 0 */
MPI_Send (myname, namelen, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);
}
}
MPI_Finalize();
exit(0);
}
which is based on this example.
Right now I'm getting a segmentation fault, not sure why. I'm fairly new to MPI but I can't see a mistake in the code above. It only happens with certain numbers of processes. For example, when TOTAL_FILES = 7 and is run with 5, 6 or 7 processes. Works fine with 9 processes or above.
The entire code can be found here. Trying it with 6 processes causes the mentioned error.
To compile and execute :
mpicc -Wall sscce.c -o sscce -lm
mpirun -np 6 sscce
It's not MPI_Waitany that is causing segmentation fault but it is the way you handle the case when all requests in recv_req[] are completed (i.e. index == MPI_UNDEFINED). perror() does not stop the code and it continues further and then segfaults in the printf statement while trying to access hostname[index+1]. The reason for all requests in the array being completed is that due to the use of MPI_ANY_SOURCE in the receive call the rank of the sender is not guaranteed to be equal to the index of the request in recv_req[] - simply compare index and status.MPI_SOURCE after MPI_Waitany returns to see it for yourself. Therefore the subsequent calls to MPI_Irecv with great probability overwrite still not completed requests and thus the number of requests that can get completed by MPI_Waitany is less than the actual number of results expected.
Also note that you never wait for the send requests to complete. You are lucky that Open MPI implementation uses an eager protocol to send small messages and therefore those get sent even though MPI_Wait(any|all) or MPI_Test(any|all) is never called on the started send requests.
It's a master - slave situation. How can I make the master process search in a non-blocking way for a message transmitted to him. If at the point of searching there is no message transmited to master, it will continue with iterations. However, if there is a message transmitted to him, it will process the message and than continue with iterations. See the comment inside /* */
int main(int argc, char *argv[])
{
int numprocs,
rank;
MPI_Request request;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
if(rank == 0) // the searching process
{
for (int i=0; i < 4000000; i++)
{
// do some stuff here; does not matter what
/* see if any message has been transmitted to me at this point
without blocking the process; if at this time it happens to be
transmitted, do something and than continue with for iternations;
or just continue with for iterations and maybe next time will
have a message which sends me to do something */
}
}
else
{
int flag = 1;
while(flag)
{
// something done that at some point changes flag
}
// send a message to process with rank 0 and don't get stuck here
MPI_Isend(12, 1, MPI_INT, 0, 100, MPI_COMM_WORLD, &request);
// some other stuff done
// wait for message to be transmitted
MPI_Wait(&request, &status);
}
MPI_Finalize();
return 0;
}
One solution is to use MPI_IProbe() to test if a message is waiting.
On this line, use a pointer instead of "12"
MPI_Isend(12, 1, MPI_INT, 0, 100, MPI_COMM_WORLD, &request);
Add flag=0 here :
while(flag!=0)
{
// something done that at some point changes flag
}
Here goes the code :
#include "mpi.h"
#include "stdio.h"
int main(int argc, char *argv[])
{
int numprocs,
rank;
MPI_Request request;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
if(rank == 0) // the searching process
{
int i;
for (i=0; i < 4000000; i++)
{
// do some stuff here; does not matter what
//printf("I am still running!\n");
int flag;
MPI_Iprobe(MPI_ANY_SOURCE,100,MPI_COMM_WORLD,&flag,&status);
if(flag!=0){
int value;
MPI_Recv(&value, 1, MPI_INT, status.MPI_SOURCE, status.MPI_TAG, MPI_COMM_WORLD, &status);
printf("I (0) received %d \n",value);
}
/* see if any message has been transmitted to me at this point
without blocking the process; if at this time it happens to be
transmitted, do something and than continue with for iternations;
or just continue with for iterations and maybe next time will
have a message which sends me to do something */
}
}
else
{
int i;
for(i=0;i<42;i++){
int flag = 1;
while(flag!=0)
{
// something done that at some point changes flag
flag=0;
}
int bla=1000*rank+i;
// send a message to process with rank 0 and don't get stuck here
MPI_Isend(&bla, 1, MPI_INT, 0, 100, MPI_COMM_WORLD, &request);
// some other stuff done
printf("I (%d) do something\n",rank);
// wait for message to be transmitted
MPI_Wait(&request, &status);
}
}
MPI_Finalize();
return 0;
}
Bye,
Francis
The non-blocking test for available messages is done using the MPI_Iprobe call. In your case it would look like:
int available;
MPI_Status status;
if(rank == 0) // the searching process
{
for (int i=0; i < 4000000; i++)
{
// do some stuff here; does not matter what
/* see if any message has been transmitted to me at this point
without blocking the process; if at this time it happens to be
transmitted, do something and than continue with for iternations;
or just continue with for iterations and maybe next time will
have a message which sends me to do something */
// Tag value 100 matches the value used in the send operation
MPI_Iprobe(MPI_ANY_SOURCE, 100, MPI_COMM_WORLD, &available, &status);
if (available)
{
// Message source rank is now available in status.MPI_SOURCE
// Receive the message
MPI_Recv(..., status.MPI_SOURCE, status.MPI_TAG, MPI_COMM_WORLD, &status);
}
}
}
MPI_ANY_SOURCE is used as wildcard rank, i.e. it instruct MPI_Irecv to check for messages from any source. If a corresponding send was posted, then available will be set to true, otherwise it will be set to false. The actual source of the message is also written to the MPI_SOURCE field of the status object. If the available flags indicates the availability of a matching message, one should then post a receive operation in order to receive it. It is important that the rank and the tag are explicitly specified in the receive operation, otherwise a different message could get received instead.
You could also use persistent connections. These behave very much like non-blocking operations with the important difference that they could be restarted multiple times. The same code with persistent connections would look like:
if(rank == 0) // the searching process
{
MPI_Request req;
MPI_Status status;
int completed;
// Prepare the persistent connection request
MPI_Recv_init(buffer, buf_size, buf_type,
MPI_ANY_SOURCE, 100, MPI_COMM_WORLD, &req);
// Make the request active
MPI_Start(&req);
for (int i=0; i < 4000000; i++)
{
// do some stuff here; does not matter what
/* see if any message has been transmitted to me at this point
without blocking the process; if at this time it happens to be
transmitted, do something and than continue with for iternations;
or just continue with for iterations and maybe next time will
have a message which sends me to do something */
// Non-blocking Test for request completion
MPI_Test(&req, &completed, &status);
if (completed)
{
// Message is now in buffer
// Process the message
// ...
// Activate the request again
MPI_Start(&req);
}
}
// Cancel and free the request
MPI_Cancel(&req);
MPI_Request_free(&req);
}
Persistent operations have a slight performance edge over the non-persistent ones, shown in the previous code sample. It is important that buffer is not accessed while the request is active, i.e. after the call to MPI_Start and before MPI_Test signals completion. Persistent send/receive operations also match non-persistent receive/send operations, therefore it is not necessary to change the code of the workers and they can still use MPI_Isend.
I've been having a bug in my code for some time and could not figure out yet how to solve it.
What I'm trying to achieve is easy enough: every worker-node (i.e. node with rank!=0) gets a row (represented by 1-dimensional arry) in a square-structure that involves some computation. Once the computation is done, this row gets sent back to the master.
For testing purposes, there is no computation involved. All that's happening is:
master sends row number to worker, worker uses the row number to calculate the according values
worker sends the array with the result values back
Now, my issue is this:
all works as expected up to a certain size for the number of elements in a row (size = 1006) and number of workers > 1
if the elements in a row exceed 1006, workers fail to shutdown and the program does not terminate
this only occurs if I try to send the array back to the master. If I simply send back an INT, then everything is OK (see commented out line in doMasterTasks() and doWorkerTasks())
Based on the last bullet point, I assume that there must be some race-condition which only surfaces when the array to be sent back to the master reaches a certain size.
Do you have any idea what the issue could be?
Compile the following code with: mpicc -O2 -std=c99 -o simple
Run the executable like so: mpirun -np 3 simple <size> (e.g. 1006 or 1007)
Here's the code:
#include "mpi.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MASTER_RANK 0
#define TAG_RESULT 1
#define TAG_ROW 2
#define TAG_FINISHOFF 3
int mpi_call_result, my_rank, dimension, np;
// forward declarations
void doInitWork(int argc, char **argv);
void doMasterTasks(int argc, char **argv);
void doWorkerTasks(void);
void finalize();
void quit(const char *msg, int mpi_call_result);
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
void doMasterTasks(int argc, char **argv) {
printf("Starting to distribute work...\n");
int size = dimension;
int * dataBuffer = (int *) malloc(sizeof(int) * size);
int currentRow = 0;
int receivedRow = -1;
int rowsLeft = dimension;
MPI_Status status;
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
// MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
MPI_Recv(&receivedRow, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
shutdownWorkers();
free(dataBuffer);
}
void doWorkerTasks() {
printf("Worker %d started\n", my_rank);
// send the processed row back as the first element in the colours array.
int size = dimension;
int * data = (int *) malloc(sizeof(int) * size);
memset(data, 0, sizeof(size));
int processingRow = -1;
MPI_Status status;
for (;;) {
MPI_Recv(&processingRow, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == TAG_FINISHOFF) {
printf("Finish-OFF tag received!\n");
break;
} else {
// MPI_Send(data, size, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
MPI_Send(&processingRow, 1, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
}
}
printf("Slave %d finished work\n", my_rank);
free(data);
}
int main(int argc, char **argv) {
if (argc == 2) {
sscanf(argv[1], "%d", &dimension);
} else {
dimension = 1000;
}
doInitWork(argc, argv);
if (my_rank == MASTER_RANK) {
doMasterTasks(argc, argv);
} else {
doWorkerTasks();
}
finalize();
}
void quit(const char *msg, int mpi_call_result) {
printf("\n%s\n", msg);
MPI_Abort(MPI_COMM_WORLD, mpi_call_result);
exit(mpi_call_result);
}
void finalize() {
mpi_call_result = MPI_Finalize();
if (mpi_call_result != 0) {
quit("Finalizing the MPI system failed, aborting now...", mpi_call_result);
}
}
void doInitWork(int argc, char **argv) {
mpi_call_result = MPI_Init(&argc, &argv);
if (mpi_call_result != 0) {
quit("Error while initializing the system. Aborting now...\n", mpi_call_result);
}
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
}
Any help is greatly appreciated!
Best,
Chris
If you take a look at your doWorkerTasks, you see that they send exactly as many data messages as they receive; (and they receive one more to shut them down).
But your master code:
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
sends np-2 more data messages than it receives. In particular, it only keeps receiving data until it has no more to send, even though there should be np-2 more data messages outstanding. Changing the code to the following:
int rowsLeftToSend= dimension;
int rowsLeftToReceive = dimension;
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
while (rowsLeftToReceive > 0) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
rowsLeftToReceive--;
if (rowsLeftToSend> 0) {
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
}
Now works.
Why the code doesn't deadlock (note this is deadlock, not a race condition; this is a more common parallel error in distributed computing) for smaller message sizes is a subtle detail of how most MPI implementations work. Generally, MPI implementations just "shove" small messages down the pipe whether or not the receiver is ready for them, but larger messages (since they take more storage resources on the receiving end) need some handshaking between the sender and the receiver. (If you want to find out more, search for eager vs rendezvous protocols).
So for the small message case (less than 1006 ints in this case, and 1 int definitely works, too) the worker nodes did their send whether or not the master was receiving them. If the master had called MPI_Recv(), the messages would have been there already and it would have returned immediately. But it didn't, so there were pending messages on the master side; but it didn't matter. The master sent out its kill messages, and everyone exited.
But for larger messages, the remaining send()s have to have the receiver particpating to clear, and since the receiver never does, the remaining workers hang.
Note that even for the small message case where there was no deadlock, the code didn't work properly - there was missing computed data.
Update: There was a similar problem in your shutdownWorkers:
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
Here you are sending to all processes, including rank 0, the one doing the sending. In principle, that MPI_Send should deadlock, as it is a blocking send and there isn't a matching receive already posted. You could post a non-blocking receive before to avoid this, but that's unnecessary -- rank 0 doesn't need to let itself know to end. So just change the loop to
for (int i = 1; i < np; i++)
tl;dr - your code deadlocked because the master wasn't receiving enough messages from the workers; it happened to work for small message sizes because of an implementation detail common to most MPI libraries.