MPI: What to do when number of expected MPI_Recv is unkown - c

I've got many slave nodes which might or might not send messages to the master node. So currently there's no way the master node knows how many MPI_Recv to expect. Slave nodes had to sent minimum number of messages to the master node for efficiency reasons.
I managed to find a cool trick, which sends an additional "done" message when its no longer expecting any messages. Unfortunately, it doesn't seem to work in my case, where there're variable number of senders. Any idea on how to go about this? Thanks!
if(rank == 0){ //MASTER NODE
while (1) {
MPI_Recv(&buffer, 10, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == DONE) break;
/* Do stuff */
}
}else{ //MANY SLAVE NODES
if(some conditions){
MPI_Send(&buffer, 64, MPI_INT, root, 1, MPI_COMM_WORLD);
}
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Send(NULL, 1, MPI_INT, root, DONE, MPI_COMM_WORLD);
Not working, the program seem to be still waiting for a MPI_Recv

A simpler and more elegant option would be to use the MPI_IBARRIER. Have each worker call all of the sends that it needs to and then call MPI_IBARRIER when it's done. On the master, you can loop on both an MPI_IRECV on MPI_ANY_SOURCE and an MPI_IBARRIER. When the MPI_IBARRIER is done, you know that everyone has finished and you can cancel the MPI_IRECV and move on. The pseudocode would look something like this:
if (master) {
/* Start the barrier. Each process will join when it's done. */
MPI_Ibarrier(MPI_COMM_WORLD, &requests[0]);
do {
/* Do the work */
MPI_Irecv(..., MPI_ANY_SOURCE, &requests[1]);
/* If the index that finished is 1, we received a message.
* Otherwise, we finished the barrier and we're done. */
MPI_Waitany(2, requests, &index, MPI_STATUSES_IGNORE);
} while (index == 1);
/* If we're done, we should cancel the receive request and move on. */
MPI_Cancel(&requests[1]);
} else {
/* Keep sending work back to the master until we're done. */
while( ...work is to be done... ) {
MPI_Send(...);
}
/* When we finish, join the Ibarrier. Note that
* you can't use an MPI_Barrier here because it
* has to match with the MPI_Ibarrier above. */
MPI_Ibarrier(MPI_COMM_WORLD, &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
}

1- you called MPI_Barrier in wrong place, it should be called after MPI_Send.
2- the root will exit from loop when it receives DONE from all other ranks (size -1).
the code after some modifications:
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char** argv)
{
MPI_Init(NULL, NULL);
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Status status;
int DONE = 888;
int buffer = 77;
int root = 0 ;
printf("here is rank %d with size=%d\n" , rank , size);fflush(stdout);
int num_of_DONE = 0 ;
if(rank == 0){ //MASTER NODE
while (1) {
MPI_Recv(&buffer, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("root recev %d from %d with tag = %d\n" , buffer , status.MPI_SOURCE , status.MPI_TAG );fflush(stdout);
if (status.MPI_TAG == DONE)
num_of_DONE++;
printf("num_of_DONE=%d\n" , num_of_DONE);fflush(stdout);
if(num_of_DONE == size -1)
break;
/* Do stuff */
}
}else{ //MANY SLAVE NODES
if(1){
buffer = 66;
MPI_Send(&buffer, 1, MPI_INT, root, 1, MPI_COMM_WORLD);
printf("rank %d sent data.\n" , rank);fflush(stdout);
}
}
if(rank != 0)
{
buffer = 55;
MPI_Send(&buffer, 1, MPI_INT, root, DONE, MPI_COMM_WORLD);
}
MPI_Barrier(MPI_COMM_WORLD);
printf("rank %d done.\n" , rank);fflush(stdout);
MPI_Finalize();
return 0;
}
output:
hosam#hosamPPc:~/Desktop$ mpicc -o aa aa.c
hosam#hosamPPc:~/Desktop$ mpirun -n 3 ./aa
here is rank 2 with size=3
here is rank 0 with size=3
rank 2 sent data.
here is rank 1 with size=3
rank 1 sent data.
root recev 66 from 1 with tag = 1
num_of_DONE=0
root recev 66 from 2 with tag = 1
num_of_DONE=0
root recev 55 from 2 with tag = 888
num_of_DONE=1
root recev 55 from 1 with tag = 888
num_of_DONE=2
rank 0 done.
rank 1 done.
rank 2 done.

Related

MPI - MPI_Probe does not block and receives same message

So, somehow MPI_Probe receives the same message even though it is only sent once.
I execute the program with only 2 process of which process 1 is sending two messages, one two retrieve a task and another one to send the result. I send the messages with different tags to differentiate those two.
So what's supposed to happen is following:
Process 0 waits for a task request
Process 1 sends a task request indicated by tag=0
Process 0 sends the task
Process 1 does the task and sends the results back to process 0 indicated by tag=1.
-- This is the part where the first problem occurs: Process 0 still receives tag=0
   shown by the printf
Process 0 receives the task enters the else-if-block where tag==1 -- It does not.
Process 1 breaks the while loop - DEBUGGING PURPOSE
Procces 0 is supposed to be blocked by MPI_Probe but is not, instead it continues to
run and always shows that the received Tag is still 0
The code is still messy and inefficient. I just want a minimal working program to build upon and to optimize. But still any tip is appreciated!
The code:
if(rank == 0) {
struct stack* idle_stack = init_stack(env_size-1);
struct sudoku_stack* sudoku_stack_ptr = init_sudoku_stack(256);
push_sudoku(sudoku_stack_ptr, sudoku);
int it=0;
while(1) {
printf("ITERATION %d\n", it);
int idle_stack_size = stack_size(idle_stack);
int _sudoku_stack_empty = sudoku_stack_empty(sudoku_stack_ptr);
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("TAG: %d\n", status.MPI_TAG);
// So this part is supposed to be entered once
if(status.MPI_TAG == 0) {
if(!sudoku_stack_empty(sudoku_stack_ptr)) {
printf("SENDING TASK\n");
int *next_sudoku = pop_sudoku(sudoku_stack_ptr);
MPI_Send(next_sudoku, v_size, MPI_INT, status.MPI_SOURCE, 0, MPI_COMM_WORLD);
} else {
// But since the Tag stays 0, it is called multiple times until
// a stack overflow occurs
printf("PUSHING TO IDLE STACK\n");
push(idle_stack, status.MPI_SOURCE);
}
} else if(status.MPI_TAG == 1) {
// This part should actually be entered by the second received message
printf("RECEIVING SOLUTION\n");
int count;
MPI_Get_count(&status, MPI_INT, &count);
int* recv_sudokus = (int*)malloc(count * sizeof(int));
MPI_Recv(recv_sudokus, count, MPI_INT, status.MPI_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
for(int i = 0; i < count; i+=v_size) {
printf("%d ", recv_sudokus[i]);
if((i+1) % m_size == 0){
printf("\n");
}
}
// DEBUG - EXIT PROGRAM
teardown(sudoku_stack_ptr, idle_stack);
break;
push_sudoku(sudoku_stack_ptr, recv_sudokus);
} else if(status.MPI_TAG == 2) {
//int* solved_sudoku = (int*)malloc(v_size * sizeof(int));
//MPI_Recv(solved_sudoku, v_size, MPI_INT, status.MPI_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
//TODO
}
it++;
}
} else {
int* sudoku = (int*)malloc(sizeof(int)*v_size);
int* possible_sudokus = (int*)malloc(sizeof(int)*m_size*v_size);
while(1) {
// Send task request
printf("REQUESTING TASK\n");
int i = 0;
MPI_Send(&i, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
// Wait for and receive task
printf("RECEIVING TASK\n");
MPI_Recv(sudoku, v_size, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("CALCULATING\n");
int index = 0;
for(int i = 1; i <= m_size; i++) {
int is_safe_res = is_safe_first_empty_cell(sudoku, i);
if(is_safe_res) {
int* sudoku_cp = (int*)malloc(sizeof(int)*v_size);
memcpy(sudoku_cp, sudoku, sizeof(int)*v_size);
insert_to_first_empty_cell(sudoku_cp, i);
memcpy(&possible_sudokus[index], sudoku_cp, sizeof(int)*v_size);
index+=v_size;
free(sudoku_cp);
}
}
printf("SENDING\n");
MPI_Send(possible_sudokus, index*v_size, MPI_INT, 0, 1, MPI_COMM_WORLD);
break;
}
}

If Statements between MPI Isend/Irecv and MPI wait prevent program from progressing. What could be causing this?

I am creating a program to calculate the potential between two conductors using MPI. I am using non blocking send and receives so calculations can be done whilst information is sent between processors.
However, the if statements between the isend and irecv and waits commands, in which the calculations are contained, are not being entered. When the if statements and the calculations are removed the program proceeds to the wait statement.
I have checked that the calculations are correct and not causing issues. I have checked that the conditions for the if statements are correct.
Here is a section of test code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
/*MPI Specific Variables*/
int my_size, my_rank, up, down;
MPI_Request reqU, reqD, sreqU, sreqD;
MPI_Status rUstatus, rDstatus, sUstatus, sDstatus;
/*Physical Dimensions*/
double phi_0 = 1000.0;/*V*/
/*Other Variables*/
int grid_size = 100;
int slice = 50;
int x,y;
double grid_res_y = 0.2;
double grid_res_x = 0.1;
int xboundary = 10;
int yboundary = 25;
int boundary_proc = 2;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &my_size);
/*Determining neighbours*/
if (my_rank != 0) /*if statemets used to stop highest and lowest rank neighbours arent outside 0 - my_size-1 range of ranks*/
{
up = my_rank-1;
}
else
{
up = MPI_PROC_NULL;
}
if(my_rank != my_size-1)
{
down = my_rank+1;
}
else
{
down = MPI_PROC_NULL;
}
/*cross-check: presumed my_size is a factor of gridsize else there are odd sized slices and this is not coded for*/
if (grid_size%my_size != 0)
{
printf("ERROR - number of procs = %d, this is not a factor of grid_size %d\n", my_size, grid_size);
exit(0);
}
/*Set Up Distributed Data Approach*/
double phi[slice+2][grid_size]; /*extra 2 rows to allow for halo data*/
for (y=0; y < slice+2; y++)
{
for (x=0; x < grid_size; x++)
{
phi[y][x] = 0.0;
}
}
if(my_rank == 0) /*Boundary Containing rank does 2 loops. One over part with inner conductor and one over part without inner conductor*/
{
for(y=0; y < slice+1; y++)
{
for(x=xboundary; x < grid_size; x++)
{
phi[y][x] = phi_0;
}
}
}
if (my_rank < my_size-1)
{
/*send top most strip up one node to be recieved as bottom halo*/
MPI_Isend(&phi[1][0], grid_size , MPI_DOUBLE, down, 1, MPI_COMM_WORLD, &sreqU);
/*recv top halo from up one node*/
MPI_Irecv(&phi[slice+1][0], grid_size, MPI_DOUBLE, down, 2, MPI_COMM_WORLD, &reqU);
}
if (my_rank > 0)
{
/*recv top halo from down one node*/
MPI_Irecv(&phi[0][0], grid_size , MPI_DOUBLE, up, 2, MPI_COMM_WORLD, &reqD);
/*send bottom most strip down one node to be recieved as top halo*/
MPI_Isend(&phi[slice][0], grid_size , MPI_DOUBLE, up, 1, MPI_COMM_WORLD, &sreqD);
}
printf("send/recv complete");
if (my_rank < boundary_proc)
{
printf("rank %d Entered if", my_rank);
/*Calculations*/
}
else if(my_rank > boundary_proc)
{
printf("rank %d Entered else if", my_rank);
/*calculations*/
}
else
{
printf("rank %d Entered else", my_rank);
/*calculations*/
}
if (my_rank<my_size-1)
{
/*Wait for send to down one rank to complete*/
MPI_Wait(&sreqD, &sDstatus);
/*Wait for recieve from up one rank to complete*/
MPI_Wait(&reqD, &rDstatus);
}
if (my_rank>0)
{
/*Wait for send to up down one rank to complete*/
MPI_Wait(&sreqU, &sUstatus);
/*Wait for recieve from down one rank to complete*/
MPI_Wait(&reqU, &rUstatus);
}
printf("Wait complete");
MPI_Finalize();
return 0;
}
All the print statements should be printed with the respective ranks. Currently it only makes it up to "send/recv complete" I am only testing on 2 processors atm.
Mismatching tags
Tags must match for each pair of communication operations, i.e. there must be a send and receive with the same tag. In your case, the two sends have their own tags and the receives a different one. Change it such that sending down and receiving from up have the same tag and vice versa, e.g.
if (my_rank < my_size-1) {
/*send top most strip up one node to be recieved as bottom halo*/
MPI_Isend(&phi[1][0], grid_size , MPI_DOUBLE, down, 1, MPI_COMM_WORLD, &sreqU);
/*recv top halo from up one node*/
MPI_Irecv(&phi[slice+1][0], grid_size, MPI_DOUBLE, down, 2, MPI_COMM_WORLD, &reqU);
}
if (my_rank > 0) {
/*recv top halo from down one node*/
MPI_Irecv(&phi[0][0], grid_size , MPI_DOUBLE, up, 1, MPI_COMM_WORLD, &reqD);
/*send bottom most strip down one node to be recieved as top halo*/
MPI_Isend(&phi[slice][0], grid_size , MPI_DOUBLE, up, 2, MPI_COMM_WORLD, &sreqD);
}
Mismatching request objects
On the border ranks, you are waiting for the wrong requests, this is just fixed by swapping the MPI_Wait if bodies.
Waiting for multiple non-blocking operations
Contrary to some of the discussion in the now-deleted answers, it is correct to wait for multiple ongoing non-blocking communications with multiple waits1.
Nevertheless, it is strictly better to use an array of requests and MPI_Waitall. It leads to cleaner code, would have prevented the mistake of mixing requests in the first place. It also gives the MPI implementation more freedom to optimize. This can look like the following:
MPI_Request requests[MAX_REQUESTS];
int num_requests = 0;
// ...
MPI_Isend(..., requests[num_requests++]);
// ...
MPI_Waitall(num_requests, requests, statuses);
Or, you could utilize the fact that MPI_Waitall permits elements of the requests array to be MPI_REQUEST_NULL. That allows you to correlate specific requests and is eventually a matter of style.
typedef enum {
RECV_UP, RECV_DOWN, SEND_UP, SEND_DOWN, MAX_REQUESTS
} MyRequests;
MPI_Request requests[MAX_REQUESTS];
MPI_Status statuses[MAX_REQUESTS];
if (my_rank < my_size-1) {
/*send top most strip up one node to be recieved as bottom halo*/
MPI_Isend(&phi[1][0], grid_size , MPI_DOUBLE, down, 1, MPI_COMM_WORLD, &requests[SEND_DOWN]);
/*recv top halo from up one node*/
MPI_Irecv(&phi[slice+1][0], grid_size, MPI_DOUBLE, down, 2, MPI_COMM_WORLD, &requests[RECV_DOWN]);
} else {
requests[RECV_DOWN] = requests[SEND_DOWN] = MPI_REQUEST_NULL;
}
if (my_rank > 0) {
/*recv top halo from down one node*/
MPI_Irecv(&phi[0][0], grid_size , MPI_DOUBLE, up, 1, MPI_COMM_WORLD, &requests[RECV_UP]);
/*send bottom most strip down one node to be recieved as top halo*/
MPI_Isend(&phi[slice][0], grid_size , MPI_DOUBLE, up, 2, MPI_COMM_WORLD, &requests[SEND_UP]);
} else {
requests[RECV_UP] = requests[SEND_UP] = MPI_REQUEST_NULL;
}
// ...
MPI_Waitall(MAX_REQUESTS, requests, statuses);
1: This is mandated by the non-blocking progress guarantee in the MPI Standard (3.7.4)
Progress A call to MPI_WAIT that completes a receive will eventually terminate and return if a matching send has been started, unless the send is satisfied by another receive. In particular, if the matching send is nonblocking, then the receive should complete even if no call is executed by the sender to complete the send. Similarly, a call to MPI_WAIT that completes a send will eventually return if a matching receive has been started, unless the receive is satisfied by another send, and even if no call is executed to complete the receive.

MPI_Waitany causes segmentation fault

I am using MPI to distribute images to different processes so that:
Process 0 distribute images to different processes.
Processes other
than 0 process the image and then send the result back to process 0.
Process 0 tries to busy a process whenever the latter finishes its job with an image, so that as soon as it is idle, it is assigned another image to process. The code follows:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include "mpi.h"
#define MAXPROC 16 /* Max number of processes */
#define TOTAL_FILES 7
int main(int argc, char* argv[]) {
int i, nprocs, tprocs, me, index;
const int tag = 42; /* Tag value for communication */
MPI_Request recv_req[MAXPROC]; /* Request objects for non-blocking receive */
MPI_Request send_req[MAXPROC]; /* Request objects for non-blocking send */
MPI_Status status; /* Status object for non-blocing receive */
char myname[MPI_MAX_PROCESSOR_NAME]; /* Local host name string */
char hostname[MAXPROC][MPI_MAX_PROCESSOR_NAME]; /* Received host names */
int namelen;
MPI_Init(&argc, &argv); /* Initialize MPI */
MPI_Comm_size(MPI_COMM_WORLD, &nprocs); /* Get nr of processes */
MPI_Comm_rank(MPI_COMM_WORLD, &me); /* Get own identifier */
MPI_Get_processor_name(myname, &namelen); /* Get host name */
myname[namelen++] = (char)0; /* Terminating null byte */
/* First check that we have at least 2 and at most MAXPROC processes */
if (nprocs<2 || nprocs>MAXPROC) {
if (me == 0) {
printf("You have to use at least 2 and at most %d processes\n", MAXPROC);
}
MPI_Finalize(); exit(0);
}
/* if TOTAL_FILES < nprocs then use only TOTAL_FILES + 1 procs */
tprocs = (TOTAL_FILES < nprocs) ? TOTAL_FILES + 1 : nprocs;
int done = -1;
if (me == 0) { /* Process 0 does this */
int send_counter = 0, received_counter;
for (i=1; i<tprocs; i++) {
MPI_Isend(&send_counter, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
++send_counter;
/* Receive a message from all other processes */
MPI_Irecv (hostname[i], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[i]);
}
for (received_counter = 0; received_counter < TOTAL_FILES; received_counter++){
/* Wait until at least one message has been received from any process other than 0*/
MPI_Waitany(tprocs-1, &recv_req[1], &index, &status);
if (index == MPI_UNDEFINED) perror("Errorrrrrrr");
printf("Received a message from process %d on %s\n", status.MPI_SOURCE, hostname[index+1]);
if (send_counter < TOTAL_FILES){ /* si todavia faltan imagenes por procesar */
MPI_Isend(&send_counter, 1, MPI_INT, status.MPI_SOURCE, tag, MPI_COMM_WORLD, &send_req[status.MPI_SOURCE]);
++send_counter;
MPI_Irecv (hostname[status.MPI_SOURCE], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[status.MPI_SOURCE]);
}
}
for (i=1; i<tprocs; i++) {
MPI_Isend(&done, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
}
} else if (me < tprocs) { /* all other processes do this */
int y;
MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);
while (y != -1) {
printf("Process %d: Received image %d\n", me, y);
sleep(me%3+1); /* Let the processes sleep for 1-3 seconds */
/* Send own identifier back to process 0 */
MPI_Send (myname, namelen, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);
}
}
MPI_Finalize();
exit(0);
}
which is based on this example.
Right now I'm getting a segmentation fault, not sure why. I'm fairly new to MPI but I can't see a mistake in the code above. It only happens with certain numbers of processes. For example, when TOTAL_FILES = 7 and is run with 5, 6 or 7 processes. Works fine with 9 processes or above.
The entire code can be found here. Trying it with 6 processes causes the mentioned error.
To compile and execute :
mpicc -Wall sscce.c -o sscce -lm
mpirun -np 6 sscce
It's not MPI_Waitany that is causing segmentation fault but it is the way you handle the case when all requests in recv_req[] are completed (i.e. index == MPI_UNDEFINED). perror() does not stop the code and it continues further and then segfaults in the printf statement while trying to access hostname[index+1]. The reason for all requests in the array being completed is that due to the use of MPI_ANY_SOURCE in the receive call the rank of the sender is not guaranteed to be equal to the index of the request in recv_req[] - simply compare index and status.MPI_SOURCE after MPI_Waitany returns to see it for yourself. Therefore the subsequent calls to MPI_Irecv with great probability overwrite still not completed requests and thus the number of requests that can get completed by MPI_Waitany is less than the actual number of results expected.
Also note that you never wait for the send requests to complete. You are lucky that Open MPI implementation uses an eager protocol to send small messages and therefore those get sent even though MPI_Wait(any|all) or MPI_Test(any|all) is never called on the started send requests.

MPI Debugging, Segmentation fault?

EDIT: My question is similar to C, Open MPI: segmentation fault from call to MPI_Finalize(). Segfault does not always happen, especially with low numbers of processes, so it you answer that one instead that would be great, either way . . .
I was hoping to get some help debugging the following code:
int main(){
long* my_local;
long n, s, f;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
if(my_rank == 0){
/* Get size n from user */
printf("Total processes: %d\n", comm_sz);
printf("Number of keys to be sorted? ");
fflush(stdout);
scanf("%ld", &n);
/* Broadcast size n to other processes */
MPI_Bcast(&n, 1, MPI_LONG, 0, MPI_COMM_WORLD);
/* Create n/comm_sz keys
NOTE! some processes will have 1 extra key if
n%comm_sz != 0 */
create_Keys(&my_local, my_rank, comm_sz, n, &s, &f);
}
if(my_rank != 0){
/* Receive n from process 0 */
MPI_Bcast(&n, 1, MPI_LONG, 0, MPI_COMM_WORLD);
/* Create n/comm_sz keys */
create_Keys(&my_local, my_rank, comm_sz, n, &s, &f);
}
/* The offending function, f is a long set to num elements of my_local*/
Odd_Even_Tsort(&my_local, my_rank, f, comm_sz);
printf("Process %d completed the function", my_rank);
MPI_Finalize();
return 0;
}
void Odd_Even_Tsort(long** my_local, int my_rank, long my_size, int comm_sz)
{
long nochange = 1;
long phase = 0;
long complete = 1;
MPI_Status Stat;
long your_size = 1;
long* recv_buf = malloc(sizeof(long)*(my_size+1));
printf("rank %d has size %ld\n", my_rank, my_size);
while (complete!=0){
if((phase%2)==0){
if( ((my_rank%2)==0) && my_rank < comm_sz-1){
/* Send right */
MPI_Send(&my_size, 1, MPI_LONG, my_rank+1, 0, MPI_COMM_WORLD);
MPI_Send(*my_local, my_size, MPI_LONG, my_rank+1, 0, MPI_COMM_WORLD);
MPI_Recv(&your_size, 1, MPI_LONG, my_rank+1, 0, MPI_COMM_WORLD, &Stat);
MPI_Recv(&recv_buf, your_size, MPI_LONG, my_rank+1, 0, MPI_COMM_WORLD, &Stat);
}
if( ((my_rank%2)==1) && my_rank < comm_sz){
/* Send left */
MPI_Recv(&your_size, 1, MPI_LONG, my_rank-1, 0, MPI_COMM_WORLD, &Stat);
MPI_Recv(&recv_buf, your_size, MPI_LONG, my_rank-1, 0, MPI_COMM_WORLD, &Stat);
MPI_Send(&my_size, 1, MPI_LONG, my_rank-1, 0, MPI_COMM_WORLD);
MPI_Send(*my_local, my_size, MPI_LONG, my_rank-1, 0, MPI_COMM_WORLD);
}
}
phase ++;
complete = 0;
}
printf("Done!\n");
fflush(stdout);
}
And the Error I'm getting is:
[ubuntu:04968] *** Process received signal ***
[ubuntu:04968] Signal: Segmentation fault (11)
[ubuntu:04968] Signal code: Address not mapped (1)
[ubuntu:04968] Failing at address: 0xb
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 4968 on node ubuntu exited on signal 11 (Segmentation fault).
The reason I'm baffled is that the print statements after the function are still displayed, but if I comment out the function, no errors. So, where the heap am I getting a Segmentation fault?? I'm getting the error with mpiexec -n 2 ./a.out and an 'n' size bigger than 9.
If you actually wanted the entire runnable code, let me know. Really I was hoping not so much for the precise answer but more how to use the gdb/valgrind tools to debug this problem and others like it (and how to read their output).
(And yes, I realize the 'sort' function isn't sorting yet).
The problem here is simple, yet difficult to see unless you use a debugger or print out exhaustive debugging information:
Look at the code where MPI_Recv is called. The recv_buf variable should be supplied as an argument instead of &recv_buf.
MPI_Recv( recv_buf , your_size, MPI_LONG, my_rank-1, 0, MPI_COMM_WORLD, &Stat);
The rest seems ok.

Unclear behaviour in simple MPI send/receive program

I've been having a bug in my code for some time and could not figure out yet how to solve it.
What I'm trying to achieve is easy enough: every worker-node (i.e. node with rank!=0) gets a row (represented by 1-dimensional arry) in a square-structure that involves some computation. Once the computation is done, this row gets sent back to the master.
For testing purposes, there is no computation involved. All that's happening is:
master sends row number to worker, worker uses the row number to calculate the according values
worker sends the array with the result values back
Now, my issue is this:
all works as expected up to a certain size for the number of elements in a row (size = 1006) and number of workers > 1
if the elements in a row exceed 1006, workers fail to shutdown and the program does not terminate
this only occurs if I try to send the array back to the master. If I simply send back an INT, then everything is OK (see commented out line in doMasterTasks() and doWorkerTasks())
Based on the last bullet point, I assume that there must be some race-condition which only surfaces when the array to be sent back to the master reaches a certain size.
Do you have any idea what the issue could be?
Compile the following code with: mpicc -O2 -std=c99 -o simple
Run the executable like so: mpirun -np 3 simple <size> (e.g. 1006 or 1007)
Here's the code:
#include "mpi.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MASTER_RANK 0
#define TAG_RESULT 1
#define TAG_ROW 2
#define TAG_FINISHOFF 3
int mpi_call_result, my_rank, dimension, np;
// forward declarations
void doInitWork(int argc, char **argv);
void doMasterTasks(int argc, char **argv);
void doWorkerTasks(void);
void finalize();
void quit(const char *msg, int mpi_call_result);
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
void doMasterTasks(int argc, char **argv) {
printf("Starting to distribute work...\n");
int size = dimension;
int * dataBuffer = (int *) malloc(sizeof(int) * size);
int currentRow = 0;
int receivedRow = -1;
int rowsLeft = dimension;
MPI_Status status;
for (int i = 1; i < np; i++) {
MPI_Send(&currentRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
// MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
MPI_Recv(&receivedRow, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(&currentRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
shutdownWorkers();
free(dataBuffer);
}
void doWorkerTasks() {
printf("Worker %d started\n", my_rank);
// send the processed row back as the first element in the colours array.
int size = dimension;
int * data = (int *) malloc(sizeof(int) * size);
memset(data, 0, sizeof(size));
int processingRow = -1;
MPI_Status status;
for (;;) {
MPI_Recv(&processingRow, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == TAG_FINISHOFF) {
printf("Finish-OFF tag received!\n");
break;
} else {
// MPI_Send(data, size, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
MPI_Send(&processingRow, 1, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
}
}
printf("Slave %d finished work\n", my_rank);
free(data);
}
int main(int argc, char **argv) {
if (argc == 2) {
sscanf(argv[1], "%d", &dimension);
} else {
dimension = 1000;
}
doInitWork(argc, argv);
if (my_rank == MASTER_RANK) {
doMasterTasks(argc, argv);
} else {
doWorkerTasks();
}
finalize();
}
void quit(const char *msg, int mpi_call_result) {
printf("\n%s\n", msg);
MPI_Abort(MPI_COMM_WORLD, mpi_call_result);
exit(mpi_call_result);
}
void finalize() {
mpi_call_result = MPI_Finalize();
if (mpi_call_result != 0) {
quit("Finalizing the MPI system failed, aborting now...", mpi_call_result);
}
}
void doInitWork(int argc, char **argv) {
mpi_call_result = MPI_Init(&argc, &argv);
if (mpi_call_result != 0) {
quit("Error while initializing the system. Aborting now...\n", mpi_call_result);
}
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
}
Any help is greatly appreciated!
Best,
Chris
If you take a look at your doWorkerTasks, you see that they send exactly as many data messages as they receive; (and they receive one more to shut them down).
But your master code:
for (int i = 1; i < np; i++) {
MPI_Send(&currentRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
MPI_Send(&currentRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
sends np-2 more data messages than it receives. In particular, it only keeps receiving data until it has no more to send, even though there should be np-2 more data messages outstanding. Changing the code to the following:
int rowsLeftToSend= dimension;
int rowsLeftToReceive = dimension;
for (int i = 1; i < np; i++) {
MPI_Send(&currentRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
while (rowsLeftToReceive > 0) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
rowsLeftToReceive--;
if (rowsLeftToSend> 0) {
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(&currentRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
}
Now works.
Why the code doesn't deadlock (note this is deadlock, not a race condition; this is a more common parallel error in distributed computing) for smaller message sizes is a subtle detail of how most MPI implementations work. Generally, MPI implementations just "shove" small messages down the pipe whether or not the receiver is ready for them, but larger messages (since they take more storage resources on the receiving end) need some handshaking between the sender and the receiver. (If you want to find out more, search for eager vs rendezvous protocols).
So for the small message case (less than 1006 ints in this case, and 1 int definitely works, too) the worker nodes did their send whether or not the master was receiving them. If the master had called MPI_Recv(), the messages would have been there already and it would have returned immediately. But it didn't, so there were pending messages on the master side; but it didn't matter. The master sent out its kill messages, and everyone exited.
But for larger messages, the remaining send()s have to have the receiver particpating to clear, and since the receiver never does, the remaining workers hang.
Note that even for the small message case where there was no deadlock, the code didn't work properly - there was missing computed data.
Update: There was a similar problem in your shutdownWorkers:
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
Here you are sending to all processes, including rank 0, the one doing the sending. In principle, that MPI_Send should deadlock, as it is a blocking send and there isn't a matching receive already posted. You could post a non-blocking receive before to avoid this, but that's unnecessary -- rank 0 doesn't need to let itself know to end. So just change the loop to
for (int i = 1; i < np; i++)
tl;dr - your code deadlocked because the master wasn't receiving enough messages from the workers; it happened to work for small message sizes because of an implementation detail common to most MPI libraries.

Resources