Related
So, somehow MPI_Probe receives the same message even though it is only sent once.
I execute the program with only 2 process of which process 1 is sending two messages, one two retrieve a task and another one to send the result. I send the messages with different tags to differentiate those two.
So what's supposed to happen is following:
Process 0 waits for a task request
Process 1 sends a task request indicated by tag=0
Process 0 sends the task
Process 1 does the task and sends the results back to process 0 indicated by tag=1.
-- This is the part where the first problem occurs: Process 0 still receives tag=0
shown by the printf
Process 0 receives the task enters the else-if-block where tag==1 -- It does not.
Process 1 breaks the while loop - DEBUGGING PURPOSE
Procces 0 is supposed to be blocked by MPI_Probe but is not, instead it continues to
run and always shows that the received Tag is still 0
The code is still messy and inefficient. I just want a minimal working program to build upon and to optimize. But still any tip is appreciated!
The code:
if(rank == 0) {
struct stack* idle_stack = init_stack(env_size-1);
struct sudoku_stack* sudoku_stack_ptr = init_sudoku_stack(256);
push_sudoku(sudoku_stack_ptr, sudoku);
int it=0;
while(1) {
printf("ITERATION %d\n", it);
int idle_stack_size = stack_size(idle_stack);
int _sudoku_stack_empty = sudoku_stack_empty(sudoku_stack_ptr);
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("TAG: %d\n", status.MPI_TAG);
// So this part is supposed to be entered once
if(status.MPI_TAG == 0) {
if(!sudoku_stack_empty(sudoku_stack_ptr)) {
printf("SENDING TASK\n");
int *next_sudoku = pop_sudoku(sudoku_stack_ptr);
MPI_Send(next_sudoku, v_size, MPI_INT, status.MPI_SOURCE, 0, MPI_COMM_WORLD);
} else {
// But since the Tag stays 0, it is called multiple times until
// a stack overflow occurs
printf("PUSHING TO IDLE STACK\n");
push(idle_stack, status.MPI_SOURCE);
}
} else if(status.MPI_TAG == 1) {
// This part should actually be entered by the second received message
printf("RECEIVING SOLUTION\n");
int count;
MPI_Get_count(&status, MPI_INT, &count);
int* recv_sudokus = (int*)malloc(count * sizeof(int));
MPI_Recv(recv_sudokus, count, MPI_INT, status.MPI_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
for(int i = 0; i < count; i+=v_size) {
printf("%d ", recv_sudokus[i]);
if((i+1) % m_size == 0){
printf("\n");
}
}
// DEBUG - EXIT PROGRAM
teardown(sudoku_stack_ptr, idle_stack);
break;
push_sudoku(sudoku_stack_ptr, recv_sudokus);
} else if(status.MPI_TAG == 2) {
//int* solved_sudoku = (int*)malloc(v_size * sizeof(int));
//MPI_Recv(solved_sudoku, v_size, MPI_INT, status.MPI_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
//TODO
}
it++;
}
} else {
int* sudoku = (int*)malloc(sizeof(int)*v_size);
int* possible_sudokus = (int*)malloc(sizeof(int)*m_size*v_size);
while(1) {
// Send task request
printf("REQUESTING TASK\n");
int i = 0;
MPI_Send(&i, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
// Wait for and receive task
printf("RECEIVING TASK\n");
MPI_Recv(sudoku, v_size, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("CALCULATING\n");
int index = 0;
for(int i = 1; i <= m_size; i++) {
int is_safe_res = is_safe_first_empty_cell(sudoku, i);
if(is_safe_res) {
int* sudoku_cp = (int*)malloc(sizeof(int)*v_size);
memcpy(sudoku_cp, sudoku, sizeof(int)*v_size);
insert_to_first_empty_cell(sudoku_cp, i);
memcpy(&possible_sudokus[index], sudoku_cp, sizeof(int)*v_size);
index+=v_size;
free(sudoku_cp);
}
}
printf("SENDING\n");
MPI_Send(possible_sudokus, index*v_size, MPI_INT, 0, 1, MPI_COMM_WORLD);
break;
}
}
I am solving a load balance problem using MPI: a Master process sends tasks to the slaves processes and collect results as they compute and send back their job.
Since I want to improve performances more as possible I use non-blocking communications: Master sends several tasks and then wait until one process sends back its response so that the master can send additional work to it and so on.
I use MPI_Waitany() since I don't know in advance which slave process responses first, then I get the sender from the status and I can send the new job to it.
My problem is that sometimes the sender I get is wrong (a rank not in MPI_COMM_WORLD) and the program crashes; other times it works fine.
Here's the code. thanks!
//master
if (rank == 0) {
int N_chunks = 10;
MPI_Request request[N_chunks];
MPI_Status status[N_chunks];
int N_computed = 0;
int dest,index_completed;
//initialize array of my data structure
vec send[N_chunks];
vec recv[N_chunks];
//send one job to each process in communicator
for(int i=1;i<size;i++){
MPI_Send( &send[N_computed], 1, mpi_vec_type, i, tag, MPI_COMM_WORLD);
MPI_Irecv(&recv[N_computed], 1, mpi_vec_type, i, tag, MPI_COMM_WORLD,
&request[N_computed]);
N_computed++;
}
// loop
while (N_computed < N_chunks){
//get processed messages
MPI_Waitany(N_computed,request,&index_completed,status);
//get sender ID dest
dest = status[index_completed].MPI_SOURCE;
//send a new job to that process
MPI_Send( &send[N_computed], 1, mpi_vec_type, dest, tag, MPI_COMM_WORLD);
MPI_Irecv(&recv[N_computed], 1, mpi_vec_type, dest, tag, MPI_COMM_WORLD,
&request[N_computed]);
N_computed++;
}
MPI_Waitall(N_computed,request,status);
//close all process
printf("End master\n");
}
You are not using MPI_Waitany() correctly.
It should be
MPI_Status status;
MPI_Waitany(N_computed,request,&index_completed,&status);
dest = status.MPI_SOURCE;
note :
you need an extra loop to MPI_Wait() the last size - 1 requests
you can revamp your algorithm and use MPI_Request request[size-1]; and hence save some memory
I forgot to add a line in the initial post in which I wait for all the pending requests: by the way if I initialize a new status everytime I do Waitany the master process crashes; I need to track which processes are still pending in order to wait the proper number of times...
thanks by the way
EDIT: now it works even if I find it not very smart; would it be possible to initialize an array of MPI_Status at the beginning instead of doing it each time before a wait?
//master
if (rank == 0) {
int N_chunks = 10;
MPI_Request request[size-1];
int N_computed = 0;
int dest;
int index_completed;
//initialize array of vec
vec send[N_chunks];
vec recv[N_chunks];
//initial case
for(int i=1;i<size;i++){
MPI_Send(&send[N_computed], 1, mpi_vec_type, i, tag, MPI_COMM_WORLD);
MPI_Irecv(&recv[N_computed], 1, mpi_vec_type, i, tag, MPI_COMM_WORLD, &request[N_computed]);
N_computed++;
}
// loop
while (N_computed < N_chunks){
MPI_Status status;
//get processed messages
MPI_Waitany(N_computed,request,&index_completed,&status);
//get sender ID dest
dest = status.MPI_SOURCE;
MPI_Send(&send[N_computed], 1, mpi_vec_type, dest, tag, MPI_COMM_WORLD);
MPI_Irecv(&recv[N_computed], 1, mpi_vec_type, dest, tag, MPI_COMM_WORLD, &request[N_computed]);
N_computed++;
}
//wait other process to send back their load
for(int i=0;i<size-1;i++){
MPI_Status status;
MPI_Waitany(N_computed, request, &index_completed,&status);
}
//end
printf("Ms finish\n");
}
I have a piece of code, unfortunately, I couldn't run it, but I was trying to find if it has an error logically. Or if there is something missing, here is the
code:
main(int argc, char *argv[]) {
int numtasks, rank, dest, source, rc, count, tag=1;
char inmsg, outmsg=’x’;
MPI_Status Stat;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
dest = 1;
source = 1;
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
}
else if (rank == 1) {
dest = 0;
source = 0;
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
}
rc = MPI_Get_count(&Stat, MPI_CHAR, &count);
printf("Task %d: Received %d char(s) from task %d with tag %d \n",
rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG);
MPI_Finalize();
}
And, is it allowed to save MPI send and receive in a variable, here rc has been used?
Your code is wrong. It contains a deadlock, which means that it can hang forever or misbehave otherwise. MPI_Send is a blocking operation - it may block until the respective MPI_Recv is called. So both processes will be stuck at their respective MPI_Send operation before MPI_Recv is called. Use MPI_Sendrecv instead.
Note that due to optimizations, MPI may instead chose to send the data immediately for small messages, so the code may complete even though it is wrong. Do not rely on that!
Normally, you don't have to check MPI return codes, as errors are fatal in MPI by default. In particular, don't assign the return code without checking it for MPI_SUCCESS.
Note that you can easily install MPI on any system, e.g. OpenMPI is available for most Linux distributions. There is no reason not to play around with MPI on a normal desktop system.
Is it possible to do an MPI_Sendrecv exchange where one side does not know the rank of the other? If not, what is the best way to do that (my next guess would just be a pair of sends and recvs)?
For example in C if I want to exchange integers between rank 0 and some other rank would this type of thing work?:
MPI_Status stat;
if(rank){
int someval = 0;
MPI_Sendrecv(&someval, 1, MPI_INT, 0, 1, &recvbuf, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
}else{
int someotherval = 1;
MPI_Sendrecv(&someotherval, 1, MPI_INT, MPI_ANY_SOURCE, someotherval, &recvbuf, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
}
EDIT:
Looks like it is not possible. I whipped up the following as a sort of wrapper to add the functionality that I need.
void slave_sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
int dest, int sendtag, void *recvbuf, int recvcount,
MPI_Datatype recvtype, int source, int recvtag, MPI_Status *status){
MPI_Send(sendbuf, sendcount, sendtype, dest, sendtag, MPI_COMM_WORLD);
MPI_Recv(recvbuf, recvcount, recvtype, source, recvtag, MPI_COMM_WORLD, status);
}
void anon_sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
int sendtag, void *recvbuf, int recvcount,
MPI_Datatype recvtype, int recvtag, MPI_Status *status){
int anon_rank;
MPI_Recv(recvbuf, recvcount, recvtype, MPI_ANY_SOURCE, recvtag, MPI_COMM_WORLD, status);
anon_rank = status -> MPI_SOURCE;
MPI_Send(sendbuf, sendcount, sendtype, anon_rank, sendtag, MPI_COMM_WORLD);
}
EDIT 2: Based on Patrick's answer it looks like the slave_sendrecv function above is not needed, you can just use regular MPI_Sendrecv on the end that knows who it's sending to.
Short answer: No.
The standard does not allow the use of MPI_ANY_SOURCE as the destination rank dest in any send procedure. This make sense, since you can not send a message without knowing the destination.
The standard however does permit you to pair a MPI_Sendrecv with regular MPI_Send/MPI_Recv:
A message sent by a send-receive operation can be received by a regular receive operation
or probed by a probe operation; a send-receive operation can receive a message sent
by a regular send operation.
In your case, process 0 will have to first receive, and then answer:
MPI_Status stat;
if(rank){
int someval = 0;
MPI_Sendrecv(&someval, 1, MPI_INT, 0, 1, &recvbuf, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
}else{
int someotherval = 1;
MPI_Recv(&recvbuf, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
// answer to process `stat.MPI_SOURCE` using `someotherval` as tag
MPI_Send(&someotherval, 1, MPI_INT, stat.MPI_SOURCE, someotherval, MPI_COMM_WORLD);
}
I've been having a bug in my code for some time and could not figure out yet how to solve it.
What I'm trying to achieve is easy enough: every worker-node (i.e. node with rank!=0) gets a row (represented by 1-dimensional arry) in a square-structure that involves some computation. Once the computation is done, this row gets sent back to the master.
For testing purposes, there is no computation involved. All that's happening is:
master sends row number to worker, worker uses the row number to calculate the according values
worker sends the array with the result values back
Now, my issue is this:
all works as expected up to a certain size for the number of elements in a row (size = 1006) and number of workers > 1
if the elements in a row exceed 1006, workers fail to shutdown and the program does not terminate
this only occurs if I try to send the array back to the master. If I simply send back an INT, then everything is OK (see commented out line in doMasterTasks() and doWorkerTasks())
Based on the last bullet point, I assume that there must be some race-condition which only surfaces when the array to be sent back to the master reaches a certain size.
Do you have any idea what the issue could be?
Compile the following code with: mpicc -O2 -std=c99 -o simple
Run the executable like so: mpirun -np 3 simple <size> (e.g. 1006 or 1007)
Here's the code:
#include "mpi.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MASTER_RANK 0
#define TAG_RESULT 1
#define TAG_ROW 2
#define TAG_FINISHOFF 3
int mpi_call_result, my_rank, dimension, np;
// forward declarations
void doInitWork(int argc, char **argv);
void doMasterTasks(int argc, char **argv);
void doWorkerTasks(void);
void finalize();
void quit(const char *msg, int mpi_call_result);
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
void doMasterTasks(int argc, char **argv) {
printf("Starting to distribute work...\n");
int size = dimension;
int * dataBuffer = (int *) malloc(sizeof(int) * size);
int currentRow = 0;
int receivedRow = -1;
int rowsLeft = dimension;
MPI_Status status;
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
// MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
MPI_Recv(&receivedRow, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
shutdownWorkers();
free(dataBuffer);
}
void doWorkerTasks() {
printf("Worker %d started\n", my_rank);
// send the processed row back as the first element in the colours array.
int size = dimension;
int * data = (int *) malloc(sizeof(int) * size);
memset(data, 0, sizeof(size));
int processingRow = -1;
MPI_Status status;
for (;;) {
MPI_Recv(&processingRow, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
if (status.MPI_TAG == TAG_FINISHOFF) {
printf("Finish-OFF tag received!\n");
break;
} else {
// MPI_Send(data, size, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
MPI_Send(&processingRow, 1, MPI_INT, 0, TAG_RESULT, MPI_COMM_WORLD);
}
}
printf("Slave %d finished work\n", my_rank);
free(data);
}
int main(int argc, char **argv) {
if (argc == 2) {
sscanf(argv[1], "%d", &dimension);
} else {
dimension = 1000;
}
doInitWork(argc, argv);
if (my_rank == MASTER_RANK) {
doMasterTasks(argc, argv);
} else {
doWorkerTasks();
}
finalize();
}
void quit(const char *msg, int mpi_call_result) {
printf("\n%s\n", msg);
MPI_Abort(MPI_COMM_WORLD, mpi_call_result);
exit(mpi_call_result);
}
void finalize() {
mpi_call_result = MPI_Finalize();
if (mpi_call_result != 0) {
quit("Finalizing the MPI system failed, aborting now...", mpi_call_result);
}
}
void doInitWork(int argc, char **argv) {
mpi_call_result = MPI_Init(&argc, &argv);
if (mpi_call_result != 0) {
quit("Error while initializing the system. Aborting now...\n", mpi_call_result);
}
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
}
Any help is greatly appreciated!
Best,
Chris
If you take a look at your doWorkerTasks, you see that they send exactly as many data messages as they receive; (and they receive one more to shut them down).
But your master code:
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
for (;;) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
if (rowsLeft == 0)
break;
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeft--;
currentRow++;
}
sends np-2 more data messages than it receives. In particular, it only keeps receiving data until it has no more to send, even though there should be np-2 more data messages outstanding. Changing the code to the following:
int rowsLeftToSend= dimension;
int rowsLeftToReceive = dimension;
for (int i = 1; i < np; i++) {
MPI_Send(¤tRow, 1, MPI_INT, i, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
while (rowsLeftToReceive > 0) {
MPI_Recv(dataBuffer, size, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
rowsLeftToReceive--;
if (rowsLeftToSend> 0) {
if (currentRow > 1004)
printf("Sending row %d to worker %d\n", currentRow, status.MPI_SOURCE);
MPI_Send(¤tRow, 1, MPI_INT, status.MPI_SOURCE, TAG_ROW, MPI_COMM_WORLD);
rowsLeftToSend--;
currentRow++;
}
}
Now works.
Why the code doesn't deadlock (note this is deadlock, not a race condition; this is a more common parallel error in distributed computing) for smaller message sizes is a subtle detail of how most MPI implementations work. Generally, MPI implementations just "shove" small messages down the pipe whether or not the receiver is ready for them, but larger messages (since they take more storage resources on the receiving end) need some handshaking between the sender and the receiver. (If you want to find out more, search for eager vs rendezvous protocols).
So for the small message case (less than 1006 ints in this case, and 1 int definitely works, too) the worker nodes did their send whether or not the master was receiving them. If the master had called MPI_Recv(), the messages would have been there already and it would have returned immediately. But it didn't, so there were pending messages on the master side; but it didn't matter. The master sent out its kill messages, and everyone exited.
But for larger messages, the remaining send()s have to have the receiver particpating to clear, and since the receiver never does, the remaining workers hang.
Note that even for the small message case where there was no deadlock, the code didn't work properly - there was missing computed data.
Update: There was a similar problem in your shutdownWorkers:
void shutdownWorkers() {
printf("All work has been done, shutting down clients now.\n");
for (int i = 0; i < np; i++) {
MPI_Send(0, 0, MPI_INT, i, TAG_FINISHOFF, MPI_COMM_WORLD);
}
}
Here you are sending to all processes, including rank 0, the one doing the sending. In principle, that MPI_Send should deadlock, as it is a blocking send and there isn't a matching receive already posted. You could post a non-blocking receive before to avoid this, but that's unnecessary -- rank 0 doesn't need to let itself know to end. So just change the loop to
for (int i = 1; i < np; i++)
tl;dr - your code deadlocked because the master wasn't receiving enough messages from the workers; it happened to work for small message sizes because of an implementation detail common to most MPI libraries.