MPI Sendrecv with MPI_ANY_SOURCE - c

Is it possible to do an MPI_Sendrecv exchange where one side does not know the rank of the other? If not, what is the best way to do that (my next guess would just be a pair of sends and recvs)?
For example in C if I want to exchange integers between rank 0 and some other rank would this type of thing work?:
MPI_Status stat;
if(rank){
int someval = 0;
MPI_Sendrecv(&someval, 1, MPI_INT, 0, 1, &recvbuf, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
}else{
int someotherval = 1;
MPI_Sendrecv(&someotherval, 1, MPI_INT, MPI_ANY_SOURCE, someotherval, &recvbuf, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
}
EDIT:
Looks like it is not possible. I whipped up the following as a sort of wrapper to add the functionality that I need.
void slave_sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
int dest, int sendtag, void *recvbuf, int recvcount,
MPI_Datatype recvtype, int source, int recvtag, MPI_Status *status){
MPI_Send(sendbuf, sendcount, sendtype, dest, sendtag, MPI_COMM_WORLD);
MPI_Recv(recvbuf, recvcount, recvtype, source, recvtag, MPI_COMM_WORLD, status);
}
void anon_sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
int sendtag, void *recvbuf, int recvcount,
MPI_Datatype recvtype, int recvtag, MPI_Status *status){
int anon_rank;
MPI_Recv(recvbuf, recvcount, recvtype, MPI_ANY_SOURCE, recvtag, MPI_COMM_WORLD, status);
anon_rank = status -> MPI_SOURCE;
MPI_Send(sendbuf, sendcount, sendtype, anon_rank, sendtag, MPI_COMM_WORLD);
}
EDIT 2: Based on Patrick's answer it looks like the slave_sendrecv function above is not needed, you can just use regular MPI_Sendrecv on the end that knows who it's sending to.

Short answer: No.
The standard does not allow the use of MPI_ANY_SOURCE as the destination rank dest in any send procedure. This make sense, since you can not send a message without knowing the destination.
The standard however does permit you to pair a MPI_Sendrecv with regular MPI_Send/MPI_Recv:
A message sent by a send-receive operation can be received by a regular receive operation
or probed by a probe operation; a send-receive operation can receive a message sent
by a regular send operation.
In your case, process 0 will have to first receive, and then answer:
MPI_Status stat;
if(rank){
int someval = 0;
MPI_Sendrecv(&someval, 1, MPI_INT, 0, 1, &recvbuf, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
}else{
int someotherval = 1;
MPI_Recv(&recvbuf, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
// answer to process `stat.MPI_SOURCE` using `someotherval` as tag
MPI_Send(&someotherval, 1, MPI_INT, stat.MPI_SOURCE, someotherval, MPI_COMM_WORLD);
}

Related

MPI I_Send/MPI_Irecv issue

I am writing an MPI program that has the first instance working as a master, sending and receiving results from its workers.
The receive function does something like this:
struct result *check_for_message(void) {
...
static unsigned int message_size;
static char *buffer;
static bool started_reception = false;
static MPI_Request req;
if (!started_reception) {
MPI_Irecv(&message_size, 1, MPI_INT, MPI_ANY_SOURCE, SIZE_TAG,
MPI_COMM_WORLD, &req);
started_reception = true;
} else {
int flag = 0;
MPI_Status status;
MPI_Test(&req, &flag, &status);
if (flag == 1) {
started_reception = false;
buffer = calloc(message_size + 1, sizeof(char));
DIE_IF_NULL(buffer); // printf + MPI_Finalize + exit
MPI_Request content_req;
MPI_Irecv(buffer, MAX_MSG_SIZE, MPI_CHAR, status.MPI_SOURCE, CONTENT_TAG,
MPI_COMM_WORLD, &content_req);
MPI_Wait(&content_req, MPI_STATUS_IGNORE);
ret = process_request(buffer);
free(buffer);
}
}
...
}
The send function does something like this:
MPI_Request size_req;
MPI_Request content_req;
MPI_Isend(&size, 1, MPI_INT, dest, SIZE_TAG, MPI_COMM_WORLD, &size_req);
MPI_Wait(&size_req, MPI_STATUS_IGNORE);
MPI_Isend(buf, size, MPI_CHAR, dest, CONTENT_TAG, MPI_COMM_WORLD,
&content_req);
MPI_Wait(&content_req, MPI_STATUS_IGNORE);
I noticed that if I remove the MPI_Wait in the sending function it often happens that the execution blocks or some sort of SIGNAL stops the execution of an instance(I can check the output but I think it was something about a free error SIGSEGV).
When I add the MPI_Wait it always seems to run perfectly. Could it be something related to the order in which the two sends perform? Aren't they supposed to be in order?
I run the program locally with -n 16 but have also tested with -n 128. The messages that I send are above 50 chars (90% of the time), some being even > 300 chars.

Why does you need to find the lenght with MPI_probe while you also indicate the message lenth in the send/receive functions?

The following code is a implementation of MPI. There is a message with increasing length that is being sent en returned. With each iteration the number of elements of the message increase.
There are 2 statements in the code that I don't understand.
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &numberOfElementsReceived);
On the first sight it looked not necessary in order to pass the message forward and back, but when I deteted it and compiled my code, it gave an error.
I also checked this post: what is the difference between MPI_Probe and MPI_Get_count in mpi
""While MPI_Probe may be used to find the size of a message you have to use MPI_Get_count to get that size. MPI_Probe returns a status which is a data structure providing information about the message, including its source, tag and size. But to get that size you call MPI_Get_count with the status as an argument.""
Why is it important to know the size and length of the message with the functions MPI_Probe and MPI_Get_Count? This is confusing me because you already describe the number of elements you send and receive in the MPI_Send and MPI_Recv functions.
for (message_size = 1; message_size <= MAX_ARRAY_SIZE; message_size <<= 1)
{
// Use a loop to vary the message size
if (myRank == 0)
{
double startTime, endTime;
numberOfElementsToSend = message_size;
printf("Rank %2.1i: Sending %i elements\n", myRank, numberOfElementsToSend);
// Measure the time spent in MPI communication
// (use the variables startTime and endTime)
startTime = MPI_Wtime();
MPI_Send(myArray, numberOfElementsToSend, MPI_INT, 1, 0,
MPI_COMM_WORLD);
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &numberOfElementsReceived);
MPI_Recv(myArray, numberOfElementsReceived, MPI_INT, 1, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
endTime = MPI_Wtime();
printf("Rank %2.1i: Received %i elements\n",
myRank, numberOfElementsReceived);
printf("Ping Pong took %f seconds\n", endTime - startTime);
}
else if (myRank == 1)
{
// Probe message in order to obtain the amount of data
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &numberOfElementsReceived);
MPI_Recv(myArray, numberOfElementsReceived, MPI_INT, 0, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Rank %2.1i: Received %i elements\n",
myRank, numberOfElementsReceived);
numberOfElementsToSend = numberOfElementsReceived;
printf("Rank %2.1i: Sending back %i elements\n",
myRank, numberOfElementsToSend);
MPI_Send(myArray, numberOfElementsToSend, MPI_INT, 0, 0,
MPI_COMM_WORLD);
}
}

BAD TERMINATION in MPI Bsend

i'm trying send packed structure by MPI_Bsend(). Something i'm doing wrong and i cannot find solution.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "mpi.h"
#define SIZE 10
struct car {
int id;
int vmax;
char marka[SIZE];
char model[SIZE];
};
int main(int argc, char **argv) {
int i;
int rank, size;
double t1, t2;
struct car BMW, BMW2;
BMW.id = 1;
strcpy(BMW.marka, "BMW");
strcpy(BMW.model, "szybki");
BMW.vmax = 199;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
int rozmiar, packet_size, msg_size, position = 0,tag;
void *bufor;
MPI_Pack_size(2, MPI_INT, MPI_COMM_WORLD, &rozmiar);
packet_size = rozmiar;
MPI_Pack_size(2 * SIZE, MPI_CHAR, MPI_COMM_WORLD, &rozmiar);
packet_size += rozmiar;
msg_size = 2 * packet_size + MPI_BSEND_OVERHEAD;
bufor = (void *)malloc(msg_size);
MPI_Buffer_attach(bufor, msg_size);
t1 = MPI_Wtime();
if (rank == 0) {
tag = 0;
for(i=1;i<size;i++){
MPI_Pack(&BMW.id,1, MPI_INT, bufor, msg_size, &position, MPI_COMM_WORLD);
MPI_Pack(&BMW.vmax,1, MPI_INT, bufor, msg_size, &position, MPI_COMM_WORLD);
MPI_Pack(&BMW.model,SIZE, MPI_CHAR, bufor, msg_size, &position, MPI_COMM_WORLD);
MPI_Pack(&BMW.marka,SIZE, MPI_CHAR, bufor, msg_size, &position, MPI_COMM_WORLD);
MPI_Bsend(bufor,position,MPI_PACKED,i,tag,MPI_COMM_WORLD);
}
} else {
MPI_Recv(bufor,msg_size,MPI_PACKED,0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
position = 0;
MPI_Unpack(bufor, msg_size, &position, &BMW2.id, 1, MPI_INT, MPI_COMM_WORLD);
MPI_Unpack(bufor, msg_size, &position, &BMW2.vmax, 1, MPI_INT, MPI_COMM_WORLD);
MPI_Unpack(bufor, msg_size, &position, &BMW2.model, SIZE, MPI_CHAR, MPI_COMM_WORLD);
MPI_Unpack(bufor, msg_size, &position, &BMW2.marka, SIZE, MPI_CHAR, MPI_COMM_WORLD);
printf("rank = %d | BMW id: %d, marka: %s, model: %s, vmax: %d \n",rank, BMW2.id, BMW2.marka, BMW2.model, BMW2.vmax);
}
t2 = MPI_Wtime();
MPI_Buffer_detach(&bufor, &msg_size);
MPI_Finalize();
if (i == size)
printf("Elapsed time is %.15f\n", t2 - t1 );
return(0);
}
Error:
====================================================================
BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
PID 25637 RUNNING AT debian
EXIT CODE: 11
================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
(signal 11)
You are using the buffered mode of MPI incorrectly. The buffer you give to MPI via MPI_Buffer_attach is supposed to be used by MPI internally. Do not use the buffered MPI interface, it is very rarely useful and very difficult to get right.
Just remove the MPI_Buffer_ and use MPI_Send instead of MPI_Bsend and you are on the right track. MPI_Pack can be a bit clumsy, you may want to look insto custom datatypes (MPI_Type_create_struct) instead. If you have a homogeneous system, you can also send the raw bytes of the struct car.

Alternative to MPI_Sendrecv_replace

I am trying to get an alternative code to
if(rank %2==0 && rightNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
rightNeighbour, 1, MPI_COMM_WORLD, &status);
else if(rank%2 ==1 && leftNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferLeft[0], len, MPI_DOUBLE, leftNeighbour, 1,
leftNeighbour, 1, MPI_COMM_WORLD, &status);
if (rank % 2 == 1 && rightNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
rightNeighbour, 1, MPI_COMM_WORLD, &status);
else if (rank % 2 == 0 && leftNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferLeft[0], len, MPI_DOUBLE, leftNeighbour, 1,
leftNeighbour, 1, MPI_COMM_WORLD, &status);
using MPI_Send and MPI_Recv but it seems it's deadlocking. Any easy way of doing the same with MPI_Send and MPI_Recv ?
I have tried using
if(rank %2==0 && rightNeighbour != MPI_PROC_NULL){
MPI_Recv(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &status);
MPI_Send(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD);
}
This code:
if(rank % 2 == 0 && rightNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
rightNeighbour, 1, MPI_COMM_WORLD, &status);
else if(rank % 2 == 1 && leftNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferLeft[0], len, MPI_DOUBLE, leftNeighbour, 1,
leftNeighbour, 1, MPI_COMM_WORLD, &status);
is equivalent to:
if(rank % 2 == 0)
{
MPI_Send(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD);
MPI_Recv(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &status);
}
else if(rank % 2 == 1)
{
double *temp = malloc(len * sizeof(double));
MPI_Recv(temp, len, MPI_DOUBLE, leftNeighbour, 1,
MPI_COMM_WORLD, &status);
MPI_Send(&bufferLeft[0], len, MPI_DOUBLE, leftNeighbour, 1,
MPI_COMM_WORLD, &status);
memcpy(&bufferLeft[0], temp, len * sizeof(double));
free(temp);
}
Note that order of the send and receive calls is reversed in the odd ranks. Also, the receive uses a temporary buffer in order to implement the semantics of MPI_Sendrecv_replace, which guarantees that the data in the buffer is first sent and only then overwritten with the received one.
Note that the check whether a rank is not MPI_PROC_NULL is pointless since a send to/receive from MPI_PROC_NULL is essentially a no-op and will always succeed. One of the key ideas of the semantics of MPI_PROC_NULL is to facilitate the writing of symmetric code that doesn't contain such if's.
You better use MPI_Irecv and MPI_Isend rather than the blocking calls (MPI_Recv and MPI_Send). Then, after issuing the communication routines, simply wait for the requests using the MPI_Waitall (or two calls to MPI_Wait). However, to do this you cannot use the same buffer (i.e. the replace) -- you need to have two separate buffers -- because otherwise they would get corrupted as the buffers might get the content replaced before the actual send.
Let A be the incoming buffer and B the outgoing buffesr, your code should look something like
if(rank %2==0 && rightNeighbour != MPI_PROC_NULL){
MPI_Request req[2];
MPI_Status status[2];
MPI_Irecv (&A, len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &req[0]);
MPI_Isend (&B, len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &req[1]);
/* A */
MPI_Waitall (2, req, status);
}
Note that in /* A */ you can take advantage to do some computation while the communication is flying. Also, the error checking is omitted in the code -- you better check all return codes for the MPI calls.
Although probably the best way to proceed is to follow Harald's suggestion on using MPI_Isend and MPI_Irecv, there's one alternative which might not work depending on the case.
For small buffer sizes, some MPI implementations follow what is known as eager mode. In this mode, the message is sent regardless the receiver is already waiting for the message or not. The send buffer data is copied to a temporary buffer and, since the user's send buffer is available after the copy is done, MPI_Send returns before the communication has actually been completed.
This is a bit risky, since for large messages MPI usually works in rendezvous mode, which actually synchronizes both ends, so the following code would produce a deadlock as well:
if(rank %2==0 && rightNeighbour != MPI_PROC_NULL){
MPI_Send(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD);
MPI_Recv(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &status);
}
In MPI_Recv, however, the reception buffer is (obviously) not available until the data has already been received.
Additional info on MPI performance tips.

OpenMPI doesn't send data from an array

I am trying to parallelize a grayscale filter for BMP image, my function get stuck when trying to send data from a pixel array.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include "mpi.h"
#define MASTER_TO_SLAVE_TAG 1 //tag for messages sent from master to slaves
#define SLAVE_TO_MASTER_TAG 10 //tag for messages sent from slaves to master
#pragma pack(1)
struct image {
struct fileHeader fh;
struct imageHeader ih;
pixel *array;
};
struct fileHeader {
//blablabla...
};
struct imageHeader {
//blablabla...
};
typedef struct
{
unsigned char R;
unsigned char G;
unsigned char B;
}pixel;
void grayScale_Parallel(struct image *im, int size, int rank)
{
int i,j,lum,aux,r;
pixel tmp;
int total_pixels = (*im).ih.width * (*im).ih.height;
int qty = total_pixels/(size-1);
int rest = total_pixels % (size-1);
MPI_Status status;
//printf("\n%d\n", rank);
if(rank == 0)
{
for(i=1; i<size; i++){
j = i*qty - qty;
aux = j;
if(rest != 0 && i==size-1) {qty=qty+rest;} //para distrubuir toda la carga
printf("\nj: %d qty: %d rest: %d\n", j, qty, rest);
//it gets stuck here,it doesn't send the data
MPI_Send(&(*im).array[j], qty*3, MPI_BYTE, i, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD);
MPI_Send(&aux, 1, MPI_INT, i, MASTER_TO_SLAVE_TAG+1, MPI_COMM_WORLD);
MPI_Send(&qty, 1, MPI_INT, i, MASTER_TO_SLAVE_TAG+2, MPI_COMM_WORLD);
printf("\nSending to node=%d, sender node=%d\n", i, rank);
}
}
else
{
MPI_Recv(&aux, 1, MPI_INT, MPI_ANY_SOURCE, MASTER_TO_SLAVE_TAG+1, MPI_COMM_WORLD,&status);
MPI_Recv(&qty, 1, MPI_INT, MPI_ANY_SOURCE, MASTER_TO_SLAVE_TAG+2, MPI_COMM_WORLD,&status);
pixel *arreglo = (pixel *)calloc(qty, sizeof(pixel));
MPI_Recv(&arreglo[0], qty*3, MPI_BYTE, MPI_ANY_SOURCE, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD,&status);
//PROCESS RECEIVED PIXELS...
//SEND to ROOT PROCESS
}
if (rank==0){
//RECEIVE DATA FROM ALL PROCESS
}
}
int main(int argc, char *argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Status status;
int op=1;
char filename_toload[50];
int bright_number=0;
struct image image2;
if (rank==0)
{
printf("File to load: \n");
scanf("%s", filename_toload);
loadImage(&image2, filename_toload);
}
while(op != 0)
{
if (rank==0)
{
printf("Welcome to example program!\n\n");
printf("\t1.- GrayScale Parallel Function\n");
printf("\t2.- Call another Function\n");
printf("\t0.- Exit\n\t");
printf("\n\n\tEnter option:");
scanf("%d", &op);
}
//Broadcast the user's choice to all other ranks
MPI_Bcast(&op, 1, MPI_INT, 0, MPI_COMM_WORLD);
switch(op)
{
case 1:
grayScale_Parallel(&image2, size, rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("GrayScale applied successfully!\n\n");
break;
case 2:
function_blabla();
printf("Function called successfully\n\n");
break;
}
}
MPI_Finalize();
return 0;
}
I think the MPI_Send function can't read the array of pixels, but is strange because i can print the pixels.
Any idea?
To elaborate more on Soravux's answer, you should change the order of your MPI_Send calls (note the changed MASTER_TO_SLAVE_TAGs) as follows to avoid deadlocks:
MPI_Send(&aux, 1, MPI_INT, i, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD);
MPI_Send(&qty, 1, MPI_INT, i, MASTER_TO_SLAVE_TAG+1, MPI_COMM_WORLD);
MPI_Send(&(*im).array[j], qty*3, MPI_BYTE, i, MASTER_TO_SLAVE_TAG+2, MPI_COMM_WORLD);
These calls need to be matched by the following sequence of MPI_Recv calls
MPI_Recv(&aux, 1, MPI_INT, MPI_ANY_SOURCE, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD,&status);
MPI_Recv(&qty, 1, MPI_INT, MPI_ANY_SOURCE, MASTER_TO_SLAVE_TAG+1, MPI_COMM_WORLD,&status);
pixel *arreglo = (pixel *)calloc(qty, sizeof(pixel));
MPI_Recv(&arreglo[0], qty*3, MPI_BYTE, MPI_ANY_SOURCE, MASTER_TO_SLAVE_TAG+2, MPI_COMM_WORLD,&status);
Hope this answers your question.
The order in which you call MPI_Send and MPI_Recv is important. You must ensure your calls are always in the same order since these functions are blocking. A call to MPI_Send will not return as long as its corresponding (same tag) MPI_Recv counterpart is not executed on the destination. This may cause deadlocks otherwise.

Resources