Alternative to MPI_Sendrecv_replace - c

I am trying to get an alternative code to
if(rank %2==0 && rightNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
rightNeighbour, 1, MPI_COMM_WORLD, &status);
else if(rank%2 ==1 && leftNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferLeft[0], len, MPI_DOUBLE, leftNeighbour, 1,
leftNeighbour, 1, MPI_COMM_WORLD, &status);
if (rank % 2 == 1 && rightNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
rightNeighbour, 1, MPI_COMM_WORLD, &status);
else if (rank % 2 == 0 && leftNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferLeft[0], len, MPI_DOUBLE, leftNeighbour, 1,
leftNeighbour, 1, MPI_COMM_WORLD, &status);
using MPI_Send and MPI_Recv but it seems it's deadlocking. Any easy way of doing the same with MPI_Send and MPI_Recv ?
I have tried using
if(rank %2==0 && rightNeighbour != MPI_PROC_NULL){
MPI_Recv(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &status);
MPI_Send(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD);
}

This code:
if(rank % 2 == 0 && rightNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
rightNeighbour, 1, MPI_COMM_WORLD, &status);
else if(rank % 2 == 1 && leftNeighbour != MPI_PROC_NULL)
MPI_Sendrecv_replace(&bufferLeft[0], len, MPI_DOUBLE, leftNeighbour, 1,
leftNeighbour, 1, MPI_COMM_WORLD, &status);
is equivalent to:
if(rank % 2 == 0)
{
MPI_Send(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD);
MPI_Recv(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &status);
}
else if(rank % 2 == 1)
{
double *temp = malloc(len * sizeof(double));
MPI_Recv(temp, len, MPI_DOUBLE, leftNeighbour, 1,
MPI_COMM_WORLD, &status);
MPI_Send(&bufferLeft[0], len, MPI_DOUBLE, leftNeighbour, 1,
MPI_COMM_WORLD, &status);
memcpy(&bufferLeft[0], temp, len * sizeof(double));
free(temp);
}
Note that order of the send and receive calls is reversed in the odd ranks. Also, the receive uses a temporary buffer in order to implement the semantics of MPI_Sendrecv_replace, which guarantees that the data in the buffer is first sent and only then overwritten with the received one.
Note that the check whether a rank is not MPI_PROC_NULL is pointless since a send to/receive from MPI_PROC_NULL is essentially a no-op and will always succeed. One of the key ideas of the semantics of MPI_PROC_NULL is to facilitate the writing of symmetric code that doesn't contain such if's.

You better use MPI_Irecv and MPI_Isend rather than the blocking calls (MPI_Recv and MPI_Send). Then, after issuing the communication routines, simply wait for the requests using the MPI_Waitall (or two calls to MPI_Wait). However, to do this you cannot use the same buffer (i.e. the replace) -- you need to have two separate buffers -- because otherwise they would get corrupted as the buffers might get the content replaced before the actual send.
Let A be the incoming buffer and B the outgoing buffesr, your code should look something like
if(rank %2==0 && rightNeighbour != MPI_PROC_NULL){
MPI_Request req[2];
MPI_Status status[2];
MPI_Irecv (&A, len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &req[0]);
MPI_Isend (&B, len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &req[1]);
/* A */
MPI_Waitall (2, req, status);
}
Note that in /* A */ you can take advantage to do some computation while the communication is flying. Also, the error checking is omitted in the code -- you better check all return codes for the MPI calls.

Although probably the best way to proceed is to follow Harald's suggestion on using MPI_Isend and MPI_Irecv, there's one alternative which might not work depending on the case.
For small buffer sizes, some MPI implementations follow what is known as eager mode. In this mode, the message is sent regardless the receiver is already waiting for the message or not. The send buffer data is copied to a temporary buffer and, since the user's send buffer is available after the copy is done, MPI_Send returns before the communication has actually been completed.
This is a bit risky, since for large messages MPI usually works in rendezvous mode, which actually synchronizes both ends, so the following code would produce a deadlock as well:
if(rank %2==0 && rightNeighbour != MPI_PROC_NULL){
MPI_Send(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD);
MPI_Recv(&bufferRight[0], len, MPI_DOUBLE, rightNeighbour, 1,
MPI_COMM_WORLD, &status);
}
In MPI_Recv, however, the reception buffer is (obviously) not available until the data has already been received.
Additional info on MPI performance tips.

Related

How can i use MPI_Testany and MPI_Irecv to get only the first irecv that will arrive?

In c in "mpi.h", i think it will be something like
MPI_Request mpireq[2];
MPI_Status mpistat;
int temp, index, flag;
MPI_Irecv(temp, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &mpireq[0]);
MPI_Irecv(temp, 1, MPI_INT, MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &mpireq[1]);
MPI_Testany(2, mpireq, &index, &flag, &mpistat);
but i think Testany is a non-blocking so i don't know how to use flag and index
i propose something like but i don't if this is going to work.
if(flag){
printf("someone came\n");
if (index == 0){do something}
else{do something else}
} else
printf("No one is here yet");
I tried something like this and it seems to work
MPI_Irecv
MPI_Irecv
MPI_Testany
while(!flag)
MPI_Testany();
if(flag)
capture Irecv that comes first;

MPI I_Send/MPI_Irecv issue

I am writing an MPI program that has the first instance working as a master, sending and receiving results from its workers.
The receive function does something like this:
struct result *check_for_message(void) {
...
static unsigned int message_size;
static char *buffer;
static bool started_reception = false;
static MPI_Request req;
if (!started_reception) {
MPI_Irecv(&message_size, 1, MPI_INT, MPI_ANY_SOURCE, SIZE_TAG,
MPI_COMM_WORLD, &req);
started_reception = true;
} else {
int flag = 0;
MPI_Status status;
MPI_Test(&req, &flag, &status);
if (flag == 1) {
started_reception = false;
buffer = calloc(message_size + 1, sizeof(char));
DIE_IF_NULL(buffer); // printf + MPI_Finalize + exit
MPI_Request content_req;
MPI_Irecv(buffer, MAX_MSG_SIZE, MPI_CHAR, status.MPI_SOURCE, CONTENT_TAG,
MPI_COMM_WORLD, &content_req);
MPI_Wait(&content_req, MPI_STATUS_IGNORE);
ret = process_request(buffer);
free(buffer);
}
}
...
}
The send function does something like this:
MPI_Request size_req;
MPI_Request content_req;
MPI_Isend(&size, 1, MPI_INT, dest, SIZE_TAG, MPI_COMM_WORLD, &size_req);
MPI_Wait(&size_req, MPI_STATUS_IGNORE);
MPI_Isend(buf, size, MPI_CHAR, dest, CONTENT_TAG, MPI_COMM_WORLD,
&content_req);
MPI_Wait(&content_req, MPI_STATUS_IGNORE);
I noticed that if I remove the MPI_Wait in the sending function it often happens that the execution blocks or some sort of SIGNAL stops the execution of an instance(I can check the output but I think it was something about a free error SIGSEGV).
When I add the MPI_Wait it always seems to run perfectly. Could it be something related to the order in which the two sends perform? Aren't they supposed to be in order?
I run the program locally with -n 16 but have also tested with -n 128. The messages that I send are above 50 chars (90% of the time), some being even > 300 chars.

Why does you need to find the lenght with MPI_probe while you also indicate the message lenth in the send/receive functions?

The following code is a implementation of MPI. There is a message with increasing length that is being sent en returned. With each iteration the number of elements of the message increase.
There are 2 statements in the code that I don't understand.
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &numberOfElementsReceived);
On the first sight it looked not necessary in order to pass the message forward and back, but when I deteted it and compiled my code, it gave an error.
I also checked this post: what is the difference between MPI_Probe and MPI_Get_count in mpi
""While MPI_Probe may be used to find the size of a message you have to use MPI_Get_count to get that size. MPI_Probe returns a status which is a data structure providing information about the message, including its source, tag and size. But to get that size you call MPI_Get_count with the status as an argument.""
Why is it important to know the size and length of the message with the functions MPI_Probe and MPI_Get_Count? This is confusing me because you already describe the number of elements you send and receive in the MPI_Send and MPI_Recv functions.
for (message_size = 1; message_size <= MAX_ARRAY_SIZE; message_size <<= 1)
{
// Use a loop to vary the message size
if (myRank == 0)
{
double startTime, endTime;
numberOfElementsToSend = message_size;
printf("Rank %2.1i: Sending %i elements\n", myRank, numberOfElementsToSend);
// Measure the time spent in MPI communication
// (use the variables startTime and endTime)
startTime = MPI_Wtime();
MPI_Send(myArray, numberOfElementsToSend, MPI_INT, 1, 0,
MPI_COMM_WORLD);
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &numberOfElementsReceived);
MPI_Recv(myArray, numberOfElementsReceived, MPI_INT, 1, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
endTime = MPI_Wtime();
printf("Rank %2.1i: Received %i elements\n",
myRank, numberOfElementsReceived);
printf("Ping Pong took %f seconds\n", endTime - startTime);
}
else if (myRank == 1)
{
// Probe message in order to obtain the amount of data
MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
MPI_Get_count(&status, MPI_INT, &numberOfElementsReceived);
MPI_Recv(myArray, numberOfElementsReceived, MPI_INT, 0, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Rank %2.1i: Received %i elements\n",
myRank, numberOfElementsReceived);
numberOfElementsToSend = numberOfElementsReceived;
printf("Rank %2.1i: Sending back %i elements\n",
myRank, numberOfElementsToSend);
MPI_Send(myArray, numberOfElementsToSend, MPI_INT, 0, 0,
MPI_COMM_WORLD);
}
}

MPI Sendrecv with MPI_ANY_SOURCE

Is it possible to do an MPI_Sendrecv exchange where one side does not know the rank of the other? If not, what is the best way to do that (my next guess would just be a pair of sends and recvs)?
For example in C if I want to exchange integers between rank 0 and some other rank would this type of thing work?:
MPI_Status stat;
if(rank){
int someval = 0;
MPI_Sendrecv(&someval, 1, MPI_INT, 0, 1, &recvbuf, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
}else{
int someotherval = 1;
MPI_Sendrecv(&someotherval, 1, MPI_INT, MPI_ANY_SOURCE, someotherval, &recvbuf, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
}
EDIT:
Looks like it is not possible. I whipped up the following as a sort of wrapper to add the functionality that I need.
void slave_sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
int dest, int sendtag, void *recvbuf, int recvcount,
MPI_Datatype recvtype, int source, int recvtag, MPI_Status *status){
MPI_Send(sendbuf, sendcount, sendtype, dest, sendtag, MPI_COMM_WORLD);
MPI_Recv(recvbuf, recvcount, recvtype, source, recvtag, MPI_COMM_WORLD, status);
}
void anon_sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
int sendtag, void *recvbuf, int recvcount,
MPI_Datatype recvtype, int recvtag, MPI_Status *status){
int anon_rank;
MPI_Recv(recvbuf, recvcount, recvtype, MPI_ANY_SOURCE, recvtag, MPI_COMM_WORLD, status);
anon_rank = status -> MPI_SOURCE;
MPI_Send(sendbuf, sendcount, sendtype, anon_rank, sendtag, MPI_COMM_WORLD);
}
EDIT 2: Based on Patrick's answer it looks like the slave_sendrecv function above is not needed, you can just use regular MPI_Sendrecv on the end that knows who it's sending to.
Short answer: No.
The standard does not allow the use of MPI_ANY_SOURCE as the destination rank dest in any send procedure. This make sense, since you can not send a message without knowing the destination.
The standard however does permit you to pair a MPI_Sendrecv with regular MPI_Send/MPI_Recv:
A message sent by a send-receive operation can be received by a regular receive operation
or probed by a probe operation; a send-receive operation can receive a message sent
by a regular send operation.
In your case, process 0 will have to first receive, and then answer:
MPI_Status stat;
if(rank){
int someval = 0;
MPI_Sendrecv(&someval, 1, MPI_INT, 0, 1, &recvbuf, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
}else{
int someotherval = 1;
MPI_Recv(&recvbuf, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
// answer to process `stat.MPI_SOURCE` using `someotherval` as tag
MPI_Send(&someotherval, 1, MPI_INT, stat.MPI_SOURCE, someotherval, MPI_COMM_WORLD);
}

Count parameter inconsistant, MPI_Bsend /MPI_Recieve

In the example found here, why is the count inconsistant in the second message
if (rank == src) {
/* These message sizes are chosen to expose any alignment problems */
MPI_Bsend( msg1, 7, MPI_CHAR, dest, tag, comm );
MPI_Bsend( msg2, 2, MPI_DOUBLE, dest, tag, comm );
MPI_Bsend( msg3, 17, MPI_CHAR, dest, tag, comm );
}
if (rank == dest) {
MPI_Recv( rmsg1, 7, MPI_CHAR, src, tag, comm, MPI_STATUS_IGNORE );
MPI_Recv( rmsg2, 10, MPI_DOUBLE, src, tag, comm, MPI_STATUS_IGNORE );
MPI_Recv( rmsg3, 17, MPI_CHAR, src, tag, comm, MPI_STATUS_IGNORE );
if (strcmp( rmsg1, msg1 ) != 0) {
errs++;
fprintf( stderr, "message 1 (%s) should be %s\n", rmsg1, msg1 );fflush(stderr);
}
Why is the count for the send and receive two inconsistant?
The count argument of Recv is only an upper bound on the amount of data to receive. This is convenient if we don't know the size of the payload at compile time. After the second Recv completes, rmsg2 will contain the two doubles, and then some uninitialized data.

Resources