Reason for MPI 'rc' variable and MPI_Get_Count() - c

In the ping-pong program below, what use is the rc variable? It is constantly updated but never used.
Plus what does MPI_Get_Count() do?
#include "mpi.h"
#include <stdio.h>
int main(int argc, char * argv [])
int numtasks, rank, dest, source, rc, count, tag=1;
char inmsg, outmsg;
MPI_Status Stat ;
MPI_Init (&argc,&argv);
MPI_Comm_size (MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
if (rank == 0) {
dest = source = 1;outmsg=’x’;
rc = MPI_Send (&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
rc = MPI_Recv (&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
}
else if (rank == 1) {
dest = source = 0;outmsg=’y’;
rc = MPI_Recv (&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
rc = MPI_Send (&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
}
rc = MPI_Get_count (&Stat, MPI_CHAR, &count);
printf("Task %d: Received %d char(s) from task %d with tag %d \n", rank, count, Stat.MPI_SOURCE,Stat.MPI_TAG);
MPI_Finalize ();
}

The bit about MPI_Get_count is answered by the documentation in the "Output" section, further down that linked page.
As for rc, here's the best explanation I can offer with no access to the author of this code or any related notes. all MPI routines in the C bindings return an error code. Some compilers check whether one is dropping return values on the floor, as it might indicate an error in the code, and generate warnings where they see that happen. Thus, to keep those warnings from appearing, this code assigns the return value to the variable rc.
That said, many compilers also warn about setting a variable that's never used, which is the case here. An idiom for telling a compiler "yes, I know I'm ignoring this return value, leave me alone" is (void)function_call(foo, bar, baz); (i.e. cast the return value to void). This is most often seen on calls to functions whose return value really should be checked, like write(). Writing it on every MPI call rather than silencing an offending warning would be rather ugly.

Related

Looking for an MPI routine

So just today I started messing around with the MPI library in C and I've tried it out some and have now found myself in a situation where I need the following:
A routine that'll send a message to a random process in a blocking receive while leaving the others still blocked.
Does such a routine exist? If not, how can something like this be accomplished?
No, such routine does not exist. However, you can easily build one using the available routines in the MPI standard. For example if you want a routine that sends to a random process which is not the current one you can write the following:
int MPI_SendRand(void *data, unsigned size, int tag, MPI_Comm comm, MPI_Status *status) {
// one process sends
int comm_size, my_rank, dest;
MPI_Comm_rank(comm, &my_rank);
MPI_Comm_size(comm, &comm_size);
// random number between [0, comm_size) excluding my_rank
while ((dst = ((float)rand())/RAND_MAX*comm_size)) == my_rank) ;
return MPI_Send(data, size, dst, tag, comm, status);
}
can be used as follows:
if (rank == master) {
MPI_SendRand(some_data, sime_size, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
} else {
// the rest waits
MPI_Recv(some_buff, some_size, MPI_SOURCE_ANY, MPI_TAG_ANY, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// do work...
}

I don't see what the issue is in my program in MPI

I don't know how to fix the problem with this program so far. The purpose of this program is to add up all the number in an array but I can only barely manage to send the arrays before errors start to appear. It has to do with the for loop in the if statement my_rank!=0 section.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[]){
int my_rank, p, source, dest, tag, total, n = 0;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
//15 processors(1-15) not including processor 0
if(my_rank != 0){
MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
int arr[n];
MPI_Recv( arr, n, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
//printf("%i ", my_rank);
int i;
for(i = ((my_rank-1)*(n/15)); i < ((my_rank-1)+(n/15)); i++ ){
//printf("%i ", arr[0]);
}
}
else{
printf("Please enter an integer:\n");
scanf("%i", &n);
int i;
int arr[n];
for(i = 0; i < n; i++){
arr[i] = i + 1;
}
for(dest = 0; dest < p; dest++){
MPI_Send( &n, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
MPI_Send( arr, n, MPI_INT, dest, tag, MPI_COMM_WORLD);
}
}
MPI_Finalize();
}
When I take that for loop out it compiles and run but when I put it back in it just stops working. Here is the error it is giving me:
[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank
[compute-0-24.local:1072] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
Please enter an integer:
--------------------------------------------------------------------------
mpirun has exited due to process rank 8 with PID 1072 on
node compute-0-24 exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[compute-0-16.local][[31957,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.4.237 failed: Connection refused (111)
[cs-cluster:11677] 14 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[cs-cluster:11677] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
There are two problems in the code you posted:
The send loop starts from p=0, which means that process of rank zero will send to itself. However, since there's no receiving part for process zero, this won't work. Just make the loop to start from p=1 and that should solve it.
The tag you use isn't initialised. So it's value can be whatever (which is OK), but can be a different whatever per process, which will lead to the various communications to never match each-other. Just initialise tag=0 for example, and that should fix that.
With this, your code snippet should work.
Learn to read the informative error messages that Open MPI gives you and to apply some general debugging strategies.
[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank
The library is telling you that the receive operation was called with an invalid rank value. Armed with that knowledge, you take a look at your code:
int my_rank, p, source, dest, tag, total, n = 0;
...
//15 processors(1-15) not including processor 0
if(my_rank != 0){
MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
...
The rank is source. source is an automatic variable declared some lines before but never initialised, therefore its initial value is completely random. You fix it by assigning source an initial value of 0 or by simply replacing it with 0 since you've already hard-coded the rank of the sender by singling out its code in the else block of the if operator.
The presence of the above error eventually hints you to examine the other variables too. Thus you notice that tag is also used uninitialised and you either initialise it to e.g. 0 or replace it altogether.
Now your program is almost correct. You notice that it seems to work fine for n up to about 33000 (the default eager limit of the self transport divided by sizeof(int)), but then it hangs for larger values. You either fire a debugger of simply add a printf statement before and after each send and receive operation and discover that already the first call to MPI_Send with dest equal to 0 never returns. You then take a closer look at your code and discover this:
for(dest = 0; dest < p; dest++){
dest starts from 0, but this is wrong since rank 0 is only sending data and not receiving. You fix it by setting the initial value to 1.
Your program should now work as intended (or at least for values of n that do not lead to stack overflow in int arr[n];). Congratulations! Now go and learn about MPI_Probe and MPI_Get_count, which will help you do the same without explicitly sending the length of the array first. Then learn about MPI_Scatter and MPI_Reduce, which will enable you to implement the algorithm even more elegantly.

MPI RMA operation: Ordering between MPI_Win_free and local load

I'm trying to do a simple test on MPI's RMA operation using MPI_Win_lock and MPI_Win_unlock. The program just let process 0 to update the integer value in process 1 and display it.
The below program runs correctly (at least the result seems correct to me):
#include "mpi.h"
#include "stdio.h"
#define root 0
int main(int argc, char *argv[])
{
int myrank, nprocs;
int send, recv, err;
MPI_Win nwin;
int *st;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Alloc_mem(1*sizeof(int), MPI_INFO_NULL, &st);
st[0] = 0;
if (myrank != root) {
MPI_Win_create(st, 1*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &nwin);
}
else {
MPI_Win_create(NULL, 0, sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &nwin);
}
if (myrank == root) {
st[0] = 1;
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, nwin);
MPI_Put(st, 1, MPI_INT, 1, 0, 1, MPI_INT, nwin);
MPI_Win_unlock(1, nwin);
MPI_Win_free(&nwin);
}
else { // rank 1
MPI_Win_free(&nwin);
printf("Rank %d, st = %d\n", myrank, st[0]);
}
MPI_Free_mem(st);
MPI_Finalize();
return 0;
}
The output I got is Rank 1, st = 1. But curiously, if I switch the lines in the else block for rank 1 to
else { // rank 1
printf("Rank %d, st = %d\n", myrank, st[0]);
MPI_Win_free(&nwin);
}
The output is Rank 1, st = 0.
I cannot find out the reason behind it, and why I need to put MPI_Win_free after loading the data is originally I need to put all the stuff in a while loop and let rank 0 to determine when to stop the loop. When condition is satisfied, I try to let rank 0 to update the flag (st) in rank 1. I try to put the MPI_Win_free outside the while loop so that the window will only be freed after the loop. Now it seems that I cannot do this and need to create and free the window every time in the loop?
I'll be honest, MPI RMA is not my speciality, but I'll give this a shot:
The problem is that you're running into a race condition. When you do the MPI_PUT operation, it sends the data from rank 0 to rank 1 to be put into the buffer at some point in the future. You don't have any control over that from rank 0's perspective.
One rank 1's side, you're not doing anything to complete the operation. I know that RMA (or one-sided operations) sound like they shouldn't require any intervention on the target side, but the do require a bit. When you use one-sided operations, you have to have something on the receiving side that also synchronizes the data. In this case, you're trying to use MPI put/get operations in combination with non-MPI load store operations. This is erroneous and results in the race condition you're seeing. When you switch the MPI_WIN_FREE to be first, you complete all of the outstanding operations so your data is correct.
You can find out lots more about passive target synchronization (which is what you're doing here) with this question: MPI with C: Passive RMA synchronization.

Is there a method in MPI that acts like a MPI_Bcast but for an individual process, instead of operating on an entire communicator?

Essentially what I am looking for here is a simple MPI_SendRecv() routine that allows me to synchronize the same buffer by specifying a source and a destination processor.
In my mind the function call for my Ideal_MPI_SendRecv() function would look precisely like MPI_Bcast() but would contain a destination process instead of a Communicator.
It might be called as follows:
Ideal_MPI_SendRecv(&somebuffer, bufferlength, datatype, source_proc, destination_proc);
If not, is there any reason? It seems like this method would be the perfect method to synchronize a variable's values between two processes.
No, there is no such call in MPI since it is trivial to implement it using point-to-point communication. Of course you could write one, for example (with some rudimentary support for error handling):
// Just a random tag that is unlikely to be used by the rest of the program
#define TAG_IDEAL_SNDRCV 11223
int Ideal_MPI_SendRecv(void *buf, int count, MPI_Datatype datatype,
int source, int dest, MPI_Comm comm)
{
int rank;
int err;
if (source == dest)
return MPI_SUCCESS;
err = MPI_Comm_rank(comm, &rank);
if (err != MPI_SUCCESS)
return err;
if (rank == source)
err = MPI_Send(buf, count, datatype, dest, TAG_IDEAL_SNDRCV, comm);
else if (rank == dest)
err = MPI_Recv(buf, count, datatype, source, TAG_IDEAL_SNDRCV, comm,
MPI_STATUS_IGNORE);
return err;
}
// Example: transfer 'int buf[10]' from rank 0 to rank 2
Ideal_MPI_SendRecv(buf, 10, MPI_INT, 0, 2, MPI_COMM_WORLD);
You could also add another output argument of type MPI_Status * and store the status of MPI_Recv there. It could be useful if both processes have different buffer sizes.
Another option would be, if you have to do that many times within a fixed set of ranks, e.g. always from rank 0 to rank 2, you could simply create a new communicator and broadcast inside it:
int rank;
MPI_Comm buddycomm;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_split(MPI_COMM_WORLD, (!rank || rank == 2) ? 0 : MPI_UNDEFINED, rank,
&buddycomm);
// Transfer 'int buf[10]' from rank 0 to rank 2
MPI_Bcast(buf, 10, MPI_INT, 0, buddycomm);
This, of course, is an overkill since the broadcast is more expensive than the simple combination of MPI_Send and MPI_Recv.
Perhaps you want to call MPI_Send on one process (the source process, with the values you want) and MPI_Recv on another process (the one which doesn't initially have the values you want)?
If not, could you clarify how what you're trying to accomplish differs from a simple point-to-point message?

MPI_Waitall is failing

I wonder if anyone can shed some light on the MPI_Waitall function for me. I have a program passing information using MPI_Isend and MPI_Irecv. After all the sends and receives are complete, one process in the program (in this case, process 0), will print a message. My Isend/Irecv are working, but the message prints out at some random point in the program; so I am trying to use MPI_Waitall to wait until all the requests are done before printing the message. I receive the following error message:
Fatal error in PMPI_Waitall: Invalid MPI_Request, error stack:
PMPI_Waitall(311): MPI_Waitall(count=16, req_array=0x16f70d0, status_array=0x16f7260) failed
PMPI_Waitall(288): The supplied request in array element 1 was invalid (kind=0)
Here is some relevant code:
MPI_Status *status;
MPI_Request *request;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
status = (MPI_Status *) malloc(numtasks * sizeof(MPI_Status));
request = (MPI_Request *) malloc(numtasks * sizeof(MPI_Request));
/* Generate Data to send */
//Isend/Irecvs look like this:
MPI_Isend(&data, count, MPI_INT, dest, tag, MPI_COMM_WORLD, &request[taskid]);
MPI_Irecv(&data, count, MPI_INT, source, tag, MPI_COMM_WORLD, &request[taskid]);
MPI_Wait(&request[taskid], &status[taskid]
/* Calculations and such */
if (taskid == 0) {
MPI_Waitall (numtasks, request, status);
printf ("All done!\n");
}
MPI_Finalize();
Without the call to MPI_Waitall, the program runs cleanly, but the "All done" message prints as soon as process 0's Isend/Irecv messages complete, instead of after all Isend/Irecvs complete.
Thank you for any help you can provide.
You are only setting one element of the request array, namely request[taskid] (and by the way you overwrite the send request handle with the receive one, irrevocably losing the former). Remember, MPI is used to program distributed memory machines and each MPI process has its own copy of the request array. Setting one element in rank taskid does not magically propagate the value to the other ranks, and even if it does, requests have only local validity. The proper implementation would be:
MPI_Status status[2];
MPI_Request request[2];
MPI_Init(&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &taskid);
MPI_Comm_size (MPI_COMM_WORLD, &numtasks);
/* Generate Data to send */
//Isend/Irecvs look like this:
MPI_Isend (&data, count, MPI_INT, dest, tag, MPI_COMM_WORLD, &request[0]);
// ^^^^
// ||
// data race !!
// ||
// vvvv
MPI_Irecv (&data, count, MPI_INT, source, tag, MPI_COMM_WORLD, &request[1]);
// Wait for both operations to complete
MPI_Waitall(2, request, status);
/* Calculations and such */
// Wait for all processes to reach this line in the code
MPI_Barrier(MPI_COMM_WORLD);
if (taskid == 0) {
printf ("All done!\n");
}
MPI_Finalize();
By the way, there is a data race in your code. Both MPI_Isend and MPI_Irecv are using the same data buffer, which is incorrect. If you are simply trying to send the content of data to dest and then receive into it from source, then use MPI_Sendrecv_replace instead and forget about the non-blocking operations:
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &taskid);
MPI_Comm_size (MPI_COMM_WORLD, &numtasks);
/* Generate Data to send */
MPI_Sendrecv_replace (&data, count, MPI_INT, dest, tag, source, tag,
MPI_COMM_WORLD, &status);
/* Calculations and such */
// Wait for all processes to reach this line in the code
MPI_Barrier(MPI_COMM_WORLD);
if (taskid == 0) {
printf ("All done!\n");
}
MPI_Finalize();

Resources