Currently I'm trying to create a master-slave program with a "listener" loop where the master waits for a message from slaves to make a decision from there. But, despite using non-blocking MPI routines, I am experiencing an error. Do I need to use some blocking routine?
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char** argv)
{
// Variable Declarations
int rank, size;
MPI_Request *requestList,requestNull;
MPI_Status status;
// Start MPI
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if( rank == 0 )
{
// Process Zero
int dataOut=13, pr;
float dataIn = -1;
requestList =(MPI_Request*)malloc((size-1)*sizeof(MPI_Request));
while(1){
dataIn = -1;
// We do NOT need to wait for the MPI_ Isend(s), it is the job of the receiver processes.
for(pr=1;pr<size;pr++)
{
MPI_Irecv(&dataIn,1,MPI_FLOAT,pr,1,MPI_COMM_WORLD,&(requestList[pr-1]));
}
if((dataIn > 1.5)){
printf("From the process: %f\n", dataIn);
break;
}
}
}
else
{
// Receiver Process
float message;
int index;
//MPI_Request request;
MPI_Status status;
while(1){
message = random()/(double)1147483648;
// Send the message back to the process zero
MPI_Isend(&message,1,MPI_FLOAT,0,1,MPI_COMM_WORLD, &requestNull);
if(message > 1.5)
break;
}
}
MPI_Finalize();
return 0;
}
The problem seems to be that you're never waiting for your MPI calls to complete.
The way non-blocking calls work in MPI is that when you issue a non-blocking call (like MPI_IRECV), the last input parameter is an MPI_REQUEST object. When the initialization call (MPI_IRECV) is done, that request object contains the information for the non-blocking call. However, that call isn't done yet and you have no guarantee that the call is done until you use a completion call (MPI_WAIT/MPI_TEST/and friends) on the request.
In your case, you probably are using the non-blocking calls unnecessarily since you rely on the information received during the MPI_IRECV call in the next line. You should probably just substitute your non-blocking call for a blocking MPI_RECV call and use MPI_ANY_SOURCE so you don't have to post a separate receive call for each rank in the communicator. Alternatively, you can use MPI_WAITANY to complete your non-blocking calls, but then you'll need to worry about cleaning up all of the extra operations when you finish.
Related
To ensure that all destructors are properly called if the program is terminated from keyboard (Ctrl+C), the approach with signals are used:
a handler, which sets an exit flag, is set for SIGINT
if a blocking call (accept(), read(), connect(), etc) is waiting for completion, it returns -1 and errno is set to EINTR
The problem is that SIGINT can arrive between check for exit flag (while (!finish)) and calling read(). In this case, read() will be blocked until the signal is sent once again.
This is a minimal working example:
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
enum { STDIN, STDOUT, STDERR };
static unsigned char finish=0;
static void handleSignal(int signal) {
finish=1;
}
int main(int argc, char ** e) {
struct sigaction action;
memset(&action, 0, sizeof(action));
action.sa_handler=handleSignal;
action.sa_flags=0;
sigaction(SIGINT, &action, NULL);
char buffer[256];
puts("<<");
while (!finish) {
sleep(2);
ssize_t n=read(STDIN, buffer, sizeof(buffer));
if (n==0) {
// End of stream
finish=1;
}
else if (n<0) {
// Error or interrupt
if (errno!=EINTR)
perror("read");
}
else {
// Convert data to hexadecimal format
for (size_t i=0; i<n; i++)
printf("%02x", buffer[i]);
}
}
puts(">>\n");
return 0;
}
sleep(2) is added for visibility (a real program may perform some preparational work before reading from file descritor).
If there any way of reliable handling of signals without using non-crossplatform things like signalfd()?
The pselect(2) system call was invented to solve this exact problem. It's POSIX, so hopefully cross-platform enough for you.
The purpose of pselect is to atomically unblock some signals, wait for I/O as select() does, and reblock them. So your loop can look something like the following pseudocode:
sigprocmask(SIG_BLOCK, {SIGINT});
while (1) {
if (finish)
graceful_exit();
int ret = pselect(1, {STDIN}, ..., { /* empty signal set */});
if (ret > 0) {
read(STDIN, buf, size); // will not block
// process data
// If you like you can do
sigprocmask(SIG_UNBLOCK, {SIGINT});
// work work work
if (finish)
graceful_exit();
// work work work
sigprocmask(SIG_BLOCK, {SIGINT});
} else {
// handle timeout or other errors
}
}
There is no race here because SIGINT is blocked for the time in between checking the finish flag and the call to pselect, so it cannot be delivered during that window. But the signal is unblocked while pselect is waiting, so if it arrives during that time (or already arrived while it was blocked), pselect will return without further delay. We only call read when pselect has told us it was ready for reading, so it cannot block.
If your program is multithreaded, use pthread_sigmask instead of sigprocmask.
As was noted in comments, you have to make your finish flag volatile, and for best compatibility it should be of type sig_atomic_t.
There is more discussion and another example in the select_tut(2) man page.
I have a problem with the following codes:
Master:
#include <iostream>
using namespace std;
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define PB1 1
#define PB2 1
int main (int argc, char *argv[])
{
int np[2] = { 2, 1 }, errcodes[2];
MPI_Comm parentcomm, intercomm;
char *cmds[2] = { "./slave", "./slave" };
MPI_Info infos[2] = { MPI_INFO_NULL, MPI_INFO_NULL };
MPI_Init(NULL, NULL);
#if PB1
for(int i = 0 ; i<2 ; i++)
{
MPI_Info_create(&infos[i]);
char hostname[] = "localhost";
MPI_Info_set(infos[i], "host", hostname);
}
#endif
MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes);
printf("c Creation of the workers finished\n");
#if PB2
sleep(1);
#endif
MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes);
printf("c Creation of the workers finished\n");
MPI_Finalize();
return 0;
}
Slave:
#include "mpi.h"
#include <stdio.h>
using namespace std;
int main( int argc, char *argv[])
{
int rank;
MPI_Init(0, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("rank = %d\n", rank);
MPI_Finalize();
return 0;
}
I do not know why when I run mpirun -np 1 ./master, my program stops with the following mesage when I set PB1 and PB2 to 1 (it works well when I set of of them to 0):
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application: ./slave Either
request fewer slots for your application, or make more slots available
for use.
For instance, when I set PB2 to 0, the program works well. Thus, I suppose that it is because the MPI_finalize does not finish its job ...
I googled, but I did not find any answer for my problem. I tried various things as: call MPI_comm_disconnect, add a barrier, ... but nothing worked.
I work on Ubuntu (15.10) and use the OpenMPI version 1.10.2.
The MPI_Finalize on the first set of salves will not finish until MPI_Finalize is called on the master. MPI_Finalize is collective over all connected processes. You can work around that by manually disconnecting the first batch of salves from the intercommunicator before calling MPI_Finalize. This way, the slaves will actually finish complete and exit - freeing the "slots" for the new batch of slaves. Unfortunately I don't see a standardized way to really ensure the slaves are finished in a sense that their slots are freed, because that would be implementation defined. The fact that OpenMPI freezes in the MPI_Comm_spawn_multiple instead of returning an error is unfortunate and one might consider that a bug. Anyway here is a draft of what you could do:
Within the master, each time is done with its slaves:
MPI_Barrier(&intercomm); // Make sure master and slaves are somewhat synchronized
MPI_Comm_disconnect(&intercomm);
sleep(1); // This is the ugly unreliable way to give the slaves some time to shut down
The slave:
MPI_Comm parent;
MPI_Comm_get_parent(&parent); // you should have that already
MPI_Comm_disconnect(&parent);
MPI_Finalize();
However, you still need to make sure OpenMPI knows how many slots should be reserved for the whole application (universe_size). You can do that for example with a hostfile:
localhost slots=4
And then mpirun -np 1 ./master.
Now this is not pretty and I would argue that your approach to dynamically spawning MPI workers isn't really what MPI is meant for. It may be supported by the standard, but that doesn't help you if implementations are struggling. However, there is not enough information on how you intend to communicate with the external processes to provide a cleaner, more ideomatic solution.
One last remark: Do check the return codes of MPI functions. Especially MPI_Comm_spawn_multiple.
I'm writing a simple program in C with MPI library.
The intent of this program is the following:
I have a group of processes that perform an iterative loop, at the end of this loop all processes in the communicator must call two collective functions(MPI_Allreduce and MPI_Bcast). The first one sends the id of the processes that have generated the minimum value of the num.val variable, and the second one broadcasts from the source num_min.idx_v to all processes in the communicator MPI_COMM_WORLD.
The problem is that I don't know if the i-th process will be finalized before calling the collective functions. All processes have a probability of 1/10 to terminate. This simulates the behaviour of the real program that I'm implementing. And when the first process terminates, the others cause deadlock.
This is the code:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
typedef struct double_int{
double val;
int idx_v;
}double_int;
int main(int argc, char **argv)
{
int n = 10;
int max_it = 4000;
int proc_id, n_proc;double *x = (double *)malloc(n*sizeof(double));
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &n_proc);
MPI_Comm_rank(MPI_COMM_WORLD, &proc_id);
srand(proc_id);
double_int num_min;
double_int num;
int k;
for(k = 0; k < max_it; k++){
num.idx_v = proc_id;
num.val = rand()/(double)RAND_MAX;
if((rand() % 10) == 0){
printf("iter %d: proc %d terminato\n", k, proc_id);
MPI_Finalize();
exit(EXIT_SUCCESS);
}
MPI_Allreduce(&num, &num_min, 1, MPI_DOUBLE_INT, MPI_MINLOC, MPI_COMM_WORLD);
MPI_Bcast(x, n, MPI_DOUBLE, num_min.idx_v, MPI_COMM_WORLD);
}
MPI_Finalize();
exit(EXIT_SUCCESS);
}
Perhaps I should create a new group and new communicator before calling MPI_Finalize function in the if statement? How should I solve this?
If you have control over a process before it terminates you should send a non-blocking flag to a rank that cannot terminate early (lets call it the root rank). Then instead of having a blocking all_reduce, you could have sends from all ranks to the root rank with their value.
The root rank could post non-blocking receives for a possible flag, and the value. All ranks would have to have sent one or the other. Once all ranks are accounted for you can do the reduce on the root rank, remove exited ranks from communication and broadcast it.
If your ranks exit without notice, I am not sure what options you have.
I am practicing synchronization through barrier by using Open MPI message communication. I have created an array of struct called containers. Each container is linked to its neighbor on the right, and the two elements at both ends are also linked, forming a circle.
In the main() testing client, I run MPI with multiple processes (mpiexec -n 5 ./a.out), and they are supposed to be synchronized by calling the barrier() function, however, my code is stuck at the last process. I am looking for help with the debugging. Please see my code below:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <mpi.h>
typedef struct container {
int labels;
struct container *linked_to_container;
int sense;
} container;
container *allcontainers; /* an array for all containers */
int size_containers_array;
int get_next_container_id(int current_container_index, int max_index)
{
if (max_index - current_container_index >= 1)
{
return current_container_index + 1;
}
else
return 0; /* elements at two ends are linked */
}
container *get_container(int index)
{
return &allcontainers[index];
}
void container_init(int num_containers)
{
allcontainers = (container *) malloc(num_containers * sizeof(container)); /* is this right to malloc memory on the array of container when the struct size is still unknown?*/
size_containers_array = num_containers;
int i;
for (i = 0; i < num_containers; i++)
{
container *current_container = get_container(i);
current_container->labels = 0;
int next_container_id = get_next_container_id(i, num_containers - 1); /* max index in all_containers[] is num_containers-1 */
current_container->linked_to_container = get_container(next_container_id);
current_container->sense = 0;
}
}
void container_barrier()
{
int current_container_id, my_sense = 1;
int tag = current_container_id;
MPI_Request request[size_containers_array];
MPI_Status status[size_containers_array];
MPI_Comm_rank(MPI_COMM_WORLD, ¤t_container_id);
container *current_container = get_container(current_container_id);
int next_container_id = get_next_container_id(current_container_id, size_containers_array - 1);
/* send asynchronous message to the next container, wait, then do blocking receive */
MPI_Isend(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, &request[current_container_id]);
MPI_Wait(&request[current_container_id], &status[current_container_id]);
MPI_Recv(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
void free_containers()
{
free(allcontainers);
}
int main(int argc, char **argv)
{
int my_id, num_processes;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_processes);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
container_init(num_processes);
printf("Hello world from thread %d of %d \n", my_id, num_processes);
container_barrier();
printf("passed barrier \n");
MPI_Finalize();
free_containers();
return 0;
}
The problem is the series of calls:
MPI_Isend()
MPI_Wait()
MPI_Recv()
This is a common source of confusion. When you use a "nonblocking" call in MPI, you are essentially telling the MPI library that you want to do some operation (send) with some data (my_sense). MPI gives you back an MPI_Request object with the guarantee that the call will be finished by the time a completion function finishes that MPI_Request.
The problem you have here is that you're calling MPI_Isend and immediately calling MPI_Wait before ever calling MPI_Recv on any rank. This means that all of those send calls get queued up but never actually have anywhere to go because you've never told MPI where to put the data by calling MPI_Recv (which tells MPI that you want to put the data in my_sense).
The reason this works part of the time is that MPI expects that things might not always sync up perfectly. If you smaller messages (which you do), MPI reserves some buffer space and will let your MPI_Send operations complete and the data gets stashed in that temporary space for a while until you call MPI_Recv later to tell MPI where to move the data. Eventually though, this won't work anymore. The buffers will be full and you'll need to actually start receiving your messages. For you, this means that you need to switch the order of your operations. Instead of doing a non-blocking send, you should do a non-blocking receive first, then do your blocking send, then wait for your receive to finish:
MPI_Irecv()
MPI_Send()
MPI_Wait()
The other option is to turn both functions into nonblocking functions and use MPI_Waitall instead:
MPI_Isend()
MPI_Irecv()
MPI_Waitall()
This last option is usually the best. The only thing that you'll need to be careful about is that you don't overwrite your own data. Right now you're using the same buffer for both the send and receive operations. If both of these are happening at the same time, there's no guarantees about the ordering. Normally this doesn't make a difference. Whether you send the message first or receive it doesn't really matter. However, in this case it does. If you receive data first, you'll end up sending the same data back out again instead of sending the data you had before the receive operation. You can solve this by using a temporary buffer to stage your data and move it to the right place when it's safe.
This is a pretty basic MPI question, but I can't wrap my head around it. I have a main function that calls another function that uses MPI. I want the main function to execute in serial, and the other function to execute in parallel. My code is like this:
int main (int argc, char *argv[])
{
//some serial code goes here
parallel_function(arg1, arg2);
//some more serial code goes here
}
void parallel_function(int arg1, int arg2)
{
//init MPI and do some stuff in parallel
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
//now do some parallel stuff
//....
//finalize to end MPI??
MPI_Finalize();
}
My code runs fine and gets the expected output, but the issue is that the main function is also being run in separate processes and so the serial code executes more than once. I don't know how it's running multiple times, because I haven't even called MPI_Init yet (if I printf in main before I call parallel_function, I see multiple printf's)
How can I stop my program running in parallel after I'm done?
Thanks for any responses!
Have a look at this answer.
Short story: MPI_Init and MPI_Finalize do not mark the beginning and end of parallel processing. MPI processes run in parallel in their entirety.
#suszterpatt is correct to state that "MPI processes run in parallel in their entirety". When you run a parallel program using, for example, mpirun or mpiexec this starts the number of processes you requested (with the -n flag) and each process begins execution at the start of main. So in your example code
int main (int argc, char *argv[])
{
//some serial code goes here
parallel_function(arg1, arg2);
//some more serial code goes here
}
every process will execute the //some serial code goes here and //some more serial code goes here parts (and of course they will all call parallel_function). There isn't one master process which calls parallel_function and then spawns other processes once MPI_Init is called.
Generally it is best to avoid doing what you are doing: MPI_Init should be one of the first function calls in your program (ideally it should be the first). In particular, take note of the following (from here):
The MPI standard does not say what a program can do before an MPI_INIT or after an MPI_FINALIZE. In the MPICH implementation, you should do as little as possible. In particular, avoid anything that changes the external state of the program, such as opening files, reading standard input or writing to standard output.
Not respecting this can lead to some nasty bugs.
It is better practice to rewrite your code to something like the following:
int main (int argc, char *argv[])
{
// Initialise MPI
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
// Serial part: executed only by process with rank 0
if (my_rank==0)
{
// Some serial code goes here
}
// Parallel part: executed by all processes.
// Serial part: executed only by process with rank 0
if (my_rank==0)
{
// Some more serial code goes here
}
// Finalize MPI
MPI_Finalize();
return 0;
}
Note: I am not a C programmer, so use the above code with care. Also, shouldn't main always return something, especially when defined as int main()?