Barrier call stuck in Open MPI (C program) - c

I am practicing synchronization through barrier by using Open MPI message communication. I have created an array of struct called containers. Each container is linked to its neighbor on the right, and the two elements at both ends are also linked, forming a circle.
In the main() testing client, I run MPI with multiple processes (mpiexec -n 5 ./a.out), and they are supposed to be synchronized by calling the barrier() function, however, my code is stuck at the last process. I am looking for help with the debugging. Please see my code below:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <mpi.h>
typedef struct container {
int labels;
struct container *linked_to_container;
int sense;
} container;
container *allcontainers; /* an array for all containers */
int size_containers_array;
int get_next_container_id(int current_container_index, int max_index)
{
if (max_index - current_container_index >= 1)
{
return current_container_index + 1;
}
else
return 0; /* elements at two ends are linked */
}
container *get_container(int index)
{
return &allcontainers[index];
}
void container_init(int num_containers)
{
allcontainers = (container *) malloc(num_containers * sizeof(container)); /* is this right to malloc memory on the array of container when the struct size is still unknown?*/
size_containers_array = num_containers;
int i;
for (i = 0; i < num_containers; i++)
{
container *current_container = get_container(i);
current_container->labels = 0;
int next_container_id = get_next_container_id(i, num_containers - 1); /* max index in all_containers[] is num_containers-1 */
current_container->linked_to_container = get_container(next_container_id);
current_container->sense = 0;
}
}
void container_barrier()
{
int current_container_id, my_sense = 1;
int tag = current_container_id;
MPI_Request request[size_containers_array];
MPI_Status status[size_containers_array];
MPI_Comm_rank(MPI_COMM_WORLD, &current_container_id);
container *current_container = get_container(current_container_id);
int next_container_id = get_next_container_id(current_container_id, size_containers_array - 1);
/* send asynchronous message to the next container, wait, then do blocking receive */
MPI_Isend(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, &request[current_container_id]);
MPI_Wait(&request[current_container_id], &status[current_container_id]);
MPI_Recv(&my_sense, 1, MPI_INT, next_container_id, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
void free_containers()
{
free(allcontainers);
}
int main(int argc, char **argv)
{
int my_id, num_processes;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_processes);
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
container_init(num_processes);
printf("Hello world from thread %d of %d \n", my_id, num_processes);
container_barrier();
printf("passed barrier \n");
MPI_Finalize();
free_containers();
return 0;
}

The problem is the series of calls:
MPI_Isend()
MPI_Wait()
MPI_Recv()
This is a common source of confusion. When you use a "nonblocking" call in MPI, you are essentially telling the MPI library that you want to do some operation (send) with some data (my_sense). MPI gives you back an MPI_Request object with the guarantee that the call will be finished by the time a completion function finishes that MPI_Request.
The problem you have here is that you're calling MPI_Isend and immediately calling MPI_Wait before ever calling MPI_Recv on any rank. This means that all of those send calls get queued up but never actually have anywhere to go because you've never told MPI where to put the data by calling MPI_Recv (which tells MPI that you want to put the data in my_sense).
The reason this works part of the time is that MPI expects that things might not always sync up perfectly. If you smaller messages (which you do), MPI reserves some buffer space and will let your MPI_Send operations complete and the data gets stashed in that temporary space for a while until you call MPI_Recv later to tell MPI where to move the data. Eventually though, this won't work anymore. The buffers will be full and you'll need to actually start receiving your messages. For you, this means that you need to switch the order of your operations. Instead of doing a non-blocking send, you should do a non-blocking receive first, then do your blocking send, then wait for your receive to finish:
MPI_Irecv()
MPI_Send()
MPI_Wait()
The other option is to turn both functions into nonblocking functions and use MPI_Waitall instead:
MPI_Isend()
MPI_Irecv()
MPI_Waitall()
This last option is usually the best. The only thing that you'll need to be careful about is that you don't overwrite your own data. Right now you're using the same buffer for both the send and receive operations. If both of these are happening at the same time, there's no guarantees about the ordering. Normally this doesn't make a difference. Whether you send the message first or receive it doesn't really matter. However, in this case it does. If you receive data first, you'll end up sending the same data back out again instead of sending the data you had before the receive operation. You can solve this by using a temporary buffer to stage your data and move it to the right place when it's safe.

Related

C: Accessing or trying to alter a passed variable in a particular segment of a function involving message queues stops the process

I am trying to get processes to communicate and form groups through message queues.
In this case, a process forks into a number of children that communicate.
What bugs me, actually, is the fact that when entering a particular function that uses both the message queue and some variables i need to change with it, accessing the dereferencered values seems to stop the execution of the process (and the others too since it doesn't "release" the semaphore they share) until the parent process stops its children and starts doing something else.
struct msgbuf {
long mtype;
int mtext[2];
};
This is the msgbuf i'm using.
int checkMsg(int msgid, struct msgbuf messaggio, int voto, int * partner, int * took, int * rejects, int *pending){
...
while(msgrcv(msgid, &messaggio, sizeof(messaggio), getpid(), IPC_NOWAIT) != -1){
if(messaggio.mtext[0]==1){
printf("%s\n", strerror(errno)); // this is shown in the terminal
printf("%d\n", * partner); // this is not shown, blocking the execution
* partner = messaggio.mtext[1];
* took = 1;
* pending = *pending - 1;
refuseInvites(msgid, messaggio);
return 1;
}
...
}
...
}
And this is the problematic piece of code.
I tried changing the second printf to output *took and it does, but then the execution stops again (i presume the moment *partner is taken into consideration). the value of partner should be set to 0 before entering the function and messaggio.mtext[1] is a process ID.
I expect the function to not stop and returning successfully, mantaining the consistency of the data.
EDIT : The call to checkMsg() is this one:
taken = checkMsg(msgid, messaggio, voto, &partner, &took, &rejects, &pending);
partner, took, rejects and pending are set to 0 before the forking
EDIT 2: all the variables are of type int and the signature of the checkMsg is included in a header before the program starts.

Implement Double Buffer in C

So I have a very high data acquisition rate of 16MB/s. I am reading 4MB of data into a buffer from a device file and then processing it. However, this method of writing then reading was to slow for the project. I would like to implement a double buffer in C.
In order to simplify my idea of the double buffer I decided not to include reading from a device file for simplicity. What I have created is a C program that spawns two separate threads readThread and writeThread. I made readThread call my swap function that swaps the pointers of the buffers.
This implementation is terrible because I am using shared memory outside of the Mutex. I am actually slightly embarrassed to post it, but it will at least give you an idea of what I am trying to do. However, I can not seem to come up with a practical way of reading and writing to separate buffers at the same time then calling a swap once both threads finished writing and reading.
Can someone please tell me if its possible to implement double buffering and give me an idea of how to use signals to control when the threads read and write?
Note that readToBuff (dumb name I know) and writeToBuff aren't actually doing anything at present they have blank functions.
Here is my code:
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
pthread_t writeThread;
pthread_t readThread;
pthread_mutex_t buffer_mutex;
char buff1[4], buff2[4];
struct mutex_shared {
int stillReading, stillWriting, run_not_over;
char *writeBuff, *readBuff;
} SHARED;
void *writeToBuff(void *idk) {
while(!SHARED.run_not_over) {
SHARED.stillWriting = 1;
for(int i = 0; i < 4; i++) {
}
SHARED.stillWriting = 0;
while(SHARED.stillReading){};
}
printf("hello from write\n");
return NULL;
}
void *readToBuff(void *idk) {
while(!SHARED.run_not_over) {
SHARED.stillReading = 1;
for(int i = 0; i < 4; i++) {
}
while(SHARED.stillWriting){};
swap(writeThread,readThread);
}
printf("hello from read");
return NULL;
}
void swap(char **a, char **b){
pthread_mutex_lock(&buffer_mutex);
printf("in swap\n");
char *temp = *a;
*a = *b;
*b = temp;
SHARED.stillReading = 0;
//SHARED.stillWriting = 0;
pthread_mutex_unlock(&buffer_mutex);
}
int main() {
SHARED.writeBuff = buff1;
SHARED.readBuff = buff2;
printf("buff1 address %p\n", (void*) &buff1);
printf("buff2 address %p\n", (void*) &buff2);
printf("writeBuff address its pointing to %p\n", SHARED.writeBuff);
printf("readBuff address its pointing to %p\n", SHARED.readBuff);
swap(&SHARED.writeBuff,&SHARED.readBuff);
printf("writeBuff address its pointing to %p\n", SHARED.writeBuff);
printf("readBuff address its pointing to %p\n", SHARED.readBuff);
pthread_mutex_init(&buffer_mutex,NULL);
printf("Creating Write Thread\n");
if (pthread_create(&writeThread, NULL, writeToBuff, NULL)) {
printf("failed to create thread\n");
return 1;
}
printf("Thread created\n");
printf("Creating Read Thread\n");
if(pthread_create(&readThread, NULL, readToBuff, NULL)) {
printf("failed to create thread\n");
return 1;
}
printf("Thread created\n");
pthread_join(writeThread, NULL);
pthread_join(readThread, NULL);
exit(0);
}
Using a pair of semaphores seems like it would be easier. Each thread has it's own semaphore to indicate that a buffer is ready to be read into or written from, and each thread has it's own index into a circular array of structures, each containing a pointer to buffer and buffer size. For double buffering, the circular array only contains two structures.
The initial state sets the read thread's semaphore count to 2, the read index to the first buffer, the write threads semaphore count to 0, and the write index to the first buffer. The write thread is then created which will immediately wait on its semaphore.
The read thread waits for non-zero semaphore count (sem_wait) on its semaphore, reads into a buffer, sets the buffer size, increments the write threads semaphore count (sem_post) and "advances" it's index to the circular array of structures.
The write thread waits for non-zero semaphore count (sem_wait) on its semaphore, writes from a buffer (using the size set by read thread), increments the read threads semaphore count (sem_post) and "advances" it's index to the circular array of structures.
When reading is complete, the read thread sets a structure's buffer size to zero to indicate the end of the read chain, then waits for the write thread to "return" all buffers.
The circular array of structures could include more than just 2 structures, allowing for more nesting of data.
I've had to use something similar for high speed data capture, but in this case, the input stream was faster than a single hard drive, so two hard drives were used, and the output alternated between two write threads. One write thread operated on the "even" buffers, the other on the "odd" buffers.
In the case of Windows, with it's WaitForMultipleObjects() (something that just about every operating system other than Posix has), each thread can use a mutex and a semaphore, along with its own linked list based message queue. The mutex controls queue ownership for queue updates, the semaphore indicates number of items pending on a queue. For retrieving a message, a single atomic WaitForMultipleObjects() waits for a mutex and a non-zero semaphore count, and when both have occurred, decrements the semaphore count and unblocks the thread. A message sender, just needs a WaitForObject() on the mutex to update another threads message queue, then posts (releases) the threads semaphore and releases the mutex. This eliminates any priority issues between threads.

MPI merge multiple intercoms into a single intracomm

I trying to set up bunch of spawned processes into a single intracomm. I need to spawn separate processes into unique working directories since these subprocesses will write out a bunch of files. After all the processes are spawned in want to merge them into a single intra communicator. To try this out I set up a simple test program.
int main(int argc, const char * argv[]) {
int rank, size;
const int num_children = 5;
int error_codes;
MPI_Init(&argc, (char ***)&argv);
MPI_Comm parentcomm;
MPI_Comm childcomm;
MPI_Comm intracomm;
MPI_Comm_get_parent(&parentcomm);
if (parentcomm == MPI_COMM_NULL) {
printf("Current Command %s\n", argv[0]);
for (size_t i = 0; i < num_children; i++) {
MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &childcomm, &error_codes);
MPI_Intercomm_merge(childcomm, 0, &intracomm);
MPI_Barrier(childcomm);
}
} else {
MPI_Intercomm_merge(parentcomm, 1, &intracomm);
MPI_Barrier(parentcomm);
}
printf("Test\n");
MPI_Barrier(intracomm);
printf("Test2\n");
MPI_Comm_rank(intracomm, &rank);
MPI_Comm_size(intracomm, &size);
printf("Rank %d of %d\n", rank + 1, size);
MPI_Barrier(intracomm);
MPI_Finalize();
return 0;
}
When I run this I get all 6 processes but my intracomm is only speaking between the parent and the last child spawned. The resulting output is
Test
Test
Test
Test
Test
Test
Test2
Rank 1 of 2
Test2
Rank 2 of 2
Is there a way to merge multiple communicators into a single communicator? Also note that I'm spawning these one at a time since I need each subprocess to execute in a unique working directory.
I realize I'm a year out of date with this answer, but I thought maybe other people might want to see an implementation of this. As the original respondent said, there is no way to merge three (or more) communicators. You have to build up the new intra-comm one at a time. Here is the code I use. This version deletes the original intra-comm; you may or may not want to do that depending on your particular application:
#include <mpi.h>
// The Borg routine: given
// (1) a (quiesced) intra-communicator with one or more members, and
// (2) a (quiesced) inter-communicator with exactly two members, one
// of which is rank zero of the intra-communicator, and
// the other of which is an unrelated spawned rank,
// return a new intra-communicator which is the union of both inputs.
//
// This is a collective operation. All ranks of the intra-
// communicator, and the remote rank of the inter-communicator, must
// call this routine. Ranks that are members of the intra-comm must
// supply the proper value for the "intra" argument, and MPI_COMM_NULL
// for the "inter" argument. The remote inter-comm rank must
// supply MPI_COMM_NULL for the "intra" argument, and the proper value
// for the "inter" argument. Rank zero (only) of the intra-comm must
// supply proper values for both arguments.
//
// N.B. It would make a certain amount of sense to split this into
// separate routines for the intra-communicator processes and the
// remote inter-communicator process. The reason we don't do that is
// that, despite the relatively few lines of code, what's going on here
// is really pretty complicated, and requires close coordination of the
// participating processes. Putting all the code for all the processes
// into this one routine makes it easier to be sure everything "lines up"
// properly.
MPI_Comm
assimilateComm(MPI_Comm intra, MPI_Comm inter)
{
MPI_Comm peer = MPI_COMM_NULL;
MPI_Comm newInterComm = MPI_COMM_NULL;
MPI_Comm newIntraComm = MPI_COMM_NULL;
// The spawned rank will be the "high" rank in the new intra-comm
int high = (MPI_COMM_NULL == intra) ? 1 : 0;
// If this is one of the (two) ranks in the inter-comm,
// create a new intra-comm from the inter-comm
if (MPI_COMM_NULL != inter) {
MPI_Intercomm_merge(inter, high, &peer);
} else {
peer = MPI_COMM_NULL;
}
// Create a new inter-comm between the pre-existing intra-comm
// (all of it, not only rank zero), and the remote (spawned) rank,
// using the just-created intra-comm as the peer communicator.
int tag = 12345;
if (MPI_COMM_NULL != intra) {
// This task is a member of the pre-existing intra-comm
MPI_Intercomm_create(intra, 0, peer, 1, tag, &newInterComm);
}
else {
// This is the remote (spawned) task
MPI_Intercomm_create(MPI_COMM_SELF, 0, peer, 0, tag, &newInterComm);
}
// Now convert this inter-comm into an intra-comm
MPI_Intercomm_merge(newInterComm, high, &newIntraComm);
// Clean up the intermediaries
if (MPI_COMM_NULL != peer) MPI_Comm_free(&peer);
MPI_Comm_free(&newInterComm);
// Delete the original intra-comm
if (MPI_COMM_NULL != intra) MPI_Comm_free(&intra);
// Return the new intra-comm
return newIntraComm;
}
If you are going to do this by calling MPI_COMM_SPAWN multiple times, then you'll have to do it more carefully. After you call SPAWN the first time, the spawned process will also need to take part in the next call to SPAWN, otherwise it will be left out of the communicator you're merging. it ends up looking like this:
The problem is that only two processes are participating in each MPI_INTERCOMM_MERGE and you can't merge three communicators so you'll never end up with one big communicator that way.
If you instead have each process participate in the merge as it goes, you end up with one big communicator in the end:
Of course, you can just spawn all of your extra processes at once, but it sounds like you might have other reasons for not doing that.

Master-Slave non-blocking listener

Currently I'm trying to create a master-slave program with a "listener" loop where the master waits for a message from slaves to make a decision from there. But, despite using non-blocking MPI routines, I am experiencing an error. Do I need to use some blocking routine?
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char** argv)
{
// Variable Declarations
int rank, size;
MPI_Request *requestList,requestNull;
MPI_Status status;
// Start MPI
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if( rank == 0 )
{
// Process Zero
int dataOut=13, pr;
float dataIn = -1;
requestList =(MPI_Request*)malloc((size-1)*sizeof(MPI_Request));
while(1){
dataIn = -1;
// We do NOT need to wait for the MPI_ Isend(s), it is the job of the receiver processes.
for(pr=1;pr<size;pr++)
{
MPI_Irecv(&dataIn,1,MPI_FLOAT,pr,1,MPI_COMM_WORLD,&(requestList[pr-1]));
}
if((dataIn > 1.5)){
printf("From the process: %f\n", dataIn);
break;
}
}
}
else
{
// Receiver Process
float message;
int index;
//MPI_Request request;
MPI_Status status;
while(1){
message = random()/(double)1147483648;
// Send the message back to the process zero
MPI_Isend(&message,1,MPI_FLOAT,0,1,MPI_COMM_WORLD, &requestNull);
if(message > 1.5)
break;
}
}
MPI_Finalize();
return 0;
}
The problem seems to be that you're never waiting for your MPI calls to complete.
The way non-blocking calls work in MPI is that when you issue a non-blocking call (like MPI_IRECV), the last input parameter is an MPI_REQUEST object. When the initialization call (MPI_IRECV) is done, that request object contains the information for the non-blocking call. However, that call isn't done yet and you have no guarantee that the call is done until you use a completion call (MPI_WAIT/MPI_TEST/and friends) on the request.
In your case, you probably are using the non-blocking calls unnecessarily since you rely on the information received during the MPI_IRECV call in the next line. You should probably just substitute your non-blocking call for a blocking MPI_RECV call and use MPI_ANY_SOURCE so you don't have to post a separate receive call for each rank in the communicator. Alternatively, you can use MPI_WAITANY to complete your non-blocking calls, but then you'll need to worry about cleaning up all of the extra operations when you finish.

Changing value of a variable with MPI

#include<stdio.h>
#include<mpi.h>
int a=1;
int *p=&a;
int main(int argc, char **argv)
{
MPI_Init(&argc,&argv);
int rank,size;
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
//printf("Address val: %u \n",p);
*p=*p+1;
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
printf("Value of a : %d\n",*p);
return 0;
}
Here, I am trying to execute the program with 3 processes where each tries to increment the value of a by 1, so the value at the end of execution of all processes should be 4. Then why does the value printed as 2 only at the printf statement after MPI_Finalize(). And isnt it that the parallel execution stops at MPI_Finalize() and there should be only one process running after it. Then why do I get the print statement 3 times, one for each process, during execution?
It is a common misunderstanding to think that mpi_init starts up the requested number of processes (or whatever mechanism is used to implement MPI) and that mpi_finalize stops them. It's better to think of mpi_init starting the MPI system on top of a set of operating-system processes. The MPI standard is silent on what MPI actually runs on top of and how the underlying mechanism(s) is/are started. In practice a call to mpiexec (or mpirun) is likely to fire up a requested number of processes, all of which are alive when the program starts. It is also likely that the processes will continue to live after the call to mpi_finalize until the program finishes.
This means that prior to the call to mpi_init, and after the call to mpi_finalize it is likely that there is a number of o/s processes running, each of them executing the same program. This explains why you get the printf statement executed once for each of your processes.
As to why the value of a is set to 2 rather than to 4, well, essentially you are running n copies of the same program (where n is the number of processes) each of which adds 1 to its own version of a. A variable in the memory of one process has no relationship to a variable of the same name in the memory of another process. So each process sets a to 2.
To get any data from one process to another the processes need to engage in message-passing.
EDIT, in response to OP's comment
Just as a variable in the memory of one process has no relationship to a variable of the same name in the memory of another process, a pointer (which is a kind of variable) has no relationship to a pointer of the same name in the memory of another process. Do not be fooled, if the ''same'' pointer has the ''same'' address in multiple processes, those addresses are in different address spaces and are not the same, the pointers don't point to the same place.
An analogy: 1 High Street, Toytown is not the same address as 1 High Street, Legotown; there is a coincidence in names across address spaces.
To get any data (pointer or otherwise) from one process to another the processes need to engage in message-passing. You seem to be clinging to a notion that MPI processes share memory in some way. They don't, let go of that notion.
Since MPI is only giving you the option to communicate between separate processes, you have to do message passing. For your purpose there is something like MPI_Allreduce, which can sum data over the separate processes. Note that this adds the values, so in your case you want to sum the increment, and add the sum later to p:
int inc = 1;
MPI_Allreduce(MPI_IN_PLACE, &inc, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
*p += inc;
In your implementation there is no communication between the spawned threads. Each process has his own int a variable which it increments and prints to the screen. Making the variable global doesn't make it shared between processes and all the pointer gimmicks show me that you don't know what you are doing. I would suggest learning a little more C and Operating Systems before you move on.
Anyway, you have to make the processes communicate. Here's how an example might look like:
#include<stdio.h>
#include<mpi.h>
// this program will count the number of spawned processes in a *very* bad way
int main(int argc, char **argv)
{
int partial = 1;
int sum;
int my_id = 0;
// let's just assume the process with id 0 is root
int root_process = 0;
// spawn processes, etc.
MPI_Init(&argc,&argv);
// every process learns his id
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
// all processes add their 'partial' to the 'sum'
MPI_Reduce(&partial, &sum, 1, MPI_INT, MPI_SUM, root_process, MPI_COMM_WORLD);
// de-init MPI
MPI_Finalize();
// the root process communicates the summation result
if (my_id == root_process)
{
printf("Sum total : %d\n", sum);
}
return 0;
}

Resources