OpenMPI MPI_Barrier problems - c

I having some synchronization issues using the OpenMPI implementation of MPI_Barrier:
int rank;
int nprocs;
int rc = MPI_Init(&argc, &argv);
if(rc != MPI_SUCCESS) {
fprintf(stderr, "Unable to set up MPI");
MPI_Abort(MPI_COMM_WORLD, rc);
}
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("P%d\n", rank);
fflush(stdout);
MPI_Barrier(MPI_COMM_WORLD);
printf("P%d again\n", rank);
MPI_Finalize();
for mpirun -n 2 ./a.out
output should be:
P0
P1
...
output is sometimes:
P0
P0 again
P1
P1 again
what's going on?

The order in which your print out lines appear on your terminal is not necessarily the order in which things are printed. You are using a shared resource (stdout) for that so there always must be an ordering problem. (And fflush doesn't help here, stdout is line buffered anyhow.)
You could try to prefix your output with a timestamp and save all of this to different files, one per MPI process.
Then to inspect your log you could merge the two files together and sort according to the timestamp.
Your problem should disappear, then.

There is nothing wrong with MPI_Barrier().
As Jens mentioned, the reason why you are not seeing the output you expected is because stdout is buffered on each processes. There is no guarantee that prints from multiple processes will be displayed on the calling process in order. (If stdout from each process is be transferred to the main process for printing in real time, that will lead to lots of unnecessary communication!)
If you want to convince yourself that the barrier works, you could try writing to a file instead. Having multiple processes writing to a single file may lead to extra complications, so you could have each proc writing to one file, then after the barrier, swap the files they write to. For example:
Proc-0 Proc-1
| |
f0.write(..) f1.write(...)
| |
x ~~ barrier ~~ x
| |
f1.write(..) f0.write(...)
| |
END END
Sample implementation:
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
char filename[20];
int rank, size;
FILE *fp;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank < 2) { /* proc 0 and 1 only */
sprintf(filename, "file_%d.out", rank);
fp = fopen(filename, "w");
fprintf(fp, "P%d: before Barrier\n", rank);
fclose(fp);
}
MPI_Barrier(MPI_COMM_WORLD);
if (rank < 2) { /* proc 0 and 1 only */
sprintf(filename, "file_%d.out", (rank==0)?1:0 );
fp = fopen(filename, "a");
fprintf(fp, "P%d: after Barrier\n", rank);
fclose(fp);
}
MPI_Finalize();
return 0;
}
After running the code, you should get the following results:
[me#home]$ cat file_0.out
P0: before Barrier
P1: after Barrier
[me#home]$ cat file_1.out
P1: before Barrier
P0: after Barrier
For all files, the "after Barrier" statements will always appear later.

Output ordering is not guaranteed in MPI programs.
This is not related to MPI_Barrier at all.
Also, I would not spend too much time on worrying about output ordering with MPI programs.
The most elegant way to achieve this, if you really want to, is to let the processes send their messages to one rank, say, rank 0, and let rank 0 print the output in the order it received them or ordered by ranks.
Again, dont spend too much time on trying to order the output from MPI programs. It is not practical and is of little use.

Adding to the previous answers here, your MPI_BARRIER works fine.
Though, if you just intend to see it working, you can force pause the execution (SLEEP(1)) for a moment to let the output catch up.

Related

Using printf with MPI leads to non-deterministic output

The following code has non-deterministic behaviour on my machine (already with using only two processes).
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == size - 1) {
printf("I am the last rank\n");
MPI_Barrier(MPI_COMM_WORLD);
} else {
MPI_Barrier(MPI_COMM_WORLD);
printf("I am rank %d\n", rank);
}
MPI_Finalize();
return 0;
}
Sometimes the output from the last rank appears first on the terminal but sometimes it does appear later, even though a barrier is used.
I assume the reason for this is that printf does internal buffering and that MPI respectively mpirun/mpiexec and printf do not really cooperate with each other. Is there a more valid source to read up on this topic?
Output redirection is strongly platform-dependent and is only tangentially mentioned in the MPI standard. There is absolutely no guarantee as to what order output from different ranks will be displayed in. There isn't even a guarantee that you can get that output - some implementations only redirect the output from rank 0, some redirect no output, and both cases are compliant with the MPI standard.
If you want strongly ordered output, you should only do output in a single rank.

MPI_Barrier doesn't seem to work, reordering printf (stdout) messages [duplicate]

This question already has answers here:
Ordering Output in MPI
(4 answers)
Closed 5 years ago.
Below is a very basic MPI program
#include <mpi.h>
#include <stdio.h>
int main(int argc, char * argv[]) {
int rank;
int size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Barrier(MPI_COMM_WORLD);
printf("Hello from %d\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Goodbye from %d\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Hello 2 from %d\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Goodbye 2 from %d\n", rank);
MPI_Finalize();
return 0;
}
You would expect the output to be (for 2 processes)
Hello from 0
Hello from 1
Goodbye from 1
Goodbye from 0
Hello 2 from 1
Hello 2 from 0
Goodbye 2 from 0
Goodbye 2 from 1
Or something similar (the hellos and goodbyes should be grouped, but process order is not guaranteed).
Here is my actual output:
Hello from 0
Goodbye from 0
Hello 2 from 0
Goodbye 2 from 0
Hello from 1
Goodbye from 1
Hello 2 from 1
Goodbye 2 from 1
Am I fundamentally misunderstanding what MPI_Barrier is supposed to do? From what I can tell, if I use it only once, then it gives me expected results, but any more than that and it seems to do absolutely nothing.
I realize many similar questions have been asked before, but the askers in the questions I viewed misunderstood the function of MPI_Barrier.
the hellos and goodbyes should be grouped
They should not, there is additional (MPI-asynchronous) buffering inside printf function, and other one buffering is stdout gathering from multiple MPI processes to single user terminal.
printf just prints into in-memory buffer of libc (glibc), which is sometimes flushed to real file descriptor (stdout; use fflush to flush the buffer); and fprintf(stderr,...) usually have less buffering than stdout
Remote tasks are started by mpirun/mpiexec, usually with ssh remote shell, which does stdout/stderr forwarding. ssh (and TCP too) will buffer data and there can be reordering when data from ssh is showed at your terminal by mpirun/mpiexec or other entity (several data streams are multiplexed into one).
What you get is like 4 strings from first process are buffered and flushed at its exit (all strings were printed to stdout with usually has buffer of several kilobytes); and 4 more strings from second process are buffered too till exit. All 4 strings are sent by ssh or other launch method to your console as single "packet" and your console just shows both packets of 4 lines each in some order, either "4_lines_packet_from_id_0; 4_lines_packet_from_id_1;" or as "4_lines_packet_from_id_1;4_lines_packet_from_id_0;".
MPI_Barrier is supposed to do?
MPI_Barrier synchronizes parts of code, but it can't disable any buffering in libc/glibc printing and file I/O functions, nor in ssh or other remote shell.
If all of your processes run on machines with synchronized system clocks (they will be when you have single machine and they should be when there is ntpd on your cluster), you can add timestamp field to every printf to check that real order is respecting your barrier (gettimeofday looks for current time, and has no extra buffering). You may sort timestamped output even if printf and ssh will reorder messages.
#include <mpi.h>
#include <sys/time.h>
#include <stdio.h>
void print_timestamped_message(int mpi_rank, char *s);
{
struct timeval now;
gettimeofday(&now, NULL);
printf("[%u.%06u](%d): %s\n", now.tv_sec, now.tv_usec, mpi_rank, s);
}
int main(int argc, char * argv[]) {
int rank;
int size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "First Hello");
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "First Goodbye");
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "Second Hello");
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "Second Goodbye");
MPI_Finalize();
return 0;
}

Have trouble using GDB to debug MPI code

I am tring to use GDB to see what's going on about low level OPEN MPI calls. However, I couldn't use GDB to jump into low level library functions in OpenMPI.
I first compiled OpenMPI with tag "--enable-debug --enable-debug-symbols", so library should be compiled with "-g" tag. Then I compile my test code using "mpicc /path/to/my/code -g -o /dest/code". Code is simple shown as below:
#include "mpi.h"
#include <stdio.h>
main(int argc, char *argv[]) {
int numtasks, rank, dest, source, rc, count, tag=1;
int buf;
const int root=0;
MPI_Status Stat;
// sleep still GDB attached
int i = 0;
char hostname[256];
gethostname(hostname, sizeof(hostname));
printf("PID %d on %s ready for attach\n", getpid(), hostname);
fflush(stdout);
while (0 == i)
sleep(5);
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if(rank == root) {
buf = 777;
}
printf("[%d]: Before Bcast, buf is %d\n", rank, buf);
MPI_Bcast(&buf, 1, MPI_INT, root, MPI_COMM_WORLD);
printf("[%d]: After Bcast, buf is %d\n", rank, buf);
MPI_Finalize();
}
This code will firstly printout PID for every process, then keep sleep wait for attach. Then I use GDB attach method to attach one of the process. Set break point & reset the value of i will jump out of waiting while loop.
Then, I continue use "s" to step next my code. It supposed to jump into low level library functions like MPI_Init(), MPI_Bcast(). However, GDB will just going to next line and ignore jumping into any function. Neither "bt" could print out backtrace information. But the whole code could run without error.
Could someone found any mistake I made in this whole process? Thanks ahead.

MPI_Finalize() does not end any processes

I'm messing around with openMPI, and I have a wierd bug.
It seems, that even after MPI_Finalize(), each of the threads keeps running.
I have followed a guide for a simple Hello World program, and it looks like this:
#include <mpi.h>;
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d"
" out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
printf("This is after finalize");
}
Notice the last printf()... This should only be printed once, since the parallel part is finalized, right?!
However, the output from this program if i for example run it with 6 processors is:
mpirun -np 6 ./hello_world
Hello world from processor ubuntu, rank 2 out of 6 processors
Hello world from processor ubuntu, rank 1 out of 6 processors
Hello world from processor ubuntu, rank 3 out of 6 processors
Hello world from processor ubuntu, rank 0 out of 6 processors
Hello world from processor ubuntu, rank 4 out of 6 processors
Hello world from processor ubuntu, rank 5 out of 6 processors
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
Am I misunderstanding how MPI works? Should each thread/process not be stopped by the finalize?
This is just undefined behavior.
The number of processes running after this routine is called is
undefined; it is best not to perform much more than a return rc after
calling MPI_Finalize.
http://www.mpich.org/static/docs/v3.1/www3/MPI_Finalize.html
The MPI standard only requires that rank 0 return from MPI_FINALIZE. I won't copy the entire text here because it's rather lengthy, but you can find it in the version 3.0 of the standard (the latest for a few more days) in Chapter 8, section 8.7 (Startup) on page 359 - 361. Here's the most relevant parts:
Although it is not required that all processes return from MPI_FINALIZE, it is required that at least process 0 in MPI_COMM_WORLD return, so that users can know that the MPI portion of the computation is over. In addition, in a POSIX environment, users may desire to supply an exit code for each process that returns from MPI_FINALIZE.
There's even an example that's trying to do exactly what you said:
Example 8.10 The following illustrates the use of requiring that at least one process return and that it be known that process 0 is one of the processes that return. One wants code like the following to work no matter how many processes return.
...
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
...
MPI_Finalize();
if (myrank == 0) {
resultfile = fopen("outfile","w");
dump_results(resultfile);
fclose(resultfile);
} exit(0);
The MPI standard doesn't say anything else about the behavior of an application after calling MPI_FINALIZE. All this function is required to do is clean up internal MPI state, complete communication operations, etc. While it's certainly possible (and allowed) for MPI to kill the other ranks of the application after a call to MPI_FINALIZE, in practice, that is almost never the way that it is done. There's probably a counter example, but I'm not aware of it.
When I started MPI, I had same problem with MPI_Init and MPI_Finalize methods. I thought between these functions work parallel and outside work serial. Finally I saw this answer and I figured its functionality out.
J Teller's answer:
https://stackoverflow.com/a/2290951/893863
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { // Do the serial part on a single MPI thread
printf("Performing serial computation on cpu %d\n", myid);
PreParallelWork();
}
ParallelWork(); // Every MPI thread will run the parallel work
if (myid == 0) { // Do the final serial part on a single MPI thread
printf("Performing the final serial computation on cpu %d\n", myid);
PostParallelWork();
}
MPI_Finalize();
return 0;
}

How to run this compiled open MPI program (C)?

I am trying to run the example code at the following url. I compiled the program with "mpicc twoGroups.c" and tried to run it as "./a.out", but got the following message: Must specify MP_PROCS= 8. Terminating.
My question is how do you set MP_PROCS=8 ?
Group and Communication Routine examples at here https://computing.llnl.gov/tutorials/mpi/
#include "mpi.h"
#include <stdio.h>
#define NPROCS 8
main(int argc, char *argv[]) {
int rank, new_rank, sendbuf, recvbuf, numtasks,
ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7};
MPI_Group orig_group, new_group;
MPI_Comm new_comm;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks != NPROCS) {
printf("Must specify MP_PROCS= %d. Terminating.\n",NPROCS);
MPI_Finalize();
exit(0);
}
sendbuf = rank;
/* Extract the original group handle */
MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
/* Divide tasks into two distinct groups based upon rank */
if (rank < NPROCS/2) {
MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group);
}
else {
MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group);
}
/* Create new new communicator and then perform collective communications */
MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
MPI_Allreduce(&sendbuf, &recvbuf, 1, MPI_INT, MPI_SUM, new_comm);
MPI_Group_rank (new_group, &new_rank);
printf("rank= %d newrank= %d recvbuf= %d\n",rank,new_rank,recvbuf);
MPI_Finalize();
}
When you execute an MPI program, you need to use the appropriate wrappers. Most of the time it looks like this:
mpiexec -n <number_of_processes> <executable_name> <executable_args>
So for your simple example:
mpiexec -n 8 ./a.out
You will also see mpirun used instead of mpiexec or -np instead of -n. Both are fine most of the time.
If you're just starting out, it would also be a good idea to make sure you're using a recent version of MPI so you don't get old bugs or weird execution environments. MPICH and Open MPI are the two most popular implementations. MPICH just released version 3.1 available here while Open MPI has version 1.7.4 available here. You can also usually get either of them via your friendly neighborhood package manager.

Resources