Using printf with MPI leads to non-deterministic output - c

The following code has non-deterministic behaviour on my machine (already with using only two processes).
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == size - 1) {
printf("I am the last rank\n");
MPI_Barrier(MPI_COMM_WORLD);
} else {
MPI_Barrier(MPI_COMM_WORLD);
printf("I am rank %d\n", rank);
}
MPI_Finalize();
return 0;
}
Sometimes the output from the last rank appears first on the terminal but sometimes it does appear later, even though a barrier is used.
I assume the reason for this is that printf does internal buffering and that MPI respectively mpirun/mpiexec and printf do not really cooperate with each other. Is there a more valid source to read up on this topic?

Output redirection is strongly platform-dependent and is only tangentially mentioned in the MPI standard. There is absolutely no guarantee as to what order output from different ranks will be displayed in. There isn't even a guarantee that you can get that output - some implementations only redirect the output from rank 0, some redirect no output, and both cases are compliant with the MPI standard.
If you want strongly ordered output, you should only do output in a single rank.

Related

Is there any way in MPI_Programs to order the execution of processes?

Say I have 2 processes, P1 and P2 and both P1 and P2 are printing an array of 1000 data points. As we know, we can't guarantee anything about the order of output, it may be P1 prints the data first followed by P2 or vice versa, or it can be that both outputs are getting mixed. Now say I want to output the values of P1 first followed by P2. Is there any way by which I can guarantee that?
I am attaching a Minimal Reproducible Example in which output gets mixed herewith
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main( int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int myrank, size; //size will take care of number of processes
MPI_Comm_rank(MPI_COMM_WORLD, &myrank) ;
MPI_Comm_size(MPI_COMM_WORLD, &size);
if(myrank==0)
{
int a[1000];
for(int i=0;i<1000;i++)
{
a[i]=i+1;
}
for(int i=0;i<1000;i++)
{
printf(" %d",a[i]);
}
}
if(myrank==1)
{
int a[1000];
for(int i=0;i<1000;i++)
{
a[i]=i+1;
}
for(int i=0;i<1000;i++)
{
printf(" %d",a[i]);
}
}
MPI_Finalize();
return 0;
}
The only way I can think of to output the data sequentially is that sending the data from say P1 to P0 and then printing it all from P0. But then we will incur the extra computational cost of sending data from one process to another.
You have some additional options:
pass a token. Processes can block waiting for a message,print whatever, then send to the next rank
let something else deal with ordering. Each process prefixes it's rank to the output, then you can sort the output by rank.
let's say this was a file. Each rank could compute where it should write and then everybody can carry out a right to the file in the correct location in parallel (which is what mpi-io apps will do)
Now say I want to output the values of P1 first followed by P2. Is
there any way by which I can guarantee that?
This is not how MPI is meant to be used, actually parallelism in general IMO. The coordination of printing the output to the console among processes will greatly degrade the performance of the parallel version, which defeats one of the purposes of parallelism i.e., reducing the overall execution time.
Most of the times one is better off just making one process responsible for printing the output to the console (typically the master process i.e., process with rank = 0).
Citing #Gilles Gouaillardet:
The only safe option is to send all the data to a given rank, and then print the data from that rank.
You could try using MPI_Barrier to coordinate the processes in a way that would print the output has you want, however (citing #Hristo Iliev):
Using barriers like that only works for local launches when (and if)
the processes share the same controlling terminal. Otherwise, it is
entirely to the discretion of the I/O redirection mechanism of the MPI
implementation.
If it is for debugging purposes you can either use a good MPI-aware debugger that allows to look into the content of the data of each process. Alternatively, you can limiting the output to be printed at one process at the time per run so that you can check if all the processes have the data that they should have.

MPI_Barrier doesn't seem to work, reordering printf (stdout) messages [duplicate]

This question already has answers here:
Ordering Output in MPI
(4 answers)
Closed 5 years ago.
Below is a very basic MPI program
#include <mpi.h>
#include <stdio.h>
int main(int argc, char * argv[]) {
int rank;
int size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Barrier(MPI_COMM_WORLD);
printf("Hello from %d\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Goodbye from %d\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Hello 2 from %d\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
printf("Goodbye 2 from %d\n", rank);
MPI_Finalize();
return 0;
}
You would expect the output to be (for 2 processes)
Hello from 0
Hello from 1
Goodbye from 1
Goodbye from 0
Hello 2 from 1
Hello 2 from 0
Goodbye 2 from 0
Goodbye 2 from 1
Or something similar (the hellos and goodbyes should be grouped, but process order is not guaranteed).
Here is my actual output:
Hello from 0
Goodbye from 0
Hello 2 from 0
Goodbye 2 from 0
Hello from 1
Goodbye from 1
Hello 2 from 1
Goodbye 2 from 1
Am I fundamentally misunderstanding what MPI_Barrier is supposed to do? From what I can tell, if I use it only once, then it gives me expected results, but any more than that and it seems to do absolutely nothing.
I realize many similar questions have been asked before, but the askers in the questions I viewed misunderstood the function of MPI_Barrier.
the hellos and goodbyes should be grouped
They should not, there is additional (MPI-asynchronous) buffering inside printf function, and other one buffering is stdout gathering from multiple MPI processes to single user terminal.
printf just prints into in-memory buffer of libc (glibc), which is sometimes flushed to real file descriptor (stdout; use fflush to flush the buffer); and fprintf(stderr,...) usually have less buffering than stdout
Remote tasks are started by mpirun/mpiexec, usually with ssh remote shell, which does stdout/stderr forwarding. ssh (and TCP too) will buffer data and there can be reordering when data from ssh is showed at your terminal by mpirun/mpiexec or other entity (several data streams are multiplexed into one).
What you get is like 4 strings from first process are buffered and flushed at its exit (all strings were printed to stdout with usually has buffer of several kilobytes); and 4 more strings from second process are buffered too till exit. All 4 strings are sent by ssh or other launch method to your console as single "packet" and your console just shows both packets of 4 lines each in some order, either "4_lines_packet_from_id_0; 4_lines_packet_from_id_1;" or as "4_lines_packet_from_id_1;4_lines_packet_from_id_0;".
MPI_Barrier is supposed to do?
MPI_Barrier synchronizes parts of code, but it can't disable any buffering in libc/glibc printing and file I/O functions, nor in ssh or other remote shell.
If all of your processes run on machines with synchronized system clocks (they will be when you have single machine and they should be when there is ntpd on your cluster), you can add timestamp field to every printf to check that real order is respecting your barrier (gettimeofday looks for current time, and has no extra buffering). You may sort timestamped output even if printf and ssh will reorder messages.
#include <mpi.h>
#include <sys/time.h>
#include <stdio.h>
void print_timestamped_message(int mpi_rank, char *s);
{
struct timeval now;
gettimeofday(&now, NULL);
printf("[%u.%06u](%d): %s\n", now.tv_sec, now.tv_usec, mpi_rank, s);
}
int main(int argc, char * argv[]) {
int rank;
int size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "First Hello");
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "First Goodbye");
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "Second Hello");
MPI_Barrier(MPI_COMM_WORLD);
print_timestamped_message(rank, "Second Goodbye");
MPI_Finalize();
return 0;
}

MPI_Finalize() does not end any processes

I'm messing around with openMPI, and I have a wierd bug.
It seems, that even after MPI_Finalize(), each of the threads keeps running.
I have followed a guide for a simple Hello World program, and it looks like this:
#include <mpi.h>;
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d"
" out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
printf("This is after finalize");
}
Notice the last printf()... This should only be printed once, since the parallel part is finalized, right?!
However, the output from this program if i for example run it with 6 processors is:
mpirun -np 6 ./hello_world
Hello world from processor ubuntu, rank 2 out of 6 processors
Hello world from processor ubuntu, rank 1 out of 6 processors
Hello world from processor ubuntu, rank 3 out of 6 processors
Hello world from processor ubuntu, rank 0 out of 6 processors
Hello world from processor ubuntu, rank 4 out of 6 processors
Hello world from processor ubuntu, rank 5 out of 6 processors
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
Am I misunderstanding how MPI works? Should each thread/process not be stopped by the finalize?
This is just undefined behavior.
The number of processes running after this routine is called is
undefined; it is best not to perform much more than a return rc after
calling MPI_Finalize.
http://www.mpich.org/static/docs/v3.1/www3/MPI_Finalize.html
The MPI standard only requires that rank 0 return from MPI_FINALIZE. I won't copy the entire text here because it's rather lengthy, but you can find it in the version 3.0 of the standard (the latest for a few more days) in Chapter 8, section 8.7 (Startup) on page 359 - 361. Here's the most relevant parts:
Although it is not required that all processes return from MPI_FINALIZE, it is required that at least process 0 in MPI_COMM_WORLD return, so that users can know that the MPI portion of the computation is over. In addition, in a POSIX environment, users may desire to supply an exit code for each process that returns from MPI_FINALIZE.
There's even an example that's trying to do exactly what you said:
Example 8.10 The following illustrates the use of requiring that at least one process return and that it be known that process 0 is one of the processes that return. One wants code like the following to work no matter how many processes return.
...
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
...
MPI_Finalize();
if (myrank == 0) {
resultfile = fopen("outfile","w");
dump_results(resultfile);
fclose(resultfile);
} exit(0);
The MPI standard doesn't say anything else about the behavior of an application after calling MPI_FINALIZE. All this function is required to do is clean up internal MPI state, complete communication operations, etc. While it's certainly possible (and allowed) for MPI to kill the other ranks of the application after a call to MPI_FINALIZE, in practice, that is almost never the way that it is done. There's probably a counter example, but I'm not aware of it.
When I started MPI, I had same problem with MPI_Init and MPI_Finalize methods. I thought between these functions work parallel and outside work serial. Finally I saw this answer and I figured its functionality out.
J Teller's answer:
https://stackoverflow.com/a/2290951/893863
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { // Do the serial part on a single MPI thread
printf("Performing serial computation on cpu %d\n", myid);
PreParallelWork();
}
ParallelWork(); // Every MPI thread will run the parallel work
if (myid == 0) { // Do the final serial part on a single MPI thread
printf("Performing the final serial computation on cpu %d\n", myid);
PostParallelWork();
}
MPI_Finalize();
return 0;
}

How to run this compiled open MPI program (C)?

I am trying to run the example code at the following url. I compiled the program with "mpicc twoGroups.c" and tried to run it as "./a.out", but got the following message: Must specify MP_PROCS= 8. Terminating.
My question is how do you set MP_PROCS=8 ?
Group and Communication Routine examples at here https://computing.llnl.gov/tutorials/mpi/
#include "mpi.h"
#include <stdio.h>
#define NPROCS 8
main(int argc, char *argv[]) {
int rank, new_rank, sendbuf, recvbuf, numtasks,
ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7};
MPI_Group orig_group, new_group;
MPI_Comm new_comm;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks != NPROCS) {
printf("Must specify MP_PROCS= %d. Terminating.\n",NPROCS);
MPI_Finalize();
exit(0);
}
sendbuf = rank;
/* Extract the original group handle */
MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
/* Divide tasks into two distinct groups based upon rank */
if (rank < NPROCS/2) {
MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group);
}
else {
MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group);
}
/* Create new new communicator and then perform collective communications */
MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
MPI_Allreduce(&sendbuf, &recvbuf, 1, MPI_INT, MPI_SUM, new_comm);
MPI_Group_rank (new_group, &new_rank);
printf("rank= %d newrank= %d recvbuf= %d\n",rank,new_rank,recvbuf);
MPI_Finalize();
}
When you execute an MPI program, you need to use the appropriate wrappers. Most of the time it looks like this:
mpiexec -n <number_of_processes> <executable_name> <executable_args>
So for your simple example:
mpiexec -n 8 ./a.out
You will also see mpirun used instead of mpiexec or -np instead of -n. Both are fine most of the time.
If you're just starting out, it would also be a good idea to make sure you're using a recent version of MPI so you don't get old bugs or weird execution environments. MPICH and Open MPI are the two most popular implementations. MPICH just released version 3.1 available here while Open MPI has version 1.7.4 available here. You can also usually get either of them via your friendly neighborhood package manager.

OpenMPI MPI_Barrier problems

I having some synchronization issues using the OpenMPI implementation of MPI_Barrier:
int rank;
int nprocs;
int rc = MPI_Init(&argc, &argv);
if(rc != MPI_SUCCESS) {
fprintf(stderr, "Unable to set up MPI");
MPI_Abort(MPI_COMM_WORLD, rc);
}
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("P%d\n", rank);
fflush(stdout);
MPI_Barrier(MPI_COMM_WORLD);
printf("P%d again\n", rank);
MPI_Finalize();
for mpirun -n 2 ./a.out
output should be:
P0
P1
...
output is sometimes:
P0
P0 again
P1
P1 again
what's going on?
The order in which your print out lines appear on your terminal is not necessarily the order in which things are printed. You are using a shared resource (stdout) for that so there always must be an ordering problem. (And fflush doesn't help here, stdout is line buffered anyhow.)
You could try to prefix your output with a timestamp and save all of this to different files, one per MPI process.
Then to inspect your log you could merge the two files together and sort according to the timestamp.
Your problem should disappear, then.
There is nothing wrong with MPI_Barrier().
As Jens mentioned, the reason why you are not seeing the output you expected is because stdout is buffered on each processes. There is no guarantee that prints from multiple processes will be displayed on the calling process in order. (If stdout from each process is be transferred to the main process for printing in real time, that will lead to lots of unnecessary communication!)
If you want to convince yourself that the barrier works, you could try writing to a file instead. Having multiple processes writing to a single file may lead to extra complications, so you could have each proc writing to one file, then after the barrier, swap the files they write to. For example:
Proc-0 Proc-1
| |
f0.write(..) f1.write(...)
| |
x ~~ barrier ~~ x
| |
f1.write(..) f0.write(...)
| |
END END
Sample implementation:
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
char filename[20];
int rank, size;
FILE *fp;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank < 2) { /* proc 0 and 1 only */
sprintf(filename, "file_%d.out", rank);
fp = fopen(filename, "w");
fprintf(fp, "P%d: before Barrier\n", rank);
fclose(fp);
}
MPI_Barrier(MPI_COMM_WORLD);
if (rank < 2) { /* proc 0 and 1 only */
sprintf(filename, "file_%d.out", (rank==0)?1:0 );
fp = fopen(filename, "a");
fprintf(fp, "P%d: after Barrier\n", rank);
fclose(fp);
}
MPI_Finalize();
return 0;
}
After running the code, you should get the following results:
[me#home]$ cat file_0.out
P0: before Barrier
P1: after Barrier
[me#home]$ cat file_1.out
P1: before Barrier
P0: after Barrier
For all files, the "after Barrier" statements will always appear later.
Output ordering is not guaranteed in MPI programs.
This is not related to MPI_Barrier at all.
Also, I would not spend too much time on worrying about output ordering with MPI programs.
The most elegant way to achieve this, if you really want to, is to let the processes send their messages to one rank, say, rank 0, and let rank 0 print the output in the order it received them or ordered by ranks.
Again, dont spend too much time on trying to order the output from MPI programs. It is not practical and is of little use.
Adding to the previous answers here, your MPI_BARRIER works fine.
Though, if you just intend to see it working, you can force pause the execution (SLEEP(1)) for a moment to let the output catch up.

Resources