MPI serial main function - c

This is a pretty basic MPI question, but I can't wrap my head around it. I have a main function that calls another function that uses MPI. I want the main function to execute in serial, and the other function to execute in parallel. My code is like this:
int main (int argc, char *argv[])
{
//some serial code goes here
parallel_function(arg1, arg2);
//some more serial code goes here
}
void parallel_function(int arg1, int arg2)
{
//init MPI and do some stuff in parallel
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
//now do some parallel stuff
//....
//finalize to end MPI??
MPI_Finalize();
}
My code runs fine and gets the expected output, but the issue is that the main function is also being run in separate processes and so the serial code executes more than once. I don't know how it's running multiple times, because I haven't even called MPI_Init yet (if I printf in main before I call parallel_function, I see multiple printf's)
How can I stop my program running in parallel after I'm done?
Thanks for any responses!

Have a look at this answer.
Short story: MPI_Init and MPI_Finalize do not mark the beginning and end of parallel processing. MPI processes run in parallel in their entirety.

#suszterpatt is correct to state that "MPI processes run in parallel in their entirety". When you run a parallel program using, for example, mpirun or mpiexec this starts the number of processes you requested (with the -n flag) and each process begins execution at the start of main. So in your example code
int main (int argc, char *argv[])
{
//some serial code goes here
parallel_function(arg1, arg2);
//some more serial code goes here
}
every process will execute the //some serial code goes here and //some more serial code goes here parts (and of course they will all call parallel_function). There isn't one master process which calls parallel_function and then spawns other processes once MPI_Init is called.
Generally it is best to avoid doing what you are doing: MPI_Init should be one of the first function calls in your program (ideally it should be the first). In particular, take note of the following (from here):
The MPI standard does not say what a program can do before an MPI_INIT or after an MPI_FINALIZE. In the MPICH implementation, you should do as little as possible. In particular, avoid anything that changes the external state of the program, such as opening files, reading standard input or writing to standard output.
Not respecting this can lead to some nasty bugs.
It is better practice to rewrite your code to something like the following:
int main (int argc, char *argv[])
{
// Initialise MPI
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
// Serial part: executed only by process with rank 0
if (my_rank==0)
{
// Some serial code goes here
}
// Parallel part: executed by all processes.
// Serial part: executed only by process with rank 0
if (my_rank==0)
{
// Some more serial code goes here
}
// Finalize MPI
MPI_Finalize();
return 0;
}
Note: I am not a C programmer, so use the above code with care. Also, shouldn't main always return something, especially when defined as int main()?

Related

Is there any way in MPI_Programs to order the execution of processes?

Say I have 2 processes, P1 and P2 and both P1 and P2 are printing an array of 1000 data points. As we know, we can't guarantee anything about the order of output, it may be P1 prints the data first followed by P2 or vice versa, or it can be that both outputs are getting mixed. Now say I want to output the values of P1 first followed by P2. Is there any way by which I can guarantee that?
I am attaching a Minimal Reproducible Example in which output gets mixed herewith
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main( int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int myrank, size; //size will take care of number of processes
MPI_Comm_rank(MPI_COMM_WORLD, &myrank) ;
MPI_Comm_size(MPI_COMM_WORLD, &size);
if(myrank==0)
{
int a[1000];
for(int i=0;i<1000;i++)
{
a[i]=i+1;
}
for(int i=0;i<1000;i++)
{
printf(" %d",a[i]);
}
}
if(myrank==1)
{
int a[1000];
for(int i=0;i<1000;i++)
{
a[i]=i+1;
}
for(int i=0;i<1000;i++)
{
printf(" %d",a[i]);
}
}
MPI_Finalize();
return 0;
}
The only way I can think of to output the data sequentially is that sending the data from say P1 to P0 and then printing it all from P0. But then we will incur the extra computational cost of sending data from one process to another.
You have some additional options:
pass a token. Processes can block waiting for a message,print whatever, then send to the next rank
let something else deal with ordering. Each process prefixes it's rank to the output, then you can sort the output by rank.
let's say this was a file. Each rank could compute where it should write and then everybody can carry out a right to the file in the correct location in parallel (which is what mpi-io apps will do)
Now say I want to output the values of P1 first followed by P2. Is
there any way by which I can guarantee that?
This is not how MPI is meant to be used, actually parallelism in general IMO. The coordination of printing the output to the console among processes will greatly degrade the performance of the parallel version, which defeats one of the purposes of parallelism i.e., reducing the overall execution time.
Most of the times one is better off just making one process responsible for printing the output to the console (typically the master process i.e., process with rank = 0).
Citing #Gilles Gouaillardet:
The only safe option is to send all the data to a given rank, and then print the data from that rank.
You could try using MPI_Barrier to coordinate the processes in a way that would print the output has you want, however (citing #Hristo Iliev):
Using barriers like that only works for local launches when (and if)
the processes share the same controlling terminal. Otherwise, it is
entirely to the discretion of the I/O redirection mechanism of the MPI
implementation.
If it is for debugging purposes you can either use a good MPI-aware debugger that allows to look into the content of the data of each process. Alternatively, you can limiting the output to be printed at one process at the time per run so that you can check if all the processes have the data that they should have.

MPI_Init must be called by one thread only

The ref of MPI_Init, states:
This routine must be called by one thread only. That thread is called the main thread and must be the thread that calls MPI_Finalize.
How to do this? I mean every example I have seen looks like this and in my code, I tried:
MPI_Comm_rank(MPI_COMM_WORLD, &mpirank);
bool mpiroot = (mpirank == 0);
if(mpiroot)
MPI_Init(&argc, &argv);
but I got:
Attempting to use an MPI routine before initializing MPICH
However, notice that this will work fine, if I leave it as in the example, I just had to re-check, because of my code's failure here.
I am thinking that because we call mpiexec -n 4 ./test, 4 processes will be spawned, thus all of them will call MPI_Init. I just printed stuff at the very first line of main() and they will be printed as many times as the number of processes.
MPI_Init must be the first MPI function called by your MPI program. It must be called by each process. Note that a process is not the same as a thread! If you go on to spawn threads from a process, those threads must not call MPI_Init again.
So your program should be something like this:
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
int mpirank;
MPI_Comm_rank(MPI_COMM_WORLD, &mpirank);
// No more calls to MPI_Init in here
...
MPI_Finalize();
}

MPI_Finalize() does not end any processes

I'm messing around with openMPI, and I have a wierd bug.
It seems, that even after MPI_Finalize(), each of the threads keeps running.
I have followed a guide for a simple Hello World program, and it looks like this:
#include <mpi.h>;
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d"
" out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
printf("This is after finalize");
}
Notice the last printf()... This should only be printed once, since the parallel part is finalized, right?!
However, the output from this program if i for example run it with 6 processors is:
mpirun -np 6 ./hello_world
Hello world from processor ubuntu, rank 2 out of 6 processors
Hello world from processor ubuntu, rank 1 out of 6 processors
Hello world from processor ubuntu, rank 3 out of 6 processors
Hello world from processor ubuntu, rank 0 out of 6 processors
Hello world from processor ubuntu, rank 4 out of 6 processors
Hello world from processor ubuntu, rank 5 out of 6 processors
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
Am I misunderstanding how MPI works? Should each thread/process not be stopped by the finalize?
This is just undefined behavior.
The number of processes running after this routine is called is
undefined; it is best not to perform much more than a return rc after
calling MPI_Finalize.
http://www.mpich.org/static/docs/v3.1/www3/MPI_Finalize.html
The MPI standard only requires that rank 0 return from MPI_FINALIZE. I won't copy the entire text here because it's rather lengthy, but you can find it in the version 3.0 of the standard (the latest for a few more days) in Chapter 8, section 8.7 (Startup) on page 359 - 361. Here's the most relevant parts:
Although it is not required that all processes return from MPI_FINALIZE, it is required that at least process 0 in MPI_COMM_WORLD return, so that users can know that the MPI portion of the computation is over. In addition, in a POSIX environment, users may desire to supply an exit code for each process that returns from MPI_FINALIZE.
There's even an example that's trying to do exactly what you said:
Example 8.10 The following illustrates the use of requiring that at least one process return and that it be known that process 0 is one of the processes that return. One wants code like the following to work no matter how many processes return.
...
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
...
MPI_Finalize();
if (myrank == 0) {
resultfile = fopen("outfile","w");
dump_results(resultfile);
fclose(resultfile);
} exit(0);
The MPI standard doesn't say anything else about the behavior of an application after calling MPI_FINALIZE. All this function is required to do is clean up internal MPI state, complete communication operations, etc. While it's certainly possible (and allowed) for MPI to kill the other ranks of the application after a call to MPI_FINALIZE, in practice, that is almost never the way that it is done. There's probably a counter example, but I'm not aware of it.
When I started MPI, I had same problem with MPI_Init and MPI_Finalize methods. I thought between these functions work parallel and outside work serial. Finally I saw this answer and I figured its functionality out.
J Teller's answer:
https://stackoverflow.com/a/2290951/893863
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { // Do the serial part on a single MPI thread
printf("Performing serial computation on cpu %d\n", myid);
PreParallelWork();
}
ParallelWork(); // Every MPI thread will run the parallel work
if (myid == 0) { // Do the final serial part on a single MPI thread
printf("Performing the final serial computation on cpu %d\n", myid);
PostParallelWork();
}
MPI_Finalize();
return 0;
}

Changing value of a variable with MPI

#include<stdio.h>
#include<mpi.h>
int a=1;
int *p=&a;
int main(int argc, char **argv)
{
MPI_Init(&argc,&argv);
int rank,size;
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
//printf("Address val: %u \n",p);
*p=*p+1;
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
printf("Value of a : %d\n",*p);
return 0;
}
Here, I am trying to execute the program with 3 processes where each tries to increment the value of a by 1, so the value at the end of execution of all processes should be 4. Then why does the value printed as 2 only at the printf statement after MPI_Finalize(). And isnt it that the parallel execution stops at MPI_Finalize() and there should be only one process running after it. Then why do I get the print statement 3 times, one for each process, during execution?
It is a common misunderstanding to think that mpi_init starts up the requested number of processes (or whatever mechanism is used to implement MPI) and that mpi_finalize stops them. It's better to think of mpi_init starting the MPI system on top of a set of operating-system processes. The MPI standard is silent on what MPI actually runs on top of and how the underlying mechanism(s) is/are started. In practice a call to mpiexec (or mpirun) is likely to fire up a requested number of processes, all of which are alive when the program starts. It is also likely that the processes will continue to live after the call to mpi_finalize until the program finishes.
This means that prior to the call to mpi_init, and after the call to mpi_finalize it is likely that there is a number of o/s processes running, each of them executing the same program. This explains why you get the printf statement executed once for each of your processes.
As to why the value of a is set to 2 rather than to 4, well, essentially you are running n copies of the same program (where n is the number of processes) each of which adds 1 to its own version of a. A variable in the memory of one process has no relationship to a variable of the same name in the memory of another process. So each process sets a to 2.
To get any data from one process to another the processes need to engage in message-passing.
EDIT, in response to OP's comment
Just as a variable in the memory of one process has no relationship to a variable of the same name in the memory of another process, a pointer (which is a kind of variable) has no relationship to a pointer of the same name in the memory of another process. Do not be fooled, if the ''same'' pointer has the ''same'' address in multiple processes, those addresses are in different address spaces and are not the same, the pointers don't point to the same place.
An analogy: 1 High Street, Toytown is not the same address as 1 High Street, Legotown; there is a coincidence in names across address spaces.
To get any data (pointer or otherwise) from one process to another the processes need to engage in message-passing. You seem to be clinging to a notion that MPI processes share memory in some way. They don't, let go of that notion.
Since MPI is only giving you the option to communicate between separate processes, you have to do message passing. For your purpose there is something like MPI_Allreduce, which can sum data over the separate processes. Note that this adds the values, so in your case you want to sum the increment, and add the sum later to p:
int inc = 1;
MPI_Allreduce(MPI_IN_PLACE, &inc, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
*p += inc;
In your implementation there is no communication between the spawned threads. Each process has his own int a variable which it increments and prints to the screen. Making the variable global doesn't make it shared between processes and all the pointer gimmicks show me that you don't know what you are doing. I would suggest learning a little more C and Operating Systems before you move on.
Anyway, you have to make the processes communicate. Here's how an example might look like:
#include<stdio.h>
#include<mpi.h>
// this program will count the number of spawned processes in a *very* bad way
int main(int argc, char **argv)
{
int partial = 1;
int sum;
int my_id = 0;
// let's just assume the process with id 0 is root
int root_process = 0;
// spawn processes, etc.
MPI_Init(&argc,&argv);
// every process learns his id
MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
// all processes add their 'partial' to the 'sum'
MPI_Reduce(&partial, &sum, 1, MPI_INT, MPI_SUM, root_process, MPI_COMM_WORLD);
// de-init MPI
MPI_Finalize();
// the root process communicates the summation result
if (my_id == root_process)
{
printf("Sum total : %d\n", sum);
}
return 0;
}

MPI 2x printing

I'm studying a bit of MPI, and decided to do a test by making a program that calls objects, eg main.c -> main program, function.c -> any function
function.c that will only use the MPI. compiling I as follows:
gcc-c main.c
to create main.o, mpicc-c to create function.c function.o, of course I create the file function.h too.
I compile with mpicc-o program main.o function.o
Here is main.c
#include <stdio.h>
#include "function.h"
void main(int argc, char *argv[])
{
printf("Hello\n");
function();
printf("Bye\n");
}
just function has the MPI code, but when I'm running the program mpiexe -np 2 I get
Hello
Hello
----- function job here -----
Bye
Bye
But I wanted it to be
Hello
------ function job -----
Bye
What can I do?
Your whole program is run on both of the two processors you set with the -np 2. A common way to prevent duplicates of printouts, final results, etc., is have one thread do those things just by checking the thread id first. Like:
int id;
MPI_Comm_rank(MPI_COMM_WORLD, &id);
if (id == 0) {
printf("only process %d does this one\n", id);
}
printf("hello from process %d\n", id); // all processes do this one
When starting out in MPI I found it helpful to print out those id numbers along with whatever partial results or data each thread was dealing with. Helped me make more sense out of what was happening.
Basically mpirun -np 2 starts 2 identical processes and you have to use MPI_Comm_rank function to check process rank.
Here is a quick snippet:
int main(int argc, char **argv)
{
int myrank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0) {
printf("Hello\n");
function();
MPI_Barrier(MPI_COMM_WORLD);
printf("Done\n");
} else {
function();
MPI_Barrier(MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}
I generally prefer this method for printing data.
It involves barriers. So you must be careful while using it.
if(1)
do
for(i = 0 to num_threads)
do
if(i==my_rank)
do
do_printf
done
******* barrier ********
end for
done
If the set of threads printing the value does not include all threads, just add the relevant threads to barrier.
Another method is for every thread to write its output in a dedicated file. This way :
you don't have to access a barrier
you do not lose printfs of any thread
you output is explicit. so there is no cluttering while debugging programs.
Code :
sprintf(my_op_file_str, "output%d", myThreadID);
close(1)
open(my_op_file_str)
Now use printf's anywhere you may like.

Resources