How to run this compiled open MPI program (C)? - c

I am trying to run the example code at the following url. I compiled the program with "mpicc twoGroups.c" and tried to run it as "./a.out", but got the following message: Must specify MP_PROCS= 8. Terminating.
My question is how do you set MP_PROCS=8 ?
Group and Communication Routine examples at here https://computing.llnl.gov/tutorials/mpi/
#include "mpi.h"
#include <stdio.h>
#define NPROCS 8
main(int argc, char *argv[]) {
int rank, new_rank, sendbuf, recvbuf, numtasks,
ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7};
MPI_Group orig_group, new_group;
MPI_Comm new_comm;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks != NPROCS) {
printf("Must specify MP_PROCS= %d. Terminating.\n",NPROCS);
MPI_Finalize();
exit(0);
}
sendbuf = rank;
/* Extract the original group handle */
MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
/* Divide tasks into two distinct groups based upon rank */
if (rank < NPROCS/2) {
MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group);
}
else {
MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group);
}
/* Create new new communicator and then perform collective communications */
MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
MPI_Allreduce(&sendbuf, &recvbuf, 1, MPI_INT, MPI_SUM, new_comm);
MPI_Group_rank (new_group, &new_rank);
printf("rank= %d newrank= %d recvbuf= %d\n",rank,new_rank,recvbuf);
MPI_Finalize();
}

When you execute an MPI program, you need to use the appropriate wrappers. Most of the time it looks like this:
mpiexec -n <number_of_processes> <executable_name> <executable_args>
So for your simple example:
mpiexec -n 8 ./a.out
You will also see mpirun used instead of mpiexec or -np instead of -n. Both are fine most of the time.
If you're just starting out, it would also be a good idea to make sure you're using a recent version of MPI so you don't get old bugs or weird execution environments. MPICH and Open MPI are the two most popular implementations. MPICH just released version 3.1 available here while Open MPI has version 1.7.4 available here. You can also usually get either of them via your friendly neighborhood package manager.

Related

Using printf with MPI leads to non-deterministic output

The following code has non-deterministic behaviour on my machine (already with using only two processes).
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == size - 1) {
printf("I am the last rank\n");
MPI_Barrier(MPI_COMM_WORLD);
} else {
MPI_Barrier(MPI_COMM_WORLD);
printf("I am rank %d\n", rank);
}
MPI_Finalize();
return 0;
}
Sometimes the output from the last rank appears first on the terminal but sometimes it does appear later, even though a barrier is used.
I assume the reason for this is that printf does internal buffering and that MPI respectively mpirun/mpiexec and printf do not really cooperate with each other. Is there a more valid source to read up on this topic?
Output redirection is strongly platform-dependent and is only tangentially mentioned in the MPI standard. There is absolutely no guarantee as to what order output from different ranks will be displayed in. There isn't even a guarantee that you can get that output - some implementations only redirect the output from rank 0, some redirect no output, and both cases are compliant with the MPI standard.
If you want strongly ordered output, you should only do output in a single rank.

Have trouble using GDB to debug MPI code

I am tring to use GDB to see what's going on about low level OPEN MPI calls. However, I couldn't use GDB to jump into low level library functions in OpenMPI.
I first compiled OpenMPI with tag "--enable-debug --enable-debug-symbols", so library should be compiled with "-g" tag. Then I compile my test code using "mpicc /path/to/my/code -g -o /dest/code". Code is simple shown as below:
#include "mpi.h"
#include <stdio.h>
main(int argc, char *argv[]) {
int numtasks, rank, dest, source, rc, count, tag=1;
int buf;
const int root=0;
MPI_Status Stat;
// sleep still GDB attached
int i = 0;
char hostname[256];
gethostname(hostname, sizeof(hostname));
printf("PID %d on %s ready for attach\n", getpid(), hostname);
fflush(stdout);
while (0 == i)
sleep(5);
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if(rank == root) {
buf = 777;
}
printf("[%d]: Before Bcast, buf is %d\n", rank, buf);
MPI_Bcast(&buf, 1, MPI_INT, root, MPI_COMM_WORLD);
printf("[%d]: After Bcast, buf is %d\n", rank, buf);
MPI_Finalize();
}
This code will firstly printout PID for every process, then keep sleep wait for attach. Then I use GDB attach method to attach one of the process. Set break point & reset the value of i will jump out of waiting while loop.
Then, I continue use "s" to step next my code. It supposed to jump into low level library functions like MPI_Init(), MPI_Bcast(). However, GDB will just going to next line and ignore jumping into any function. Neither "bt" could print out backtrace information. But the whole code could run without error.
Could someone found any mistake I made in this whole process? Thanks ahead.

Comparing CPU utilization during MPI thread deadlock using mvapich2 vs. openmpi

I noticed that when I have a deadlocked MPI program, e.g. wait.c
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[])
{
int taskID = -1;
int NTasks = -1;
int a = 11;
int b = 22;
MPI_Status Stat;
/* MPI Initializations */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskID);
MPI_Comm_size(MPI_COMM_WORLD, &NTasks);
if(taskID == 0)
MPI_Send(&a, 1, MPI_INT, 1, 66, MPI_COMM_WORLD);
else //if(taskID == 1)
MPI_Recv(&b, 1, MPI_INT, 0, 66, MPI_COMM_WORLD, &Stat);
printf("Task %i : a: %i b: %i\n", taskID, a, b);
MPI_Finalize();
return 0;
}
When I compile wait.c with the mvapich2-2.1 library (which itself was compiled using gcc-4.9.2) and run it (e.g. mpirun -np 4 ./a.out) I notice (via top), that all 4 processors are chugging along at 100%.
When I compile wait.c with the openmpi-1.6 library (which itself was compiled using gcc-4.9.2) and run it (e.g. mpirun -np 4 ./a.out), I notice (via top), that 2 processors are chugging at 100% and 2 at 0%.
Presumably the 2 at 0% are the ones that completed communication.
QUESTION : Why is there a difference in CPU usage between openmpi and mvapich2? Is this the expected behavior? When the CPU usage is 100%, is that from constantly checking to see if a message is being sent?
Both implementations busy-wait on MPI_Recv() in order to minimize latencies. This explains why ranks 2 and 3 are at 100% with either of the two MPI implementations.
Now, clearly ranks 0 and 1 progress to the MPI_Finalize() call and this is where the two implementations differ: mvapich2 busy-wait while openmpi does not.
To answer your question: yes, they are at 100% while checking whether a message has been received and it is expected behaviour.
If you are not on InfiniBand, you can observe this by attaching a strace to one of the processes: you should see a number of poll() invocations there.

MPI_Finalize() does not end any processes

I'm messing around with openMPI, and I have a wierd bug.
It seems, that even after MPI_Finalize(), each of the threads keeps running.
I have followed a guide for a simple Hello World program, and it looks like this:
#include <mpi.h>;
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d"
" out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
printf("This is after finalize");
}
Notice the last printf()... This should only be printed once, since the parallel part is finalized, right?!
However, the output from this program if i for example run it with 6 processors is:
mpirun -np 6 ./hello_world
Hello world from processor ubuntu, rank 2 out of 6 processors
Hello world from processor ubuntu, rank 1 out of 6 processors
Hello world from processor ubuntu, rank 3 out of 6 processors
Hello world from processor ubuntu, rank 0 out of 6 processors
Hello world from processor ubuntu, rank 4 out of 6 processors
Hello world from processor ubuntu, rank 5 out of 6 processors
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
Am I misunderstanding how MPI works? Should each thread/process not be stopped by the finalize?
This is just undefined behavior.
The number of processes running after this routine is called is
undefined; it is best not to perform much more than a return rc after
calling MPI_Finalize.
http://www.mpich.org/static/docs/v3.1/www3/MPI_Finalize.html
The MPI standard only requires that rank 0 return from MPI_FINALIZE. I won't copy the entire text here because it's rather lengthy, but you can find it in the version 3.0 of the standard (the latest for a few more days) in Chapter 8, section 8.7 (Startup) on page 359 - 361. Here's the most relevant parts:
Although it is not required that all processes return from MPI_FINALIZE, it is required that at least process 0 in MPI_COMM_WORLD return, so that users can know that the MPI portion of the computation is over. In addition, in a POSIX environment, users may desire to supply an exit code for each process that returns from MPI_FINALIZE.
There's even an example that's trying to do exactly what you said:
Example 8.10 The following illustrates the use of requiring that at least one process return and that it be known that process 0 is one of the processes that return. One wants code like the following to work no matter how many processes return.
...
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
...
MPI_Finalize();
if (myrank == 0) {
resultfile = fopen("outfile","w");
dump_results(resultfile);
fclose(resultfile);
} exit(0);
The MPI standard doesn't say anything else about the behavior of an application after calling MPI_FINALIZE. All this function is required to do is clean up internal MPI state, complete communication operations, etc. While it's certainly possible (and allowed) for MPI to kill the other ranks of the application after a call to MPI_FINALIZE, in practice, that is almost never the way that it is done. There's probably a counter example, but I'm not aware of it.
When I started MPI, I had same problem with MPI_Init and MPI_Finalize methods. I thought between these functions work parallel and outside work serial. Finally I saw this answer and I figured its functionality out.
J Teller's answer:
https://stackoverflow.com/a/2290951/893863
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { // Do the serial part on a single MPI thread
printf("Performing serial computation on cpu %d\n", myid);
PreParallelWork();
}
ParallelWork(); // Every MPI thread will run the parallel work
if (myid == 0) { // Do the final serial part on a single MPI thread
printf("Performing the final serial computation on cpu %d\n", myid);
PostParallelWork();
}
MPI_Finalize();
return 0;
}

MPI parallel Programming

I have three user-defined functions in a Monte Carlo simulator program. In main() they are being called using the appropriate parameters.
It is a serial program.
How do I convert it into the Parallel program?
The steps I have done so far for the serial program to make it as an MPI Parallel Program
are:
#include <conio.h>
#include <stdio.h>
#include "mpi.h"
//Global Varibles Declared
#define a=4;
#define b=2;
#define c=4;
#define d=6;
function1(Parameter4, Parameter))
{
// body of function
}
function2( parameter 1, parameter2)
{
//body of function
}
int main(int argc, char *argv[])
{
// Local Variables defined
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
function1(a, b);
function2(c, d);
MPI_Finalize ();
}
Now my questions are
Where do I specify
Number of processor(like running it with 2, 4, 6 , 8 processors)
Send and Recv Methods
How do I see the graphs of output using different number of processor.
Could any try to help me please as I am new to this language and don't know lot about it.
MPI is a communications protocol. We can't help you without know what platform/library you are working with. If you know what library you are working with, odds are good that there is a sample on the web somewhere showing how to implement Monte Carlo simulations with it.
First of all, your sample code is not valid C code. The #define lines should look like:
#define a 4
The number of processors is typically specified when running the program, which is usually done via
mpiexec -np PROCS MPIPROG
or the like, where PROCS is the number of MPI tasks to start and MPIPROG is the name of the compiled MPI executable. There is also a possibility to spawn MPI tasks from within MPI, but this doesn't work everywhere, so I would not recommend it. The advantage of specifying the number of tasks at runtime is that you can choose how many tasks to use depending on the platform you are working on.
Send and Recv can be used anywhere in the code, after MPI_Init has been called, and before MPI_Finalize has been called. As an example, to send an integer from task 0 to task 1, you would use something like
int number;
if (rank == 0) {
/* compute the number on proc 0 */
number = some_complex_function();
/* send the number to proc 1 */
MPI_Send(&number, 1, MPI_INT, 1, 42, MPI_COMM_WORLD);
} else if (rank == 1) {
/* receive the number from proc 0 */
MPI_Recv(&number, 1, MPI_INT, 0, 42, MPI_COMM_WORLD, 0);
}
// now both procs can do something with the number
Note that in this case, Task 1 will have to wait until the number is received from task 0, so in a real application you might want to give task 1 some work to do while task 0 computes "some_complex_function".

Resources