Comparing CPU utilization during MPI thread deadlock using mvapich2 vs. openmpi

Comparing CPU utilization during MPI thread deadlock using mvapich2 vs. openmpi - c

I noticed that when I have a deadlocked MPI program, e.g. wait.c
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[])
{
int taskID = -1;
int NTasks = -1;
int a = 11;
int b = 22;
MPI_Status Stat;
/* MPI Initializations */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskID);
MPI_Comm_size(MPI_COMM_WORLD, &NTasks);
if(taskID == 0)
MPI_Send(&a, 1, MPI_INT, 1, 66, MPI_COMM_WORLD);
else //if(taskID == 1)
MPI_Recv(&b, 1, MPI_INT, 0, 66, MPI_COMM_WORLD, &Stat);
printf("Task %i : a: %i b: %i\n", taskID, a, b);
MPI_Finalize();
return 0;
}
When I compile wait.c with the mvapich2-2.1 library (which itself was compiled using gcc-4.9.2) and run it (e.g. mpirun -np 4 ./a.out) I notice (via top), that all 4 processors are chugging along at 100%.
When I compile wait.c with the openmpi-1.6 library (which itself was compiled using gcc-4.9.2) and run it (e.g. mpirun -np 4 ./a.out), I notice (via top), that 2 processors are chugging at 100% and 2 at 0%.
Presumably the 2 at 0% are the ones that completed communication.
QUESTION : Why is there a difference in CPU usage between openmpi and mvapich2? Is this the expected behavior? When the CPU usage is 100%, is that from constantly checking to see if a message is being sent?

Both implementations busy-wait on MPI_Recv() in order to minimize latencies. This explains why ranks 2 and 3 are at 100% with either of the two MPI implementations.
Now, clearly ranks 0 and 1 progress to the MPI_Finalize() call and this is where the two implementations differ: mvapich2 busy-wait while openmpi does not.
To answer your question: yes, they are at 100% while checking whether a message has been received and it is expected behaviour.
If you are not on InfiniBand, you can observe this by attaching a strace to one of the processes: you should see a number of poll() invocations there.

Related

Wait until slaves called MPI_finalize

I have a problem with the following codes:
Master:
#include <iostream>
using namespace std;
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define PB1 1
#define PB2 1
int main (int argc, char *argv[])
{
int np[2] = { 2, 1 }, errcodes[2];
MPI_Comm parentcomm, intercomm;
char *cmds[2] = { "./slave", "./slave" };
MPI_Info infos[2] = { MPI_INFO_NULL, MPI_INFO_NULL };
MPI_Init(NULL, NULL);
#if PB1
for(int i = 0 ; i<2 ; i++)
{
MPI_Info_create(&infos[i]);
char hostname[] = "localhost";
MPI_Info_set(infos[i], "host", hostname);
}
#endif
MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes);
printf("c Creation of the workers finished\n");
#if PB2
sleep(1);
#endif
MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes);
printf("c Creation of the workers finished\n");
MPI_Finalize();
return 0;
}
Slave:
#include "mpi.h"
#include <stdio.h>
using namespace std;
int main( int argc, char *argv[])
{
int rank;
MPI_Init(0, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("rank = %d\n", rank);
MPI_Finalize();
return 0;
}
I do not know why when I run mpirun -np 1 ./master, my program stops with the following mesage when I set PB1 and PB2 to 1 (it works well when I set of of them to 0):
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application: ./slave Either
request fewer slots for your application, or make more slots available
for use.
For instance, when I set PB2 to 0, the program works well. Thus, I suppose that it is because the MPI_finalize does not finish its job ...
I googled, but I did not find any answer for my problem. I tried various things as: call MPI_comm_disconnect, add a barrier, ... but nothing worked.
I work on Ubuntu (15.10) and use the OpenMPI version 1.10.2.

The MPI_Finalize on the first set of salves will not finish until MPI_Finalize is called on the master. MPI_Finalize is collective over all connected processes. You can work around that by manually disconnecting the first batch of salves from the intercommunicator before calling MPI_Finalize. This way, the slaves will actually finish complete and exit - freeing the "slots" for the new batch of slaves. Unfortunately I don't see a standardized way to really ensure the slaves are finished in a sense that their slots are freed, because that would be implementation defined. The fact that OpenMPI freezes in the MPI_Comm_spawn_multiple instead of returning an error is unfortunate and one might consider that a bug. Anyway here is a draft of what you could do:
Within the master, each time is done with its slaves:
MPI_Barrier(&intercomm); // Make sure master and slaves are somewhat synchronized
MPI_Comm_disconnect(&intercomm);
sleep(1); // This is the ugly unreliable way to give the slaves some time to shut down
The slave:
MPI_Comm parent;
MPI_Comm_get_parent(&parent); // you should have that already
MPI_Comm_disconnect(&parent);
MPI_Finalize();
However, you still need to make sure OpenMPI knows how many slots should be reserved for the whole application (universe_size). You can do that for example with a hostfile:
localhost slots=4
And then mpirun -np 1 ./master.
Now this is not pretty and I would argue that your approach to dynamically spawning MPI workers isn't really what MPI is meant for. It may be supported by the standard, but that doesn't help you if implementations are struggling. However, there is not enough information on how you intend to communicate with the external processes to provide a cleaner, more ideomatic solution.
One last remark: Do check the return codes of MPI functions. Especially MPI_Comm_spawn_multiple.

MPI_Finalize() does not end any processes

I'm messing around with openMPI, and I have a wierd bug.
It seems, that even after MPI_Finalize(), each of the threads keeps running.
I have followed a guide for a simple Hello World program, and it looks like this:
#include <mpi.h>;
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d"
" out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
printf("This is after finalize");
}
Notice the last printf()... This should only be printed once, since the parallel part is finalized, right?!
However, the output from this program if i for example run it with 6 processors is:
mpirun -np 6 ./hello_world
Hello world from processor ubuntu, rank 2 out of 6 processors
Hello world from processor ubuntu, rank 1 out of 6 processors
Hello world from processor ubuntu, rank 3 out of 6 processors
Hello world from processor ubuntu, rank 0 out of 6 processors
Hello world from processor ubuntu, rank 4 out of 6 processors
Hello world from processor ubuntu, rank 5 out of 6 processors
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
Am I misunderstanding how MPI works? Should each thread/process not be stopped by the finalize?

This is just undefined behavior.
The number of processes running after this routine is called is
undefined; it is best not to perform much more than a return rc after
calling MPI_Finalize.
http://www.mpich.org/static/docs/v3.1/www3/MPI_Finalize.html

The MPI standard only requires that rank 0 return from MPI_FINALIZE. I won't copy the entire text here because it's rather lengthy, but you can find it in the version 3.0 of the standard (the latest for a few more days) in Chapter 8, section 8.7 (Startup) on page 359 - 361. Here's the most relevant parts:
Although it is not required that all processes return from MPI_FINALIZE, it is required that at least process 0 in MPI_COMM_WORLD return, so that users can know that the MPI portion of the computation is over. In addition, in a POSIX environment, users may desire to supply an exit code for each process that returns from MPI_FINALIZE.
There's even an example that's trying to do exactly what you said:
Example 8.10 The following illustrates the use of requiring that at least one process return and that it be known that process 0 is one of the processes that return. One wants code like the following to work no matter how many processes return.
...
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
...
MPI_Finalize();
if (myrank == 0) {
resultfile = fopen("outfile","w");
dump_results(resultfile);
fclose(resultfile);
} exit(0);
The MPI standard doesn't say anything else about the behavior of an application after calling MPI_FINALIZE. All this function is required to do is clean up internal MPI state, complete communication operations, etc. While it's certainly possible (and allowed) for MPI to kill the other ranks of the application after a call to MPI_FINALIZE, in practice, that is almost never the way that it is done. There's probably a counter example, but I'm not aware of it.

When I started MPI, I had same problem with MPI_Init and MPI_Finalize methods. I thought between these functions work parallel and outside work serial. Finally I saw this answer and I figured its functionality out.
J Teller's answer:
https://stackoverflow.com/a/2290951/893863
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { // Do the serial part on a single MPI thread
printf("Performing serial computation on cpu %d\n", myid);
PreParallelWork();
}
ParallelWork(); // Every MPI thread will run the parallel work
if (myid == 0) { // Do the final serial part on a single MPI thread
printf("Performing the final serial computation on cpu %d\n", myid);
PostParallelWork();
}
MPI_Finalize();
return 0;
}

How to run this compiled open MPI program (C)?

I am trying to run the example code at the following url. I compiled the program with "mpicc twoGroups.c" and tried to run it as "./a.out", but got the following message: Must specify MP_PROCS= 8. Terminating.
My question is how do you set MP_PROCS=8 ?
Group and Communication Routine examples at here https://computing.llnl.gov/tutorials/mpi/
#include "mpi.h"
#include <stdio.h>
#define NPROCS 8
main(int argc, char *argv[]) {
int rank, new_rank, sendbuf, recvbuf, numtasks,
ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7};
MPI_Group orig_group, new_group;
MPI_Comm new_comm;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks != NPROCS) {
printf("Must specify MP_PROCS= %d. Terminating.\n",NPROCS);
MPI_Finalize();
exit(0);
}
sendbuf = rank;
/* Extract the original group handle */
MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
/* Divide tasks into two distinct groups based upon rank */
if (rank < NPROCS/2) {
MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group);
}
else {
MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group);
}
/* Create new new communicator and then perform collective communications */
MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
MPI_Allreduce(&sendbuf, &recvbuf, 1, MPI_INT, MPI_SUM, new_comm);
MPI_Group_rank (new_group, &new_rank);
printf("rank= %d newrank= %d recvbuf= %d\n",rank,new_rank,recvbuf);
MPI_Finalize();
}

When you execute an MPI program, you need to use the appropriate wrappers. Most of the time it looks like this:
mpiexec -n <number_of_processes> <executable_name> <executable_args>
So for your simple example:
mpiexec -n 8 ./a.out
You will also see mpirun used instead of mpiexec or -np instead of -n. Both are fine most of the time.
If you're just starting out, it would also be a good idea to make sure you're using a recent version of MPI so you don't get old bugs or weird execution environments. MPICH and Open MPI are the two most popular implementations. MPICH just released version 3.1 available here while Open MPI has version 1.7.4 available here. You can also usually get either of them via your friendly neighborhood package manager.

MPI RMA operation: Ordering between MPI_Win_free and local load

I'm trying to do a simple test on MPI's RMA operation using MPI_Win_lock and MPI_Win_unlock. The program just let process 0 to update the integer value in process 1 and display it.
The below program runs correctly (at least the result seems correct to me):
#include "mpi.h"
#include "stdio.h"
#define root 0
int main(int argc, char *argv[])
{
int myrank, nprocs;
int send, recv, err;
MPI_Win nwin;
int *st;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Alloc_mem(1*sizeof(int), MPI_INFO_NULL, &st);
st[0] = 0;
if (myrank != root) {
MPI_Win_create(st, 1*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &nwin);
}
else {
MPI_Win_create(NULL, 0, sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &nwin);
}
if (myrank == root) {
st[0] = 1;
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, nwin);
MPI_Put(st, 1, MPI_INT, 1, 0, 1, MPI_INT, nwin);
MPI_Win_unlock(1, nwin);
MPI_Win_free(&nwin);
}
else { // rank 1
MPI_Win_free(&nwin);
printf("Rank %d, st = %d\n", myrank, st[0]);
}
MPI_Free_mem(st);
MPI_Finalize();
return 0;
}
The output I got is Rank 1, st = 1. But curiously, if I switch the lines in the else block for rank 1 to
else { // rank 1
printf("Rank %d, st = %d\n", myrank, st[0]);
MPI_Win_free(&nwin);
}
The output is Rank 1, st = 0.
I cannot find out the reason behind it, and why I need to put MPI_Win_free after loading the data is originally I need to put all the stuff in a while loop and let rank 0 to determine when to stop the loop. When condition is satisfied, I try to let rank 0 to update the flag (st) in rank 1. I try to put the MPI_Win_free outside the while loop so that the window will only be freed after the loop. Now it seems that I cannot do this and need to create and free the window every time in the loop?

I'll be honest, MPI RMA is not my speciality, but I'll give this a shot:
The problem is that you're running into a race condition. When you do the MPI_PUT operation, it sends the data from rank 0 to rank 1 to be put into the buffer at some point in the future. You don't have any control over that from rank 0's perspective.
One rank 1's side, you're not doing anything to complete the operation. I know that RMA (or one-sided operations) sound like they shouldn't require any intervention on the target side, but the do require a bit. When you use one-sided operations, you have to have something on the receiving side that also synchronizes the data. In this case, you're trying to use MPI put/get operations in combination with non-MPI load store operations. This is erroneous and results in the race condition you're seeing. When you switch the MPI_WIN_FREE to be first, you complete all of the outstanding operations so your data is correct.
You can find out lots more about passive target synchronization (which is what you're doing here) with this question: MPI with C: Passive RMA synchronization.

MPI parallel Programming

I have three user-defined functions in a Monte Carlo simulator program. In main() they are being called using the appropriate parameters.
It is a serial program.
How do I convert it into the Parallel program?
The steps I have done so far for the serial program to make it as an MPI Parallel Program
are:
#include <conio.h>
#include <stdio.h>
#include "mpi.h"
//Global Varibles Declared
#define a=4;
#define b=2;
#define c=4;
#define d=6;
function1(Parameter4, Parameter))
{
// body of function
}
function2( parameter 1, parameter2)
{
//body of function
}
int main(int argc, char *argv[])
{
// Local Variables defined
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
function1(a, b);
function2(c, d);
MPI_Finalize ();
}
Now my questions are
Where do I specify
Number of processor(like running it with 2, 4, 6 , 8 processors)
Send and Recv Methods
How do I see the graphs of output using different number of processor.
Could any try to help me please as I am new to this language and don't know lot about it.

MPI is a communications protocol. We can't help you without know what platform/library you are working with. If you know what library you are working with, odds are good that there is a sample on the web somewhere showing how to implement Monte Carlo simulations with it.

First of all, your sample code is not valid C code. The #define lines should look like:
#define a 4
The number of processors is typically specified when running the program, which is usually done via
mpiexec -np PROCS MPIPROG
or the like, where PROCS is the number of MPI tasks to start and MPIPROG is the name of the compiled MPI executable. There is also a possibility to spawn MPI tasks from within MPI, but this doesn't work everywhere, so I would not recommend it. The advantage of specifying the number of tasks at runtime is that you can choose how many tasks to use depending on the platform you are working on.
Send and Recv can be used anywhere in the code, after MPI_Init has been called, and before MPI_Finalize has been called. As an example, to send an integer from task 0 to task 1, you would use something like
int number;
if (rank == 0) {
/* compute the number on proc 0 */
number = some_complex_function();
/* send the number to proc 1 */
MPI_Send(&number, 1, MPI_INT, 1, 42, MPI_COMM_WORLD);
} else if (rank == 1) {
/* receive the number from proc 0 */
MPI_Recv(&number, 1, MPI_INT, 0, 42, MPI_COMM_WORLD, 0);
}
// now both procs can do something with the number
Note that in this case, Task 1 will have to wait until the number is received from task 0, so in a real application you might want to give task 1 some work to do while task 0 computes "some_complex_function".

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight