I have Manjaro Linux 17.1.10 (kernel 4.17.0-2) in a ThinkPad T420 (Core i5 2450M, Upgraded to 8GB RAM) and I'm trying to run some C programs using OpenMPI (version 3.1.0) but I'm having troubles running the programs.
I am able to compile them without issues both from terminal using mpicc and Ecplipse's "build" option, but when I try to run them, either from terminal or in Eclipse (configured as a parallel aplication in launch options) I face errors, no matter the code that I run.
I'm trying to run the MPI hello world C project that comes with Ecipse Parallel:
/*
============================================================================
Name : test.c
Author : MGMX
Version : 1
Copyright : GNU 3.0
Description : Hello MPI World in C
============================================================================
*/
#include <stdio.h>
#include <string.h>
#include "mpi.h"
int main(int argc, char* argv[]){
int my_rank; /* rank of process */
int p; /* number of processes */
int source; /* rank of sender */
int dest; /* rank of receiver */
int tag=0; /* tag for messages */
char message[100]; /* storage for message */
MPI_Status status ; /* return status for receive */
/* start up MPI */
MPI_Init(&argc, &argv);
/* find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &p);
if (my_rank !=0){
/* create message */
sprintf(message, "Hello MPI World from process %d!", my_rank);
dest = 0;
/* use strlen+1 so that '\0' get transmitted */
MPI_Send(message, strlen(message)+1, MPI_CHAR,
dest, tag, MPI_COMM_WORLD);
}
else{
printf("Hello MPI World From process 0: Num processes: %d\n",p);
for (source = 1; source < p; source++) {
MPI_Recv(message, 100, MPI_CHAR, source, tag,
MPI_COMM_WORLD, &status);
printf("%s\n",message);
}
}
/* shut down MPI */
MPI_Finalize();
return 0;
}
If I run the program using simple ./test without parameters I have an output:
[mgmx#ThinkPad Debug]$ ./test
Hello MPI World From process 0: Num processes: 1
but if I use mpirun i have different results, depending of the number pof procecess (-np #) i select; but all of them throw error:
[mgmx#ThinkPad Debug]$ mpirun -np 0 test
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
[mgmx#ThinkPad Debug]$ mpirun -np 1 test
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[49948,1],0]
Exit code: 1
--------------------------------------------------------------------------
[mgmx#ThinkPad Debug]$ mpirun -np 2 test
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
[mgmx#ThinkPad Debug]$ mpirun -np 3 test
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
test
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
[mgmx#ThinkPad Debug]$ mpirun -np 4 test
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
test
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
I am running everything local.
Here is the ouput of --version both in mpicc and mpirun:
[mgmx#ThinkPad Debug]$ mpicc --version
gcc (GCC) 8.1.1 20180531
Copyright (C) 2018 Free Software Foundation, Inc.
Esto es software libre; vea el código para las condiciones de copia. NO hay
garantía; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN PROPÓSITO EN
PARTICULAR
[mgmx#ThinkPad Debug]$ mpirun --version
mpirun (Open MPI) 3.1.0
Report bugs to http://www.open-mpi.org/community/help/
And Eclipse's "about" window
Also I installed OpenMPI in various ways: from the Manjaro Repository using pacman, using pamac (a pacman/yaourt GUI) and the git version from the AUR using both yaourt and pamac.
Related
i am learning the MPI Interface for C and just installed MPI for my system (Mac Os Mojave 10.14.6). I made it with the help of this tutorial.
Everything was fine and now i wanted to start my first simple code.
I tried to understand the error and searched for a solution, but could not find it on my own. Using CLion as IDE.
main.c
#include <stdio.h>
#include <mpi.h>
int main(argc, argv)
int argc;
char **argv;
{
int rank, size;
MPI_Init( &argc, &argv );
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &size);
printf("Hi from process %d of $d\n", rank, size);
MPI_Finalize();
return 0;
}
CMakeList.txt
cmake_minimum_required(VERSION 3.15)
project(uebung02 C)
set(CMAKE_C_STANDARD 99)
add_executable(uebung02 main.c)
set(CMAKE_C_COMPILER /opt/openmpi/bin/mpicc)
set(CMAKE_CXX_COMPILER /opt/openmpi/bin/mpic++)
Error output
/Users/admin/Documents/HU/VSuA/uebung02/cmake-build-debug/uebung02
--------------------------------------------------------------------------
PMIx has detected a temporary directory name that results
It looks like orte_init failed for some reason; your parallel process is
in a path that is too long for the Unix domain socket:
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
Temp dir: /var/folders/9j/54dxfbf1451dk82nn7y99d8r0000gn/T/openmpi-sessions-501#admins-MacBook-Pro_0/15653
orte_ess_init failed
Try setting your TMPDIR environmental variable to point to
something shorter in length
[admins-MacBook-Pro.local:44625] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 582
[admins-MacBook-Pro.local:44625] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 166
--> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
[admins-MacBook-Pro.local:44625] *** An error occurred in MPI_Init
addition[admins-MacBook-Pro.local:44625] *** on a NULL communicator
[admins-MacBook-Pro.local:44625] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[admins-MacBook-Pro.local:44625] *** and potentially your MPI job)
al information (which may only be relevant to an Open MPI
developer):
[admins-MacBook-Pro.local:44625] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
ompi_mpi_init: ompi_rte_init failed
--> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------------
Process finished with exit code 1
I noticed that when I have a deadlocked MPI program, e.g. wait.c
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[])
{
int taskID = -1;
int NTasks = -1;
int a = 11;
int b = 22;
MPI_Status Stat;
/* MPI Initializations */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskID);
MPI_Comm_size(MPI_COMM_WORLD, &NTasks);
if(taskID == 0)
MPI_Send(&a, 1, MPI_INT, 1, 66, MPI_COMM_WORLD);
else //if(taskID == 1)
MPI_Recv(&b, 1, MPI_INT, 0, 66, MPI_COMM_WORLD, &Stat);
printf("Task %i : a: %i b: %i\n", taskID, a, b);
MPI_Finalize();
return 0;
}
When I compile wait.c with the mvapich2-2.1 library (which itself was compiled using gcc-4.9.2) and run it (e.g. mpirun -np 4 ./a.out) I notice (via top), that all 4 processors are chugging along at 100%.
When I compile wait.c with the openmpi-1.6 library (which itself was compiled using gcc-4.9.2) and run it (e.g. mpirun -np 4 ./a.out), I notice (via top), that 2 processors are chugging at 100% and 2 at 0%.
Presumably the 2 at 0% are the ones that completed communication.
QUESTION : Why is there a difference in CPU usage between openmpi and mvapich2? Is this the expected behavior? When the CPU usage is 100%, is that from constantly checking to see if a message is being sent?
Both implementations busy-wait on MPI_Recv() in order to minimize latencies. This explains why ranks 2 and 3 are at 100% with either of the two MPI implementations.
Now, clearly ranks 0 and 1 progress to the MPI_Finalize() call and this is where the two implementations differ: mvapich2 busy-wait while openmpi does not.
To answer your question: yes, they are at 100% while checking whether a message has been received and it is expected behaviour.
If you are not on InfiniBand, you can observe this by attaching a strace to one of the processes: you should see a number of poll() invocations there.
I'm messing around with openMPI, and I have a wierd bug.
It seems, that even after MPI_Finalize(), each of the threads keeps running.
I have followed a guide for a simple Hello World program, and it looks like this:
#include <mpi.h>;
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d"
" out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
printf("This is after finalize");
}
Notice the last printf()... This should only be printed once, since the parallel part is finalized, right?!
However, the output from this program if i for example run it with 6 processors is:
mpirun -np 6 ./hello_world
Hello world from processor ubuntu, rank 2 out of 6 processors
Hello world from processor ubuntu, rank 1 out of 6 processors
Hello world from processor ubuntu, rank 3 out of 6 processors
Hello world from processor ubuntu, rank 0 out of 6 processors
Hello world from processor ubuntu, rank 4 out of 6 processors
Hello world from processor ubuntu, rank 5 out of 6 processors
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
This is after finalize...
Am I misunderstanding how MPI works? Should each thread/process not be stopped by the finalize?
This is just undefined behavior.
The number of processes running after this routine is called is
undefined; it is best not to perform much more than a return rc after
calling MPI_Finalize.
http://www.mpich.org/static/docs/v3.1/www3/MPI_Finalize.html
The MPI standard only requires that rank 0 return from MPI_FINALIZE. I won't copy the entire text here because it's rather lengthy, but you can find it in the version 3.0 of the standard (the latest for a few more days) in Chapter 8, section 8.7 (Startup) on page 359 - 361. Here's the most relevant parts:
Although it is not required that all processes return from MPI_FINALIZE, it is required that at least process 0 in MPI_COMM_WORLD return, so that users can know that the MPI portion of the computation is over. In addition, in a POSIX environment, users may desire to supply an exit code for each process that returns from MPI_FINALIZE.
There's even an example that's trying to do exactly what you said:
Example 8.10 The following illustrates the use of requiring that at least one process return and that it be known that process 0 is one of the processes that return. One wants code like the following to work no matter how many processes return.
...
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
...
MPI_Finalize();
if (myrank == 0) {
resultfile = fopen("outfile","w");
dump_results(resultfile);
fclose(resultfile);
} exit(0);
The MPI standard doesn't say anything else about the behavior of an application after calling MPI_FINALIZE. All this function is required to do is clean up internal MPI state, complete communication operations, etc. While it's certainly possible (and allowed) for MPI to kill the other ranks of the application after a call to MPI_FINALIZE, in practice, that is almost never the way that it is done. There's probably a counter example, but I'm not aware of it.
When I started MPI, I had same problem with MPI_Init and MPI_Finalize methods. I thought between these functions work parallel and outside work serial. Finally I saw this answer and I figured its functionality out.
J Teller's answer:
https://stackoverflow.com/a/2290951/893863
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { // Do the serial part on a single MPI thread
printf("Performing serial computation on cpu %d\n", myid);
PreParallelWork();
}
ParallelWork(); // Every MPI thread will run the parallel work
if (myid == 0) { // Do the final serial part on a single MPI thread
printf("Performing the final serial computation on cpu %d\n", myid);
PostParallelWork();
}
MPI_Finalize();
return 0;
}
I am trying to run the example code at the following url. I compiled the program with "mpicc twoGroups.c" and tried to run it as "./a.out", but got the following message: Must specify MP_PROCS= 8. Terminating.
My question is how do you set MP_PROCS=8 ?
Group and Communication Routine examples at here https://computing.llnl.gov/tutorials/mpi/
#include "mpi.h"
#include <stdio.h>
#define NPROCS 8
main(int argc, char *argv[]) {
int rank, new_rank, sendbuf, recvbuf, numtasks,
ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7};
MPI_Group orig_group, new_group;
MPI_Comm new_comm;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks != NPROCS) {
printf("Must specify MP_PROCS= %d. Terminating.\n",NPROCS);
MPI_Finalize();
exit(0);
}
sendbuf = rank;
/* Extract the original group handle */
MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
/* Divide tasks into two distinct groups based upon rank */
if (rank < NPROCS/2) {
MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group);
}
else {
MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group);
}
/* Create new new communicator and then perform collective communications */
MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
MPI_Allreduce(&sendbuf, &recvbuf, 1, MPI_INT, MPI_SUM, new_comm);
MPI_Group_rank (new_group, &new_rank);
printf("rank= %d newrank= %d recvbuf= %d\n",rank,new_rank,recvbuf);
MPI_Finalize();
}
When you execute an MPI program, you need to use the appropriate wrappers. Most of the time it looks like this:
mpiexec -n <number_of_processes> <executable_name> <executable_args>
So for your simple example:
mpiexec -n 8 ./a.out
You will also see mpirun used instead of mpiexec or -np instead of -n. Both are fine most of the time.
If you're just starting out, it would also be a good idea to make sure you're using a recent version of MPI so you don't get old bugs or weird execution environments. MPICH and Open MPI are the two most popular implementations. MPICH just released version 3.1 available here while Open MPI has version 1.7.4 available here. You can also usually get either of them via your friendly neighborhood package manager.
I'm trying to gracefully exit my program after if Rdinput returns an error.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0
#define Abort(x) MPI_Abort(MPI_COMM_WORLD, x)
#define Bcast(send_data, count, type) MPI_Bcast(send_data, count, type, MASTER, GROUP) //root --> MASTER
#define Finalize() MPI_Finalize()
int main(int argc, char **argv){
//Code
if( rank == MASTER ) {
time (&start);
printf("Initialized at %s\n", ctime (&start) );
//Read file
error = RdInput();
}
Bcast(&error, 1, INT); Wait();
if( error = 1 ) MPI_Abort(1);
//Code
Finalize();
}
Program output:
mpirun -np 2 code.x
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Initialized at Wed May 30 11:34:46 2012
Error [RdInput]: The file "input.mga" is not available!
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 7369 on
node einstein exiting improperly. There are two reasons this could occur:
//More error message.
What can I do to gracefully exit an MPI program without printing this huge error message?
If you have this logic in your code:
Bcast(&error, 1, INT);
if( error = 1 ) MPI_Abort(1);
then you're just about done (although you don't need any kind of wait after a broadcast). The trick, as you've discovered, is that MPI_Abort() does not do "graceful"; it basically is there to shut things down in whatever way possible when something's gone horribly wrong.
In this case, since now everyone agrees on the error code after the broadcast, just do a graceful end of your program:
MPI_Bcast(&error, 1, MPI_INT, MASTER, MPI_COMM_WORLD);
if (error != 0) {
if (rank == 0) {
fprintf(stderr, "Error: Program terminated with error code %d\n", error);
}
MPI_Finalize();
exit(error);
}
It's an error to call MPI_Finalize() and keep on going with more MPI stuff, but that's not what you're doing here, so you're fine.