MPI can not create parallel process, fails at MPI_INIT - c

i am learning the MPI Interface for C and just installed MPI for my system (Mac Os Mojave 10.14.6). I made it with the help of this tutorial.
Everything was fine and now i wanted to start my first simple code.
I tried to understand the error and searched for a solution, but could not find it on my own. Using CLion as IDE.
main.c
#include <stdio.h>
#include <mpi.h>
int main(argc, argv)
int argc;
char **argv;
{
int rank, size;
MPI_Init( &argc, &argv );
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &size);
printf("Hi from process %d of $d\n", rank, size);
MPI_Finalize();
return 0;
}
CMakeList.txt
cmake_minimum_required(VERSION 3.15)
project(uebung02 C)
set(CMAKE_C_STANDARD 99)
add_executable(uebung02 main.c)
set(CMAKE_C_COMPILER /opt/openmpi/bin/mpicc)
set(CMAKE_CXX_COMPILER /opt/openmpi/bin/mpic++)
Error output
/Users/admin/Documents/HU/VSuA/uebung02/cmake-build-debug/uebung02
--------------------------------------------------------------------------
PMIx has detected a temporary directory name that results
It looks like orte_init failed for some reason; your parallel process is
in a path that is too long for the Unix domain socket:
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
Temp dir: /var/folders/9j/54dxfbf1451dk82nn7y99d8r0000gn/T/openmpi-sessions-501#admins-MacBook-Pro_0/15653
orte_ess_init failed
Try setting your TMPDIR environmental variable to point to
something shorter in length
[admins-MacBook-Pro.local:44625] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 582
[admins-MacBook-Pro.local:44625] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 166
--> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
[admins-MacBook-Pro.local:44625] *** An error occurred in MPI_Init
addition[admins-MacBook-Pro.local:44625] *** on a NULL communicator
[admins-MacBook-Pro.local:44625] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[admins-MacBook-Pro.local:44625] *** and potentially your MPI job)
al information (which may only be relevant to an Open MPI
developer):
[admins-MacBook-Pro.local:44625] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
ompi_mpi_init: ompi_rte_init failed
--> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------------
Process finished with exit code 1

Related

Im learning MPI with C and don't understand why my code don't work

I'm new to MPI and I want my C program, which needs to be launched with two process, to output this:
Hello
Good bye
Hello
Good bye
... (20 times)
But when I start it with mpirun -n 2 ./main it gives me this error and the program don't work:
*** An error occurred in MPI_Send
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[laptop:3786023] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
*** An error occurred in MPI_Send
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[laptop:3786024] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[61908,1],0]
Exit code: 1
--------------------------------------------------------------------------
Here's my code:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <unistd.h>
#include <string.h>
#define N 32
void send (int rank)
{
char mess[N];
if (rank == 0)
sprintf(mess, "Hello");
else
sprintf(mess, "Good bye");
MPI_Send(mess, strlen(mess), MPI_CHAR, !rank, 0, MPI_COMM_WORLD);
}
void receive (int rank)
{
char buf[N];
MPI_Status status;
MPI_Recv(buf, N, MPI_CHAR, !rank, 0, MPI_COMM_WORLD, &status);
printf("%s\n", buf);
}
int main (int argc, char **argv)
{
if (MPI_Init(&argc, &argv)) {
fprintf(stderr, "Erreur MPI_Init\n");
exit(1);
}
int size, rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
for (size_t i = 0; i < 20; i ++) {
if (rank == 0) {
send(rank);
receive(rank);
} else {
receive(rank);
send(rank);
}
}
MPI_Finalize();
return 0;
}
I don't understand that error and I don't know how to debug that.
Thank you if you can help me to solve that (probably idiot) problem !
The main error is your function is called send(), and that conflicts with the function of the libc, and hence results in this bizarre behavior.
In order to avoid an other undefined behavior caused by using uninitialized data, you also have to send the NULL terminating character, e.g.
MPI_Send(mess, strlen(mess)+1, MPI_CHAR, !rank, 0, MPI_COMM_WORLD);

(Manjaro Linux) OpenMPI not running any kind of compiled C multiprocess program

I have Manjaro Linux 17.1.10 (kernel 4.17.0-2) in a ThinkPad T420 (Core i5 2450M, Upgraded to 8GB RAM) and I'm trying to run some C programs using OpenMPI (version 3.1.0) but I'm having troubles running the programs.
I am able to compile them without issues both from terminal using mpicc and Ecplipse's "build" option, but when I try to run them, either from terminal or in Eclipse (configured as a parallel aplication in launch options) I face errors, no matter the code that I run.
I'm trying to run the MPI hello world C project that comes with Ecipse Parallel:
/*
============================================================================
Name : test.c
Author : MGMX
Version : 1
Copyright : GNU 3.0
Description : Hello MPI World in C
============================================================================
*/
#include <stdio.h>
#include <string.h>
#include "mpi.h"
int main(int argc, char* argv[]){
int my_rank; /* rank of process */
int p; /* number of processes */
int source; /* rank of sender */
int dest; /* rank of receiver */
int tag=0; /* tag for messages */
char message[100]; /* storage for message */
MPI_Status status ; /* return status for receive */
/* start up MPI */
MPI_Init(&argc, &argv);
/* find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &p);
if (my_rank !=0){
/* create message */
sprintf(message, "Hello MPI World from process %d!", my_rank);
dest = 0;
/* use strlen+1 so that '\0' get transmitted */
MPI_Send(message, strlen(message)+1, MPI_CHAR,
dest, tag, MPI_COMM_WORLD);
}
else{
printf("Hello MPI World From process 0: Num processes: %d\n",p);
for (source = 1; source < p; source++) {
MPI_Recv(message, 100, MPI_CHAR, source, tag,
MPI_COMM_WORLD, &status);
printf("%s\n",message);
}
}
/* shut down MPI */
MPI_Finalize();
return 0;
}
If I run the program using simple ./test without parameters I have an output:
[mgmx#ThinkPad Debug]$ ./test
Hello MPI World From process 0: Num processes: 1
but if I use mpirun i have different results, depending of the number pof procecess (-np #) i select; but all of them throw error:
[mgmx#ThinkPad Debug]$ mpirun -np 0 test
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
[mgmx#ThinkPad Debug]$ mpirun -np 1 test
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[49948,1],0]
Exit code: 1
--------------------------------------------------------------------------
[mgmx#ThinkPad Debug]$ mpirun -np 2 test
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
[mgmx#ThinkPad Debug]$ mpirun -np 3 test
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
test
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
[mgmx#ThinkPad Debug]$ mpirun -np 4 test
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
test
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
I am running everything local.
Here is the ouput of --version both in mpicc and mpirun:
[mgmx#ThinkPad Debug]$ mpicc --version
gcc (GCC) 8.1.1 20180531
Copyright (C) 2018 Free Software Foundation, Inc.
Esto es software libre; vea el código para las condiciones de copia. NO hay
garantía; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN PROPÓSITO EN
PARTICULAR
[mgmx#ThinkPad Debug]$ mpirun --version
mpirun (Open MPI) 3.1.0
Report bugs to http://www.open-mpi.org/community/help/
And Eclipse's "about" window
Also I installed OpenMPI in various ways: from the Manjaro Repository using pacman, using pamac (a pacman/yaourt GUI) and the git version from the AUR using both yaourt and pamac.

MPI_Init must be called by one thread only

The ref of MPI_Init, states:
This routine must be called by one thread only. That thread is called the main thread and must be the thread that calls MPI_Finalize.
How to do this? I mean every example I have seen looks like this and in my code, I tried:
MPI_Comm_rank(MPI_COMM_WORLD, &mpirank);
bool mpiroot = (mpirank == 0);
if(mpiroot)
MPI_Init(&argc, &argv);
but I got:
Attempting to use an MPI routine before initializing MPICH
However, notice that this will work fine, if I leave it as in the example, I just had to re-check, because of my code's failure here.
I am thinking that because we call mpiexec -n 4 ./test, 4 processes will be spawned, thus all of them will call MPI_Init. I just printed stuff at the very first line of main() and they will be printed as many times as the number of processes.
MPI_Init must be the first MPI function called by your MPI program. It must be called by each process. Note that a process is not the same as a thread! If you go on to spawn threads from a process, those threads must not call MPI_Init again.
So your program should be something like this:
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
int mpirank;
MPI_Comm_rank(MPI_COMM_WORLD, &mpirank);
// No more calls to MPI_Init in here
...
MPI_Finalize();
}

How to run this compiled open MPI program (C)?

I am trying to run the example code at the following url. I compiled the program with "mpicc twoGroups.c" and tried to run it as "./a.out", but got the following message: Must specify MP_PROCS= 8. Terminating.
My question is how do you set MP_PROCS=8 ?
Group and Communication Routine examples at here https://computing.llnl.gov/tutorials/mpi/
#include "mpi.h"
#include <stdio.h>
#define NPROCS 8
main(int argc, char *argv[]) {
int rank, new_rank, sendbuf, recvbuf, numtasks,
ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7};
MPI_Group orig_group, new_group;
MPI_Comm new_comm;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks != NPROCS) {
printf("Must specify MP_PROCS= %d. Terminating.\n",NPROCS);
MPI_Finalize();
exit(0);
}
sendbuf = rank;
/* Extract the original group handle */
MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
/* Divide tasks into two distinct groups based upon rank */
if (rank < NPROCS/2) {
MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group);
}
else {
MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group);
}
/* Create new new communicator and then perform collective communications */
MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
MPI_Allreduce(&sendbuf, &recvbuf, 1, MPI_INT, MPI_SUM, new_comm);
MPI_Group_rank (new_group, &new_rank);
printf("rank= %d newrank= %d recvbuf= %d\n",rank,new_rank,recvbuf);
MPI_Finalize();
}
When you execute an MPI program, you need to use the appropriate wrappers. Most of the time it looks like this:
mpiexec -n <number_of_processes> <executable_name> <executable_args>
So for your simple example:
mpiexec -n 8 ./a.out
You will also see mpirun used instead of mpiexec or -np instead of -n. Both are fine most of the time.
If you're just starting out, it would also be a good idea to make sure you're using a recent version of MPI so you don't get old bugs or weird execution environments. MPICH and Open MPI are the two most popular implementations. MPICH just released version 3.1 available here while Open MPI has version 1.7.4 available here. You can also usually get either of them via your friendly neighborhood package manager.

Gracefully exit with MPI

I'm trying to gracefully exit my program after if Rdinput returns an error.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0
#define Abort(x) MPI_Abort(MPI_COMM_WORLD, x)
#define Bcast(send_data, count, type) MPI_Bcast(send_data, count, type, MASTER, GROUP) //root --> MASTER
#define Finalize() MPI_Finalize()
int main(int argc, char **argv){
//Code
if( rank == MASTER ) {
time (&start);
printf("Initialized at %s\n", ctime (&start) );
//Read file
error = RdInput();
}
Bcast(&error, 1, INT); Wait();
if( error = 1 ) MPI_Abort(1);
//Code
Finalize();
}
Program output:
mpirun -np 2 code.x
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Initialized at Wed May 30 11:34:46 2012
Error [RdInput]: The file "input.mga" is not available!
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 7369 on
node einstein exiting improperly. There are two reasons this could occur:
//More error message.
What can I do to gracefully exit an MPI program without printing this huge error message?
If you have this logic in your code:
Bcast(&error, 1, INT);
if( error = 1 ) MPI_Abort(1);
then you're just about done (although you don't need any kind of wait after a broadcast). The trick, as you've discovered, is that MPI_Abort() does not do "graceful"; it basically is there to shut things down in whatever way possible when something's gone horribly wrong.
In this case, since now everyone agrees on the error code after the broadcast, just do a graceful end of your program:
MPI_Bcast(&error, 1, MPI_INT, MASTER, MPI_COMM_WORLD);
if (error != 0) {
if (rank == 0) {
fprintf(stderr, "Error: Program terminated with error code %d\n", error);
}
MPI_Finalize();
exit(error);
}
It's an error to call MPI_Finalize() and keep on going with more MPI stuff, but that's not what you're doing here, so you're fine.

Resources