I installed mpich2 on my Ubuntu 14.04 laptop with the following command:
sudo apt-get install libcr-dev mpich2 mpich2-doc
This is the code I'm trying to execute:
#include <mpi.h>
#include <stdio.h>
int main()
{
int myrank, size;
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("Hello world! I am %d of %d\n", myrank, size);
MPI_Finalize();
return 0;
}
Compiling it as mpicc helloworld.c gives no errors. But when I execute the program as: mpirun -np 5 ./a.out There is no output, the program just keeps executing as if it were in an infinite loop. On pressing Ctrl+C, this is what I get:
$ mpirun -np 5 ./a.out
^C[mpiexec#user] Sending Ctrl-C to processes as requested
[mpiexec#user] Press Ctrl-C again to force abort
[mpiexec#user] HYDU_sock_write (./utils/sock/sock.c:291): write error (Bad file descriptor)
[mpiexec#user] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:170): unable to write data to proxy
[mpiexec#user] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream
[mpiexec#user] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#user] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec#user] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
I couldn't get any solution on googling. What is causing this error?
I was getting the same issue with two compute nodes:
$ mpirun -np 10 -ppn 5 --hosts c1,c2 ./a.out
[mpiexec#c1] Press Ctrl-C again to force abort
[mpiexec#c1] HYDU_sock_write (utils/sock/sock.c:286): write error (Bad file descriptor)
[mpiexec#c1] HYD_pmcd_pmiserv_send_signal (pm/pmiserv/pmiserv_cb.c:169): unable to write data to proxy
[mpiexec#c1] ui_cmd_cb (pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream
[mpiexec#c1] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[mpiexec#c1] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec#c1] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
Turns out c1 node couldn't ssh c2.
If you are using only single machine, you can try using fork as launcher:
mpirun -launcher fork -np 5 ./a.out
Related
I am trying to catch the SIGTERM signal and print a message in the handler from a Linux daemon:
void SigStop_Handler(int sig)
{
D(printf("****************** HANDLED STOP SIGNAL ******************\n"));
printf("\n");
}
int main(int argc, char *argv[])
{
signal(SIGTERM, SigStop_Handler);
while(true)
{
//do something
}
return 0;
}
The program runs as a daemon, started from a command line:
systemctl start abc
The deameon will be stopped by:
systemctl stop abc
When the daemon is stopping, I expected that the message will be printed on the console. However the message is not printed and the command line doesn’t return to the command-promt. It does return after a while (timeout). The daemon will be stopped, but the message will not be printed.
What am I doing wrong?
Two items to consider: terminating the daemon when signal is received, and getting the message.
In the context of systemctl, you want your signal handler to exit the program. Otherwise,
it will just 'print' the message the messages, and resume processing. Try
void SigStop_Handler(int sig)
{
D(fprintf(stderr, "****************** HANDLED STOP SIGNAL ******************\n"));
printf("\n");
exit(sig+128) ;
}
Also, given that the process run as a daemon, stdout is not going to be connected to the terminal. Instead, it will get redirected into log file. Also, suggesting using stderr to send messages, instead of stdout (see code above). Also, look at StandardOutput and StandardErrro configuration entries for your service.
The delay that you currently observe occurs because 'systemctl' wait for few seconds after sending the TERM signal - giving the daemon change to terminate. If the daemon does not terminate voluntarily, a KILL signal will be sent, which will force the daemon to terminate, unconditionally, immediately.
I am learning MPI for parallel programming in C and I am using a processor with 4 cores. I am trying to do an example from a tutorial in which the output should be:
Hello world! I'm process 0 out of 4 processes
Hello world! I'm process 2 out of 4 processes
Hello world! I'm process 1 out of 4 processes
Hello world! I'm process 3 out of 4 processes
In whatever order.
Here is my code:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char** argv)
{
int ierr, num_procs, my_id;
ierr = MPI_Init(&argc, &argv);
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &my_id);
ierr = MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
printf("Hello world! I'm process %i out of %i processes\n", my_id, num_procs);
ierr = MPI_Finalize();
}
I compile it using:
mpicc helloworld.c -o helloworld
And I run it using:
mpirun -np 4 helloworld
This is what is outputted:
Hello world! I'm process 0 out of 1 processes
Hello world! I'm process 0 out of 1 processes
Hello world! I'm process 0 out of 1 processes
Hello world! I'm process 0 out of 1 processes
It's outputting it 4 times which is relatively good news I guess but the program isn't recognising the number of threads and each thread ID.
Is it even running in parallel or is it just running 4 times in serial?
How can I get the program to recognise the amount of threads and the thread ID properly?
Thanks in advance!
mpicc helloworld.c -o helloworld
mpirun -np 4 helloworld
Hello world! I'm process 0 out of 1 processes
Hello world! I'm process 0 out of 1 processes
Hello world! I'm process 0 out of 1 processes
Hello world! I'm process 0 out of 1 processes
This sequence clearly shows us that your MPI runtime was unable to detect parallel start, it probably from misconfiguration: your mpicc is from one MPI implementation and your mpirun is from other. For example, both MPICH and OpenMPI have mpicc scripts for compiling MPI programs, but their mpiexec/mpirun programs are incompatible. Compile with MPICH, start with OpenMPI starter and MPICH runtime will not receive needed environment variables to figure out parallel run and its parameters.
You should revisit list of installed packages (dpkg -l|egrep 'mpich|openmpi') and check which file is from which library (dpkg -L mpich, dpkg -L openmpi-bin; dpkg -L libmpich-dev, dpkg -L libopenmpi-dev). Ubuntu/debian also have system of "alternatives" which will install symbolic links mpicc and mpirun to actual scripts (do ls -l /usr/bin/mpicc /usr/bin/mpirun to see current state of the links). Check update-alternatives tool, its man page and docs to learn how to reset all mpi-named scripts to one implementation (and there is galternatives GUI for it).
According to file lists in packages, mpich and openmpi have variants of mpirun/mpiexec with suffixes http://packages.ubuntu.com/yakkety/amd64/openmpi-bin/filelist http://packages.ubuntu.com/yakkety/amd64/mpich/filelist:
/usr/bin/mpiexec.openmpi
/usr/bin/mpirun.openmpi
/usr/bin/mpiexec.hydra
/usr/bin/mpiexec.mpich
/usr/bin/mpirun.mpich
Same situation for mpicc scripts: http://packages.ubuntu.com/yakkety/amd64/libopenmpi-dev/filelist http://packages.ubuntu.com/yakkety/amd64/libmpich-dev/filelist
/usr/bin/mpicc.openmpi
/usr/bin/mpicc.mpich
Always use mpicc and mpirun (or mpiexec) from the same implementation. You may also use variants with suffix to be sure: mpicc.openmpi & mpiexec.openmpi pair or mpicc.mpich & mpiexec.mpich pair.
And to use some MPI implementation, you should have it fully installed, both for bin, lib and dev packages.
I have installed OpenMPI and tried to compile/execute one of the examples delivered with the newest version.
As I try to run with mpiexec it says that the address is already in use.
Someone got a hint why this is always happening?
Kristians-MacBook-Pro:examples kristian$ mpicc -o hello hello_c.c
Kristians-MacBook-Pro:examples kristian$ mpiexec -n 4 ./hello
[Kristians-MacBook-Pro.local:02747] [[56076,0],0] bind() failed on error Address already in use (48)
[Kristians-MacBook-Pro.local:02747] [[56076,0],0] ORTE_ERROR_LOG: Error in file oob_usock_component.c at line 228
[Kristians-MacBook-Pro.local:02748] [[56076,1],0] usock_peer_send_blocking: send() to socket 19 failed: Socket is not connected (57)
[Kristians-MacBook-Pro.local:02748] [[56076,1],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 315
[Kristians-MacBook-Pro.local:02748] [[56076,1],0] orte_usock_peer_try_connect: usock_peer_send_connect_ack to proc [[56076,0],0] failed: Unreachable (-12)
[Kristians-MacBook-Pro.local:02749] [[56076,1],1] usock_peer_send_blocking: send() to socket 20 failed: Socket is not connected (57)
[Kristians-MacBook-Pro.local:02749] [[56076,1],1] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 315
[Kristians-MacBook-Pro.local:02749] [[56076,1],1] orte_usock_peer_try_connect: usock_peer_send_connect_ack to proc [[56076,0],0] failed: Unreachable (-12)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[56076,1],0]
Exit code: 1
--------------------------------------------------------------------------
Thanks in advance.
Okay.
I have now changed the $TMPDIR environment variable with export TMPDIR=/tmp and it works.
Now it seems to me that the OpenMPI Session folder was blocking my communication. But why did it?
Am I missing something here?
When I run program via:
myshell$] mpirun --hosts localhost,192.168.1.4 ./a.out
the program executes successfully. Now when I try to run:
myshell$] mpirun --hosts localhost,myac#192.168.1.4 ./a.out
openssh prompts for password. I get:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(433)..............:
MPID_Init(176).....................: channel initialization failed
MPIDI_CH3_Init(70).................:
MPID_nem_init(286).................:
MPID_nem_tcp_init(108).............:
MPID_nem_tcp_get_business_card(354):
MPID_nem_tcp_init(313).............: gethostbyname failed, myac#192.168.1.4 (errno 1)
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0#myac] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:0#myac] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#myac] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec#myac] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec#myac] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec#myac] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
[mpiexec#myac] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
Why am I getting error when I am providing the username?
You could try specifying a username in your ssh config file (http://www.cyberciti.biz/faq/create-ssh-config-file-on-linux-unix/) instead of on the mpirun command line. That way perhaps mpirun would not be confused by the extra username part, which as far as I can see from the documentation it does not support. But ssh could, behind the scenes, use the username you specify in your ssh config file. And of course you'll want to set up SSH keys so you don't have to type a password.
I don't believe MPICH supports providing usernames in the --hosts value on the command line. You should try the host file based method described on the wiki. http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Using_Hydra_on_Machines_with_Different_User_Names
For example:
shell$ cat hosts
donner user=foo
foo user=bar
shakey user=bar
terra user=foo
I am trying to run a simple MPI program on 4 nodes. I am using OpenMPI 1.4.3 running on Centos 5.5. When I submit the MPIRUN Command with the hostfile/machinefile, I get no output, receive a blank screen. Hence, I have to kill the job. .
I use the following run command: : mpirun --hostfile hostfile -np 4 new46
OUTPUT ON KILLING JOB:
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process that caused
that situation.
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
myocyte46 - daemon did not report back when launched
myocyte47 - daemon did not report back when launched
myocyte49 - daemon did not report back when launched
Here is the MPI program I am trying to execute on 4 nodes
**************************
if (my_rank != 0)
{
sprintf(message, "Greetings from the process %d!", my_rank);
dest = 0;
MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
}
else
{
for (source = 1;source < p; source++)
{
MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status);
printf("%s\n", message);
}
****************************
My hostfile looks like this:
[amohan#myocyte48 ~]$ cat hostfile
myocyte46
myocyte47
myocyte48
myocyte49
*******************************
I ran the above MPI program independently on each of the nodes and it compiled and ran just fine. I have this issue of "Daemon did not report back when launched" when I use the hostfile. I am trying to figure out what could be the issue.
Thanks!
I think these lines
myocyte46 - daemon did not report back when launched
are pretty clear -- you're having trouble either launching the mpi daemons or communicating with them afterwards. So you need to start looking at networking. Can you ssh without password into these nodes? Can you ssh back? Leaving aside the MPI program, can you
mpirun -np 4 hostname
and get anything?