UPDATE
I have an university project in which I should build up a cluster with RPis.
Now we have a fully functional system with BLCR/MPICH on.
BLCR works very well with normal processes linked with the lib.
Demonstrations we have to show from our management web interface are:
parallel execution of a job
migration of processes across the nodes
fault tolerance with MPI
We are allowed to use the simplest computations.
The first one we got easily, with MPI too. The second point we actually have only working with normal processes (without MPI). Regarding the third point I have less idea how to implement a master-slave MPI scheme, in which I can restart a slave process, which also affects point two because we should/can/have_to make a checkpoint of the slave process, kill/stop it and restart it on another node. I know that I have to handle the MPI_Errors myself but how to restore the process? It would be nice if someone could post me a link or paper (with explanations) at least.
Thanks in advance
UPDATE:
As written earlier our BLCR+MPICH stuff works or seems to.
But... When I start MPI Processes checkpointing seems to work well.
Here the proof:
... snip ...
Benchmarking: dynamic_5: md5($s.$p.$s) [32/32 128x1 (MD5_Body)]... DONE
Many salts: 767744 c/s real, 767744 c/s virtual
Only one salt: 560896 c/s real, 560896 c/s virtual
Benchmarking: dynamic_5: md5($s.$p.$s) [32/32 128x1 (MD5_Body)]... [proxy:0:0#node2] requesting checkpoint
[proxy:0:0#node2] checkpoint completed
[proxy:0:1#node1] requesting checkpoint
[proxy:0:1#node1] checkpoint completed
[proxy:0:2#node3] requesting checkpoint
[proxy:0:2#node3] checkpoint completed
... snip ...
If I kill one Slave-Process on any node I get this here:
... snip ...
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
... snip ...
It is ok because we have a checkpoint so we can restart our application.
But it doesn't work:
pi 7380 0.0 0.2 2984 1012 pts/4 S+ 16:38 0:00 mpiexec -ckpointlib blcr -ckpoint-prefix /tmp -ckpoint-num 0 -f /tmp/machinefile -n 3
pi 7381 0.1 0.5 5712 2464 ? Ss 16:38 0:00 /usr/bin/ssh -x 192.168.42.101 "/usr/local/bin/mpich/bin/hydra_pmi_proxy" --control-port masterpi:47698 --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
pi 7382 0.1 0.5 5712 2464 ? Ss 16:38 0:00 /usr/bin/ssh -x 192.168.42.102 "/usr/local/bin/mpich/bin/hydra_pmi_proxy" --control-port masterpi:47698 --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1
pi 7383 0.1 0.5 5712 2464 ? Ss 16:38 0:00 /usr/bin/ssh -x 192.168.42.105 "/usr/local/bin/mpich/bin/hydra_pmi_proxy" --control-port masterpi:47698 --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 2
pi 7438 0.0 0.1 3548 868 pts/1 S+ 16:40 0:00 grep --color=auto mpi
I don't know why but the first time I restart the app on every node the process seems to be restarted (I got it from using top or ps aux | grep "john" but no output to the management (or on the management console/terminal) is shown. It just hangs up after showing me:
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp -ckpoint-num 0 -f /tmp/machinefile -n 3
Warning: Permanently added '192.168.42.102' (ECDSA) to the list of known hosts.
Warning: Permanently added '192.168.42.101' (ECDSA) to the list of known hosts.
Warning: Permanently added '192.168.42.105' (ECDSA) to the list of known hosts.
My plan B is just to test with own application if the BLCR/MPICH stuff really works. Maybe there some troubles with john.
Thanks in advance
**
UPDATE
**
Next problem with simple hello world. I dispair slowly. Maybe I'm confused too much.
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/ -ckpoint-interval 3 -f /tmp/machinefile -n 4 ./hello
Warning: Permanently added '192.168.42.102' (ECDSA) to the list of known hosts.
Warning: Permanently added '192.168.42.105' (ECDSA) to the list of known hosts.
Warning: Permanently added '192.168.42.101' (ECDSA) to the list of known hosts.
[proxy:0:0#node2] requesting checkpoint
[proxy:0:0#node2] checkpoint completed
[proxy:0:1#node1] requesting checkpoint
[proxy:0:1#node1] checkpoint completed
[proxy:0:2#node3] requesting checkpoint
[proxy:0:2#node3] checkpoint completed
[proxy:0:0#node2] requesting checkpoint
[proxy:0:0#node2] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:0#node2] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:0#node2] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#node2] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:1#node1] requesting checkpoint
[proxy:0:1#node1] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:1#node1] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:1#node1] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1#node1] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:2#node3] requesting checkpoint
[proxy:0:2#node3] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:2#node3] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:2#node3] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2#node3] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec#masterpi] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
[mpiexec#masterpi] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#masterpi] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec#masterpi] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
hello.c
/* C Example */
#include <stdio.h>
#include <mpi.h>
int main (argc, argv)
int argc;
char *argv[];
{
int rank, size, i, j;
char hostname[1024];
hostname[1023] = '\0';
gethostname(hostname, 1023);
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
i = 0;
for(i ; i < 400000000; i++){
for(j; j < 4000000; j++){
}
}
printf("%s done...", hostname);
printf("%s: %d is alive\n", hostname, getpid());
MPI_Finalize();
return 0;
}
Related
I have a program in C that runs well when running it directly from the comand line but fails when running it with systemd:
Core was generated by `/usr/local/bin/midnite-modbusd'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0 0x0000000000401308 in main (argc=1, argv=0x7ffeae390268) at src/midnite-modbusd.c:139
139 slen= interval - (millis % interval);
The code in question:
//wait for start of each sample interval
gettimeofday(&tv,NULL);
millis= (long long unsigned)tv.tv_sec*1000 + (tv.tv_usec/1000);
slen= interval - (millis % interval);
i= (millis+slen) % 1000;
usleep (slen*1000);
The full code is available on github.
The systyemd unit:
[Unit]
Description=Midnite Classic modbus data polling
After=network.target
[Service]
Type=simple
User=midnite-modbusd
ExecStart=/usr/local/bin/midnite-modbusd
Restart=on-failure
[Install]
WantedBy=multi-user.target
What can be so different when a program runs with systemd ?
Edit 1
It seems that my program has major issues that only happen when running with systemd:
it won't read my configuration file, which should throw an error message and exit(1) because of invalid values
journactl doesn't get filled in real time. Using journactl -f I have to wait a couple of minutes before seeing a bunch of logs that appear suddenly
As a side note for my tests using the command line I run: sudo -H -u midnite-modbusd /usr/local/bin/midnite-modbusd
A defined value of sample_interval from configuration file will initialize the interval, please check if the file is correct and sample_interval is present. An uninitialized value of interval might cause the divide by zero exception
I found the issue in this code:
if (getppid()==1) {
sprintf(str, "Daemon aready running");
log_message(log_file_path,(char*)str);
return;
}
This code is here for when the program was intended to fork itself to run as an "old style" daemon.
I didn't realize that, as systemd is forking it, then the program have a parent process (thus getppid() returning 1 when running with systemd but not from the command line)
Anyway it is badly written: this test should stop the script.
I am running a sample MPI program which prints hello world.
When I am running with 1,2....330 process it runs as expected.
But when the number goes beyond 330 it fails with below error.
Can some explain the reason for this.
I am running the program on my laptop which has i5 processor with 4 cores and 8 GB RAM.
[proxy:0:0#Abhishek-Machine] HYDU_create_process (./utils/launch/launch.c:25): pipe error (Too many open files)
[proxy:0:0#Abhishek-Machine] launch_procs (./pm/pmiserv/pmip_cb.c:705): create process returned error
[proxy:0:0#Abhishek-Machine] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:893): launch_procs returned error
[proxy:0:0#Abhishek-Machine] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#Abhishek-Machine] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec#Abhishek-Machine] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
[mpiexec#Abhishek-Machine] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#Abhishek-Machine] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec#Abhishek-Machine] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
You are hitting an OS limit for socket descriptors or similar. Over subscribing your workstation to this degree is not a good idea and unlikely to work unless you change your system settings (which is not a good idea for this use case).
I am afraid I am not sure what I'm doing wrong here.
I have a threaded application that starts 3 threads upon start
[root#Embest /]# ps
1111 root 608 S fw634c_d_cdm_sb
1112 root 608 S fw634c_d_cdm_sb
1113 root 608 S fw634c_d_cdm_sb
then waits in standby mode for commands from the serial.
after it runs and returns to stand by mode, I check with ps whats going on; there are zombiefied instances of the application (and the file name is sq.bracketed too)
1114 root Z [fw634c_d_cdm_sb]
...
...
...
1768 root Z [fw634c_d_cdm_sb]
about 628 of them.
thing is,
the policy i'm following is:
-for detachable threads - don't care (they will exit and free resources on their own after completing)
-for joinable threads - i run pthread_join after running pthread_create and wait for the threaded function to complete. like this:
if (pthread_create(&tmp_thrd_id,&attr_joinable,run_function,(void *)&aStruct)!=0){
DEBUG(printf("thread NOT created \n"));
}else{
DEBUG(printf("thread created !\n"));
if (pthread_join(tmp_thrd_id,NULL)!=0){
DEBUG(printf("\nERROR in joining \n"));
}else{
DEBUG(printf("Thread completed\n"));
}
}
I only run pthread_exit(NULL) in main , which doesn't do much and after the startup just lies around just because it must not be killed.
i'm probably forgeting something vital here. but can't clarify what after reading a few basic guides on threads....
thank you for your help
A "zombie" thread is a thread that has exited, and is waiting around for someone to call pthread_join to collect its exit status. So somewhere in your program you are creating threads and not eventually calling pthread_join or pthread_detach for those threads.
In a C program on Linux, I fork() followed by execve() twice to create two processes running two seperate programs. How do I make sure that the execution of the two child processes interleave?
Thanks
Tried to do the above task as an answer given below had suggested but seems on encountering sched_scheduler() process hangs. Including code below...replay1 and replay2 are two prograns which simply prints "Replay1" and "Replay2" respectively.
# include<stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <sched.h>
void main()
{
int i,pid[5],pidparent,new=0;
char *newargv1[] = {"./replay1",NULL};
char *newargv2[] = {"./replay2",NULL};
char *newenviron[] = {NULL};
struct sched_param mysched;
mysched.sched_priority = 1;
sched_setscheduler(0,SCHED_FIFO, &mysched);
pidparent =getpid();
for(i=0;i<2;i++)
{
if(getpid()==pidparent)
{
pid[i] = fork();
if(pid[i] != 0)
kill(pid[i],SIGSTOP);
if(i==0 && pid[i]==0)
execve(newargv1[0], newargv1, newenviron);
if (i==1 && pid[i]==0)
execve(newargv2[0], newargv2, newenviron);
}
}
for(i=0;i<10;i++)
{
if(new==0)
new=1;
else
new=0;
kill(pid[new],SIGCONT);
sleep(100);
kill(pid[new], SIGSTOP);
}
}
Since you need random interleaving, here's a horrible hack to do it:
Immediately after forking, send a SIGSTOP to each application.
Set your parent application to have real-time priority with sched_setscheduler. This will allow you to have more fine-grained timers.
Send a SIGCONT to one of the child processes.
Loop: Wait a random, short time. Send a SIGSTOP to the currently-running application, and a SIGCONT to the other. Repeat.
This will help force execution to interleave. It will also make things quite slow. You may also want to try using sched_setaffinity to assign each process to a different CPU (if you have a dual-core or hyperthreaded CPU) - this will cause them to effectively run simultaneously, modulo wait times for I/O. I/O wait times (which could cause them to wait for the hard disk, at which point they're likely to wake up sequentially and thus not interleave) can be avoided by making sure whatever data they're manipulating is on a ramdisk (on linux, use tmpfs).
If this is too coarse-grained for you, you can use ptrace's PTRACE_SINGLESTEP operation to step one CPU operation at a time, interleaving as you see fit.
As this is for testing purposes, you could place sched_yield(); calls after every line of code in the child processes.
Another potential idea is to have a parent process ptrace() the child processes, and use PTRACE_SINGLESTEP to interleave the two process's execution on an instruction-by-instruction basis.
if you need to synchronize them and they are your own processes, use semaphores. If you do not have access to the source, then there is no way to synchronize them.
If your aim is to do concurrency testing, I know of only two techniques:
Test exact scenarios using synchronization. For example, process 1 opens a connection and executes a query, then process 2 comes in and executes a query, then process1 gets active again and gets the results, etc. You do this with synchronization techniques mentioned by others. However, getting good test scenarios is very difficult. I have rarely used this method in the past.
In random you trust: fire up a high number of test processes that execute a long running test suite. I used this method for both multithreading and multiprocess testing (my case was testing device driver access from multiple processes without blue screening out). Usually you want to make the number of processes and number of iterations of the test suite per process configurable so that you can either do a quick pass or do a longer test before a release (running this kind of test with 10 processes for 10-12 hours was not uncommon for us). A usual run for this sort of testing is measured in hours. You just fire up the processes, let them run for a few hours, and hope that they will catch all the timing windows. The interleaving is usually handled by the OS, so you don't really need to worry about it in the test processes.
Job control is much simpler with the Bash instead of C. Try this:
#! /bin/bash
stop ()
{
echo "$1 stopping"
kill -SIGSTOP $2
}
cont ()
{
echo "$1 continuing"
kill -SIGCONT $2
}
replay1 ()
{
while sleep 1 ; do echo "replay 1 running" ; done
}
replay2 ()
{
while sleep 1 ; do echo "replay 2 running" ; done
}
replay1 &
P1=$!
stop "replay 1" $P1
replay2 &
P2=$!
stop "replay 2" $P2
trap "kill $P1;kill $P2" EXIT
while sleep 1 ; do
cont "replay 1 " $P1
cont "replay 2" $P2
sleep 3
stop "replay 1 " $P1
stop "replay 2" $P2
done
The two processes are running in parallel:
$ ./interleave.sh
replay 1 stopping
replay 2 stopping
replay 1 continuing
replay 2 continuing
replay 2 running
replay 1 running
replay 1 running
replay 2 running
replay 1 stopping
replay 2 stopping
replay 1 continuing
replay 2 continuing
replay 1 running
replay 2 running
replay 2 running
replay 1 running
replay 2 running
replay 1 running
replay 1 stopping
replay 2 stopping
replay 1 continuing
replay 2 continuing
replay 1 running
replay 2 running
replay 1 running
replay 2 running
replay 1 running
replay 2 running
replay 1 stopping
replay 2 stopping
^C
I am trying to run a simple MPI program on 4 nodes. I am using OpenMPI 1.4.3 running on Centos 5.5. When I submit the MPIRUN Command with the hostfile/machinefile, I get no output, receive a blank screen. Hence, I have to kill the job. .
I use the following run command: : mpirun --hostfile hostfile -np 4 new46
OUTPUT ON KILLING JOB:
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process that caused
that situation.
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
myocyte46 - daemon did not report back when launched
myocyte47 - daemon did not report back when launched
myocyte49 - daemon did not report back when launched
Here is the MPI program I am trying to execute on 4 nodes
**************************
if (my_rank != 0)
{
sprintf(message, "Greetings from the process %d!", my_rank);
dest = 0;
MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
}
else
{
for (source = 1;source < p; source++)
{
MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status);
printf("%s\n", message);
}
****************************
My hostfile looks like this:
[amohan#myocyte48 ~]$ cat hostfile
myocyte46
myocyte47
myocyte48
myocyte49
*******************************
I ran the above MPI program independently on each of the nodes and it compiled and ran just fine. I have this issue of "Daemon did not report back when launched" when I use the hostfile. I am trying to figure out what could be the issue.
Thanks!
I think these lines
myocyte46 - daemon did not report back when launched
are pretty clear -- you're having trouble either launching the mpi daemons or communicating with them afterwards. So you need to start looking at networking. Can you ssh without password into these nodes? Can you ssh back? Leaving aside the MPI program, can you
mpirun -np 4 hostname
and get anything?