I'm running a Java program as a daemon on Linux using Apache commons-daemon's jsvc.
The daemon "randomly" crashes with only message:
jsvc.exec error: Service did not exit cleanly
This is the relevant part of the code in jsvc (in jsvc-unix.c line 1142):
while (waitpid(pid, &status, 0) != pid) {
/* Waith for process */
}
/* The child must have exited cleanly */
if (WIFEXITED(status)) {
status = WEXITSTATUS(status);
// Clean exit code...
}
else {
if (WIFSIGNALED(status)) {
log_error("Service killed by signal %d", WTERMSIG(status));
/* prevent looping */
if (laststart + 60 > time(NULL)) {
log_debug("Waiting 60 s to prevent looping");
sleep(60);
}
continue;
}
log_error("Service did not exit cleanly", status);
return 1;
}
In which case can WIFEXITED and WIFSIGNALED both be false ?
Is it guaranteed that the process was not killed in this case (by a process or Linux OOM killer) ?
WIFSTOPPED exists too, but it's only possible if the parent is ptrace:ing the child process (or with different flags to waitpid).
I think your best bet is to print the status and the look at the bits in sys/wait.h. It's quite hard to get it right though. Much information is being stuffed into that int and it's hard to figure it out. It looks like the code you pasted already tries to do that, but forgot the %d in the format string.
Related
I need to be able to get the returned value from a child process without having to hold the execution of the parent for it.
Notice the a runtime error could happen in the child process.
Here is my program that I'm trying to make:
//In parent process:
do
{
read memory usage from /proc/ID/status
if(max_child_memory_usage > memory_limit)
{
kill(proc, SIGKILL);
puts("Memory limit exceeded");
return -5; // MLE
}
getrusage(RUSAGE_SELF,&r_usage);
check time and memory consumption
if(memory limit exceeded || time limit exceeded)
{
kill(proc, SIGKILL);
return fail;
}
/*
need to catch the returned value from the child somehow with
this loop working.
Notice the a runtime error could happen in the child process.
*/
while(child is alive);
The waitpid function has an option called WNOHANG which causes it to return immediately if the given child has not yet returned:
pid_t rval;
int status;
do {
...
rval = waitpid(proc, &status, WNOHANG);
} while (rval == 0);
if (rval == proc) {
if (WIFEXITED(status)) {
printf("%d exited normal with status %d\n", WEXITSTATUS(status));
} else {
printf("%d exited abnormally\n");
}
}
See the man page for waitpid for more details on checking various abnormal exit conditions.
The solutions using the WNOHANG flag would work only if you need to check just once for the exit status of the child. If however you would like to procure the exit status when the child exits, no matter how late that is, a better solution is to set a signal handler for the SIGCHLD signal.
When the child process terminates whether normally or abnormally, SIGCHLD will be sent to the parent process. Within this signal handler, you can call wait to reap the exit status of the child.
void child_exit_handler(int signo){
int exit_status;
int pid = wait(&exit_status);
// Do things ...
}
// later in the code, before forking and creating the child
signal(SIGCHLD, child_exit_handler);
Depending on other semantics of your program, you may want to use waitpid instead. (SIGCHLD may also be called if the program was stopped, not terminated. The man page for wait(2) describes macros to check for this.)
Recently, when reading a book about linux programming, I got a message that:
The status argument given to _exit() defines the termination status of the process, which is available to the parent of this process when it calls wait(). Although defined as an int, only the bottom 8 bits of status are actually made available to the parent. And only 0 ~ 127 is recommanded to use, because 128 ~ 255 could be confusing in shell due to some reason. Due to that -1 will become 255 in 2's complement.
The above is about the exit status of a child process.
My question is:
Why the parent process only get the 8 bits of the child process's exit status?
What about return value of normal functions? Does it reasonable or befinit to use only 0 ~ 127 ? Because I do use -1 as return value to indicate error sometimes, should I correct that in future.
Update - status get by wait() / waitpid():
I read more chps in the book (TLPI), and found there are more trick in the return status & wait()/waitpid() that worth mention, I should have read more chps before ask the question. Anyhow, I have add an answer by myself to describe about it, in case it might help someone in future.
Why the parent process only get the 8 bits of the child process's exit status?
Because POSIX says so. And POSIX says so because that's how original Unix worked, and many operating system derived from it and modeled after it continue to work.
What about return value of normal functions?
They are unrelated. Return whatever is reasonable. -1 is as good as any other value, and is in fact a standard way to indicate an error in a huge lot of standard C and POSIX APIs.
The answer from #n.m. is good.
But later on, I read more chps in the book (TLPI), and found there are more trick in the return status & wait()/waitpid() that worth mention, and that might be another important or root reason why child process can't use full bits of int when exit.
Wait status
basicly:
child process should exit with value within range of 1 byte, which is set as part of the status parameter of wait() / waitpid(),
and only 2 LSB bytes of the status is used,
byte usage of status:
event byte 1 byte 0
============================================================
* normal termination exit status (0 ~ 255) 0
* killed by signal 0 termination signal (!=0)
* stopped by signal stop signal 0x7F
* continued by signal 0xFFFF
*
dissect returned status:
header 'sys/wait.h', defines a set of macros that help to dissect a wait status,
macros:
* WIFEXITED(status)
return true if child process exit normally,
*
* WIFSIGNALED(status)
return true if child process killed by signal,
* WTERMSIG(status)
return signal number that terminate the process,
* WCOREDUMP(status)
returns ture if child process produced a core dump file,
tip:
this macro is not in SUSv3, might absent on some system,
thus better check whether it exists first, via:
#ifdef WCOREDUMP
// ...
#endif
*
* WIFSTOPPED(status)
return true if child process stopped by signal,
* WSTOPSIG(status)
return signal number that stopp the process,
*
* WIFCONTINUED(status)
return true if child process resumed by signal SIGCONT,
tip:
this macro is part of SUSv3, but some old linux or some unix might didn't impl it,
thus better check whether it exists first, via:
#ifdef WIFCONTINUED
// ...
#endif
*
Sample code
wait_status_test.c
// dissect status returned by wait()/waitpid()
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/wait.h>
#define SLEEP_SEC 10 // sleep seconds of child process,
int wait_status_test() {
pid_t cpid;
// create child process,
switch(cpid=fork()) {
case -1: // failed
printf("error while fork()\n");
exit(errno);
case 0: // success, child process goes here
sleep(SLEEP_SEC);
printf("child [%d], going to exit\n",(int)getpid());
_exit(EXIT_SUCCESS);
break;
default: // success, parent process goes here
printf("parent [%d], child created [%d]\n", (int)getpid(), (int)cpid);
break;
}
// wait child to terminate
int status;
int wait_flag = WUNTRACED | WCONTINUED;
while(1) {
if((cpid = waitpid(-1, &status, wait_flag)) == -1) {
if(errno == ECHILD) {
printf("no more child\n");
exit(EXIT_SUCCESS);
} else {
printf("error while wait()\n");
exit(-1);
}
}
// disset status
printf("parent [%d], child [%d] ", (int)getpid(), (int)cpid);
if(WIFEXITED(status)) { // exit normal
printf("exit normally with [%d]\n", status);
} else if(WIFSIGNALED(status)) { // killed by signal
char *dumpinfo = "unknow";
#ifdef WCOREDUMP
dumpinfo = WCOREDUMP(status)?"true":"false";
#endif
printf("killed by signal [%d], has dump [%s]\n", WTERMSIG(status), dumpinfo);
} else if(WIFSTOPPED(status)) { // stopped by signal
printf("stopped by signal [%d]\n", WSTOPSIG(status));
#ifdef WIFCONTINUED
} else if(WIFCONTINUED(status)) { // continued by signal
printf("continued by signal SIGCONT\n", WSTOPSIG(status));
#endif
} else { // this should never happen
printf("unknow event\n");
}
}
return 0;
}
int main(int argc, char *argv[]) {
wait_status_test();
return 0;
}
Compile:
gcc -Wall wait_status_test.c
Execute:
./a.out and wait it to terminate normally, child process id is printed after fork(),
./a.out, then kill -9 <child_process_id> before it finish sleep,
./a.out, then kill -STOP <child_process_id> before it finish sleep, then kill -CONT <child_process_id> to resume it,
So, I'm exiting from the child thread back to the parent. I am using the _exit() system call. I was wondering a few things. One was what parameter for the _exit for my child. Here is the code that my child process is executing:
printf("\n****Child process.****\n\nSquence: ");
do{
//Print the integer in the sequence.
printf("%d\t",inputInteger);
if((inputInteger%2) == 0){
//printf("inputInteger = %d\n", inputInteger);
inputInteger = inputInteger / 2;
}else{
inputInteger = 3*inputInteger +1;
//printf("%d\t",inputInteger);
}
}while(inputInteger != 1);
//Makes sure we print off 1!
printf("%d\n\n", inputInteger);
//Properly exit
_exit(status);
I use status because back in my parent thread I use it in the waitpid() system call. Here is the code for parent process that is executed after the child is completed.
waitpid_check = waitpid(processID, &status, 0);
printf("\n****Parent process.****\n");
if(waitpid_check == -1){
printf("Error in waitpid.\n");
exit(EXIT_FAILURE);
}
if(WIFEXITED(status)){
printf("Child process terminated normally!\n");
}
Here I'm using waitpid() system call that ensures that the child was exited, then use status to check if it was exited properly. I was wondering if I was going about this in the right way of creating the child and exiting it.
Then I was also wondering if I was correctly checking the exiting of the child in the parent.
Thanks for your help!
From the waitpid linux manual.
"If status is not NULL, wait() and waitpid() store status information in the int to which
it points."
You don't need the return value of wait paid to check if the child failed. You need to check to value of status. There are a handful of macros to check status.
WIFEXITED(status)
returns true if the child terminated normally, that is, by calling exit(3) or _exit(2), or by returning from main().
WEXITSTATUS(status)
returns the exit status of the child. This consists of the least significant 8 bits of the status argument that the child specified in a call to exit(3) or _exit(2) or as the argument for a return statement in main(). This macro should only be employed if WIFEXITED returned true.
WIFSIGNALED(status)
returns true if the child process was terminated by a signal.
WTERMSIG(status)
returns the number of the signal that caused the child process to terminate. This macro should only be employed if WIFSIGNALED returned true.
WCOREDUMP(status)
returns true if the child produced a core dump. This macro should only be employed if WIFSIGNALED returned true. This macro is not specified in POSIX.1-2001 and is not available on some UNIX implementations (e.g., AIX, SunOS). Only use this enclosed in #ifdef WCOREDUMP ... #endif.
WIFSTOPPED(status)
returns true if the child process was stopped by delivery of a signal; this is only possible if the call was done using WUNTRACED or when the child is being traced (see ptrace(2)).
WSTOPSIG(status)
returns the number of the signal which caused the child to stop. This macro should only be employed if WIFSTOPPED returned true.
WIFCONTINUED(status)
(since Linux 2.6.10) returns true if the child process was resumed by delivery of SIGCONT.
As for whether or not you are exiting the child process right that really depends. You would exit like you would in any other program since when you fork a process you are really just duplicating an address space and the child when run as its own independent program (of course with the same open FD's, already declared values etc as parent). Below is typical implementation for this problem (although NULL is being passed to the wait instead of a status so I think you are doing it right.)
/* fork a child process */
pid = fork();
if (pid < 0) { /* error occurred */
fprintf(stderr, "Fork Failed\n");
return 1;
}
else if (pid == 0) { /* child process */
printf("I am the child %d\n",pid);
execlp("/bin/ls","ls",NULL);
}
else { /* parent process */
/* parent will wait for the child to complete */
printf("I am the parent %d\n",pid);
wait(NULL);
printf("Child Complete\n");
}
return 0;
I'd love to help but I'm really rusty on these calls. If you've read through the documentation on these API calls and you're checking everywhere for error returns, then you should be in good shape.
The idea seems good at a high level.
One thing to keep in mind is you might want to surround the meat of your child method in a try/catch. With threads, you often don't want an exception to mess up your main flow.
You won't have that problem with multiple processes, but think about whether you want _exit to be called in the face of an exception, and how to communicate (to the parent or to the user) that an exception occurred.
I am a little confused about how to handle errors from execvp(). My code so far looks like this:
int pid = fork();
if (pid < 0) {
// handle error.
}
else if (pid == 0) {
int status = execvp(myCommand,myArgumentVector); // status should be -1 because execvp
// only returns when an error occurs
// We only reach this point as a result of failure from execvp
exit(/* What goes here? */);
}
else {
int status;
int waitedForPid = waitpid(pid,&status,0);
//...
}
There are three cases I'm trying to address:
myCommand,myArgumentVector are valid and the command executes correctly.
myCommand,myArgumentVector are valid parameters, but something goes wrong in the execution of myCommand.
myCommand,myArgumentVector are invalid parameters (e.g. myCommand cannot be found) and the execvp() call fails.
My primary concern is that the parent process will have all the information it needs in order to handle the child's error correctly, and I'm not entirely sure how to do that.
In the first case, the program presumably ended with an exit status of 0. This means that if I were to call WIFEXITED(status) in the macro, I should get true. I think this should work fine.
In the second case, the program presumably ended with an exit status other than 0. This means that if I were to call WEXITSTATUS(status) I should get the specific exit status of the child invocation of myCommand (please advise if this is incorrect).
The third case is causing me a lot of confusion. So if execvp() fails then the error is stored in the global variable errno. But this global variable is only accessible from the child process; the parent as an entirely separate process I don't think can see it. Does this mean that I should be calling exit(errno)? Or am I supposed to be doing something else here? Also, if I call exit(errno) how can I get the value of errno back from status in the parent?
My grasp is still a little tenuous so what I'm looking for is either confirmation or correction in my understanding of how to handle these three cases.
Here is a simple code that I've tried.
if(fork() == 0){
//do child stuff here
execvp(cmd,arguments); /*since you want to return errno to parent
do a simple exit call with the errno*/
exit(errno);
}
else{
//parent stuff
int status;
wait(&status); /*you made a exit call in child you
need to wait on exit status of child*/
if(WIFEXITED(status))
printf("child exited with = %d\n",WEXITSTATUS(status));
//you should see the errno here
}
In case 1, the execvp() does not return. The status returned to the parent process will be the exit status of the child — what it supplies to exit() or what it returns from main(), or it may be that the child dies from a signal in which case the exit status is different but detectably so (WIFSIGNALED, etc). Note that this means that the status need not be zero.
It isn't entirely clear (to me) what you are thinking of with case 2. If the command starts but rejects the options it is called with, it is actually case 1, but the chances of the exit status being zero should be small (though it has been known for programs to exit with status 0 on error). Alternatively, the command can't be found, or is found but is not executable, in which case execvp() returns and you have case 3.
In case 3, the execvp() call fails. You know that because it returns; a successful execvp() never returns. There is no point in testing the return value of execvp(); the mere fact that it returns means it failed. You can tell why it failed from the setting of errno. POSIX uses the exit statuses of 126 and 127 — see xargs and system() for example. You can look at the error codes from execvp() to determine when you should return either of those or some other non-zero value.
In the third case, errno IS accessible from the parent as well, so you could just exit(errno).
However, that is not the best thing to do, since the value of errno could change by the time you exit.
To be more sure that you don't lose errno if you have code between your exec() and exit() calls, assign errno to an int:
execvp(<args>);
int errcode=errno;
/* other code */
exit(errcode);
As for your other question, exit status is not directly comparable to the errno, and you shouldn't be trying to retrieve errno from anything but errno (as above) anyway.
This documentation may help:
http://www.gnu.org/software/libc/manual/html_node/Exit-Status.html
Here is my code:
it works fine. Child status is returned in waitpid hence it tells either child process is executed successfully or not.
//declaration of a process id variable
pid_t pid, ret_pid;
//fork a child process is assigned
//to the process id
pid=fork();
DFG_STATUS("Forking and executing process in dfgrunufesimulator %d \n", pid);
//code to show that the fork failed
//if the process id is less than 0
if(pid<0)
{
return DFG_FAILURE;
}
else if(pid==0)
{
//this statement creates a specified child process
exec_ret = execvp(res[0],res); //child process
DFG_STATUS("Process failed ALERT exec_ret = %d\n", exec_ret);
exit(errno);
}
//code that exits only once a child
//process has been completed
else
{
ret_pid = waitpid(pid, &status, 0);
if ( ret_pid == -1)
{
perror("waitpid");
return DFG_FAILURE;
}
DFG_STATUS("ret_pid = %d, pid = %d, child status = %d\n",ret_pid,pid,status);
if (WIFEXITED(status)) {
DFG_STATUS("child exited, status=%d\n", WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
DFG_STATUS("child killed (signal %d)\n", WTERMSIG(status));
} else if (WIFSTOPPED(status)) {
DFG_STATUS("child stopped (signal %d)\n", WSTOPSIG(status));
#ifdef WIFCONTINUED /* Not all implementations support this */
} else if (WIFCONTINUED(status)) {
DFG_STATUS("child continued\n");
#endif
} else { /* Non-standard case -- may never happen */
DFG_STATUS("Unexpected status (0x%x)\n", status);
}
if(status != 0) /* Child process failed to execute */
return DFG_FAILURE;
result = DFG_SUCCESS;
}
From a existing question here, someone gave this example code:
int status;
child_pid = fork();
if (child_pid == 0) {
// in child; do stuff including perhaps exec
} else if (child_pid == -1) {
// failed to fork
} else {
if (waitpid(child_pid, &status, 0) == child_pid) {
// child exited or interrupted; now you can do something with status
} else {
// error etc
}
}
Could anyone explain to me what the second parameter of waitpid() is used for?
from man pages :
If status is not NULL, wait() and waitpid() store status infor-
mation in the int to which it points. This integer can be
inspected with the following macros (which take the integer
itself as an argument, not a pointer to it, as is done in wait()
and waitpid()!):
WIFEXITED(status)
returns true if the child terminated normally, that is,
by calling exit(3) or _exit(2), or by returning from
main().
WEXITSTATUS(status)
returns the exit status of the child. This consists of
the least significant 8 bits of the status argument that
the child specified in a call to exit(3) or _exit(2) or
as the argument for a return statement in main(). This
macro should only be employed if WIFEXITED returned true.
WIFSIGNALED(status)
returns true if the child process was terminated by a
signal.
WTERMSIG(status)
returns the number of the signal that caused the child
process to terminate. This macro should only be employed
if WIFSIGNALED returned true.
WCOREDUMP(status)
returns true if the child produced a core dump. This
macro should only be employed if WIFSIGNALED returned
true. This macro is not specified in POSIX.1-2001 and is
not available on some Unix implementations (e.g., AIX,
SunOS). Only use this enclosed in #ifdef WCOREDUMP ...
#endif.
WIFSTOPPED(status)
returns true if the child process was stopped by delivery
of a signal; this is only possible if the call was done
using WUNTRACED or when the child is being traced (see
ptrace(2)).
WSTOPSIG(status)
returns the number of the signal which caused the child
to stop. This macro should only be employed if WIF-
STOPPED returned true.
WIFCONTINUED(status)
(since Linux 2.6.10) returns true if the child process
was resumed by delivery of SIGCONT.
So it stores status of the "how the child terminated".
You can use the macros to investigate how exactly the child terminated, and you can define some actions depending to the child's termination status.
It is a bit-field for options, the only one available is WNOWAIT, which means to leave the child in a waitable state; a later wait call can be used to again retrieve the child status information.
See: http://linux.die.net/man/2/waitpid
pid = fork();
if(pid < 0)
{
printf("fork failed\n");
return -1;
}
else if(pid == 0)
{
sleep(5);
printf("Child process\n");
return 2;
}
else
{
printf("Parent process\n");
kill(pid, SIGKILL);
waitpid(pid, &ret, 0);
if(WIFEXITED(ret))
printf("Child process returned normally\n");
if(WIFSIGNALED(ret))
printf("Child process terminated by signal\n");
return 1;
}
As you can see that the return value can be used to check how a particular process terminated and take actions on the basis of that.
If you comment the kill line from the code, the child process will terminate properly.