Recently, when reading a book about linux programming, I got a message that:
The status argument given to _exit() defines the termination status of the process, which is available to the parent of this process when it calls wait(). Although defined as an int, only the bottom 8 bits of status are actually made available to the parent. And only 0 ~ 127 is recommanded to use, because 128 ~ 255 could be confusing in shell due to some reason. Due to that -1 will become 255 in 2's complement.
The above is about the exit status of a child process.
My question is:
Why the parent process only get the 8 bits of the child process's exit status?
What about return value of normal functions? Does it reasonable or befinit to use only 0 ~ 127 ? Because I do use -1 as return value to indicate error sometimes, should I correct that in future.
Update - status get by wait() / waitpid():
I read more chps in the book (TLPI), and found there are more trick in the return status & wait()/waitpid() that worth mention, I should have read more chps before ask the question. Anyhow, I have add an answer by myself to describe about it, in case it might help someone in future.
Why the parent process only get the 8 bits of the child process's exit status?
Because POSIX says so. And POSIX says so because that's how original Unix worked, and many operating system derived from it and modeled after it continue to work.
What about return value of normal functions?
They are unrelated. Return whatever is reasonable. -1 is as good as any other value, and is in fact a standard way to indicate an error in a huge lot of standard C and POSIX APIs.
The answer from #n.m. is good.
But later on, I read more chps in the book (TLPI), and found there are more trick in the return status & wait()/waitpid() that worth mention, and that might be another important or root reason why child process can't use full bits of int when exit.
Wait status
basicly:
child process should exit with value within range of 1 byte, which is set as part of the status parameter of wait() / waitpid(),
and only 2 LSB bytes of the status is used,
byte usage of status:
event byte 1 byte 0
============================================================
* normal termination exit status (0 ~ 255) 0
* killed by signal 0 termination signal (!=0)
* stopped by signal stop signal 0x7F
* continued by signal 0xFFFF
*
dissect returned status:
header 'sys/wait.h', defines a set of macros that help to dissect a wait status,
macros:
* WIFEXITED(status)
return true if child process exit normally,
*
* WIFSIGNALED(status)
return true if child process killed by signal,
* WTERMSIG(status)
return signal number that terminate the process,
* WCOREDUMP(status)
returns ture if child process produced a core dump file,
tip:
this macro is not in SUSv3, might absent on some system,
thus better check whether it exists first, via:
#ifdef WCOREDUMP
// ...
#endif
*
* WIFSTOPPED(status)
return true if child process stopped by signal,
* WSTOPSIG(status)
return signal number that stopp the process,
*
* WIFCONTINUED(status)
return true if child process resumed by signal SIGCONT,
tip:
this macro is part of SUSv3, but some old linux or some unix might didn't impl it,
thus better check whether it exists first, via:
#ifdef WIFCONTINUED
// ...
#endif
*
Sample code
wait_status_test.c
// dissect status returned by wait()/waitpid()
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/wait.h>
#define SLEEP_SEC 10 // sleep seconds of child process,
int wait_status_test() {
pid_t cpid;
// create child process,
switch(cpid=fork()) {
case -1: // failed
printf("error while fork()\n");
exit(errno);
case 0: // success, child process goes here
sleep(SLEEP_SEC);
printf("child [%d], going to exit\n",(int)getpid());
_exit(EXIT_SUCCESS);
break;
default: // success, parent process goes here
printf("parent [%d], child created [%d]\n", (int)getpid(), (int)cpid);
break;
}
// wait child to terminate
int status;
int wait_flag = WUNTRACED | WCONTINUED;
while(1) {
if((cpid = waitpid(-1, &status, wait_flag)) == -1) {
if(errno == ECHILD) {
printf("no more child\n");
exit(EXIT_SUCCESS);
} else {
printf("error while wait()\n");
exit(-1);
}
}
// disset status
printf("parent [%d], child [%d] ", (int)getpid(), (int)cpid);
if(WIFEXITED(status)) { // exit normal
printf("exit normally with [%d]\n", status);
} else if(WIFSIGNALED(status)) { // killed by signal
char *dumpinfo = "unknow";
#ifdef WCOREDUMP
dumpinfo = WCOREDUMP(status)?"true":"false";
#endif
printf("killed by signal [%d], has dump [%s]\n", WTERMSIG(status), dumpinfo);
} else if(WIFSTOPPED(status)) { // stopped by signal
printf("stopped by signal [%d]\n", WSTOPSIG(status));
#ifdef WIFCONTINUED
} else if(WIFCONTINUED(status)) { // continued by signal
printf("continued by signal SIGCONT\n", WSTOPSIG(status));
#endif
} else { // this should never happen
printf("unknow event\n");
}
}
return 0;
}
int main(int argc, char *argv[]) {
wait_status_test();
return 0;
}
Compile:
gcc -Wall wait_status_test.c
Execute:
./a.out and wait it to terminate normally, child process id is printed after fork(),
./a.out, then kill -9 <child_process_id> before it finish sleep,
./a.out, then kill -STOP <child_process_id> before it finish sleep, then kill -CONT <child_process_id> to resume it,
Related
I came across some code in C where we check the return value of wait and if it's not an error there's yet another check of WIFEXITED and WIFEXITSTATUS. Why isn't this redundant? As far as I understand wait returns -1 if an error occurred while WIFEXITED returns non-zero value if wait child terminated normally. So if there wasn't any error in this line if ( wait(&status) < 0 ) why would anything go wrong durng WIFEXITED check?
This is the code:
#include <stdio.h>
#include <signal.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <unistd.h>
#define CHILDREN_NUM 5
int main () {
int i, status, pid, p;
for(i = 0; (( pid = fork() ) < 0) && i < CHILDREN_NUM;i++)
sleep(5);
if ( pid == 0 )
{
printf(" Child %d : successfully created!\n",i);
exit( 0 ); /* Son normally exits here! */
}
p = CHILDREN_NUM;
/* The father waits for agents to return succesfully */
while ( p >= 1 )
{
if ( wait(&status) < 0 ) {
perror("Error");
exit(1);
}
if ( ! (WIFEXITED(status) && (WEXITSTATUS(status) == 0)) ) /* kill all running agents */
{
fprintf( stderr,"Child failed. Killing all running children.\n");
//some code to kill children here
exit(1);
}
p--;
}
return(0);
}
wait returning >= 0 tells you a child process has terminated (and that calling wait didn't fail), but it does not tell you whether that process terminated successfully or not (or if it was signalled).
But, here, looking at your code, it's fairly obvious the program does care about whether the child process that terminated did so successfully or not:
fprintf( stderr,"Child failed. Killing all running children.\n");
So, the program needs to do further tests on the status structure that was populated by wait:
WIFEXITED(status): did the process exit normally? (as opposed to being signalled).
WEXITSTATUS(status) == 0: did the process exit with exit code 0 (aka "success"). For more information, see: Meaning of exit status 1 returned by linux command.
wait(&status) waits on the termination of a child process. The termination may be due to a voluntary exit or due to the receipt of an unhandled signal whose default disposition is to terminate the process.
WIFEXITED(status) and WIFSIGNALED(status) allow you to distinguish the two* cases, and you can later use either WEXITSTATUS or WTERMSIG to retrieve either the exit status (if(WIFEXITED(status)) or termination signal (if(WIFSIGNALED(status)).
*with waitpid and special flags (WUNTRACED, WCONTINUED), you can also wait on child process stops and resumptions, which you can detect with WIFSTOPPED or WIFCONTINUED (linux only) respectively. See waitpid(2) for more information.
I'm running a Java program as a daemon on Linux using Apache commons-daemon's jsvc.
The daemon "randomly" crashes with only message:
jsvc.exec error: Service did not exit cleanly
This is the relevant part of the code in jsvc (in jsvc-unix.c line 1142):
while (waitpid(pid, &status, 0) != pid) {
/* Waith for process */
}
/* The child must have exited cleanly */
if (WIFEXITED(status)) {
status = WEXITSTATUS(status);
// Clean exit code...
}
else {
if (WIFSIGNALED(status)) {
log_error("Service killed by signal %d", WTERMSIG(status));
/* prevent looping */
if (laststart + 60 > time(NULL)) {
log_debug("Waiting 60 s to prevent looping");
sleep(60);
}
continue;
}
log_error("Service did not exit cleanly", status);
return 1;
}
In which case can WIFEXITED and WIFSIGNALED both be false ?
Is it guaranteed that the process was not killed in this case (by a process or Linux OOM killer) ?
WIFSTOPPED exists too, but it's only possible if the parent is ptrace:ing the child process (or with different flags to waitpid).
I think your best bet is to print the status and the look at the bits in sys/wait.h. It's quite hard to get it right though. Much information is being stuffed into that int and it's hard to figure it out. It looks like the code you pasted already tries to do that, but forgot the %d in the format string.
I have this code:
int status;
t = wait(&status);
When the child process works, the value of status is 0, well.
But why does it return 256 when it doesn't work? And why changing the value of the argument given to exit in the child process when there is an error doesn't change anything (exit(2) instead of exit(1) for example)?
Edit : I'm on linux, and I compiled with GCC.
Given code like this...
int main(int argc, char **argv) {
pid_t pid;
int res;
pid = fork();
if (pid == 0) {
printf("child\n");
exit(1);
}
pid = wait(&res);
printf("raw res=%d\n", res);
return 0;
}
...the value of res will be 256. This is because the return value from wait encodes both the exit status of the process as well as the reason the process exited. In general, you should not attempt to interpret non-zero return values from wait directly; you should use of the WIF... macros. For example, to see if a process exited normally:
WIFEXITED(status)
True if the process terminated normally by a call to _exit(2) or
exit(3).
And then to get the exit status:
WEXITSTATUS(status)
If WIFEXITED(status) is true, evaluates to the low-order 8 bits
of the argument passed to _exit(2) or exit(3) by the child.
For example:
int main(int argc, char **argv) {
pid_t pid;
int res;
pid = fork();
if (pid == 0) {
printf("child\n");
exit(1);
}
pid = wait(&res);
printf("raw res=%d\n", res);
if (WIFEXITED(res))
printf("exit status = %d\n", WEXITSTATUS(res));
return 0;
}
You can read more details in the wait(2) man page.
The status code contains various information about how the child process exited. Macros are provided to get information from the status code.
From wait(2) on linux:
If status is not NULL, wait() and waitpid() store status information in the int to which it points. This integer can be inspected with the following macros (which take the integer itself as an argu-
ment, not a pointer to it, as is done in wait() and waitpid()!):
WIFEXITED(status)
returns true if the child terminated normally, that is, by calling exit(3) or _exit(2), or by returning from main().
WEXITSTATUS(status)
returns the exit status of the child. This consists of the least significant 8 bits of the status argument that the child specified in a call to exit(3) or _exit(2) or as the argument for a
return statement in main(). This macro should be employed only if WIFEXITED returned true.
WIFSIGNALED(status)
returns true if the child process was terminated by a signal.
WTERMSIG(status)
returns the number of the signal that caused the child process to terminate. This macro should be employed only if WIFSIGNALED returned true.
WCOREDUMP(status)
returns true if the child produced a core dump. This macro should be employed only if WIFSIGNALED returned true. This macro is not specified in POSIX.1-2001 and is not available on some UNIX
implementations (e.g., AIX, SunOS). Only use this enclosed in #ifdef WCOREDUMP ... #endif.
WIFSTOPPED(status)
returns true if the child process was stopped by delivery of a signal; this is possible only if the call was done using WUNTRACED or when the child is being traced (see ptrace(2)).
WSTOPSIG(status)
returns the number of the signal which caused the child to stop. This macro should be employed only if WIFSTOPPED returned true.
WIFCONTINUED(status)
(since Linux 2.6.10) returns true if the child process was resumed by delivery of SIGCONT.
SUGGESTION: try one of the following "Process Completion Status" macros:
http://www.gnu.org/software/libc/manual/html_node/Process-Completion-Status.html
EXAMPLE:
int status = 0;
..
int retval = wait (&status);
if (WIFEXITED(status))
printf("OK: Child exited with exit status %d.\n", WEXITSTATUS(status));
else
printf("ERROR: Child has not terminated correctly.\n");
void main ( )
{ int x;
signal (SIGUSR1, f);
x= fork ( );
if (x == -1) exit (1);
if (x != 0)
{ kill (x, SIGUSR1) ;
sleep (2);
exit (0);
}
}
void f ( )
{
printf ("signal received");
exit (0);
}
I think that the program above asks the system to launch the f function ( which displays "signal received" ) when the SIGUSR1 signal is received by the parent process. but I'm not sure about that, please feel free to correct or to give more details. Thank for the help !
There are some mistakes in your code:
Avoid calling printf( ) function in signal handler. SIGNAL(7) manual provides a list of authorized functions calling them is safe inside signal-handlers. Read:
Async-signal-safe functions
A signal handler function must be very careful, since processing
elsewhere may be interrupted at some arbitrary point in the execution
of the program. POSIX has the concept of "safe function". If a
signal interrupts the execution of an unsafe function, and handler
calls an unsafe function, then the behavior of the program is
undefined.
Use return type of main() int; read "What should main() return in C?"
x should be pid_t. (Process Identification).
Now lets suppose your program compile and run (not interrupted by any other signal while handler executing):
I am just indenting your code and shifting f() function definition before main because function declaration is missing, also adding some comments that you should read:
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <signal.h>
void f( )
{
printf ("signal received\n");
exit (0);//if child receives signal then child exits from here
} // ******* mostly happens
int main ( )
{ int x;
signal (SIGUSR1, f);// First: handler registered
x = fork ( ); // Second: Child created, now child will execute in
if (x == -1) // parallel with parent
exit (1);
if (x != 0) {// parent
kill(x, SIGUSR1) ; // Third: sends signal to child
sleep(2); // Forth: parent sleep
}
return 0; // parent always exits
// Child can exits from here ****
// if signal sent from parent delayed
}
In main() function, you registers f() function for SIGUSR1 signal and after that calls fork() to create a new process. In runtime as fork() function returns a child-process starts executing in parallel with parent process.
As I can see your code, I think that you understands that child-process is copy of parent-process except values of variables can be different from the point fork() returns and hence x is different in child and parent process. We can use the return value from fork to tell whether the program is running in the parent-process or in child. But note that it is not parent but actually the child-process that receives signal SIGUSR1. Value of self process id is always 0 for any process. You checks the return value x = fork() that is pid of newly created child-process, in child-process value of x is 0 and in parent x != 0. Hence signal is sent from parent process to child process.
Your comments:
I think that the program above asks the system to launch the f( ) function ( which displays "signal received") when the SIGUSR1 signal is received by the parent process.
I have impression that you don't consider that both processes execute concurrently and "it can be happen that soon after fork() create a child-process, child-process start executing and immediately terminate before parent-process can send a signal to child (or child-process can receive the signal)". In that case, function f() will never get a chance to execute and printf in signal handler never prints.
But the possibility of what I have described just above is very low because fork takes time to create a new process. And even if you execute the code again and again most of the times signal sent from the parent process will execute signal-handler.
What is correct way of writing this code?
Code is x.c: Correct way is set a flag that indicates that signal handler executed and then call printf function on the bases of flag value outside signal-handler as I have described in my answer: How to avoid using printf in a signal handler? And reason behind it explained by Jonathan Leffler in his answer.
#define _POSIX_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/wait.h>
#include <signal.h>
volatile sig_atomic_t flag = 0; //initially flag == 0
void f(){
flag = 1; // flag is one if handler executes
}
int main(void){
pid_t x;
(void) signal (SIGUSR1, f);
x = fork();
if(x == -1)
exit(EXIT_FAILURE);
if (x != 0){// parent
kill(x, SIGUSR1);
sleep(1);
}
if(flag)//print only if signal caught and flag == 1
printf("%s signal received \n", x == 0 ? "Child:" : "Parent:");
return EXIT_SUCCESS;
}
Now compile it and execute:
#:~$ gcc -Wall -pedantic -std=c99 x.c -o x
#:~$ ./x
Child: signal received
#:~$
Notice child-process prints because parent sends signal to child(but parent process doesn't prints as no signal catch in parent). So behavior of above code still similar as you was getting in your code. Below I have added one more example in which I am trying to demonstrate that 'concurrent execution of processes results different at different instance of execution'(read comments).
// include header files...
volatile sig_atomic_t flag = 0;
void f(){
flag = 1;
}
int main(void){
pid_t x;
(void) signal (SIGUSR1, f);
(void) signal (SIGUSR2, f); // added one more signal
x= fork ( );
if(x == -1)
exit(EXIT_FAILURE);
if (x != 0){// parent
kill(x, SIGUSR1);
while(!flag); // in loop until flag == 0
}
if (x == 0){//child
kill(getppid(), SIGUSR2); // send signal to parent
while(!flag); // in loop until flag == 0
}// ^^^^ loop terminates just after signal-handler sets `flag`
if(flag)
printf("%s signal received \n", x == 0 ? "Child:" : "Parent:");
return EXIT_SUCCESS;
}
In above code, two signals are registered in both parent and child process. Parent process doesn't sleeps but busy in a while loop until a signal sets flag. Similarly child-process has a loop that breaks as flag becomes 1 in signal-handler. Now compile this code and run repeatedly. I frequently tried an got following output in my system.
#:~$ gcc -Wall -pedantic -std=c99 x.c -o x
#:~$ ./x
Child: signal received
Parent: signal received
#:~$ ./x
Child: signal received
Parent: signal received
#:~$ ./x
Child: signal received
Parent: signal received
#:~$ ./x
Parent: signal received // <------
#:~$ Child: signal received
./x
Child: signal received
Parent: signal received
#:~$ ./x
Parent: signal received // <------
#:~$ Child: signal received
#:~$
Notice output, one case is: "till child process created parent sent signal and enter in while-loop and when child-process get chance to execute(depends on CPU scheduling) it send back a signal to parents and before parent process get chance to execute child receives signal and prints message". But it also happens sometimes that before child printf print; parent receives and print message (that is I marked using arrow).
In last example I am trying to show child-process executes in parallel with parent- process and output can be differs if you don't applies concurrency control mechanism.
Some good resource To learn signals (1) The GNU C Library: Signal Handling
(2) CERT C Coding Standard 11. Signals (SIG).
One problem is that the child process doesn't do anything, but will return immediately from the main function, maybe before the parent process can send the signal.
You might want to call e.g. pause in the child.
For the sake of the exercise, here is a corrected version of the original code which will compile and run.
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <bits/signum.h>
void f ( );
int main ( )
{ int x;
signal (SIGUSR1, f);
x= fork ( );
if (x == -1) exit (1);
if (x != 0)
{ kill (x, SIGUSR1) ;
sleep (2);
exit (0);
}
}
void f ( )
{
printf ("signal received\n");
exit (0);
}
This does exactly what the original question suggested the program should do. Try it out and see what happens if you don't believe me.
BTW: I'm not very experienced at C. There are a number of comments asserting that using printf() in the child process is unsafe. Why?? The child process is a duplicate of the parent including virtual address space. So why is printf() unsafe?
I'm playing with waitpid() and signal() and I'm looking for reliable test cases for returning WIFSIGNALED(status) = WIFSTOPPED(status) = WIFCONTINUED (status) = true but can't find any...
Care to tell me how can I make sure those return true so I can debug my code?
Also, a few hints about what signals should I catch with signal() to test those macros would be helpful...
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
#define NELEMS(x) (sizeof (x) / sizeof (x)[0])
static void testsignaled(void) {
kill(getpid(), SIGINT);
}
static void teststopped(void) {
kill(getpid(), SIGSTOP);
}
static void testcontinued(void) {
kill(getpid(), SIGSTOP);
/* Busy-work to keep us from exiting before the parent waits.
* This is a race.
*/
alarm(1);
while(1) {}
}
int main(void) {
void (*test[])(void) = {testsignaled, teststopped, testcontinued};
pid_t pid[NELEMS(test)];
int i, status;
for(i = 0; i < sizeof test / sizeof test[0]; ++i) {
pid[i] = fork();
if(0 == pid[i]) {
test[i]();
return 0;
}
}
/* Pause to let the child processes to do their thing.
* This is a race.
*/
sleep(1);
/* Observe the stoppage of the third process and continue it. */
wait4(pid[2], &status, WUNTRACED, 0);
kill(pid[2], SIGCONT);
/* Wait for the child processes. */
for(i = 0; i < NELEMS(test); ++i) {
wait4(pid[i], &status, WCONTINUED | WUNTRACED, 0);
printf("%d%s%s%s\n", i, WIFCONTINUED(status) ? " CONTINUED" : "", WIFSIGNALED(status) ? " SIGNALED" : "", WIFSTOPPED(status) ? " STOPPED" : "");
}
return 0;
}
Handling WIFSIGNALED is easy. The child process can commit suicide with the kill() system call. You can also check for core dumps - some signals create them (SIGQUIT, IIRC); some signals do not (SIGINT).
Handling WIFSTOPPED may be harder. The simple step to try is for the child to send itself SIGSTOP with the kill() system call again. Actually, I think that should work. Note that you may want to check on SIGTTIN and SIGTTOU and SIGTSTOP - I believe they count for WIFSTOPPED. (There's also a chance that SIGSTOP only works sanely when sent by a debugger to a process it is running via the non-POSIX system call, ptrace().)
Handling WIFCONTINUED is something that I think the parent has to do; after you detect a process has been stopped, your calling code should make it continue by sending it a SIGCONT signal (kill() again). The child can't deliver this itself; it has been stopped. Again, I'm not sure whether there are extra wrinkles to worry about - probably.
A framework something like the below will allow you check the results of the wait() and waitpid() calls.
pid_t pid = fork();
if (pid == 0) {
/* child */
sleep(200);
}
else {
/* parent */
kill(pid, SIGSTOP);
/* do wait(), waitpid() stuff */
}
You do not actually have to catch the signals (using signal() or related function) that are sent. signal() installs a handler that overrides the default behavior for the specific signal - so if you want to check for a signal terminating your process, pick one that has that default behavior - "man -s7 signal" will give you details a signal's default behavior.
For the macros you have mentioned use SIGSTOP for WIFSTOPPED(status), SIGCONT for WIFCONTINUED (status) and SIGINT for WIFSIGNALED(status)
If you want more flexibility for testing, you could use kill (see "man kill") to send signals to your process. kill -l will list all the signals that can be sent.
in your tests you can fork() and send specific signal to your child processes? In this scenario your child processes are test cases?
EDIT
my answer is about coding a C test. you fork, get the pid of your child process (the process
with signal handlers installed), then you can send signal to it by using kill(2).
In this way you can test the exit status