I have an assignment to write a program to use fork off a children. That child will the fork off its own child (grandchild of the original parent). The grandchild should exec() to do a ps -ef (for example). The child should wait for its child (grandchild of original parent) to finish successfully. If it didn't finish successfully (I assume that the status return code is 0), it should spawn off another grandchild until it is successful. Once this is complete, it should send the SIGINT signal to its parent.
This is what I was doing, the second time I fork the grandchild, I exec as specified. Here, I set up a signal handler too. In the child, I wait (wait(&status)) and loop while (status != 0). That was the idea.
But, still, I couldn't get the program works. I guess I have the problem with signal handling (?) Can you give me a hint?
The return value of things that don't succeed is -1. It is never a positive value in C.
The exec() functions only return if an error has occurred. The return value is -1, and errno is set to indicate the error.
Basically what you want to do is wait on your grandchild in your child. If your grandchild calls exec, and it fails, you will return from exec and continue execution in that program(grandchild). If the grandchild succeeds, you won't get an indication except your child process will start again. In your gradchild process, if you fail, you want to throw a signal up to your parent. and when the parent starts, you can check to see if this signal was received. If not, you know it completed successfully.
Related
I am learning about forks, execl and parent and child processes in my systems programming class. One thing that is confusing me is waitpid() and getpid(). Could someone confirm or correct my understanding of these two functions?
getpid() will return the process ID of whatever process calls it. If the parent calls it, it returns the pid of the parent. Likewise for the child. (It actually returns a value of type pid_t, according to the manpages).
waitpid() seems more complex. I know that if I use it in the parent process, without any flags to prevent it from blocking (using WNOHANG), it will halt the parent process until the child process terminates. I'm a little unsure as to how waitpid() manages all this, however. waitpid() also returns pid_t. What is the value of the pid_t waitpid() returns? How does this change depending on whether or not a parent or child calls it, and whether or not a child process is still running, or has terminated?
Your understanding of getpid is correct, it returns the PID of the running process.
waitpid is used (as you said) to block the execution of a process (unless
WNOHANG is passed) and resume execution when a (or more) child of the process
ends. waitpid returns the pid of the child whose state has changed, -1 on
failure. It also can return 0 if WNOHANG has specified but the child has not
changed the state. See:
man waitpid
RETURN VALUE
waitpid(): on success, returns the process ID of the child whose state has changed; if WNOHANG
was specified and one or more child(ren) specified by pid exist, but have not yet changed state,
then 0 is returned. On error, -1 is returned.
Depending on the arguments passed to waitpid, it will behave differently. Here
I'l quote the man page again:
man waitpid
pid_t waitpid(pid_t pid, int *wstatus, int options);
...
The waitpid() system call suspends execution of the calling process until a child specified by pid argument
has changed state. By default, waitpid() waits only for terminated children, but this behavior is modifiable
via the options argument, as described below:
The value of pid can be:
< -1: meaning wait for any child process whose process group ID is equal to the absolute value of pid.
-1: meaning wait for any child process.
0: meaning wait for any child process whose process group ID is equal to that of the calling process.
> 0: meaning wait for the child whose process ID is equal to the value of pid.
The value of options is an OR of zero or more of the following constants:
WNOHANG: return immediately if no child has exited.
WUNTRACED also return if a child has stopped (but not traced via ptrace(2)).
Status for traced children which have stopped is provided even if this option is not specified.
WCONTINUED (since Linux 2.6.10) also return if a stopped child has been resumed by delivery of SIGCONT.
I'm a little unsure as to how waitpid() manages all this
waitpid is a syscall and the OS handles this.
How does this change depending on whether or not a parent or child calls it, and whether or not a child process is still running, or has terminated?
wait should only be called by a process that has executed fork(). So the parent
process should cal wait()/waitpid. If the child process hasn't called
fork(), then it doesn't need to call either one of these functions. If however
the child process has called fork(), then it also should call
wait()/waitpid().
The behaviour of these function is very well explained in the man page, I quoted the important parts of it. You should read the whole man page
to get a better understanding of it.
waitpid "shall only return the status of a child process" (from the POSIX spec). So the pid_t waitpid returns belongs to one of the current or former children of the process calling waitpid. For example, if a child has recently terminated, it returns that child's PID.
waitpid is only useful when called from a parent process. If called from a process that does not have any children, it returns ECHILD.
waitpid can check the status of children that have terminated, or that has recently stopped or continued (e.g., ^Z from a shell). The various pid/option argument combinations in the spec tell you the various types of information you can return. For example, the WCONTINUED option requests status of recently-continued children instead of recently-terminated children.
I was reading about the wait() function in a Unix systems book. The book contains a program which has wait(NULL) in it. I don't understand what that means. In other program there was
while(wait(NULL)>0)
...which also made me scratch my head.
Can anybody explain what the function above is doing?
man wait(2)
All of these system calls are used to wait for state changes in
a child of the calling process, and obtain information about the child
whose state has changed. A state change is considered to be: the child terminated; the child was stopped by a signal; or the child was resumed by a signal
So wait() allows a process to wait until one of its child processes change its state, exists for example. If waitpid() is called with a process id it waits for that specific child process to change its state, if a pid is not specified, then it's equivalent to calling wait() and it waits for any child process to change its state.
The wait() function returns child pid on success, so when it's is called in a loop like this:
while(wait(NULL)>0)
It means wait until all child processes exit (or change state) and no more child processes are unwaited-for (or until an error occurs)
a quick google suggests, wait(NULL) waits for any of the child processes to complete
wait(NULL) which should be equivalent to waitpid(-1, NULL, 0)
wait(NULL) waits for all the child processes to complete
I have understood that:
1) waitpid is used to wait for a child's death and then collect the SIGCHLD and the exit status of the child etc.
2) When we have a signal handler for SIGCHLD, we do some more things related to cleanup of child or other stuff (upto the programmer) and then do a waitpid so that the child will not go zombie and then return.
Now, do we need to have both 1 and 2 in our programs when we do a fork/exec and the child returns ?
If we have both, the SIGCHLD is obtained first, so the signal handler is called first and thus its waitpid is called successfully and not the waitpid in the parent process code as follows:
my_signal_handler_for_sigchld
{
do something
tmp = waitpid(-1,NULL,0);
print tmp (which is the correct value of the child pid)
}
int main ()
{
sigaction(SIGCHLD, my_signal_handler_for_sigchld)
fork()
if (child) //do something, return
if parent // waitpid(child_pid, NULL,0); print value returned from this waitpid - it is -1
}
Appreciate if someone helps me understand this.
You really don't need to handle SIGCHLD if your intent is to run a child process, do some stuff, then wait for it to finish. In that case, you just call waitpid when you're ready to synchronize. The only thing SIGCHLD is useful for is asynchronous notification of child termination, for example if you've got an interactive (or long-running daemon) application that's spawning various children and needs to know when they finish. However, SIGCHLD is really bad/ugly for this purpose too, since if you're using library code that creates child processes, you might catch the events for the library's children terminating and interfere with its handling of them. Signal handlers are inherently process-global and deal with global state, which is usually A Bad Thing(tm).
Here are two better approaches for when you have child processes that will be terminating asynchronously:
Approach 1 (select/poll event-based): Make sure you have a pipe to/from each child process you create. It can be either their stdin/stdout/stderr or just an extra dummy fd. When the child process terminates, its end of the pipe will be closed, and your main event loop will detect the activity on that file descriptor. From the fact that it closed, you recognize that the child process died, and call waitpid to reap the zombie.
Approach 2 (thread based): For each child process you create, also create a thread that will immediately call waitpid on the child process's pid. When waitpid returns successfully, use your favorite thread synchronization primitives to let the rest of the program know that the child terminated, or simply take care of everything you need to do in this waiter thread before it terminates.
Both of these approaches are modular and library-friendly (they avoid interfering with any other parts of your code or library code which might be making use of child processes).
You need to call the waiting syscalls like waitpid or friends -eg wait4 etc- othewise you could have zombie processes.
You could handle SIGCHLD to be notified that some child ended (or stopped, etc...) but you'll need to wait for it later.
Signal handlers are restricted to call a small set of async-signal-safe-functions (see signal(7) for more). Good advice is to just set a volatile sig_atomic_t flag inside, and test it at later and safer places.
Heres a breakdown of my code.
I have a program that forks a child (and registers the child's pid in a file) and then does its own thing. The child becomes any program the programmer has dignified with argv. When the child is finished executing, it sends a signal (using SIGUSR1) back to the parent processes so the parent knows to remove the child from the file. The parent should stop a second, acknowledge the deleted entry by updating its table, and continue where it left off.
pid = fork();
switch(pid){
case -1:{
exit(1);
}
case 0 :{
(*table[numP-1]).pid = getpid(); //Global that stores pids
add(); //saves table into a text file
freeT(table); //Frees table
execv(argv[3], &argv[4]); //Executes new program with argv
printf("finished execution\n");
del(getpid()); //Erases pid from file
refreshReq(); //Sends SIGUSR1 to parent
return 0;
}
default:{
... //Does its own thing
}
}
The problem is that the after execv successfully starts and finishes (A printf statement before the return 0 lets me know), I do not see the rest of the commands in the switch statement being executed. I am wondering if the execv has like a ^C command in it which kills the child when it finishes and thus never finishes the rest of the commands. I looked into the man pages but did not find anything useful on the subject.
Thanks!
execv replaces the currently executing program with a different one. It doesn't restore the old program once that new program is done, hence it's documented "on success, execv does not return".
So, you should see your message "finished execution" if and only if execv fails.
execv replaces the current process with a new one. In order to spawn a new process, you can use e.g. system(), popen(), or a combination of fork() and exec()
Other people have already explained what execv and similar functions do, and why the next line of code is never executed. The logical next question is, so how should the parent detect that the child is done?
In the simple cases where the parent should do absolutely nothing while the child is running, just use system instead of fork and exec.
Or if the parent will do something else before the child exits, these are the key points:
When the child exits, the parent will get SIGCHLD. The default handler for SIGCHLD is ignore. If you want to catch that signal, install a handler before calling fork.
After a child has exited, the parent should call waitpid to clean up the child and find out what its exit status was.
The parent can also call wait or waitpid in a blocking mode to wait until a child exits.
The parent can also call waitpid in a non-blocking mode to find out whether the child has exited yet.
What did you expect to happen? This is what execv does. Please read the documentation which says:
The exec() family of functions replaces the current process image with a new process image.
Perhaps you were after system or something, to ask the environment to spawn a new process in addition to the current one. Or.. isn't that what you already achieved through fork? It's hard to see what you want to accomplish here.
In my simple custom shell I'm reading commands from the standard input and execute them with execvp(). Before this, I create a fork of the current process and I call the execvp() in that child process, right after that, I call exit(0).
Something like this:
pid = fork();
if(pid == -1) {
perror("fork");
exit(1);
}
if(pid == 0) {
// CHILD PROCESS CODE GOES HERE...
execvp(pArgs[0], pArgs);
exit(0);
} else {
// PARENT PROCESS CODE GOES HERE...
}
Now, the commands run with execvp() can return errors right? I want to handle that properly and right now, I'm always calling exit(0), which will mean the child process will always have an "OK" state.
How can I return the proper status from the execvp() call and put it in the exit() call? Should I just get the int value that execvp() returns and pass it as an exit() argument instead of 0. Is that enough and correct?
You need to use waitpid(3) or wait(1) in the parent code to wait for the child to exit and get the error message.
The syntax is:
pid_t waitpid(pid_t pid, int *status, int options);
or
pid_t wait(int *status);
status contains the exit status. Look at the man pages to see you how to parse it.
Note that you can't do this from the child process. Once you call execvp the child process dies (for all practical purposes) and is replaced by the exec'd process. The only way you can reach exit(0) there is if execvp itself fails, but then the failure isn't because the new program ended. It's because it never ran to begin with.
Edit: the child process doesn't really die. The PID and environment remain unchanged, but the entire code and data are replaced with the exec'd process. You can count on not returning to the original child process, unless exec fails.
From your question it's a little hard to figure out what you're asking. So I'll try to cover a couple of the related issues:
execvp() either does not return (on success), or it returns an error. Meaning your child code only need handle error conditions. Your child code should capture the result of execvp() and use that value in exit() as you suggested. Your child code should never return 0, since the only success means that the execvp worked and that processs will return 0 (or not).
The parent can obtain child info from waitpid() about it's exit status. There are several macros defined to pull info from the returned status parameter. Notable for your purpose are WIFEXITED to tell you if the child exited "normally", and WEXITSTATUS to get the child's status as passed to exit(). See the waitpid man page for other macros.
Use wait() or waitpid() in the parent process. An example here: Return code when OS kills your process.
Also, when a child dies the SIGCHLD signal is sent to the parent process.