Developing a correct understanding of waitpid() and getpid() - c

I am learning about forks, execl and parent and child processes in my systems programming class. One thing that is confusing me is waitpid() and getpid(). Could someone confirm or correct my understanding of these two functions?
getpid() will return the process ID of whatever process calls it. If the parent calls it, it returns the pid of the parent. Likewise for the child. (It actually returns a value of type pid_t, according to the manpages).
waitpid() seems more complex. I know that if I use it in the parent process, without any flags to prevent it from blocking (using WNOHANG), it will halt the parent process until the child process terminates. I'm a little unsure as to how waitpid() manages all this, however. waitpid() also returns pid_t. What is the value of the pid_t waitpid() returns? How does this change depending on whether or not a parent or child calls it, and whether or not a child process is still running, or has terminated?

Your understanding of getpid is correct, it returns the PID of the running process.
waitpid is used (as you said) to block the execution of a process (unless
WNOHANG is passed) and resume execution when a (or more) child of the process
ends. waitpid returns the pid of the child whose state has changed, -1 on
failure. It also can return 0 if WNOHANG has specified but the child has not
changed the state. See:
man waitpid
RETURN VALUE
waitpid(): on success, returns the process ID of the child whose state has changed; if WNOHANG
was specified and one or more child(ren) specified by pid exist, but have not yet changed state,
then 0 is returned. On error, -1 is returned.
Depending on the arguments passed to waitpid, it will behave differently. Here
I'l quote the man page again:
man waitpid
pid_t waitpid(pid_t pid, int *wstatus, int options);
...
The waitpid() system call suspends execution of the calling process until a child specified by pid argument
has changed state. By default, waitpid() waits only for terminated children, but this behavior is modifiable
via the options argument, as described below:
The value of pid can be:
< -1: meaning wait for any child process whose process group ID is equal to the absolute value of pid.
-1: meaning wait for any child process.
0: meaning wait for any child process whose process group ID is equal to that of the calling process.
> 0: meaning wait for the child whose process ID is equal to the value of pid.
The value of options is an OR of zero or more of the following constants:
WNOHANG: return immediately if no child has exited.
WUNTRACED also return if a child has stopped (but not traced via ptrace(2)).
Status for traced children which have stopped is provided even if this option is not specified.
WCONTINUED (since Linux 2.6.10) also return if a stopped child has been resumed by delivery of SIGCONT.
I'm a little unsure as to how waitpid() manages all this
waitpid is a syscall and the OS handles this.
How does this change depending on whether or not a parent or child calls it, and whether or not a child process is still running, or has terminated?
wait should only be called by a process that has executed fork(). So the parent
process should cal wait()/waitpid. If the child process hasn't called
fork(), then it doesn't need to call either one of these functions. If however
the child process has called fork(), then it also should call
wait()/waitpid().
The behaviour of these function is very well explained in the man page, I quoted the important parts of it. You should read the whole man page
to get a better understanding of it.

waitpid "shall only return the status of a child process" (from the POSIX spec). So the pid_t waitpid returns belongs to one of the current or former children of the process calling waitpid. For example, if a child has recently terminated, it returns that child's PID.
waitpid is only useful when called from a parent process. If called from a process that does not have any children, it returns ECHILD.
waitpid can check the status of children that have terminated, or that has recently stopped or continued (e.g., ^Z from a shell). The various pid/option argument combinations in the spec tell you the various types of information you can return. For example, the WCONTINUED option requests status of recently-continued children instead of recently-terminated children.

Related

Return type of vfork()

From the GNU manual:
The vfork() function has the same effect as fork(2), except that the behavior is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(),
What does it mean? Does it mean the return value of vfork() cannot be assigned to a non-pid_t type variable?
The manual is quite confusing on this. Actually, both processes (the child and the father) shared the same address space, even the stack!
vfork() returns twice:
In the child process, returning 0
When the child is either finished or executed some other program, the second return is done in the father process with the child's process identifier. Meanwhile, the father process was suspended.
The return code of fork()/vfork() is typically stored in a variable (of type pid_t to follow the synopsis of the system calls):
pid_t pid = vfork();
As the address spaces are shared between the father and the child when we are running vfork(), the same variable is modified in both the father and the child! But it is set sequentially to 0 in the child process and after the latter either exits or executes a program, the variable is set a second time but with the child's pid in the father process.
NB: The manual says:
vfork() differs from fork(2) in that the calling thread is suspended
until the child terminates (either normally, by calling _exit(2), or
abnormally, after delivery of a fatal signal), or it makes a call to
execve(2).

How to check if all child processes ended?

I am trying to create an assignment where I want to check if all child process created by students have exited. As I am not calling fork, I don't have access to thread ids. Is there a way to check if the current process doesn't have any children without knowing thread ids of child processes created?
I checked many questions but every solution consists of the use of return value from the fork call. Any help is appreciated.
Thank you.
You can call
int st = waitpid(-1, NULL, WNOHANG);
The first argument tells waitpid() to wait for any child process to exit, not for a specific pid.
The third argument is a flag, that makes waitpid() return immediately instead of blocking.
Now there are three possible outcomes:
return value is -1 and errno is ECHILD: this means, that there is no child process present at all
return value is >0: this denotes, that a child has exited in the past, but the return value was not yet collected (a so-called zombie process). Now iterate the process (call waitpid() again).
return value is 0: in this case, there are child processes available that are still running.
This should cover all cases you ask for.

waitpid returns pid=0 and WIFEXITED=1 how to get pid?

Steps:
Fork and start process in a different program group
Stop process with SIGTSTP
Restart process with SIGCONT
Process ends
Problem:
The SIGCHLD handler has:
waitpid(-1, &status, WNOHANG | WUNTRACED);
upon return pid=0 and WIFEXITED=1
so, the process exited, but I can't get the pid?
I need the pid.
From the man page: "if WNOHANG was specified and one or more child(ren) specified by pid exist, but have not yet changed state, then 0 is returned"But it seems the status has changed to exited.
The status is meaningless if the pid returned was 0. Think about it. A return of 0 means you have one or more children that have yet to change state. What would the state of a child that has yet to change state be? If there were multiple children, which child is the status code referencing?
This is analogous to checking errno on a successful call. Anything from a previous call can be in errno but it has nothing to do with the most recent successful call because errno is usually not set on success.
The return value of waitpid is the PID of the child that was waited for.

What does wait() do on Unix?

I was reading about the wait() function in a Unix systems book. The book contains a program which has wait(NULL) in it. I don't understand what that means. In other program there was
while(wait(NULL)>0)
...which also made me scratch my head.
Can anybody explain what the function above is doing?
man wait(2)
All of these system calls are used to wait for state changes in
a child of the calling process, and obtain information about the child
whose state has changed. A state change is considered to be: the child terminated; the child was stopped by a signal; or the child was resumed by a signal
So wait() allows a process to wait until one of its child processes change its state, exists for example. If waitpid() is called with a process id it waits for that specific child process to change its state, if a pid is not specified, then it's equivalent to calling wait() and it waits for any child process to change its state.
The wait() function returns child pid on success, so when it's is called in a loop like this:
while(wait(NULL)>0)
It means wait until all child processes exit (or change state) and no more child processes are unwaited-for (or until an error occurs)
a quick google suggests, wait(NULL) waits for any of the child processes to complete
wait(NULL) which should be equivalent to waitpid(-1, NULL, 0)
wait(NULL) waits for all the child processes to complete

Ensure PID refers to the correct process

I fork() a parent process to a child, the PID returned by fork() is stored in the parent's memory, then time passes and the child terminates; Now can I determine if the PID value stored in the parent's memory still refers to the same forked child, and how can I ensure that this PID doesn't refer to a different process with the same PID, which may eventually have born after the child terminated?
The operating system cannot reuse the child's PID until the parent has acknowledged that it knows the child has stopped executing.
The parent makes the acknowledgment using the wait and waitpid calls. The children that terminate are kept in a "zombie" state while the parent doesn't call these functions. After these calls return the parent will know that if there's a process running with the same PID that the child had, it's not the child.
For extra safety you might be interested in checking the parent PID of the child process.
You can:
call man 2 wait in parent, to get notification when child dies;
invent your polling protocol between parent and child. If child still the same, it must respond to parent's poll with the same value as it did right after the spawn. You can use some POSIX IPC mechanism for this. This can be useful when your parent has only one execution thread and you can't use threads in parent.

Resources