Fork implementation - c

How is fork system call code written . I want to know some details how a function can return two different values and that to two different processes . In short want to know how fork system call is implemented?

Carl's answer was great. I'd like to add that in many operating systems return values are passed in one of the registers. In x86 architecture this register might be eax, In ARM architecture this register might be R0, etc.
Each process also have a Process Control Block (PCB), which store values of registers at the time some interrupt, syscall, or exception happened and control was passed to the OS. The next time the process scheduled, the values of the registers are restored from PCB.
Now, when fork() happens, OS can do:
child_process->PCB[return_value_register] = 0;
parrent_process->PCB[return_value_register] = child_pid;
So, when the processes are rescheduled, each of them see a different return value.
As an example, you can see xv6's implementation of fork. In there, the parent process is still in running state, so it returns parent's return value using simple return statement. But it sets value of EAX register for child process to 0, so when child process is scheduled it sees 0 as return value:
// Clear %eax so that fork returns 0 in the child.
np->tf->eax = 0;
Note that return 0 will also compile to something like "mov eax, 0".
Update: I just implemented fork() for a hobby OS I am doing. You can see the source code here.

You've pretty much explained it by saying that it's a system call. It's the operating system's job to do all that work, and the operating system can pretty much do whatever it wants outside of the context of your program or the rules of whatever language you're implementing it in. Here's a simple example of how it might happen:
Program calls fork() system call
Kernel fork system call duplicates the process running the program
The kernel sets the return value for the system call for the original program and for the duplicate (PID of the duplicate and 0, respectively)
The kernel puts both processes in the scheduler queue
As each process is scheduled, the kernel 'returns' to each of the two programs.

There is a comment in the Unix V6 source code booklet for universities which was annotated by Ken Thompson and Dennis Ritchie themselves describing how the double return actually works. The comment ends with following sentence:
You are not expected to understand this.

In easy way for example process is cloned in fork() function with Moving IP/EIP/RIP register to skip some instruction in functions that can look like:
return pid;
return 0;
First process will execute first instruction and pop function from stack, second process will start but from second instruction returning 0.

Related

System call: does Read function change process?

enter image description here
I learned that when a system call function is called, the process changes. But what is process B if I call the read function without the fork() function? isn't there is only one process?
On x86-64, there is one specific instruction to do system calls: syscall (https://www.felixcloutier.com/x86/syscall.html). When you call read() in C, it is compiled to placing the proper syscall number in a register along with the arguments you provide and to one syscall instruction. When syscall is executed, it jumps to the address stored in the IA32_LSTAR register. After that, it is in kernel mode executing the kernel's syscall handler.
At that point, it is still in the context of process A. Within its handler, the kernel realizes that you want to read from disk. It will thus start a DMA operation by writing some registers of the hard-disk controller. From there, process A is waiting for IO. There is no point in leaving the core idle so the kernel calls the scheduler and it will probably decide to switch the context of the core to another process B.
When the DMA IO operation is done, the hard-disk controller triggers an interrupt. The kernel thus puts process A back into the ready queue and calls the scheduler which will probably have the effect of switching the context of the core back to process A.
The image you provide isn't very clear so I can understand the confusion. Overall, on most architectures it will work similarly to what is stated above.
The image is somewhat misleading. What actually happens is, the read system call needs to wait for IO. There is nothing else that can be done in the context of process (or thread) A.
So kernel needs to find something else for the CPU to do. Usually there is some other process or processes which do have something to do (not waiting for a system call to return). It could also be another thread of process A that is given time to execute (from kernel point of view, thread and process aren't really much different, actually). There may be several processes which get to execute while process A waits for system call to complete, too.
And if there is nothing else for any other process and thread to do, then kernel will just be idle, let the CPU sleep for a bit, basically save power (especially important on a laptop).
So the image in the question shows just one possible situation.

How does fork() know when to return 0?

Take the following example:
int main(void)
{
pid_t pid;
pid = fork();
if (pid == 0)
ChildProcess();
else
ParentProcess();
}
So correct me if I am wrong, once fork() executes a child process is created. Now going by this answer fork() returns twice. That is once for the parent process and once for the child process.
Which means that two separate processes come into existence DURING the fork call and not after it ending.
Now I don't get it how it understands how to return 0 for the child process and the correct PID for the parent process.
This where it gets really confusing. This answer states that fork() works by copying the context information of the process and manually setting the return value to 0.
First am I right in saying that the return to any function is placed in a single register? Since in a single processor environment a process can call only one subroutine that returns only one value (correct me if I am wrong here).
Let's say I call a function foo() inside a routine and that function returns a value, that value will be stored in a register say BAR. Each time a function wants to return a value it will use a particular processor register. So if I am able to manually change the return value in the process block I am able to change the value returned to the function right?
So am I correct in thinking that is how fork() works?
How it works is largely irrelevant - as a developer working at a certain level (ie, coding to the UNIX APIs), you really only need to know that it works.
Having said that however, and recognising that curiosity or a need to understand at some depth is generally a good trait to have, there are any number of ways that this could be done.
First off, your contention that a function can only return one value is correct as far as it goes but you need to remember that, after the process split, there are actually two instances of the function running, one in each process. They're mostly independent of each other and can follow different code paths. The following diagram may help in understanding this:
Process 314159 | Process 271828
-------------- | --------------
runs for a bit |
calls fork |
| comes into existence
returns 271828 | returns 0
You can hopefully see there that a single instance of fork can only return one value (as per any other C function) but there are actually multiple instances running, which is why it's said to return multiple values in the documentation.
Here's one possibility on how it could work.
When the fork() function starts running, it stores the current process ID (PID).
Then, when it comes time to return, if the PID is the same as that stored, it's the parent. Otherwise it's the child. Pseudo-code follows:
def fork():
saved_pid = getpid()
# Magic here, returns PID of other process or -1 on failure.
other_pid = split_proc_into_two();
if other_pid == -1: # fork failed -> return -1
return -1
if saved_pid == getpid(): # pid same, parent -> return child PID
return other_pid
return 0 # pid changed, child, return zero
Note that there's a lot of magic in the split_proc_into_two() call and it almost certainly won't work that way at all under the covers(a). It's just to illustrate the concepts around it, which is basically:
get the original PID before the split, which will remain identical for both processes after they split.
do the split.
get the current PID after the split, which will be different in the two processes.
You may also want to take a look at this answer, it explains the fork/exec philosophy.
(a) It's almost certainly more complex than I've explained. For example, in MINIX, the call to fork ends up running in the kernel, which has access to the entire process tree.
It simply copies the parent process structure into a free slot for the child, along the lines of:
sptr = (char *) proc_addr (k1); // parent pointer
chld = (char *) proc_addr (k2); // child pointer
dptr = chld;
bytes = sizeof (struct proc); // bytes to copy
while (bytes--) // copy the structure
*dptr++ = *sptr++;
Then it makes slight modifications to the child structure to ensure it will be suitable, including the line:
chld->p_reg[RET_REG] = 0; // make sure child receives zero
So, basically identical to the scheme I posited, but using data modifications rather than code path selection to decide what to return to the caller - in other words, you'd see something like:
return rpc->p_reg[RET_REG];
at the end of fork() so that the correct value gets returned depending on whether it's the parent or child process.
In Linux fork() happens in kernel; the actual place is the _do_fork here. Simplified, the fork() system call could be something like
pid_t sys_fork() {
pid_t child = create_child_copy();
wait_for_child_to_start();
return child;
}
So in the kernel, fork() really returns once, into the parent process. However the kernel also creates the child process as a copy of the parent process; but instead of returning from an ordinary function, it would synthetically create a new kernel stack for the newly created thread of the child process; and then context-switch to that thread (and process); as the newly created process returns from the context switching function, it would make the child process' thread end up returning to user mode with 0 as the return value from fork().
Basically fork() in userland is just a thin wrapper returns the value that the kernel put onto its stack/into return register. The kernel sets up the new child process so that it returns 0 via this mechanism from its only thread; and the child pid is returned in the parent system call as any other return value from any system call such as read(2) would be.
You first need to know how multitasking works. It is not useful to understand all the details, but every process runs in some kind of a virtual machine controlled by the kernel: a process has its own memory, processor and registers, etc. There is mapping of these virtual objects onto the real ones (the magic is in the kernel), and there is some machinery that swap virtual contexts (processes) to physical machine as time pass.
Then, when the kernel forks a process (fork() is an entry to the kernel), and creates a copy of almost everything in the parent process to the child process, it is able to modify everything needed. One of these is the modification of the corresponding structures to return 0 for the child and the pid of the child in the parent from current call to fork.
Note: nether say "fork returns twice", a function call returns only once.
Just think about a cloning machine: you enter alone, but two persons exit, one is you and the other is your clone (very slightly different); while cloning the machine is able to set a name different than yours to the clone.
The fork system call creates a new process and copies a lot of state from the parent process. Things like the file descriptor table gets copied, the memory mappings and their contents, etc. That state is inside the kernel.
One of the things the kernel keeps track for every process are the values of registers this process needs to have restored at the return from a system call, trap, interrupt or context switch (most context switches happen on system calls or interrupts). Those registers are saved on a syscall/trap/interrupt and then restored when returning to userland. System calls return values by writing into that state. Which is what fork does. Parent fork gets one value, child process a different one.
Since the forked process is different from the parent process, the kernel could do anything to it. Give it any values in registers, give it any memory mappings. To actually make sure that almost everything except the return value is the same as in the parent process requires more effort.
For each running process, the kernel has a table of registers, to load back when a context switch is made. fork() is a system call; a special call that, when made, the process gets a context switch and the kernel code executing the call runs in a different (kernel) thread.
The value returned by system calls is placed in a special register (EAX in x86) that your application reads after the call. When the fork() call is made, the kernel makes a copy of the process, and in each table of registers of each process descriptor writes the appropiate value: 0, and the pid.

How do I explain 'main()'?

I'm creating a presentation on how to program in C, and since I'm fairly new to C, I want to check whether my assumptions are correct, and what am I missing.
Every C program has to have an entry point for the OS to know where to begin execution. This is defined by the main() function. This function always has a return value, whether it be user defined or an implicit return 0;.
Since this function is returning something, we must define the type of the thing it returns.
This is where my understand starts to get hazy...
Why does the entry point needs to have a return value?
Why does it have to be an int?
What does the OS do with the address of int main() after the program executes?
What happens in that address when say a segfault or some other error halts the program without reaching a return statement?
Every program terminates with an exit code. This exit code is determined by the return of main().
Programs typically return 0 for success or 1 for failure, but you can choose to use exit codes for other purposes.
1 and 2 are because the language says so.
For 3: Most operating systems have some sort of process management, and a process exits by invoking a suitable operating system service to do so, which takes a status value as an argument. For example, both DOS and Linux have "exit" system calls which accept one numeric argument.
For 4: Following from the above, operating systems typically also allow processes to die in response to receiving a signal which is not ignored or handled. In a decent OS you should be able to distinguish whether a process has exited normally (and retrieve its exit status) or been killed because of a signal (and retrieve the signal number). For instance, in Linux the wait system call provides this service.
Exit statuses and signals provide a simple mechanism for processes to communicate with one another in a generic way without the need for a custom communications infrastructure. It would be significantly more tedious and cumbersome to use an OS which didn't have such facilities or something equivalent.

Fork and returning twice

I am working on a project that requires implementation of a fork() in unix. I read freeBSD and openBSD source code but it is really hard to understand. Can someone please Explain the returning twice concept? I understand that one return is pid of a child, and that gets returned to parent and other one is zero and it gets returned to a child process. But I cannot wrap my head around how to implement this notion of returning twice... how can I return twice? Thanks everyone in advance.
When you call fork, it returns "twice" in that the fork spawns two processes, which each return.
So, if you're implementing fork, you have to create a second process without ending the first. Then the return-twice behavior will happen naturally: each of the two distinct processes will continue execution, only differing in the value they return (the child giving zero, and the parent giving the child's PID).
When you think of a function returning, you have your usual code flow in mind, which starts at the entry point (usually main) and then executes line by line, in a strictly deterministic and linear fashion.
However, in a real-world system, it is possible to have multiple execution contexts which each have their own control flow (and the new C++ standard actually includes that notion). Each separate process is an execution context that starts at main, but you can also create a new execution context from within an existing one (in fact, all operating systems must be able to do that!). fork is one way to create a new execution context, and the entry point of the new context is the point where fork returns. However, the original context also continues running, and it continues as usual after the fork call. The new context is a separate process, and thus fork returns (once) in both contexts.
There are other ways of creating new execution contexts; one is to create a new thread (within the same process) by instantiating a std::thread object or by using a platform-specific function; another is Linux's clone() function, which underlies both the Posix thread implementation and fork in Linux (by creating a new execution path for the kernel's scheduler, and either copying all virtual memory (new process) or not (new thread).
Following I will try to explain how to return twice from a function.
I'm warning you from the start that this is all a hack.
But there are plenty of places that use these sort of hacks.
First let's say we have the following C program.
#include <stdio.h>
uint64_t saved_ret;
int main(int argc, char *argv[])
{
if (saveesp()) {
printf("here! esp = %llX\n", saved_ret);
jmpback();
} else {
printf("there! esp = %llX\n", saved_ret);
}
return 0;
}
Now we want to saveesp() to return twice so that we can reach both printf's.
So here's how saveesp() is implemented:
#define _ENTRY(x) \
.text; .globl x; .type x,#function; x:
#define NENTRY(y) _ENTRY(y)
NENTRY(saveesp)
movq (%rsp), %rax
movq %rax, saved_ret
movl $1, %eax
ret
NENTRY(jmpback)
xorq %rax, %rax
pushq saved_ret
ret
This is in no way portable code. But you can write similar assembly stubs for all the platforms you want to support.
What saveesp() does is, it takes the return address stored on the stack and saves it to a local variable. Afterwards it returns 1. Which is a non-zero return, which takes us to the first printf.
After the printf() we call jmpback(). Which is the actual hack. This function makes it so that it appears that saveesp() returns a second time.
It does this by pushing the saved return address down the stack and doing a ret. The ret will pop the address from the stack and jump to it. The return code is set to zero this time around. So when we 'reach' back to our C routine it appears we've just came back from saveesp() with zero return value. Thus the second printf is reached.
If you're interested in this sort of hacks you should read a bit more about setjmp and longjmp from the C standard that are used to implement exception handling.
Also, we actually use this inside the OpenBSD kernel on the suspend/resume codepath.
Have a look here at lines 231 and 250 it's pretty much the same C code as above. And then have a look at the assembly code here at line 542 is the savecpu function that returns the first time on suspend and at line 375 is where we return the second time around when we come back on resume.

Changing the Fork() system call

Hi I am trying create a system call that will count the number of forks that were called. I was going to change the fork system call so that it has a counter that will keep track of the number of times fork() was invoked. I was planning on adding a static variable to fork.h and then increment that everytime fork.c is called. I just don't understand what is going on in fork.c at all. Is this even the right approach?
The Linux kernel already maintains a count of the total number of forks in the system as a whole.
One of the tasks performed by copy_process(), which does a lot of the work involved in forking, is to increment the total_forks counter.
This counter is exposed to userland as the processes line in /proc/stat (by the code here).
The source code for fork can be found at <linux kernel source tree>/kernel/fork.c file. The function is do_fork. You can add your code right before the else statement which returns errors. Remember that you would have to compile and reboot with this new kernel.

Resources