I am working on a project that requires implementation of a fork() in unix. I read freeBSD and openBSD source code but it is really hard to understand. Can someone please Explain the returning twice concept? I understand that one return is pid of a child, and that gets returned to parent and other one is zero and it gets returned to a child process. But I cannot wrap my head around how to implement this notion of returning twice... how can I return twice? Thanks everyone in advance.
When you call fork, it returns "twice" in that the fork spawns two processes, which each return.
So, if you're implementing fork, you have to create a second process without ending the first. Then the return-twice behavior will happen naturally: each of the two distinct processes will continue execution, only differing in the value they return (the child giving zero, and the parent giving the child's PID).
When you think of a function returning, you have your usual code flow in mind, which starts at the entry point (usually main) and then executes line by line, in a strictly deterministic and linear fashion.
However, in a real-world system, it is possible to have multiple execution contexts which each have their own control flow (and the new C++ standard actually includes that notion). Each separate process is an execution context that starts at main, but you can also create a new execution context from within an existing one (in fact, all operating systems must be able to do that!). fork is one way to create a new execution context, and the entry point of the new context is the point where fork returns. However, the original context also continues running, and it continues as usual after the fork call. The new context is a separate process, and thus fork returns (once) in both contexts.
There are other ways of creating new execution contexts; one is to create a new thread (within the same process) by instantiating a std::thread object or by using a platform-specific function; another is Linux's clone() function, which underlies both the Posix thread implementation and fork in Linux (by creating a new execution path for the kernel's scheduler, and either copying all virtual memory (new process) or not (new thread).
Following I will try to explain how to return twice from a function.
I'm warning you from the start that this is all a hack.
But there are plenty of places that use these sort of hacks.
First let's say we have the following C program.
#include <stdio.h>
uint64_t saved_ret;
int main(int argc, char *argv[])
{
if (saveesp()) {
printf("here! esp = %llX\n", saved_ret);
jmpback();
} else {
printf("there! esp = %llX\n", saved_ret);
}
return 0;
}
Now we want to saveesp() to return twice so that we can reach both printf's.
So here's how saveesp() is implemented:
#define _ENTRY(x) \
.text; .globl x; .type x,#function; x:
#define NENTRY(y) _ENTRY(y)
NENTRY(saveesp)
movq (%rsp), %rax
movq %rax, saved_ret
movl $1, %eax
ret
NENTRY(jmpback)
xorq %rax, %rax
pushq saved_ret
ret
This is in no way portable code. But you can write similar assembly stubs for all the platforms you want to support.
What saveesp() does is, it takes the return address stored on the stack and saves it to a local variable. Afterwards it returns 1. Which is a non-zero return, which takes us to the first printf.
After the printf() we call jmpback(). Which is the actual hack. This function makes it so that it appears that saveesp() returns a second time.
It does this by pushing the saved return address down the stack and doing a ret. The ret will pop the address from the stack and jump to it. The return code is set to zero this time around. So when we 'reach' back to our C routine it appears we've just came back from saveesp() with zero return value. Thus the second printf is reached.
If you're interested in this sort of hacks you should read a bit more about setjmp and longjmp from the C standard that are used to implement exception handling.
Also, we actually use this inside the OpenBSD kernel on the suspend/resume codepath.
Have a look here at lines 231 and 250 it's pretty much the same C code as above. And then have a look at the assembly code here at line 542 is the savecpu function that returns the first time on suspend and at line 375 is where we return the second time around when we come back on resume.
Related
I have seen many posts regarding this question. Many say that exit(EXIT_SUCCESS) should be called for successful termination, and exit(EXIT_FAILURE) for unsuccessful termination.
What I want to ask is: What if we do not call the exit() function and instead what if we write return 0 or return -1? What difference does it make?
What happens if successful termination does not happen? What are its effects?
It is told that if we call exit() functions the program becomes portable --
"portable" in the sense what? How can one function make the entire code portable?
It is told that the execution returns to the parent what happens if the execution does not return to the parent?
My questions may seem to be silly but I need answers for all of these to get rid of my ambiguity between return and exit.
Return returns a value from a function. Exit will exit your program. When called within the main function, they are essentially the same. In fact, most standard libc _start functions call main and then call exit on the result of main anyway.
Nothing, directly. The caller (usually your shell) will get the return value, and be able to know from that whether your program succeeded or not.
I don't know about this. I don't know what you mean here. exit is a standard function, and you may use it if you wish, or you can decide not to. Personally, I prefer to return from main only, and only return error status from other functions, to let the caller decide what to do with it. It's usually bad form to write an API that kills the program on anything but an unrecoverable error that the program can't manage.
If execution didn't return to the parent (which is either a shell or some other program), it would just hang forever. Your OS makes sure that this doesn't happen in most ordinary cases.
I was always a bit hazy on this little bit of C magic. When you call execv, you're "replacing the process image." What exactly does that mean? Just the DATA segment? Everything allocated to the process? The stack? The heap?
My question is about what happens to the storage used by the parameters that you pass to execv? If they were local variables to the function that called execv, then they're on the stack. But if you replace the process image, and call the new process's main() function, bad things would happen when main() returned, because the stack information that points to the return location from the main call was replaced by the new process image.
Same thing for variables, yes? And what if those variables were allocated on the heap?
Inquiring minds are inquiring to anybody who knows.
The exec family of functions replace the process wholesale - data, stack, text, heap, everything. Some file descriptors can stay open (those opened by the original process without FD_CLOEXEC set). But apart from that, you pretty much get a whole new process - see the link for all the details.
What happens to the parameters you passed in is the OS's problem - it has to make sure they're passed to the new process's main function in a way that complies with the standard, but I don't think POSIX dictates exactly how it does that.
For Linux, you can look at the fs/exec.c file to see the implementation. Jump near the end (line 1484 as I post this) to look at the do_execveat_common function which is the main part of the implementation. You'll see the arguments are copied into the new address space (calls to copy_strings near the end of the function).
Just the DATA segment?
No, all memory mappings are erased and re-create for the new executable
Everything allocated to the process? The stack? The heap?
Yes, all memory. Some kernel resources, documented here, are inherited from the parent process though, such as file descriptors. These resources are managed by the kernel, and are not part of the process memory. All of this is quite operating system specific though, it can accomplish this through various means as long as it complies with the mentioned exec() documentation.
what happens to the storage used by the parameters that you pass to execv?
Typically the kernel makes a copy of those arguments, and injects them into the memory of the new executable.
But if you replace the process image, and call the new process's main() function, bad things would happen when main() returned,
No, when main() returns, that process ends. The code and memory of the original process that called exec() doesn't exist any more, there's nothing to return to.
I'm creating a presentation on how to program in C, and since I'm fairly new to C, I want to check whether my assumptions are correct, and what am I missing.
Every C program has to have an entry point for the OS to know where to begin execution. This is defined by the main() function. This function always has a return value, whether it be user defined or an implicit return 0;.
Since this function is returning something, we must define the type of the thing it returns.
This is where my understand starts to get hazy...
Why does the entry point needs to have a return value?
Why does it have to be an int?
What does the OS do with the address of int main() after the program executes?
What happens in that address when say a segfault or some other error halts the program without reaching a return statement?
Every program terminates with an exit code. This exit code is determined by the return of main().
Programs typically return 0 for success or 1 for failure, but you can choose to use exit codes for other purposes.
1 and 2 are because the language says so.
For 3: Most operating systems have some sort of process management, and a process exits by invoking a suitable operating system service to do so, which takes a status value as an argument. For example, both DOS and Linux have "exit" system calls which accept one numeric argument.
For 4: Following from the above, operating systems typically also allow processes to die in response to receiving a signal which is not ignored or handled. In a decent OS you should be able to distinguish whether a process has exited normally (and retrieve its exit status) or been killed because of a signal (and retrieve the signal number). For instance, in Linux the wait system call provides this service.
Exit statuses and signals provide a simple mechanism for processes to communicate with one another in a generic way without the need for a custom communications infrastructure. It would be significantly more tedious and cumbersome to use an OS which didn't have such facilities or something equivalent.
How is fork system call code written . I want to know some details how a function can return two different values and that to two different processes . In short want to know how fork system call is implemented?
Carl's answer was great. I'd like to add that in many operating systems return values are passed in one of the registers. In x86 architecture this register might be eax, In ARM architecture this register might be R0, etc.
Each process also have a Process Control Block (PCB), which store values of registers at the time some interrupt, syscall, or exception happened and control was passed to the OS. The next time the process scheduled, the values of the registers are restored from PCB.
Now, when fork() happens, OS can do:
child_process->PCB[return_value_register] = 0;
parrent_process->PCB[return_value_register] = child_pid;
So, when the processes are rescheduled, each of them see a different return value.
As an example, you can see xv6's implementation of fork. In there, the parent process is still in running state, so it returns parent's return value using simple return statement. But it sets value of EAX register for child process to 0, so when child process is scheduled it sees 0 as return value:
// Clear %eax so that fork returns 0 in the child.
np->tf->eax = 0;
Note that return 0 will also compile to something like "mov eax, 0".
Update: I just implemented fork() for a hobby OS I am doing. You can see the source code here.
You've pretty much explained it by saying that it's a system call. It's the operating system's job to do all that work, and the operating system can pretty much do whatever it wants outside of the context of your program or the rules of whatever language you're implementing it in. Here's a simple example of how it might happen:
Program calls fork() system call
Kernel fork system call duplicates the process running the program
The kernel sets the return value for the system call for the original program and for the duplicate (PID of the duplicate and 0, respectively)
The kernel puts both processes in the scheduler queue
As each process is scheduled, the kernel 'returns' to each of the two programs.
There is a comment in the Unix V6 source code booklet for universities which was annotated by Ken Thompson and Dennis Ritchie themselves describing how the double return actually works. The comment ends with following sentence:
You are not expected to understand this.
In easy way for example process is cloned in fork() function with Moving IP/EIP/RIP register to skip some instruction in functions that can look like:
return pid;
return 0;
First process will execute first instruction and pop function from stack, second process will start but from second instruction returning 0.
I am writing a user space thread library. I have a struct that manages each thread. My threads are very simple, they take a function ptr and its arguments, and just run that function one time.
Each thread has a jmp_buf and I use setjmp and longjmp to switch between threads. One thing I cant figure out is how to tell when this function is finished.
For each thread I modify the jmpbuf in 2 ways.
I edit the PC and set it to the function pointer, so the program counter goes there next.
I also make each one have its own stack and edit SP so it points to that stack
So using my thread control struct I can switch between threads and have each one maintain state, but do not know how to tell when that function is finished. When it is finished i want to call a special exit() function I have.
You could modify the return address on the stack to point to your exit() function, or wrap the function call in another function that calls exit() after it.
Instead of modifying your PC to the user function, you should actually be calling some special function (let's call it run_thread()) that branches to that thread's entry function. When that entry function returns (that is, the thread has exited), run_thread() should do whatever work is required to indicate that this thread is done (probably by removing that thread control block from the scheduling list and adding it to the join() cleanup list). It can then yield and when the parent calls join() on its ID, it will be cleaned up.
It'll try to return to wherever it was called from originally - presumably your create_thread function.