Consider this simple code:
int myvar = 0;
int main() {
if (fork()>0) {
myvar++;
} else {
// father do nothing
}
}
When child increments myvar, is the value shared with the father (like pthread)?
No and yes.
No, they are not shared in any way which is visible to the programmer; the processes can modify their own copies of the variables independently and they will change without any noticable effect on the other process(es) which are fork() parents, siblings or descendents.
But yes, the OS actually does share the pages initially, because fork implements copy-on-write which means that provided none of the processes modifies the pages, they are shared. This is, however, an optimisation which can be ignored.
If you wanted to have shared variables, put them in an anonymous shared mapping (see mmap()) in which case they really will get shared, with all the caveats which come with that.
fork()ing creates an exact copy of the parent process at the time of forking. However, after the fork() is completed, the child has a completely different existence, and will not report back to the parent.
In other words, no, the parent's global variables will not be altered by changes in the child.
After fork(), the entire process, including all global variables, is duplicated. The child is an exact replica of the parent, except that it has a different PID(Process Id), a different parent, and fork() returned 0. Global variables are still global within its own process. So the answer is no, global variables are not shared between processes after a call to fork().
No, since global variables are not shared between processes unless some IPC mechanism is implemented. The memory space will be copied. As a consequence, the global variable in both processes will have the same value inmediately after fork, but if one changes it, the other wont see it changed.
Threads on the other hand do share global variables.
Related
I have a global boolean variable that both parent and child process can write to. The child process is just forked (no exec() called).
In C/Linux, how do I go about synchronizing access to this global? In C/C++ threading world, I could have used a mutex.
After calling fork, each process has its own copy of any variables in use. The global variable in the parent and the global in the child are completely distinct from each other, so they can't be used as a common variable.
If you want the two processes to share data, you would need to either create a pipe using the pipe function to pass data between the two, or you would need to create a shared memory segment that both processes would have access to.
I've tried to look this up, but I'm struggling a bit to understand the relation between the Parent Process and the Child Process immediately after I call fork().
Are they completely separate processes, only associated by the id/parent id? Or do they share memory? For example the 'code' section of each process - is that duplicated so that each process has it's own identical copy, or is that 'shared' in some way so that only one exists?
I hope that makes sense.
In the name of full disclosure this is 'homework related'; while not a direct question from the book, I have a feeling it's mostly academic and, in practice, I probably don't need to know.
As it appears to the process, the entire memory is duplicated.
In reality, it uses "copy on write" system. The first time either process changes its memory after fork(), a separate copy is made of the modified page (usually 4kB).
Usually the code segment of a process is not modified, in which case it remains shared.
Logically, a fork creates an identical copy of the original process that is largely independent of the original. For performance reasons, memory is shared with copy-on-write semantics, which means that unmodified memory (such as code) remains shared.
File descriptors are duplicated, so that the forked process could, in principle, take over a database connection on behalf of the parent (or they could even jointly communicate with the database if the programmer is a bit twisted). More commonly, this is used to set up pipes between processes so you can write find -name '*.c' | xargs grep fork.
A bunch of other stuff is shared. See here for details.
One important omission is threads — the child process only inherits the thread that called fork(). This causes no end of trouble in multithreaded programs, since the status of mutexes, etc., that were locked in the parent is implementation-specific (and don't forget that malloc() and printf() use locks internally). The only safe thing to do in the child after fork() returns is to call execve() as soon as possible, and even then you have to be cautious with file descriptors. See here for the full horror story.
They are separate processes i.e. the Child and the Parent will have separate PIDs
The child will inherit all of the open descriptors from the Parent
Internally the pages i.e. the stack/heap regions which can be modified unlike the .text region, will be shared b/w the Parent and the Child until one of them tries to modify the contents. In such cases a new page is created and data specific to the page being modified is copied to this freshly allocated page and mapped to the region corresponding to the one who caused the change - could be either the Parent or Child. This is called COW (mentioned by other members in this forum above in their answers).
The Child can finish execution and until reclaimed by the parent using the wait() or waitpid() calls will be in ZOMBIE state. This will help clear the child's process entry from the process table and allow the child pid to be reused. Usually when a child dies, the SIGCHLD signal is sent out to the parent which would ideally result in a handler being called subsequent to which the wait() call is executed in that handler.
In case the Parent exits without cleaning up the already running or zombie child (via the wait() waitpid calls), the init() process (PID 1) becomes the parent to these now orphan children. This init() process executes wait() or waitpid() calls at regular intervals.
EDIT: typos
HTH
Yes, they are separate processes, but with some special "properties". One of them is the child-parent relation.
But more important is the sharing of memory pages in a copy-on-write (COW) manner: until the one of them performs a write (to a global variable or whatever) on a page, the memory pages are shared. When a write is performed, a copy of that page is created by the kernel and mapped at the right address.
The COW magic is done by in the kernel by marking the pages as read-only and using the fault mechanism.
Consider this simple code:
int myvar = 0;
int main() {
if (fork()>0) {
myvar++;
} else {
// father do nothing
}
}
When child increments myvar, is the value shared with the father (like pthread)?
No and yes.
No, they are not shared in any way which is visible to the programmer; the processes can modify their own copies of the variables independently and they will change without any noticable effect on the other process(es) which are fork() parents, siblings or descendents.
But yes, the OS actually does share the pages initially, because fork implements copy-on-write which means that provided none of the processes modifies the pages, they are shared. This is, however, an optimisation which can be ignored.
If you wanted to have shared variables, put them in an anonymous shared mapping (see mmap()) in which case they really will get shared, with all the caveats which come with that.
fork()ing creates an exact copy of the parent process at the time of forking. However, after the fork() is completed, the child has a completely different existence, and will not report back to the parent.
In other words, no, the parent's global variables will not be altered by changes in the child.
After fork(), the entire process, including all global variables, is duplicated. The child is an exact replica of the parent, except that it has a different PID(Process Id), a different parent, and fork() returned 0. Global variables are still global within its own process. So the answer is no, global variables are not shared between processes after a call to fork().
No, since global variables are not shared between processes unless some IPC mechanism is implemented. The memory space will be copied. As a consequence, the global variable in both processes will have the same value inmediately after fork, but if one changes it, the other wont see it changed.
Threads on the other hand do share global variables.
I have a main() function and prior to declaring main(), I declare global variables.
Then inside main() 2 processes start: 1 child and 1 parent via fork(). Why can't the parent and child processes share the global variables I declared? What is a good way to handle this? Thank you.
When you fork() you're spawning a new process. Everything at the time of the fork is copied, but after that ... nothing is shared.
You have two choices at that point:
Keep a pipe open between your two processes and communicate changes
Re-write your code to be multi-threaded, where you can access the same data (using locks)
With fork() you create a new process with separate memory space. To communicate between processes you can use signals (using kill())
If you want to share variables, consider using threads (e.g. pthread.h). Then you can use events or mutexes for thread synchronization.
my question is somewhat conceptual, how is parent process' data shared with child process created by a fork() call or with a thread created by pthread_create()
for example, are global variables directly passed into child process and if so, does modification on that variable made by child process effect value of it in parent process?
i appreciate partial and complete answers in advance, if i'm missing any existing resource, i'm sorry, i've done some search on google but couldn't find good results
thanks again for your time and answers
The semantics of fork() and pthread_create() are a little different.
fork() will create a new process, where the global variables will be separate between the parent and children. Most OS implementations will use copy-on-write semantics, meaning that both the parent and child process will use the same physical memory pages for all global variables until one of the processes attempts to edit the physical memory, at which point a copy of that page is made, so that now each process gets its own copy and does not see the other process's, so that the processes are isolated.
pthread_create() on the other hand, creates a new thread within the same process. The new thread will have a separate stack space from the other running threads of the same process, however the global variables and heap space are shared between all threads of the same process. This is why you often need a mutex to coordinate access to a shared piece of memory between multiple threads of the same process.
TL;DR version: with fork(), you don't see the other guy's changes; with pthread_create() you do.
A fork creates an almost exact copy of the calling process, including memory and file descriptors. Global variables are copied along with everything else, but they are not in any way linked to the parent process. Since file descriptors are also copied, parent and child can interact via these (as long as they're setup properly, usually via pipe or socketpair).
There's a big difference between processes created by fork and between threads created with pthread_create. Processes don't share global variables and should communicate through pipes, sockets, or other tools provided by the OS. A good solution is MPI - which is a message-passing library for inter-process communication.
Threads are quite different. A thread created with pthread_create shares all the global variables with its caller. Moreover, the caller can pass an arbitrary structure into the thread, and this structure will also be shared. This means that one should be extremely careful when programming with threads - such amounts of sharing are dangerous and error prone. The pthread API provides mutexes and conditions for robust synchronization between threads (although it still requires practice and expertise to implement correctly).