might be my question is a bit too deep in the matter and i think of a problem that does not exist. I hope you can help. ;-)
The problem is the following:
I start a process in Linux at startup (rc.d) which then creates a shared memory and forks two daemon processes. The daemons, once detached from the parent processes have inherited the shared memory of the parent process but have also an own session and are no more connected to the parent.
Do they have an own link to the shared memory and does the kernel count the references? I ask because i would like to safely detach the parent process before it bails out. In my implementation it works, the shared memory is detached by the parent process but the daemons still can use it. But ist that safe or just by coincidence??
Thanks for your thoughts in advance!
Martell
In linux each process has a File Descriptor Table, file descriptors index into a per-process file descriptor table maintained by the kernel, that in turn indexes into a system-wide table of files opened by all processes, called the file table.
Now at fork, each of the child processes will have its own FD and each entry pointing to the same Objects as in the parent process. So if the parent process closes his shared-memory index in his FD this won't affect the other processes as the OS won't close this shared memory as its still in use by different processes. (In other words, child processes are still attached to this S.M and have a link to it and the OS counts them)
Related
I am comparatively new to the linux programming. I wonder whether the exec() function called after fork() can cause data loss in the parent process.
After a successful call to fork, a new process is created which is a duplicate of the calling process. One thing that gets duplicated are the file descriptors, so it's possible for the new process to read/write the same file descriptors as the original process. These may be files, sockets, pipes, etc.
The exec function replaces the currently running program in the current process with a new program, overwriting the memory of the old program in that process. So any data stored in the memory of the old program is lost. This does not however affect the parent process that forked this process.
When a new program is executed via exec, any open file descriptors that do not have the FD_CLOEXEC (close-on-exec) flag set (see the fcntl man page) are again preserved. So now you have two processes, each possibly running a different program, which may both write to the same file descriptor. If that happens, and the processes don't properly synchronize, data written by one process to the file may overwritten by the other process.
So data loss can occur with regard to writing to file descriptors that the child process inherited from the parent process.
Fork() executes the same program and has copy same variables of the father at the moment of the fork, how do the OS keep both process in memory, safeguarding each process only access his variables?
When the kernel creates a new process, it also creates a new memory mapping. Initially all pages in the new mapping are shared with the parent process, but once pages in the map are modified by the child process those are copied into their own pages.
Useful terms to search for: Virtual memory, on demand paging, memory mapping, shared memory, copy on write.
The OS copies virtual memory space of the forking process (with possible optimizations like copy-on-write).
Fork is a technique that in general makes a separate address space for the child. The child has the same memory of the parent, but they have different PID. So you can distinguish them: specifically fork() returns 0 in the child process and a non zero value (child's PID) in the parent process.
I have an application which uses mmap for ipc. Can I run this application multiple times? Will it have any side effects ?
My application scenario:
my application forks off a child process whose job is to always kill the parent process randomly but it should do this in controlled manner, for example setting a variable in parent process which indicates the child process to kill the parent process (here comes the mmap). The parent process has a signal handler where it can resume the application again the child process kills the parent process it continues...
Can any one help me? thanks in adavnce
Whether running your application multiple times will have side effects or not depends on how you implement it. Please have a look at this answer. It contains a lot of helpful information. For example:
mmap is great if you have multiple processes accessing data in a read only fashion from the same file [...]
This mean: If you want to use the same shared memory for multiple parent/child pairs, then you need to synchronize access to that shared memory. Please have a look at this Q&A on how to do that. Of course, you have to make sure, that each parent/child pair uses its own variables in the shared memory.
Another option is to use a separate shared memory segment for each parent/child pair. You could do this, for example, by making the process ID of the parent process a part of the shared memory file name. Then, when you fork the child process, you pass the process ID (or the shared memory file name) to the child process, so that parent and child know which shared memory to use in order to comunicate to each other.
I've tried to look this up, but I'm struggling a bit to understand the relation between the Parent Process and the Child Process immediately after I call fork().
Are they completely separate processes, only associated by the id/parent id? Or do they share memory? For example the 'code' section of each process - is that duplicated so that each process has it's own identical copy, or is that 'shared' in some way so that only one exists?
I hope that makes sense.
In the name of full disclosure this is 'homework related'; while not a direct question from the book, I have a feeling it's mostly academic and, in practice, I probably don't need to know.
As it appears to the process, the entire memory is duplicated.
In reality, it uses "copy on write" system. The first time either process changes its memory after fork(), a separate copy is made of the modified page (usually 4kB).
Usually the code segment of a process is not modified, in which case it remains shared.
Logically, a fork creates an identical copy of the original process that is largely independent of the original. For performance reasons, memory is shared with copy-on-write semantics, which means that unmodified memory (such as code) remains shared.
File descriptors are duplicated, so that the forked process could, in principle, take over a database connection on behalf of the parent (or they could even jointly communicate with the database if the programmer is a bit twisted). More commonly, this is used to set up pipes between processes so you can write find -name '*.c' | xargs grep fork.
A bunch of other stuff is shared. See here for details.
One important omission is threads — the child process only inherits the thread that called fork(). This causes no end of trouble in multithreaded programs, since the status of mutexes, etc., that were locked in the parent is implementation-specific (and don't forget that malloc() and printf() use locks internally). The only safe thing to do in the child after fork() returns is to call execve() as soon as possible, and even then you have to be cautious with file descriptors. See here for the full horror story.
They are separate processes i.e. the Child and the Parent will have separate PIDs
The child will inherit all of the open descriptors from the Parent
Internally the pages i.e. the stack/heap regions which can be modified unlike the .text region, will be shared b/w the Parent and the Child until one of them tries to modify the contents. In such cases a new page is created and data specific to the page being modified is copied to this freshly allocated page and mapped to the region corresponding to the one who caused the change - could be either the Parent or Child. This is called COW (mentioned by other members in this forum above in their answers).
The Child can finish execution and until reclaimed by the parent using the wait() or waitpid() calls will be in ZOMBIE state. This will help clear the child's process entry from the process table and allow the child pid to be reused. Usually when a child dies, the SIGCHLD signal is sent out to the parent which would ideally result in a handler being called subsequent to which the wait() call is executed in that handler.
In case the Parent exits without cleaning up the already running or zombie child (via the wait() waitpid calls), the init() process (PID 1) becomes the parent to these now orphan children. This init() process executes wait() or waitpid() calls at regular intervals.
EDIT: typos
HTH
Yes, they are separate processes, but with some special "properties". One of them is the child-parent relation.
But more important is the sharing of memory pages in a copy-on-write (COW) manner: until the one of them performs a write (to a global variable or whatever) on a page, the memory pages are shared. When a write is performed, a copy of that page is created by the kernel and mapped at the right address.
The COW magic is done by in the kernel by marking the pages as read-only and using the fault mechanism.
So I have an application which uses threads. Now when the program first starts up, I want it to go through setting up database connections and whatnot before it backgrounds itself so that whatever/whoever starts the program can know if there was an error starting up.
I did some looking around and have found some resources that say 'do not mix fork and threads', while others say that forking in linux will only duplicate the main thread and leave the others alone.
In the case of the latter (where it just duplicates the main thread), how then do the threads access file level (global) variables? Will the threads not be able to access the variables that are now in the forked process's address space?
Ultimately the goal is to have the application background itself after threads have been created. If this is not possible, I can put the fork before the thread creation, just would like to do it as late as possible.
Note: at the time of the fork, the threads will be doing a sleep() loop until the main thread puts data into a shared variable for them to process. So if the sleep gets interrupted, they wont be harmed.
There is no way to duplicate threads as part of the fork, and the parent's threads will all terminate when the parent exits, so even if they could access the child's memory, it wouldn't help you. You need to either create your threads after forking, or use pthread_atfork to register handlers that will recreate them in the child process. I would recommend just waiting until after forking to create your threads since it's a lot simpler and more efficient.
Why is it that you want to delay forking as long as possible? If you want to maintain connection to a terminal or something until initialization is finished, you can just have the parent process wait to terminate until the child process (with its threads) is done initializing and ready to be "in the background". Various synchronization tools could be used to accomplish this. One simple one would be opening a pipe through which the child sends its output back to the parent to display; the parent could simply exit when it receives EOF on this pipe.
Forking a process creates two different processes and threads in one process will not be able to access memory in the second process. If you want different processes to access the same memory, you want something called shared memory.
When a thread in a process calls fork(), a new process is created by copying, among other things, (1) the full address space of the process and (2) the (one) thread that called fork. If there are other threads in the process, they don't get copied. This will almost certainly lead to bugs in your program. Hence the advice not to mix threads and forks.
If you want to create a background process with many threads, you must fork it before spawning any other thread. Then, the two processes behave normally, like any two isolated processes: threads within one process share the same memory, but your background threads and your foreground process won't share any memory (by default).