Following question:
I created a shared memory segment (in my main.c), containing multiple structures, a few variables etc. Right after that, I am
-creating a pipe, and
-fork()-ing.
I am making both the child, and parent process communicate through the pipe - whose socket descriptors are both stored in a global structure, saved in the shared memory segment.
Now I read that for elements contained in a shared memory segment, after forking, both processes can manipulate the shared variables and structures, and that the other process sharing the memory would thereby have access to the same, manipulated data. So far, so good!
My question is not a a source code issue, it is rather more a theoretical point I seem to be missing, since my code is working exactly the way it should, but I don't understand why this works:
After forking, I make each process close it's irrelevant (for my purposes), side of the pipe (e.g. the parent closes the reading side of the pipe, the child the writing side). However, the pipe_fd[2] is stored in the global struct in the SHM segment. So how come, if one side is closed from one process, and the other side from the other process (accessing respectively by using
close(nameOfSHMStruct->pipe_fd[0]);
and
close(nameOfSHMStruct->pipe_fd[1]);
), but both access it form the struct, that they are still able to communicate with each-other? am I missing a something about the pipe()-statement , or is it something with the SHM, or is it something with the fork(), or god knows something about the combination of all the 3 of them? As I said already, the code actually works this way, I'm printing (as a debug message), the data exchanged between the processes, but I just don't really get the core theoretical aspect behind it's way of functioning...
They are able to communicate beacause they only close their descriptors of the pipe. I will explain deeply:
FATHER PROCCESS -----> FORK() ------>>> FATHER PROCESS
pipe() -> pipe_fd[2] | pipe_fd[2] (father pipe fds)
|
----->>> CHILD PROCESS
pipe_fd[2] (child pipe fds)
A fork clones the father process, including the file descriptors: the child owns a copy of the file descriptors of its father. So after a fork, we will have 2 file descriptors for each process.
So, considering this, you should not store the pipe file descriptors in a shared memory structure, beacause it is pointing to conceptually different fd's in the father and in the children.
Here and here more info.
It would helpful to see more of the code, but I'll take a guess.
The 'pipe_fd' created with the call to pipe() is copied to the child process upon fork(). Since the memory space is also copied on fork, that pointer in your shm object distinctly points to the memory address in the parent or child. So calling close, even though on the 'pipe_fd' in the shm, is actually pointing to the 'pipe_fd' in the parent or child respectively.
I guess an easier of looking at it is: all you've placed in that shm object is a pointer, which is shared across the processes, and since the address space is copied (which includes that pipe_fd), the pointer points to the same address in the parent or child, which is their own copy of that 'pipe_fd'.
Related
I am writing a rudimentary shell program in C which uses a parent process to handle shell events and fork() to create child processes that call execv on another executable (also C).
I am trying to keep a process counter on the parent process. And as such I thought of the possibility of creating a pointer to a variable that keeps track of how many processes are running.
However, that seems to be impossible since the arguments execv (and the program executed by it) takes are of type char * const argv[].
I have tried to keep track of the amount of processes using mmap for shared memory between processes, but couldn't get that to work since after the execv call the process simply dies and doesn't let me update the process counter.
In summary, my question is: Is there a way for me to pass a pointer to an integer on an execv call to another program?
Thank you in advance.
You cannot meaningfully pass a pointer from one process to another because the pointer is meaningless in the other process. Each process has its own memory, and the address is relative to that memory space. In other words, the virtual memory manager lets every process pretend it has the entire machine's memory; other processes are simply invisible.
However, you do have a few options for setting up communications between related processes. The most obvious one is a pipe, which you've presumably already encountered. That's more work, though, because you need to make sure that some process is always listening for pipe communications.
Another simple possibility is to just leave a file descriptor open when you fork and exec (see the close-on-exec flag to see how to accomplish the latter); although mmap is not preserved by exec, you can remap the memory to the open fd in the child process. If you don't want to pass the fd, you can mmap the memory to a temporary file, and use an environment variable to record the name of the temporary file.
Another possibility is Posix shared memory. Again, you might want to communicate the shm name through an environment variable, rather than hard-coding it in to the application.
Note that neither shared mmaps nor shared memory are atomic. If you're incrementing a counter, you'll need to use some locking mechanism to avoid race conditions.
For possibly a lot more information than you really wanted, you can read ESR's overview of interprocess communication techniques in Chapter 7 of The Art of Unix Programming.
Suppose I have a main process running and in its execution it has initialized some pointers and created some instances of a predefined structure.
Now if I fork this main process, is seperate memory allocated for the pointers?And are duplicate instances of the previously existing variables, data structures created for this new process?
As an example of my requirement consider -
struct CKT
{
...
}
main()
{
...Some computations with the structure and other pointers.....
pid_t pid = fork();
if(pid == 0) //child
{
..some more computations with the structure...but I need a
..separate instance of it with all the pointers in it as well..
}
else if(pid > 0) // parent
{
..working with the original instance of the structure..
}
// merging the child process with the parent...
// after reading the data of the child processes structure's data...
// and considering a few cases...
}
Can anyone explain how do I achieve this??
Yes, theorically, the fork system call will duplicate, among other, the stack of the parent. In pratical, otherwise, there is a common method, named copy-on-write, used in that case.
It consists on copy a given parent's memory page only when the child's process is trying to modify this memory space. It allows to reduce the cost of the fork system call.
The one thing which is not copy is the return value of fork: 0 in the child, and the PID of the child in the father.
Yes. It might not copy the memory space of the old process immediately, though. The OS will use copy-on-write where possible, copying each memory page the first time it is modified in either process.
COW is what makes one common use of fork (shortly followed by an exec in the child) efficient. The child process never actually uses most of the memory space inherited from the parent.
The copies in the new process will have exactly the same numeric addresses as they did in the old process, so all the pointers from the old process remain valid in the new process and point to the new process's objects. That's part of the point of virtual memory, it allows different process to refer to different physical memory using the same pointer value.
pointer and memory content both will be duplicated for the fork child.
all kind of data pointers, memory, variable will be duplicate in a separate memory for the child process created with fork. and you could not change pointers neither memory content from process child directly.
but you can change variable of parent process from child process using memory share
Refer to this link to see how to it: How to share memory between process fork()?
Yes, your forked process receives copies of all privately mapped memory (default memory mappings via malloc, calloc, stack frames, global variables)
Your child receives shared copies of all open file descriptors. Means those file descriptors will remain valid and open until both parent and child close them. Seeks on those file descriptors are also shared. If you wish to make a file descriptor child-private then you will have to fdreopen it. Otherwise it is very recommended to close all file descriptors you don't need in children immediately after forking.
Your child will receive the same shared MAP_SHARED mappings of memory. Those will continue to access the same physical memory shared between parent and child. This applies to all shared memory aquired through the shm* family of calls and mmapwith MAP_SHARED.
Your child will not receive any mappings marked with MADV_DONTFORK flag via madvise. Those will become invalid in the child. This is not default behavior and you do not have to worry about it unless explicitly used.
You might get the result you are looking for by using a shared memory segment. Use the mmap system call to create a shared memory segment, and put all your shared structures in that segment. Since you cannot use malloc on this segment (it's returned by the syscall as a pointer to the whole segment), you must copy manually the structures, and do the shared memory usage tracking by yourself.
Perhaps you can allocate your data first locally, then evaluate how much memory is used by them, and do the shared memory allocation with the correct size. It is also possible to reallocate the shared segment to a bigger size, in which case you will have to signal the realloc somehow from one end to the other (maybe by using the first integer pointed by the shared map to store that value?).
man pages:
mmap
munmap
I've tried to look this up, but I'm struggling a bit to understand the relation between the Parent Process and the Child Process immediately after I call fork().
Are they completely separate processes, only associated by the id/parent id? Or do they share memory? For example the 'code' section of each process - is that duplicated so that each process has it's own identical copy, or is that 'shared' in some way so that only one exists?
I hope that makes sense.
In the name of full disclosure this is 'homework related'; while not a direct question from the book, I have a feeling it's mostly academic and, in practice, I probably don't need to know.
As it appears to the process, the entire memory is duplicated.
In reality, it uses "copy on write" system. The first time either process changes its memory after fork(), a separate copy is made of the modified page (usually 4kB).
Usually the code segment of a process is not modified, in which case it remains shared.
Logically, a fork creates an identical copy of the original process that is largely independent of the original. For performance reasons, memory is shared with copy-on-write semantics, which means that unmodified memory (such as code) remains shared.
File descriptors are duplicated, so that the forked process could, in principle, take over a database connection on behalf of the parent (or they could even jointly communicate with the database if the programmer is a bit twisted). More commonly, this is used to set up pipes between processes so you can write find -name '*.c' | xargs grep fork.
A bunch of other stuff is shared. See here for details.
One important omission is threads — the child process only inherits the thread that called fork(). This causes no end of trouble in multithreaded programs, since the status of mutexes, etc., that were locked in the parent is implementation-specific (and don't forget that malloc() and printf() use locks internally). The only safe thing to do in the child after fork() returns is to call execve() as soon as possible, and even then you have to be cautious with file descriptors. See here for the full horror story.
They are separate processes i.e. the Child and the Parent will have separate PIDs
The child will inherit all of the open descriptors from the Parent
Internally the pages i.e. the stack/heap regions which can be modified unlike the .text region, will be shared b/w the Parent and the Child until one of them tries to modify the contents. In such cases a new page is created and data specific to the page being modified is copied to this freshly allocated page and mapped to the region corresponding to the one who caused the change - could be either the Parent or Child. This is called COW (mentioned by other members in this forum above in their answers).
The Child can finish execution and until reclaimed by the parent using the wait() or waitpid() calls will be in ZOMBIE state. This will help clear the child's process entry from the process table and allow the child pid to be reused. Usually when a child dies, the SIGCHLD signal is sent out to the parent which would ideally result in a handler being called subsequent to which the wait() call is executed in that handler.
In case the Parent exits without cleaning up the already running or zombie child (via the wait() waitpid calls), the init() process (PID 1) becomes the parent to these now orphan children. This init() process executes wait() or waitpid() calls at regular intervals.
EDIT: typos
HTH
Yes, they are separate processes, but with some special "properties". One of them is the child-parent relation.
But more important is the sharing of memory pages in a copy-on-write (COW) manner: until the one of them performs a write (to a global variable or whatever) on a page, the memory pages are shared. When a write is performed, a copy of that page is created by the kernel and mapped at the right address.
The COW magic is done by in the kernel by marking the pages as read-only and using the fault mechanism.
I have used fork() to create 2 different processes operating on 2 different address spaces.
Now, in parent process I need the value of a variable from child's address space or if the child process can modify the variable in parent's address space.
Is this possible?
No, once you've forked, each process gets its own address space and you'll have to look into either:
some form of IPC between the processes to access each others data (such as shared memory or message queues).
some more lighweight variant of fork that allows sharing of data (including possibly threading).
Once you have two processes, sharing data needs interprocess communication: file, pipe or shared memory.
If you mean exchanging data between these two processes you can not. You can do it by system APIs like SharedMemory, Message Passing, Pipeline, Socket, ...
As you have created two process using fork command Both Process will be in different address space so they will only communicate by IPC, message passing ,Piping , Shared Memory etc. otherwise one process can't access other process data as thay do have Process specific data
and similarly threads also have thread specific data
I know about "mmap", but as far as i know if i want to share memory allocated by a parent process and accessed it by a client process i need to create a temporary file.
But this file will continue to exist if the processes die.
I was educated to never leave garbage behind. Both in real life and in programming.
The solution should work on Linux, FreeBSD and Solaris.
This article is a very good starting point for shared memory.
However, I'd recommend that you use a pipe instead generally, so as to avoid race conditions and all the mental overhead of concurrency. When you open a child process, its stdin and stdout are file descriptors that you can read and write to from the parent process.
Mmap usually supports "anonymous" mode which does not create any garbage. It works on LInux, according to man page works on Solaris, I am not sure about FreeBSD - look at the man page and search for MAP_ANON or MAP_ANONYMOUS.
Named Pipes should do for you as mentioned above. You can use two pipes (if possible) :
parentpid_childpid -- Parent writes and child reads
childpid_parentpid -- Child writes and parebt reads.
This works for me. Please mention if you have any special senarios to consider.
Allocate Memory in Parent.
Use this memory both in parent and Child.
Give the responsibility of freeing the memory to the parent.
Make the parent wait (wait system call) on the child.
Parent frees the memory before exiting.
Alternatively to be on safer side, before the child exits check if the parent is alive or not.
If not, free the memory in child itself.But this won't work if there can be multiple child.
You can use first few bits of memory to track the number of process using this memory. Whenever a process starts using memory it increment this count and before exiting it decrements the count.Plus if the count becomes 0 , Free the memory.
There is yet another method.Write a function ownMalloc which sits on top of system malloc (or whichever function you uses). This keeps track of all allocated memory and also the processes which uses it. It periodically goes through the the different chunks allocated and frees the chunk not in use.
Use POSIX shm_open(3) and related functions to map memory not backed by a file. The file descriptor returned by shm_open() should be automatically closed if parent and child cease to exist, but I'm not 100% sure if that's always the case. Maybe someone else can shed more light on that?
Allocate the memory in Parent process, keep on wait till the child process executes or let him do some other tasks. Once the child process is over then it will return to the parent process, Deallocate the memory there.
If the parent process needs to be stopped before child(Child is said to be orphaned in this case) then use the exec() which will start child process as a new process.