Say I have a pointer to some structure in a thread, and I want to pass it to the parent process via a pipe.
Example:
MyType * someType;
I then want to cast someType to void * and put it on the pipe. How can it be done?
While you can physically pass a pointer to a parent process, the value would be meaningless to the process and your best case scenario would be an immediate crash. Pointers indicate an address of an object in memory. This address will only be valid in the context of the child process and will point to a completely different object in the parent process.
You'll need to do one of the following in order to enable this scenario
Pass the entire object across the pipe in some serialized from
Pass a pointer relative to the shared memory base between the processes and do the appropriate fixup in the parent process.
EDIT
Note, my answer was written when the question was asking about how to pass a pointer between a child and parent process. It was later updated to threads.
Another option would be to store the object in shared memory, then pass the segment ID to the parent process.
The parent can then attach to the memory and access/modify the object.
This gives some background:
http://fscked.org/writings/SHM/shm.html
Reading between the lines in this and the other question you cite, you are using "parent process" to refer to the main process thread, and "thread" to refer to a new thread created within that same process. This is causing confusion both for your thinking about the problem and for others trying to answer the questions.
In this scenario, there is exactly one process, and it has two threads. The first thread was created for you when the OS started the process. The second was created deliberately by the first. You have decided to use pipes to communicate between these threads.
First, I'd agree with a lot of the answers on the other question that pipes are a bit of a heavy-weight solution to inter-thread communication since they were designed to handle inter-process communication. That said, they will work.
Second, be aware that you won't be able to meaningfully move a pointer between processes. Pointers are only valid within a single process. Even pointers to shared memory have issues since the shared memory regions might be mapped to different virtual addresses in each process. Since it looks like both ends of the pipe are in the same process, this isn't an issue, but if that weren't true, it would be a big issue.
With all of that in mind, you then just need to agree with yourself on a representation for the pointer. The simplest answer is to just write sizeof(void *) bytes to the pipe. When read out, you put those bytes back into a pointer variable, and cast back to the real type. Your surrounding protocol must know what that type is, of course.
If you become tempted to let the two threads exist in separate processes, or to reuse this code to persist (checkpoint) your work in progress in a file, then you have a more complicated problem. Searching for discussions of data and state persistence, pickling, and marshaling will lead to things to think about.
Put it in shared memory, and pass a pointer relative to the shared memory base.
Pointer casting depends on how your compiler aligns objects. An remember when getting relative pointer positions, pointer arithmetic is based on sizeof the thing pointed to.
Related
I am writing a rudimentary shell program in C which uses a parent process to handle shell events and fork() to create child processes that call execv on another executable (also C).
I am trying to keep a process counter on the parent process. And as such I thought of the possibility of creating a pointer to a variable that keeps track of how many processes are running.
However, that seems to be impossible since the arguments execv (and the program executed by it) takes are of type char * const argv[].
I have tried to keep track of the amount of processes using mmap for shared memory between processes, but couldn't get that to work since after the execv call the process simply dies and doesn't let me update the process counter.
In summary, my question is: Is there a way for me to pass a pointer to an integer on an execv call to another program?
Thank you in advance.
You cannot meaningfully pass a pointer from one process to another because the pointer is meaningless in the other process. Each process has its own memory, and the address is relative to that memory space. In other words, the virtual memory manager lets every process pretend it has the entire machine's memory; other processes are simply invisible.
However, you do have a few options for setting up communications between related processes. The most obvious one is a pipe, which you've presumably already encountered. That's more work, though, because you need to make sure that some process is always listening for pipe communications.
Another simple possibility is to just leave a file descriptor open when you fork and exec (see the close-on-exec flag to see how to accomplish the latter); although mmap is not preserved by exec, you can remap the memory to the open fd in the child process. If you don't want to pass the fd, you can mmap the memory to a temporary file, and use an environment variable to record the name of the temporary file.
Another possibility is Posix shared memory. Again, you might want to communicate the shm name through an environment variable, rather than hard-coding it in to the application.
Note that neither shared mmaps nor shared memory are atomic. If you're incrementing a counter, you'll need to use some locking mechanism to avoid race conditions.
For possibly a lot more information than you really wanted, you can read ESR's overview of interprocess communication techniques in Chapter 7 of The Art of Unix Programming.
This is sort of a technical question, maybe you can help me if you know about C and UNIX (or maybe it is a really newbie question!)
A question came up today while analizing some code in our Operative Systems course. We are learning what it means to "fork" a process in UNIX, we already know it creates a copy of the current process parallel to it and they have separate data sections.
But then I thought that maybe, if one creates a variable and a pointer pointing at it before doing fork(), because the pointer stores the memory address of the variable, one could try to modify the value of that variable from the child process by using that pointer.
We tried a code similar to this in class:
#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
int main (){
int value = 0;
int * pointer = &value;
int status;
pid_t pid;
printf("Parent: Initial value is %d\n",value);
pid = fork();
switch(pid){
case -1: //Error (maybe?)
printf("Fork error, WTF?\n");
exit(-1);
case 0: //Child process
printf("\tChild: I'll try to change the value\n\tChild: The pointer value is %p\n",pointer);
(*pointer) = 1;
printf("\tChild: I've set the value to %d\n",(*pointer));
exit(EXIT_SUCCESS);
break;
}
while(pid != wait(&status)); //Wait for the child process
printf("Parent: the pointer value is %p\nParent: The value is %d\n",pointer,value);
return 0;
}
If you run it, you'll get something like this:
Parent: Initial value is 0
Child: I'll try to change the value
Child: The pointer value is 0x7fff733b0c6c
Child: I've set the value to 1
Parent: the pointer value is 0x7fff733b0c6c
Parent: The value is 0
It's obvious that the child process didn't affect at all the parent process. Frankly, I was expecting some "segmentation fault" error, because of accessing a not permitted memory address. But what really happened?
Remember, I'm not looking for a way to communicate processes, that's not the point. What I want to know is what did the code do. Inside the child process, the change is visible, so it DID something.
My main hypothesis is that pointers are not absolute to memory, they are relative to the process' stack. But I haven't been able to find an answer (no one in class knew, and googling I just found some questions about process communication) so I'd like to know from you, hopefully someone will know.
Thanks for taking your time reading!
The key here is the concept of a virtual address space.
Modern processors (Say anything newer then a 80386) have a memory management unit which maps from a per process virtual address space to physical memory pages under control of the kernel.
When the kernel sets up a process it creates a set of page table entries for that process that define the physical memory pages to virtual address space mapping, and it is in this virtual address space that the program executes.
Conceptually when you fork, the kernel copies the existing process pages to a new set of physical pages and sets up the new processes page tables so that as far as the new process is concerned it appears to be running in the same virtual memory layout as the original one had, while actually addressing entirely different physical memory.
The detail is more subtle as nobody wants to waste time copying hundreds of MB of data unless such is necessary.
When the process calls fork() the kernel sets up a second set of page table entries (for the new process), but points them at the same physical pages as the original process, it then sets the flag in both sets of pages to make the mmu consider them read only.....
As soon as either process writes to a page, the memory management unit generates a page fault (due to the PTE entry having the read only flag set), and the page fault handler then allocates a new page from physical memory, copies the data over, updates the page table entry and sets the pages back to read/write.
In this way, pages are only actually copied the first time either process tries to make a change to a copy on write page, and the slight of hand goes completely unnoticed by either process.
Regards, Dan.
Logically, the fork()ed process gets its own, independent copy of more or less the whole state of the parent process. That couldn't work if pointers in the child referred to memory belonging to the parent.
The details of how a particular UNIX-like kernel makes that work can vary. Linux implements the child process's memory via copy-on-write pages, which makes fork()ing comparatively cheap relative to other possible implementations. In that case, the child's pointers really do point to the parent process's memory, up until such time that either child or parent tries to modify that memory, at which time a copy is made for the child to use. That all relies on the underlying virtual memory system. Other UNIX and UNIX-like systems can and have done it differently.
The child modified a pointer that is perfectly legal in its address space because it is a copy of its parent. There was no effect on the parent because the memory is not logically shared. Each process gets to go its separate way after the fork.
UNIX has a number of ways of creating shared memory (where one process can modify memory and have that modification seen by another process), but fork is not one of them. And it's a good thing because otherwise, synchronization between the parent and child would be almost impossible.
let's say I have 2 processes and I have a variable I want to pass from the first one to the second one. I know I can declare a global variable and pass it by reference among differents functions, but I don't know if it is possible to pass a variable among different processes.
I heard that each process is assigned its own portion of virtual memory and that one process cannot access another process' memory space. Is it that true? Or is is actually possible for two processes to share a variable and therefore mutex mechanisms are needed?
I don't know if it is possible to pass a variable among different processes.
No, it is not possible, at least not in the classical sense of passing a variable. You have many options, though: inter-process communication can be done through shared memory (sometimes implemented through memory-mapped files), named pipes, etc.
Or is is actually possible for two processes to share a variable and therefore mutex mechanisms are needed?
Yes, you should look up shared memory, or more generally, IPC / interprocess communication.
If the second process is started from the first one, you could pass it as command line parameter.
Otherwise you should rely on some inter-process communication method (like Socket or FIFO, also known as named pipe).
You could have a look to this other post:
Interprocess Communication via file
I am new to threads and processes.
I have code that works fine right now with forking the code into multiple processes. However each process needs to add to a global variable, but from what I read, each time the process forks, it takes a copy of the global, and adds them independently. Is there a way to join them, like you can with threads?
Different processes can communicate and exchange data via shared memory.
On linux, you can look:
man shm_overview
for attaching a memory segment on several processes
and
man sem_overview
for the semaphore library for controlling parallel access.
You should define a struct with two fields, one for your global and one for a semaphore. Then, before any forking occurs, create some shared memory in the parent process big enough to hold this struct and initialize one there. In the children, map in the shared memory so they can access the global. All processes, parent and children, should obey the rules of the semaphore when accessing the global.
To avoid unnecessary blocking which can hurt performance, try not to hold the semaphore too long. When reading the global, make a quick copy of it in a process and use that, rather than holding the semaphore for the entire time you are using its value. Likewise, when changing the global, prepare your changes ahead of time (before you grab the semaphore) and, once you have the semaphore, copy them in all at once. Sometimes your work depends on reading and writing the global without it changing in between being read and written. In this case, some blocking may be inevitable.
It is not clear what platform you are on, but all major PC and server platforms (Windows, Linux/Unix/Mac OS) have support for shared memory and semaphores. The APIs may be different, but the functionality you need is there.
my question is somewhat conceptual, how is parent process' data shared with child process created by a fork() call or with a thread created by pthread_create()
for example, are global variables directly passed into child process and if so, does modification on that variable made by child process effect value of it in parent process?
i appreciate partial and complete answers in advance, if i'm missing any existing resource, i'm sorry, i've done some search on google but couldn't find good results
thanks again for your time and answers
The semantics of fork() and pthread_create() are a little different.
fork() will create a new process, where the global variables will be separate between the parent and children. Most OS implementations will use copy-on-write semantics, meaning that both the parent and child process will use the same physical memory pages for all global variables until one of the processes attempts to edit the physical memory, at which point a copy of that page is made, so that now each process gets its own copy and does not see the other process's, so that the processes are isolated.
pthread_create() on the other hand, creates a new thread within the same process. The new thread will have a separate stack space from the other running threads of the same process, however the global variables and heap space are shared between all threads of the same process. This is why you often need a mutex to coordinate access to a shared piece of memory between multiple threads of the same process.
TL;DR version: with fork(), you don't see the other guy's changes; with pthread_create() you do.
A fork creates an almost exact copy of the calling process, including memory and file descriptors. Global variables are copied along with everything else, but they are not in any way linked to the parent process. Since file descriptors are also copied, parent and child can interact via these (as long as they're setup properly, usually via pipe or socketpair).
There's a big difference between processes created by fork and between threads created with pthread_create. Processes don't share global variables and should communicate through pipes, sockets, or other tools provided by the OS. A good solution is MPI - which is a message-passing library for inter-process communication.
Threads are quite different. A thread created with pthread_create shares all the global variables with its caller. Moreover, the caller can pass an arbitrary structure into the thread, and this structure will also be shared. This means that one should be extremely careful when programming with threads - such amounts of sharing are dangerous and error prone. The pthread API provides mutexes and conditions for robust synchronization between threads (although it still requires practice and expertise to implement correctly).