I am writing a rudimentary shell program in C which uses a parent process to handle shell events and fork() to create child processes that call execv on another executable (also C).
I am trying to keep a process counter on the parent process. And as such I thought of the possibility of creating a pointer to a variable that keeps track of how many processes are running.
However, that seems to be impossible since the arguments execv (and the program executed by it) takes are of type char * const argv[].
I have tried to keep track of the amount of processes using mmap for shared memory between processes, but couldn't get that to work since after the execv call the process simply dies and doesn't let me update the process counter.
In summary, my question is: Is there a way for me to pass a pointer to an integer on an execv call to another program?
Thank you in advance.
You cannot meaningfully pass a pointer from one process to another because the pointer is meaningless in the other process. Each process has its own memory, and the address is relative to that memory space. In other words, the virtual memory manager lets every process pretend it has the entire machine's memory; other processes are simply invisible.
However, you do have a few options for setting up communications between related processes. The most obvious one is a pipe, which you've presumably already encountered. That's more work, though, because you need to make sure that some process is always listening for pipe communications.
Another simple possibility is to just leave a file descriptor open when you fork and exec (see the close-on-exec flag to see how to accomplish the latter); although mmap is not preserved by exec, you can remap the memory to the open fd in the child process. If you don't want to pass the fd, you can mmap the memory to a temporary file, and use an environment variable to record the name of the temporary file.
Another possibility is Posix shared memory. Again, you might want to communicate the shm name through an environment variable, rather than hard-coding it in to the application.
Note that neither shared mmaps nor shared memory are atomic. If you're incrementing a counter, you'll need to use some locking mechanism to avoid race conditions.
For possibly a lot more information than you really wanted, you can read ESR's overview of interprocess communication techniques in Chapter 7 of The Art of Unix Programming.
Related
As the parent process is using huge mount of memory, fork may fail with errno of ENOMEM under some configuration of kernel overcommit policy. Even though the child process may only exec low memory-consuming program like ls.
To clarify the problem, when /proc/sys/vm/overcommit_memory is configured to be 2, allocation of (virtual) memory is limited to SWAP + MEMORY * ration(default to 50%).
When a process forks, virtual memory is not copied thanks to COW. But the kernel still need to allocate virtual memory space. As an analogy, fork is like malloc(virtual memory space size) which will not allocate physical memory and writing to shared memory will cause copy of virtual memory and physical memory is allocated. When overcommit_memory is configured to be 2, fork may fail due to virtual memory space allocation.
Is it possible to fork a process without inherit virtual memory space of parent process in the following conditions?
if the child process calls exec after fork
if the child process doesn't call exec and will not using any global or static variable from parent process. For example, the child process just do some logging then quit.
As Basile Starynkevitch answered, it's not possible.
There is, however, a very simple and common solution used for this, that does not rely on Linux-specific behaviour or memory overcommit control: Use an early-forked slave process do the fork and exec.
Have the large parent process create an unix domain socket and fork a slave process as early as possible, closing all other descriptors in the slave (reopening STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO to /dev/null). I prefer a datagram socket for its simplicity and guarantees, although a stream socket will also work.
In some rare cases it is useful to have the slave process execute a separate dedicated small helper program. In most instances this is not necessary, and makes security design much easier. (In Linux, you can include SCM_CREDENTIALS ancillary messages when passing data using an Unix domain socket, and use the process ID therein to verify the identity/executable the peer is using the /proc/PID/exe pseudo-file.)
In any case, the slave process will block in reading from the socket. When the other end closes the socket, the read/receive will return 0, and the slave process will exit.
Each datagram the slave process receives, describes a command to execute. (Using a datagram allows using C strings, delimited with NUL characters, without any escaping etc.; using an Unix stream socket typically requires you to delimit the "command" somehow, which in turn means escaping the delimiters in the command component strings.)
The slave process creates one or more pipes, and forks a child process. This child process closes the original Unix socket, replaces the standard streams with the respective pipe ends (closing the other ends), and executes the desired command. I personally prefer to use an extra close-on-exec socket in Linux to detect successful execution; in an error case, the errno code is written to the socket, so that the slave-parent can reliably detect the failure and the exact reason, too. If success, the slave-parent closes the unnecessary pipe ends, replies to the original process about the success, with the other pipe ends as SCM_RIGHTS ancillary data. After sending the message, it closes the rest of the pipe ends, and waits for a new message.
On the original process side, the above process is sequential; only one thread may execute start executing an external process at a time. (You simply serialize the access with a mutex.) Several can run at the same time; it is only the request to and response from the slave helper that is serialized.
If that is an issue -- it should not be in typical cases -- you can for example multiplex the connections, by prefixing each message with an ID number (assigned by the parent process, monotonically increasing). In that case, you'll probably use a dedicated thread on the parent end to manage the communications with the slave, as you certainly cannot have multiple threads reading from the same socket at the same time, and expect deterministic results.
Further improvements to the scheme include things like using a dedicated process group for the executed processes, setting limits to them (by setting limits to the slave process), and executing the commands as dedicated users and groups by using a privileged slave.
The privileged slave case is where it is most useful to have the parent execute a separate helper process for it. In Linux, both sides can use SCM_CREDENTIALS ancillary messages via Unix domain sockets to verify the identity (PID, and with ID, the executable) of the peer, making it rather straightforward to implement robust security. (But note that /proc/PID/exe has to be checked more than once, to catch the attacks where a message is sent by a nefarious program, quickly executing the appropriate program but with command-line arguments that cause it to exit soon, making it occasionally look like the correct executable made the request, while a copy of the descriptor -- and thus the entire communications channel -- was in control of a nefariuous user.)
In summary, the original problem can be solved, although the answer to the posed question is No. If the executions are security-sensitive, for example change privileges (user accounts) or capabilities (in Linux), then the design has to be carefully considered, but in normal cases the implementation is quite straight-forward.
I'd be happy to elaborate if necessary.
No, it is not possible. You might be interested by vfork(2) which I don't recommend. Look also into mmap(2) and its MAP_NORESERVE flag. But copy-on-write techniques are used by the kernel, so you practically won't double the RAM consumption.
My suggestion is to have enough swap space to not being concerned by such an issue. So setup your computer to have more available swap space than the largest running process. You can always create some temporary swap file (e.g. with dd if=/dev/zero of=/var/tmp/swapfile bs=1M count=32768 then mkswap /var/tmp/swapfile) then add it as a temporary swap zone (swapon /var/tmp/swapfile) and remove it (swapoff /var/tmp/swapfile and rm /var/tmp/swapfile) when you don't need it anymore.
You probably don't want to swap on a tmpfs file system like /tmp/ often is, since tmpfs file systems are backed up by swap space!.
I dislike memory overcommitment and I disable it (thru proc(5)). YMMV.
I'm not aware of any way to do (2), but for (1) you could try to use vfork which will fork a new process without copying the page tables of the parent process. But this generally isn't recommended for a number of reasons, including because it causes the parent to block until the child performs an execve or terminates.
This is possible on Linux. Use the clone syscall without the flag CLONE_THREAD and with the flag CLONE_VM. The parent and child processes will use the same mappings, much like a thread would; there is no COW or page table copying.
madvise(addr, size, MADV_DONTFORK)
Alternatively, you can call munmap() after fork() to remove the virtual addresses inherited from the parent process.
In *nix systems, processes are created by using fork() system call. Consider for example, init process creates another process.. First it forks itself and creates the a process which has the context like init. Only on calling exec(), this child process turns out to be a new process. So why is the intermediate step ( of creating a child with same context as parent ) needed? Isn't that a waste of time and resource, because we are creating a context ( consumes time and wastes memory ) and then over writing it?
Why is this not implemented as allocating a vacant memory area and then calling exec()? This would save time and resources right?
The intermediate step enables you to set up shared resources in the child process without the external program being aware of it. The canonical example is constructing a pipe:
// read output of "ls"
// (error checking omitted for brevity)
int pipe_fd[2];
pipe(&pipe_fd);
if (fork() == 0) { // child:
close(pipe_fd[0]); // we don't want to read from the pipe
dup2(pipe_fd[1], 1); // redirect stdout to the write end of the pipe
execlp("ls", "ls", (char *) NULL);
_exit(127); // in case exec fails
}
// parent:
close(pipe_fd[1]);
fp = fdopen(pipe_fd[0], "r");
while (!feof(fp)) {
char line[256];
fgets(line, sizeof line, fp);
...
}
Note how the redirection of standard output to the pipe is done in the child, between fork and exec. Of course, for this simple case, there could be a spawning API that would simply do this automatically, given the proper parameters. But the fork() design enables arbitrary manipulation of per-process resources in the child — one can close unwanted file descriptors, modify per-process limits, drop privileges, manipulate signal masks, and so on. Without fork(), the API for spawning processes would end up either extremely fat or not very useful. And indeed, the process spawning calls of competing operating systems typically fall somewhere in between.
As for the waste of memory, it is avoided with the copy on write technique. fork() doesn't allocate new memory for the child process, but points the child to the parent's memory, with the instructions to make a copy of a page only if the page is ever written to. This makes fork() not only memory-efficient, but also fast, because it only needs to copy a "table of contents".
This is an old complaint. Many people have asked Why fork() first? and typically they suggest an operation that will both create a new process from scratch and run a program in it. This operation is called something like spawn().
And they always say, Won't that be faster?
And in fact, every system other than the Unix family does go the "spawn" way. Only Unix is based on fork() and exec().
But it's funny, Unix has always been much faster than other full-featured systems. It has always handled way more users and load.
And Unix has been made even faster over the years. Fork() no longer really duplicates the address space, it just shares it using a technique called copy-on-write. (A very old fork optimization called vfork() is also still around.)
Drink the Kool-Aid.
I don't know exactly how the init process works on a kernel in terms of forking but to answer you question of why you need to call fork then exec is simply because once you exec there is no turning back.
If you check out the documentation here, it essentially requires a new process to be spawned (the fork call) in order for the parent process to resume control and either wait for it to finish or sit as a daemon probably would.
Only on calling exec(), this child process turns out to be a new
process.
Not really. After a fork, you already have new process, even not that much different from its parent. There are some cases where no exec need to follow a fork.
So why is the intermediate step ( of creating a child with same
context as parent ) needed?
One reason would be because it is an efficient way to create the whole shebang. Cloning is usually less complex than creating from scratch.
Isn't that a waste of time and resource, because we are creating a
context ( consumes time and wastes memory ) and then over writing it?
It is not a waste of time and resource as most of this resource is virtual, due to the copy on write mechanism used. Moreover, it is incorrect to state the created context is overwritten. Nothing is rewritten given the fact nothing was actually written in the first place. That's the whole point of COW. "Only" the process address space (code, heap and stack) are substituted, not overwritten. A lot of the process context is partially or totally preserved, including environment, file descriptors, priority, ignored signals, current and root directory, limits, various masks, processor bindings, privileges and several other things foreign to the process address space.
In Linux or other modern OS, each process's memory is protected, so that a wild write in one process does not crash any other process. Now assume we have memory shared between process A and process B. Now say, due to a soft error, process A unintentionally writes something to that memory area. Is there any way to protect against this, given that both process A and process B have full write access to that memory?
When you call shm_open you can pass it the O_RDONLY flag to the mode parameter.
Alternatively you can use mprotect to mark specific pages as (e.g.) read-only. You'll need cooperation and trust between the two processes to do this, there is no way for B to say A can't write to it using mprotect.
If you really want to be sure that the other process can't interfere then communicating via pipes or sockets of some description might be a sensible idea.
You could also use mmap to map a something (e.g. in /dev/shm?) the file permissions make impossible to write to for one of the two processes if they're running as separate UIDs. For example if you have /dev/shm/myprocess owned by user producer and group consumer and set the file permissions to 0640 before mapping it by a process running with that UID and GID then you could prevent the second process from writing to it.
You may use a simple checksum on each write. So, when a process detects wrong checksum upon a read operation, it's the sign of the failure of the other process.
On an embedded platform (with no swap partition), I have an application whose main process occupies most of the available physical memory. The problem is that I want to launch an external shell script from my application, but using fork() requires that there be enough memory for 2x my original process before the child process (which will ultimately execl itself to something much smaller) can be created.
So is there any way to invoke a shell script from a C program without incurring the memory overhead of a fork()?
I've considered workarounds such as having a secondary smaller process which is responsible for creating shells, or having a "watcher" script which I signal by touching a file or somesuch, but I'd much rather have something simpler.
Some UNIX implementations will give you a vfork (part of the Single UNIX spec) which is exactly like fork except that it shares all the stuff with the parent.
With vfork, there are a very limited number of things you can do in the child before calling exec to overwrite the address space with another process - that's basically what vfork was built for, a minimal copy version of fork for the fork/exec sequence.
If your system has an MMU, then usually fork() is implemented using copy-on-write, which doesn't actually allocate more memory at the time fork() is called. Additional memory would only be allocated if you write to any of the pages shared with the parent process. An exec() would then discard those pages.
If you know you don't have an MMU, then perhaps fork() is indeed implemented using an actual copy. Another approach might be to have a helper process that is responsible for spawning subshells, which you communicate with using a pipe.
I see you've already accepted an answer, but you may want to read about posix_spawn and use if it if it's available on your target:
http://www.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html
It sounds as if the prudent move in this case is to port your shell script (if possible) to C, and executing it within the process; so you don't have to fork at all.
Then again; I don't know what you are actually trying to do.
Instead of forking your process to spawn a shell, launch a shell within your process (in foreground) then fork it within the shell.
system("/bin/ash /scripts/bgtask");
with /scripts/bgtask being:
/bin/ash /scripts/propertask &
This way you double only the memory used by the shell, not by the main program. Your main program goes busy for duration of spawning the two shells: original to start bgtask and the background clone launched by it, then the memory allocated by the first shell is free again.
I know about "mmap", but as far as i know if i want to share memory allocated by a parent process and accessed it by a client process i need to create a temporary file.
But this file will continue to exist if the processes die.
I was educated to never leave garbage behind. Both in real life and in programming.
The solution should work on Linux, FreeBSD and Solaris.
This article is a very good starting point for shared memory.
However, I'd recommend that you use a pipe instead generally, so as to avoid race conditions and all the mental overhead of concurrency. When you open a child process, its stdin and stdout are file descriptors that you can read and write to from the parent process.
Mmap usually supports "anonymous" mode which does not create any garbage. It works on LInux, according to man page works on Solaris, I am not sure about FreeBSD - look at the man page and search for MAP_ANON or MAP_ANONYMOUS.
Named Pipes should do for you as mentioned above. You can use two pipes (if possible) :
parentpid_childpid -- Parent writes and child reads
childpid_parentpid -- Child writes and parebt reads.
This works for me. Please mention if you have any special senarios to consider.
Allocate Memory in Parent.
Use this memory both in parent and Child.
Give the responsibility of freeing the memory to the parent.
Make the parent wait (wait system call) on the child.
Parent frees the memory before exiting.
Alternatively to be on safer side, before the child exits check if the parent is alive or not.
If not, free the memory in child itself.But this won't work if there can be multiple child.
You can use first few bits of memory to track the number of process using this memory. Whenever a process starts using memory it increment this count and before exiting it decrements the count.Plus if the count becomes 0 , Free the memory.
There is yet another method.Write a function ownMalloc which sits on top of system malloc (or whichever function you uses). This keeps track of all allocated memory and also the processes which uses it. It periodically goes through the the different chunks allocated and frees the chunk not in use.
Use POSIX shm_open(3) and related functions to map memory not backed by a file. The file descriptor returned by shm_open() should be automatically closed if parent and child cease to exist, but I'm not 100% sure if that's always the case. Maybe someone else can shed more light on that?
Allocate the memory in Parent process, keep on wait till the child process executes or let him do some other tasks. Once the child process is over then it will return to the parent process, Deallocate the memory there.
If the parent process needs to be stopped before child(Child is said to be orphaned in this case) then use the exec() which will start child process as a new process.