I am looking for C code to use on a Linux based system to start another process asynchronously. The second process should continue, even if the first ends. I've looked through the "fork" and "system" and "exec" options, but don't see anything that will spawn a peer process that's not communicating with or a child of the original process.
Can this be done?
Certainly you can. In the parent fork() a child, and in that child first call daemon() (which is an easy way to avoid setsid etc.), then call something from the exec family.
In Linux (and Unix), every process is created by an existing process. You may be able to create a process using fork and then, kill the parent process. This way, the child will be an orphan but still, it gets adopted by init. If you want to create a process that is not inherited by another, I am afraid that may not be possible.
You do a fork (man 2 fork) followed by an execl (man 2 execl)
For creates a new process of the same image as the calling process (so a perfect twin), where execl replaces one of the twins with a new image.
If you search google for "fork execl" you will find many text book examples -- including how to use correctly fork() and exec()
The most common fork-execl you will still have the new process associated to the terminal -- to create a perfect background process you need to create what is called a daemon process -- the template for that can be fornd in this answer here Creating a daemon in Linux
Related
Well, I'm learning about processes using the C language, and I have seen that when you call the exit function a process is terminated and without waiting for it, it will become a zombie process. My question is, if the first process created when executing the program is a process itself, is there a 0S routine that wait for it after an exit() call, avoiding that it becomes a zombie process? I'm curious about it.
For Unix systems at least (and I expect Windows is similar), when the system boots, it creates one special first process. Every process after that is created by some existing process.
When you log into a windowed desktop interface, there is some desktop manager process (that has been created by the first process or one of its descendants) managing windows. When you start a program by clicking on it, that desktop manager or one of its children (maybe some file manager software) creates a process to run the program. When you start a program by executing a command in a terminal window, there is a command line shell process that is interpreting the things you type, and it creates a process to run the program.
So, in all cases, your user program has a parent process, either a command-line shell or some desktop software.
If a child process creates another child (even as the first instruction) then the parent also has to wait for it or it becomes a zombie.
Basically processes always become zombie until they are removed from the process table, the OS (via the process init) will handle and wait() for orphans (zombies without parents), it does that periodically so normally you won't have orphans running for very long.
On Linux, the top most (parent) process is init. This is the only process, which has no parent. Any other process (without any exception) do have a parent and hence is a child of another process.
See:
init
Section NOTES on wait
A child that terminates, but has not been waited for becomes a
"zombie". The kernel maintains a minimal set of information
about the zombie process (PID, termination status, resource usage
information) in order to allow the parent to later perform a wait
to obtain information about the child. As long as a zombie is
not removed from the system via a wait, it will consume a slot in
the kernel process table, and if this table fills, it will not be
possible to create further processes. If a parent process
terminates, then its "zombie" children (if any) are adopted by
init(1), ... init(1) automatically performs a wait to remove the
zombies.
This may seem to be a dumb question but I don't really have a good understanding of fork() other than knowing that this is about multi-threading. Child process is like a thread. If a task needs to be processed via fork(), how to correctly assign tasks to parent process and child process?
Check the return value of fork. The child process will receive the value of 0. The parent will receive the value of the process id of the child.
Read Advanced Linux Programming which has an entire chapter dedicated to processes (because fork is difficult to explain);
then read documentation of fork(2); fork is not about multi-threading, but about creating processes. Threads are generally created with pthread_create(3) (which is implemented above clone(2), a Linux specific syscall). Read some pthreads tutorial to learn more about threads.
PS. fork is difficult to understand (you'll need hours of reading, some experimentation, perhaps using strace(1), till you reach the "AhAh" insight moment when you have understood it) since it returns twice on success. You need to keep its result, and you need to test the result for the three cases : <0 (failure), ==0 (child), >0 (parent). Don't forget to later call waitpid(2) (or something similar) in the parent, to avoid having zombie processes.
Short quesion:
I want wait in the parent for the child to be replaced with some exec call, not wait for terminate.
How can I do it?
(c language, linux platform)
Basile's answer is incorrect.
While it is true that there's no real way to wait for an exec after a call to fork(2), this is not the only way to create a child process. What you can do instead is use the vfork(2) call. This will block in the parent until the child calls either _exit or one of the exec functions.
Note that part of the reason this works the way it does is that the child process from vfork(2) does not, in fact, clone the entirety of the parent's address space. This means it is undefined behaviour to modify data in the child process before exec. If you need to do anything weird, you may be better off with for example using pause(2) and installing a signal handler for SIGUSR1 or some other signal of your choice, then using that signal immediately before the exec, or using some other IPC mechanism as mentioned above.
If you don't need to do anything special at all, and only want to call fork/exec right after one another, but want to be sure that execution of the child process has started, you can instead use posix_spawn(3), which should also start an external program immediately, effectively blocking the parent until after the exec.
You can't wait in a parent for the child to do some exec, except by having some convention about IPC, e.g. deciding to send something (in the child) on a pipe(7) just before the exec. You'll set up the pipe(2) before the fork(2). You might also use the Linux specific eventfd(2) for such IPC.
After the fork(2) and before any exec you are running (in the child process) the same code as the parent. So it is up to you to implement such conventional communications.
BTW, generally, the child process does not do a lot of things after the fork and before the exec, so waiting for the exec to happen is useless.... In the unlikely case an error happens -including failure of exec- you just _exit (usually with an exit code like 127).
You might consider ptrace(2) (with PTRACE_SYSCALL ...) but I would not do it that way.
Read Advanced Linux Programming and study the source code of some free software shells (sash or bash). Use also strace to understand what is happening in a shell.
My question is about more philosophical than technical issues.
Objective is to write a multiprocess (not multithread) program with one "master" process and N "worker" processes. Program is linux-only, async, event-based web-server, like nginx. So, the main problem is how to spawn "worker" processes.
In linux world there are two ways:
1). fork()
2). fork() + exec*() family
A short description for each way and what confused me in each of them.
First way with fork() is dirty, because forked process has copy (...on-write, i know) of parent memory: signal handlers, variables, file\socket descriptors, environ and other, e.g. stack and heap. In conclusion, after fork i need to...hmm..."clear memory", for example, disable signal handlers, socket connections and other horrible things, inherited from parent, because child has a lot of data that he was not intended - breaks encapsulation, and many side-effects is possible.
The general way for this case is run infinite loop in forked process to handle some data and do some magic with socket pair, pipes or shared memory for creating communication channel between parent and child before and after fork(), because socket descriptors reopen in child and used same socket as parent.
Also, this is nginx-way: it has one executable binary, that use fork() for spawn child process.
The second way is similar to first, but have a difference with usage one of exec*() function in child process after fork() for run external binary. One important thing is that exec*() loads binary in current (forked) process memory, automatic clear stack, heap and do all other nasty job, so fork will look like a clearly new instance of program without copy of parent memory or something other trash.
There has another problem with communication establishing between parent and child: because forked process after exec*() remove all data inherited from parent, that i need somehow create a socket pair between parent and child. For example, create additional listen socket (domain or in another port) in parent and wait child connections and child should connect to parent after initialization.
The first way is simple, but confuse me, that is not a clear process, just a copy of parent memory, with many possible side-effects and trash, and need to keep in mind that forked process has many dependencies to parent code. Second way needs more time to support two binary, and not so elegant like single-file solution. Maybe, the best way is use fork() for process create and something to clear it memory without exec*() call, but I cant find any solution for second step.
In conclusion, I need help to decide which way to use: create one-file executable file like nginx, and use fork(), or create two separate files, one with "server" and one with "worker", and use fork() + exec*(worker) N times from "server", and want know for pros and cons for each way, maybe I missed something.
For a multiprocess solution both options, fork and fork+exec, are almost equivalent and depends on the child and parent process context. If the child process executes the parents' text (binary) and needs all or a part of parents' staff (descriptors, signals etc) - it is a sign to use fork. If the child should execute a new binary and needs nothing from the parents' staff - it seems fork+exec much more suitable.
There is also a good function in the pthread library - pthread_atfork().
It allows to register handlers that will be called before and after fork.
These handlers may perform all the necessary work (closing file descriptors, for example).
As a Linux Programmer, you have a rich library of multithreading process capabilities. Look at pthread and friends.
If you need a process per request, then fork and friends have been the most widely used since time immemorial.
In C, is it possible to have the forked() process alive indefinitely even after the parent exits?
The idea of what I am trying to do is, Parent process forks a child, then exits, child keeps running in background until another process sends it a kill signal.
Yes, it is definitely possible to keep the child alive. The other responders are also correct; this is how a "daemon" or background process runs in a Linux environment.
Some call this the "fork off and die" approach. Here's a link describing how to do it:
http://wiki.linuxquestions.org/wiki/Fork_off_and_die
Note that more than just fork()-ing is done. File descriptors are closed to keep the background process from tying up system resources, etc.
Kerrek is right, this exactly the way how every daemon is implemented. So, your idea is perfect.
There is a daemon library function which is very easy to use for that.
The daemon() function call is not without limitations if you want to
write a well-behaved daemon. See On Starting Daemons
for an explanation.
Briefly: A good daemon should only background when it is ready to field requests, but do its setup under its own PID and print startup errors