I have confusion in creating a process in Linux. Up to now I thought that we can create the process by calling the following system calls.
system()
exec() family of system call
fork() system call
but:
system(): But as "system" system call executing the input executable on shell and shell is creating a child process for the execution of input .here shell is calling child process then we can say that fork is creating process for this.
exec family of system call: As this family of system call over write the current running process with new process.So it is also creating a new process but using same address space. As I think it is also calling call fork for creating the process.
I am confused with the fact all the above is possible way of creating a new process or only fork system.
exec family of system call does not call fork, neither it creates a new process.
It only overwrites the existing process with the new binary.
In linux user programs, fork is the only function to create new process. Though fork internally calls clone and other system calls.
In other hands, system is only a wrapper to fork and exec. The actual task of creating a process is done by fork in system. So system is not a way to create new process.
fork() creates a copy of your process. This is where you actually create a process in a POSIX environment like Linux. To precisely answer your question title, fork() is the only way to create a process.
What exec() does for you is then to replace a process (for example the process you just created with fork()) by another process, so exec() doesn't itself create a process but is often accompanied with fork(), since you usually want to create another process that is different from your current one.
Underneath the system() call, there's just a fork() followed by an exec(), so it's not a new way of creating a process.
In POSIX environment, You can create a process though fork system call without any exception. Fork will create a process.
exec family of function just load binary of other program to the address space of current process(which call the exec() system call).
In system() it is internally use fork() followed by exec()system call.
There is only two ways to create a new process: the system calls fork and clone.
The other functions mentioned, fall into two categories:
exec() family: These replace the contents of a process with some other program. Usually exec() is used right after a call to fork or clone to turn one of the resulting processes into a process of the desired application. When a bash executes a gcc command, for instance, it first forks itself, then it makes one of the two resulting bash processes into a gcc process using the exec() family.
system() family: These encapsulate a fork/clone system call and a corresponding exec() call, possibly doing fancy stuff like connecting stdin and stdout, etc.
Note that all of these functions fork(), clone(), exec(), system(), etc. are system call wrappers defined by the standard C library (which is always present), not the system calls themselves. As such, counterintuitively, fork() is a wrapper for the clone system call on current systems. Not that it matters much. However, the C library functions are standardized, the system calls are not.
Historically, fork is the older system call. While it is very easy to define and use its semantics, it always suffered from its performance implications: The entire process environment needs to be (at least logically) copied, however, most of this work is for nothing, as one of the resulting processes is usually completely overwritten by an exec call. Also, the fork semantics do not allow for thread creation. Due to these shortcomings, the clone call was introduced, which allows fine grained control on what is copied, and what is shared between the two processes, allowing pthreads to be implemented in terms of clone.
In addition of all the other answers, and to be picky, processes are created by fork(2) (or the obsolete vfork(2)...) and clone(2) syscalls (and no, the execve(2) syscall don't create a process, but overwrite its address space and state by starting a new program in the same process), but some processes are "magically" created by the kernel, notably:
/sbin/init is started by the kernel at startup (if not found, some other programs are tried, even /bin/sh ....); this is the process of pid 1 at is is started quite early...
Some kernel processes (or kernel threads) are started by the kernel, like kswapd, kworker (see this question), etc... I have more than 50 kernel processes or tasks
The Linux kernel is also sometimes starting user processes from kernel land, notably hotplug(8), modprobe, etc... See also udev etc...
Almost all processes are started by fork (or clone ...) and are descendants of /sbin/init (or the process of pid 1). (But modprobe or hotplug could be started by the kernel, and they usually fork other processes).
Process creation (thru fork etc....) is quite efficient. A shell is forking almost every command (except the builtin ones, like cd or ulimit...); clone is necessary for multi-threading (but can be used as a replacement of fork...)
Notice that system(3), popen(3) are library functions (not system calls, which are listed in syscalls(2) ...) calling both fork and execve (on /bin/sh ...) and that daemon(3) is a library function calling fork (twice) etc...
Use strace(1) (to find out which syscalls a program is executing) and read Advanced Linux Programming
These days, recent Libc are using clone more than fork (and some are not calling the fork syscall any more but only clone); you can have several libc, eg MUSL libc in addition (or in replacement) of GNU libc
Related
Is there a way to start a child process without fork(), using execvp() exclusively?
The pedantic answer to your question is no. The only system call that creates a new process is fork. The system call underlying execvp (called execve) loads a new program into an existing process, which is a different thing.
Some species of Unix have additional system calls besides fork (e.g. vfork, rfork, clone) that create a new process, but they are only small variations on fork itself, and none of them are part of the POSIX standard that specifies the functionality you can count on on anything that calls itself a Unix.
The slightly more helpful answer is that you might be looking for posix_spawn, which is a library routine wrapping fork and exec into a single operation, but I find it more troublesome to use that correctly than to write my own fork+exec subroutine. YMMV.
posix_spawn is the only posix compliant way to create a child process without calling fork directly. I say 'directly' because historically posix_spawn would itself just call fork, or vfork. However, that is no longer the case in GNU/linux. posix_spawn itself may be more efficient than fork, in addition to perhaps being a stronger fit conceptually when code is attempting to run a different executable.
If you aren't worried about portability, you can abandon posix and couple yourself directly to the kernel you are targeting. On linux the system call to create a child process is clone. At the time of this answer the manual page provides documentation for three variants, including the relatively new clone3.
I believe you can take the example from the manual page and add an execvp call to childFunc. I have not tried it yet, though!
Unlike Windows systems, where creating a new process and executing a new process image happen in a single step, Linux and other UNIX-like systems do them as two distinct steps.
The fork function makes an exact duplicate of the calling process and actually returns twice, once to the parent process and once to the child process. The execvp function (and other functions in the exec family) executes a new process image in the same process, overwriting the existing process image.
You can call execvp without calling fork first. If so, that just means the currently running program goes away and is replaced with the given program. However, fork is the way to create a new process.
As user zwol has already explained, execve() does not fork a new process. Rather, it replaces the address space and CPU state of current process,
loads the new address space from the executable filename and starts it from
main() with argument list argv and environment variable list envp.
It keeps pid and open files.
int execve(const char *filename,char *const argv [],char *const envp[]);
filename: name of executable file to run
argv: Command line arguments
envp: environment variable settings (e.g., $PATH, $HOME, etc.)
posix_spawn. But it ignores failures of execvp() -- potentially because implementing this was regarded as too complicated.
In Windows, you can execute a process using CreateProcess().
Now I want to know how to execute a process in Linux, so far I only found that you can do that by calling fork() and then calling exec().
Is this the only way to execute a process in Linux?
Linux provides both high-level and low-level interfaces for starting new processes.
Low-level interfaces
The primary low-level interface is the one you have already discovered: fork(). This is often followed quickly by one of the several exec() functions, but an exec() is unneeded if it is satisfactory for the new process to execute the same code as the original. That's more common than it may sound to someone more used to the Windows API.
POSIX also defines posix_spawn() family, which serve as special-purpose alternatives to fork() + exec() for certain circumstances to which the latter are not well suited. posix_spawn() is implemented on Linux via the clone() library function (see next).
Although they are not in POSIX, Linux also provides clone() and vfork() as alternatives to fork(). On modern Linux these use the same system system call that fork() does, but not the fork() library function itself. There's not much reason to use vfork() any longer -- it is perhaps indicative that POSIX used to have it, but removed it nearly a decade ago. Linux-specific clone(), on the other hand, has some interesting behaviors that are occasionally useful. It provides a more general interface to the fork system call than any of the other functions discussed here do, but it is not portable. Generally speaking, prefer fork() unless you need something it cannot provide.
High(er)-level interfaces
Linux provides some higher-level interfaces as well. The primary two are system(), which executes a shell command and waits for it to complete, and popen(), which launches a shell command with either its standard input or standard output connected to a pipe, by which the concurrently-running parent process can communicate with it. Both of these are specified by POSIX, and on POSIX systems they are specified to operate via fork() + exec(). Of course Windows has system(), too, and _popen(), but not fork() or any direct analog.
Summary
Overall, then, a userspace Linux process can start a new process only by forking, but that is to be distinguished from calling the fork() library function, even indirectly. There are at least six different C library functions in GNU / Linux that can serve as an interface for starting a new process. Some of those interfaces furthermore permit the new process to execute the same code (a copy of the process image of the original) indefinitely, which is sometimes a useful thing to do. It is the fork() part of fork() + exec() that starts a new process; the exec() part merely changes what code the new process runs.
I am looking for C code to use on a Linux based system to start another process asynchronously. The second process should continue, even if the first ends. I've looked through the "fork" and "system" and "exec" options, but don't see anything that will spawn a peer process that's not communicating with or a child of the original process.
Can this be done?
Certainly you can. In the parent fork() a child, and in that child first call daemon() (which is an easy way to avoid setsid etc.), then call something from the exec family.
In Linux (and Unix), every process is created by an existing process. You may be able to create a process using fork and then, kill the parent process. This way, the child will be an orphan but still, it gets adopted by init. If you want to create a process that is not inherited by another, I am afraid that may not be possible.
You do a fork (man 2 fork) followed by an execl (man 2 execl)
For creates a new process of the same image as the calling process (so a perfect twin), where execl replaces one of the twins with a new image.
If you search google for "fork execl" you will find many text book examples -- including how to use correctly fork() and exec()
The most common fork-execl you will still have the new process associated to the terminal -- to create a perfect background process you need to create what is called a daemon process -- the template for that can be fornd in this answer here Creating a daemon in Linux
I would like to know if there is any good way to execute an external command in Linux environment using C language without using system(), popen(), fork(), exec()?
The reason I cannot use these functions is that my main application has used up most of the system resources (i.e memory) in my embedded board. If I do a fork, the board won't be able to create a duplicate of my main application. From I read in a book, both system() and popen() actually using fork() underneath, so I cannot use them either.
The only idea I currently have is create a process before I run my main application and use IPC(pipe or socket) to let the new process know what external commands it needs to run with system() or popen() and return the results back to my application when it is done.
You cannot do this. Linux create new process by sequential call to fork() and exec(). No other way of process creation exists.
But fork() itself is quite efficient. It uses Copy-on-Write for child process, so fork() not copy memory until it is really needed. So, if you call exec() right after fork() your system won't eat too much memory.
UPD. I lie to you saying about process creation. In fact, there is clone() call which fork() uses internally. This call provides more control over process creation, but it can be complicated to use. Read man 2 fork and man 2 clone for more information.
On an embedded platform (with no swap partition), I have an application whose main process occupies most of the available physical memory. The problem is that I want to launch an external shell script from my application, but using fork() requires that there be enough memory for 2x my original process before the child process (which will ultimately execl itself to something much smaller) can be created.
So is there any way to invoke a shell script from a C program without incurring the memory overhead of a fork()?
I've considered workarounds such as having a secondary smaller process which is responsible for creating shells, or having a "watcher" script which I signal by touching a file or somesuch, but I'd much rather have something simpler.
Some UNIX implementations will give you a vfork (part of the Single UNIX spec) which is exactly like fork except that it shares all the stuff with the parent.
With vfork, there are a very limited number of things you can do in the child before calling exec to overwrite the address space with another process - that's basically what vfork was built for, a minimal copy version of fork for the fork/exec sequence.
If your system has an MMU, then usually fork() is implemented using copy-on-write, which doesn't actually allocate more memory at the time fork() is called. Additional memory would only be allocated if you write to any of the pages shared with the parent process. An exec() would then discard those pages.
If you know you don't have an MMU, then perhaps fork() is indeed implemented using an actual copy. Another approach might be to have a helper process that is responsible for spawning subshells, which you communicate with using a pipe.
I see you've already accepted an answer, but you may want to read about posix_spawn and use if it if it's available on your target:
http://www.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html
It sounds as if the prudent move in this case is to port your shell script (if possible) to C, and executing it within the process; so you don't have to fork at all.
Then again; I don't know what you are actually trying to do.
Instead of forking your process to spawn a shell, launch a shell within your process (in foreground) then fork it within the shell.
system("/bin/ash /scripts/bgtask");
with /scripts/bgtask being:
/bin/ash /scripts/propertask &
This way you double only the memory used by the shell, not by the main program. Your main program goes busy for duration of spawning the two shells: original to start bgtask and the background clone launched by it, then the memory allocated by the first shell is free again.