I develop a C code on Linux and I would like to execute a binary say /usr/sbin/binary_program -p xxx, Is there another way than system() call to execute a binary?
Yes, and in general, system should never be used, for at least these reasons:
It suffers from all the dangers of shell quoting issues, so anything but a hard-coded command line is potentially dangerous.
It is not thread-safe.
It interferes with signal handling in the calling program.
It provides no way to get output from the executed program except for the exit status, unless the command explicitly saves output to a file.
For executing external programs, you should use posix_spawn, or fork followed by one of the exec-family functions. However, if possible you should avoid dependency on external programs/commands, especially when it would be easier and less error-prone to do the work directly in your program. For example I've seen ridiculous usages like system("sleep 1"); instead of sleep(1);.
Yes, you can use the exec* family of functions.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/execv.html
If needed to simulate the behavior of system you can fork and then call an exec function.
The POSIX page of system says:
The system() function shall behave as if a child process were created using fork(), and the child process invoked the sh utility using execl() as follows:
execl(< shell path>, "sh", "-c", command, (char *)0);
It is important to realize that you can have several programs running simultaneously, and communicating thru pipes (or others Inter Process Communication). This is mostly possible thru a mixture of syscalls.
I strongly suggest reading Advanced Linux Programming, or some other good books explaining a lot more (than we can do in a few minutes) about various syscalls(2) involved, notably fork(2), pipe(2), dup2(2), execve(2), waitpid(2) and several others (perhaps poll(2) for multiplexing, e.g. to avoid deadlocks in circular pipes). The system(3) function is built above these syscalls (and /bin/sh)
That Advanced Linux Programming book has an entire chapter devoted to processes.
I also suggest to understand how a Unix command shell works. Either by studying the source code of some simple free shell (like sash) or at least by strace-ing it.
Practically speaking, popen(3) is more useful then system(3). You can get the output of the companion command.
Some libraries (Poco, Qt, Glib/GTK) also have powerful process management functions.
A new process is created with fork which is tricky to understand. A new program is started in the same process with execve.
All processes are created by fork (or perhaps vfork) except some few started magically by the kernel (/sbin/init, /sbin/modprobe, ...)
Related
In Windows, you can execute a process using CreateProcess().
Now I want to know how to execute a process in Linux, so far I only found that you can do that by calling fork() and then calling exec().
Is this the only way to execute a process in Linux?
Linux provides both high-level and low-level interfaces for starting new processes.
Low-level interfaces
The primary low-level interface is the one you have already discovered: fork(). This is often followed quickly by one of the several exec() functions, but an exec() is unneeded if it is satisfactory for the new process to execute the same code as the original. That's more common than it may sound to someone more used to the Windows API.
POSIX also defines posix_spawn() family, which serve as special-purpose alternatives to fork() + exec() for certain circumstances to which the latter are not well suited. posix_spawn() is implemented on Linux via the clone() library function (see next).
Although they are not in POSIX, Linux also provides clone() and vfork() as alternatives to fork(). On modern Linux these use the same system system call that fork() does, but not the fork() library function itself. There's not much reason to use vfork() any longer -- it is perhaps indicative that POSIX used to have it, but removed it nearly a decade ago. Linux-specific clone(), on the other hand, has some interesting behaviors that are occasionally useful. It provides a more general interface to the fork system call than any of the other functions discussed here do, but it is not portable. Generally speaking, prefer fork() unless you need something it cannot provide.
High(er)-level interfaces
Linux provides some higher-level interfaces as well. The primary two are system(), which executes a shell command and waits for it to complete, and popen(), which launches a shell command with either its standard input or standard output connected to a pipe, by which the concurrently-running parent process can communicate with it. Both of these are specified by POSIX, and on POSIX systems they are specified to operate via fork() + exec(). Of course Windows has system(), too, and _popen(), but not fork() or any direct analog.
Summary
Overall, then, a userspace Linux process can start a new process only by forking, but that is to be distinguished from calling the fork() library function, even indirectly. There are at least six different C library functions in GNU / Linux that can serve as an interface for starting a new process. Some of those interfaces furthermore permit the new process to execute the same code (a copy of the process image of the original) indefinitely, which is sometimes a useful thing to do. It is the fork() part of fork() + exec() that starts a new process; the exec() part merely changes what code the new process runs.
I've been looking at creating Unix dæmons, and there seem to be two methods. The long-winded one, which seems to come up when searching is to call fork(), setsid(), fork() again, chdir() to somewhere safe, set umask() and, finally, close() stdin, stdout and stderr.
Running man daemon, however, brings up information on a daemon() function, which seems to do all the same stuff as above. Are there any differences between the two approaches or is daemon() just a convenience function that does the same thing as the long-winded method? Is either one better, especially for a novice C programmer?
The daemon function is not defined in POSIX, so its implementation (if any) could behave differently on different platforms.
On Linux with glibc, daemon only does one fork, optionally chdirs (but only to /, you can't specify a path), does not touch umask, and does not close the std* descriptors (it optionally reopens them to /dev/null though). (source)
So it depends on the platform, and at least one implementation does less than what you do. If you need all of what you're doing, stick with that (or stick to a platform where the daemon function does exactly that).
Note that daemon is not conforming to any standard. Better use standard conforming functions (like POSIX-defined fork and setsid).
The daemon call summarizes the long-winded fork procedure, and I don't recall any implementation that does anything more.
Since daemon() is a high-level concept, it's definitely to be preferred for novice and experienced programmers.
I have confusion in creating a process in Linux. Up to now I thought that we can create the process by calling the following system calls.
system()
exec() family of system call
fork() system call
but:
system(): But as "system" system call executing the input executable on shell and shell is creating a child process for the execution of input .here shell is calling child process then we can say that fork is creating process for this.
exec family of system call: As this family of system call over write the current running process with new process.So it is also creating a new process but using same address space. As I think it is also calling call fork for creating the process.
I am confused with the fact all the above is possible way of creating a new process or only fork system.
exec family of system call does not call fork, neither it creates a new process.
It only overwrites the existing process with the new binary.
In linux user programs, fork is the only function to create new process. Though fork internally calls clone and other system calls.
In other hands, system is only a wrapper to fork and exec. The actual task of creating a process is done by fork in system. So system is not a way to create new process.
fork() creates a copy of your process. This is where you actually create a process in a POSIX environment like Linux. To precisely answer your question title, fork() is the only way to create a process.
What exec() does for you is then to replace a process (for example the process you just created with fork()) by another process, so exec() doesn't itself create a process but is often accompanied with fork(), since you usually want to create another process that is different from your current one.
Underneath the system() call, there's just a fork() followed by an exec(), so it's not a new way of creating a process.
In POSIX environment, You can create a process though fork system call without any exception. Fork will create a process.
exec family of function just load binary of other program to the address space of current process(which call the exec() system call).
In system() it is internally use fork() followed by exec()system call.
There is only two ways to create a new process: the system calls fork and clone.
The other functions mentioned, fall into two categories:
exec() family: These replace the contents of a process with some other program. Usually exec() is used right after a call to fork or clone to turn one of the resulting processes into a process of the desired application. When a bash executes a gcc command, for instance, it first forks itself, then it makes one of the two resulting bash processes into a gcc process using the exec() family.
system() family: These encapsulate a fork/clone system call and a corresponding exec() call, possibly doing fancy stuff like connecting stdin and stdout, etc.
Note that all of these functions fork(), clone(), exec(), system(), etc. are system call wrappers defined by the standard C library (which is always present), not the system calls themselves. As such, counterintuitively, fork() is a wrapper for the clone system call on current systems. Not that it matters much. However, the C library functions are standardized, the system calls are not.
Historically, fork is the older system call. While it is very easy to define and use its semantics, it always suffered from its performance implications: The entire process environment needs to be (at least logically) copied, however, most of this work is for nothing, as one of the resulting processes is usually completely overwritten by an exec call. Also, the fork semantics do not allow for thread creation. Due to these shortcomings, the clone call was introduced, which allows fine grained control on what is copied, and what is shared between the two processes, allowing pthreads to be implemented in terms of clone.
In addition of all the other answers, and to be picky, processes are created by fork(2) (or the obsolete vfork(2)...) and clone(2) syscalls (and no, the execve(2) syscall don't create a process, but overwrite its address space and state by starting a new program in the same process), but some processes are "magically" created by the kernel, notably:
/sbin/init is started by the kernel at startup (if not found, some other programs are tried, even /bin/sh ....); this is the process of pid 1 at is is started quite early...
Some kernel processes (or kernel threads) are started by the kernel, like kswapd, kworker (see this question), etc... I have more than 50 kernel processes or tasks
The Linux kernel is also sometimes starting user processes from kernel land, notably hotplug(8), modprobe, etc... See also udev etc...
Almost all processes are started by fork (or clone ...) and are descendants of /sbin/init (or the process of pid 1). (But modprobe or hotplug could be started by the kernel, and they usually fork other processes).
Process creation (thru fork etc....) is quite efficient. A shell is forking almost every command (except the builtin ones, like cd or ulimit...); clone is necessary for multi-threading (but can be used as a replacement of fork...)
Notice that system(3), popen(3) are library functions (not system calls, which are listed in syscalls(2) ...) calling both fork and execve (on /bin/sh ...) and that daemon(3) is a library function calling fork (twice) etc...
Use strace(1) (to find out which syscalls a program is executing) and read Advanced Linux Programming
These days, recent Libc are using clone more than fork (and some are not calling the fork syscall any more but only clone); you can have several libc, eg MUSL libc in addition (or in replacement) of GNU libc
I'm trying to run an application in C, but the only way I could find that is reasonably easy to use works like this:
system("command here");
It works, of course, but it's really slow (especially when repeating this a lot). I'm just wondering if there is a way of running a program without having to interact with a shell, something like python's subprocess module.
I have heard of execl, and I would use that (forking it first, of course), but I'm wondering if there is a simpler way that wouldn't require forking first.
EDIT: I also want to be able to know the return code of the program
As I'm sure you already know, system already employs the fork/exec strategy. I understand you want to circumvent the shell and are looking for a simple approach, I'm just saying you could just as easily write a function to wrap the fork/exec pattern as is done in system. Indeed it would probably be most straightforward to just do that. An alternative as Gabe mentioned in the comments is posix_spawn.
A faster (but apparently discouraged) alternative is vfork() / exec, but this is generally discouraged and is obsolete in the latest POSIX standards.
4.3BSD; POSIX.1-2001 (but marked OBSOLETE). POSIX.1-2008 removes the
specification of vfork().
It's meant to be immediately followed by an exec or _exit. Otherwise all kinds of weird bugs can arise since the virtual memory pages and page tables aren't duplicated (child uses same data/heap/stack segments). The parent/calling process blocks until the child execs or _exits. Regular fork's modern implementations have copy-on-write semantics which approach the speed of vfork, without the potential bugs incurred by vfork's memory sharing semantics.
If you want even further control over memory-sharing semantics and process inheritance, and the consequent potential speed-up (and are on Linux), look into clone() (wrapper for system-call sys_clone()) which is what some process-creating system calls delegate their work to. Be sure to carefully comb over all of the various flags.
You can use waitpid to get the exit status of the process.
If neither system() nor popen() provides the mechanism you need, then the easy way to do it is with fork() and execv() (or, perhaps, execl(), but the argument list must be fixed at compile time, not variable, to use it). Really! It is not hard to do fork() and exec(), and any alternative will encapsulate that processing.
The Python subprocess module is simply hiding fork() and exec() for you behind a convenient interface. That's probably appropriate for a high-level language like Python. C is a lower-level language and doesn't really need the complexity.
The hard way to do it is with posix_spawn(). You have to create arguments to describe all the actions you want done in the child between the fork() and the exec(), which is far harder to set up than it is to simply do the fork(), make the changes, and then use exec() after all. This (posix_spawn()) is what you get when you design the code to spawn a child process without visibly using fork() and exec() and ensure that it can handle almost any reasonable circumstance.
You'll need to consider whether you need to use wait() or waitpid() or a variant to determine when the child is complete. You may need to consider whether to handle the SIGCHLD signal (which will notify you when a child dies).
If the fork and exec patter is used just to run a program without freeze the current program, what's the advantage, for example, over using this single line:
system("program &"); // run in background, don't freeze
The system function creates a new shell instance for running the program, which is why you can run it in the background. The main difference from fork/exec is that using system like this actually creates two processes, the shell and the program, and that you can't communicate directly with the new program via anonymous pipes.
fork+exec is much more lightweight than system(). The later will create a process for the shell, the shell will parse the command line given and invoke the required executables. This means more memory, more execution time, etc. Obviously, if the program will run in background, these extra resources will be consumed only temporarily, but depending on how frequently you use it, the difference will be quite noticeable.
The man page for system clearly says that system executes the command by "calling /bin/sh -c command", which means system creates at least two processes: /bin/sh and then the program (the shell startup files may spawn much more than one process)
This can cause a few problems:
portability (what if a system doesn't have access to /bin/sh, or does not use & to run a process in the background?)
error handling (you can't know if the process exited with an error)
talking with the process (you can't send anything to the process, or get anything out of it)
performance, etc
The proper way to do this is fork+exec, which creates exactly one process. It gives you better control over the performance and resource consumption, and it's much easier to modify to do simple, important things (like error handling).