setuid() before calling execv() in vfork() / clone() - c

I need to fork an exec from a server. Since my servers memory foot print is large, I intend to use vfork() / linux clone(). I also need to open pipes for stdin / stdout / stderr. Is this allowed with clone() / vfork()?

From the standard:
[..] the behaviour is undefined if the process created by vfork() either modifies any data other than a variable of type pid_t used to store the return value from vfork(), or returns from the function in which vfork() was called, or calls any other function before successfully calling _exit() or one of the exec family of functions.
The problem with calling functions like setuid or pipe is that they could affect memory in the address space shared between the parent and child processes. If you need to do anything before exec, the best way is to write a small shim process that does whatever you need it to and then execs to the eventual child process (perhaps arguments supplied through argv).
shim.c
======
enum {
/* initial arguments */
ARGV_FILE = 5, ARGV_ARGS
};
int main(int argc, char *argv[]) {
/* consume instructions from argv */
/* setuid, pipe() etc. */
return execvp(argv[ARGV_FILE], argv + ARGV_ARGS);
}

I'd use clone() instead, using CLONE_VFORK|CLONE_VM flags; see man 2 clone for details.
Because CLONE_FILES is not set, the child process has its own file descriptors, and can close and open standard descriptors without affecting the parent at all.
Because the cloned process is a separate process, it has its own user and group ids, so setting them via setresgid() and setresuid() (perhaps calling setgroups() or initgroups() first to set the additional groups -- see man 2 setresuid, man 2 setgroups, and man 3 initgroups for details) will not affect the parent at all.
The CLONE_VFORK|CLONE_VM flags mean this clone() should behave like vfork(), with the child process running in the same memory space as the parent process up till the execve() call.
This approach avoids the latency when using an intermediate executable -- it is pretty significant --, but the approach completely Linux-specific.

Related

Creating a child process WITHOUT fork()

Is there a way to start a child process without fork(), using execvp() exclusively?
The pedantic answer to your question is no. The only system call that creates a new process is fork. The system call underlying execvp (called execve) loads a new program into an existing process, which is a different thing.
Some species of Unix have additional system calls besides fork (e.g. vfork, rfork, clone) that create a new process, but they are only small variations on fork itself, and none of them are part of the POSIX standard that specifies the functionality you can count on on anything that calls itself a Unix.
The slightly more helpful answer is that you might be looking for posix_spawn, which is a library routine wrapping fork and exec into a single operation, but I find it more troublesome to use that correctly than to write my own fork+exec subroutine. YMMV.
posix_spawn is the only posix compliant way to create a child process without calling fork directly. I say 'directly' because historically posix_spawn would itself just call fork, or vfork. However, that is no longer the case in GNU/linux. posix_spawn itself may be more efficient than fork, in addition to perhaps being a stronger fit conceptually when code is attempting to run a different executable.
If you aren't worried about portability, you can abandon posix and couple yourself directly to the kernel you are targeting. On linux the system call to create a child process is clone. At the time of this answer the manual page provides documentation for three variants, including the relatively new clone3.
I believe you can take the example from the manual page and add an execvp call to childFunc. I have not tried it yet, though!
Unlike Windows systems, where creating a new process and executing a new process image happen in a single step, Linux and other UNIX-like systems do them as two distinct steps.
The fork function makes an exact duplicate of the calling process and actually returns twice, once to the parent process and once to the child process. The execvp function (and other functions in the exec family) executes a new process image in the same process, overwriting the existing process image.
You can call execvp without calling fork first. If so, that just means the currently running program goes away and is replaced with the given program. However, fork is the way to create a new process.
As user zwol has already explained, execve() does not fork a new process. Rather, it replaces the address space and CPU state of current process,
loads the new address space from the executable filename and starts it from
main() with argument list argv and environment variable list envp.
It keeps pid and open files.
int execve(const char *filename,char *const argv [],char *const envp[]);
filename: name of executable file to run
argv: Command line arguments
envp: environment variable settings (e.g., $PATH, $HOME, etc.)
posix_spawn. But it ignores failures of execvp() -- potentially because implementing this was regarded as too complicated.

Executing the program after the "fork part"

in my program, I use in main function fork to create 2 processes. Child process do something and parent process is forked again and his child calls another function. Both functions writes to 1 file and all works fine.
What I need is to write something to the end of file, after both functions and all processes (both functions create processes) finish.
I tried to write fprintf command everywhere in main and it allways writes somewhere in the middle of file, so I think that the main propably runs parallelly with the 2 functions.
I tried to use semaphore
s = sem_open(s1, o_CREATE, 0666, 0);
in this way: In the end of each function I wrote sem_post(s) and in main I put sem_wait(s); sem_wait(s); and after this i wrote fprintf command, but it also didn't work.
Is there some way how to solve this?
Thanks
I think you're looking for the wait function. See this stack overflow question: wait(NULL) will wait for all children to finish wait for a child process to finish (thanks Jonathan Leffler). Call wait in a loop to wait for all children processes to finish. Just use that function right before you write to the file in your parent process.
You can also read about the waitpid function if you want to wait for a specific process instead of for all the processes.
Edit:
Alternatively, you can actually use semaphores across processes, but it takes a little more work. See this stack overflow answer. The basic idea is to use the function sem_open with the O_CREAT constant. sem_open has 2 function signatures:
sem_t *sem_open(const char *name, int oflag);
sem_t *sem_open(const char *name, int oflag, mode_t mode, unsigned int value);
From the sem_open man page:
If O_CREAT is specified in oflag, then two additional arguments must
be supplied. The mode argument specifies the permissions to be
placed on the new semaphore, as for open(2). (Symbolic definitions
for the permissions bits can be obtained by including <sys/stat.h>.)
The permissions settings are masked against the process umask. Both
read and write permission should be granted to each class of user
that will access the semaphore. The value argument specifies the
initial value for the new semaphore. If O_CREAT is specified, and a
semaphore with the given name already exists, then mode and value are
ignored.
In your parent process, call sem_open with the mode and value parameters, giving it the permissions you need. In the child process(es), call sem_open("YOUR_SEMAPHORE_NAME", 0) to open that semaphore for use.

Is it possible to use fork without exec if both processes are executing the same program?

Here is a code sample where the fork library call is used to create a child process which shares the parent's address space. The child process executes its code without using the exec system call. My question is: is the exec system call not required in the case that both the parent and child processes are executing the same program?
#include <stdio.h>
int main()
{
int count;
count = fork();
if (count == 0)
printf("\nHi I'm child process and count =%d\n", count);
else
printf("\nHi I'm parent process and count =%d\n", count);
return 0;
}
The answer to this question may be different depending on the operating system. The man page for fork on OS X contains this ominous warning (bold portion is a paraphrase of the original):
There are limits to what you can do in the child process. To be
totally safe you should restrict yourself to only executing
async-signal safe operations until such time as one of the exec
functions is called. All APIs, including global data symbols, in any
framework or library should be assumed to be unsafe after a fork()
unless explicitly documented to be safe or async-signal safe. If you
need to use these frameworks in the child process, you must exec. In
this situation it's reasonable to exec another copy of the same executable.
The list of async-signal safe functions can be found in the man page for sigaction(2).
Is it possible to use fork without exec
Yes, it is possible.
is the exec system call not required in the case that both the parent and child processes are executing the same program
Yes, it is not required in that case.

What to do if exec() fails?

Let's suppose we have a code doing something like this:
int pipes[2];
pipe(pipes);
pid_t p = fork();
if(0 == p)
{
dup2(pipes[1], STDOUT_FILENO);
execv("/path/to/my/program", NULL);
...
}
else
{
//... parent process stuff
}
As you can see, it's creating a pipe, forking and using the pipe to read the child's output (I can't use popen here, because I also need the PID of the child process for other purposes).
Question is, what should happen if in the above code, execv fails? Should I call exit() or abort()? As far as I know, those functions close the open file descriptors. Since fork-ed process inherits the parent's file descriptors, does it mean that the file descriptors used by the parent process will become unusable?
UPD
I want to emphasize that the question is not about the executable loaded by exec() failing, but exec itself, e.g. in case the file referred by the first argument is not found or is not executable.
You should use exit(int) since the (low byte) of the argument can be read by the parent process using waitpid(). This lets you handle the error appropriately in the parent process. Depending on what your program does you may want to use _exit instead of exit. The difference is that _exit will not run functions registered with atexit nor will it flush stdio streams.
There are about a dozen reasons execv() can fail and you might want to handle each differently.
The child failing is not going to affect the parent's file descriptors. They are, in effect, reference counted.
You should call _exit(). It does everything exit() does, but it avoids invoking any registered atexit() functions. Calling _exit() means that the parent will be able to get your failed child's exit status, and take any necessary steps.

What is the purpose of fork()?

In many programs and man pages of Linux, I have seen code using fork(). Why do we need to use fork() and what is its purpose?
fork() is how you create new processes in Unix. When you call fork, you're creating a copy of your own process that has its own address space. This allows multiple tasks to run independently of one another as though they each had the full memory of the machine to themselves.
Here are some example usages of fork:
Your shell uses fork to run the programs you invoke from the command line.
Web servers like apache use fork to create multiple server processes, each of which handles requests in its own address space. If one dies or leaks memory, others are unaffected, so it functions as a mechanism for fault tolerance.
Google Chrome uses fork to handle each page within a separate process. This will prevent client-side code on one page from bringing your whole browser down.
fork is used to spawn processes in some parallel programs (like those written using MPI). Note this is different from using threads, which don't have their own address space and exist within a process.
Scripting languages use fork indirectly to start child processes. For example, every time you use a command like subprocess.Popen in Python, you fork a child process and read its output. This enables programs to work together.
Typical usage of fork in a shell might look something like this:
int child_process_id = fork();
if (child_process_id) {
// Fork returns a valid pid in the parent process. Parent executes this.
// wait for the child process to complete
waitpid(child_process_id, ...); // omitted extra args for brevity
// child process finished!
} else {
// Fork returns 0 in the child process. Child executes this.
// new argv array for the child process
const char *argv[] = {"arg1", "arg2", "arg3", NULL};
// now start executing some other program
exec("/path/to/a/program", argv);
}
The shell spawns a child process using exec and waits for it to complete, then continues with its own execution. Note that you don't have to use fork this way. You can always spawn off lots of child processes, as a parallel program might do, and each might run a program concurrently. Basically, any time you're creating new processes in a Unix system, you're using fork(). For the Windows equivalent, take a look at CreateProcess.
If you want more examples and a longer explanation, Wikipedia has a decent summary. And here are some slides here on how processes, threads, and concurrency work in modern operating systems.
fork() is how Unix create new processes. At the point you called fork(), your process is cloned, and two different processes continue the execution from there. One of them, the child, will have fork() return 0. The other, the parent, will have fork() return the PID (process ID) of the child.
For example, if you type the following in a shell, the shell program will call fork(), and then execute the command you passed (telnetd, in this case) in the child, while the parent will display the prompt again, as well as a message indicating the PID of the background process.
$ telnetd &
As for the reason you create new processes, that's how your operating system can do many things at the same time. It's why you can run a program and, while it is running, switch to another window and do something else.
fork() is used to create child process. When a fork() function is called, a new process will be spawned and the fork() function call will return a different value for the child and the parent.
If the return value is 0, you know you're the child process and if the return value is a number (which happens to be the child process id), you know you're the parent. (and if it's a negative number, the fork was failed and no child process was created)
http://www.yolinux.com/TUTORIALS/ForkExecProcesses.html
fork() is basically used to create a child process for the process in which you are calling this function. Whenever you call a fork(), it returns a zero for the child id.
pid=fork()
if pid==0
//this is the child process
else if pid!=0
//this is the parent process
by this you can provide different actions for the parent and the child and make use of multithreading feature.
fork() will create a new child process identical to the parent. So everything you run in the code after that will be run by both processes — very useful if you have for instance a server, and you want to handle multiple requests.
System call fork() is used to create processes. It takes no arguments and returns a process ID. The purpose of fork() is to create a new process, which becomes the child process of the caller. After a new child process is created, both processes will execute the next instruction following the fork() system call. Therefore, we have to distinguish the parent from the child. This can be done by testing the returned value of fork():
If fork() returns a negative value, the creation of a child process was unsuccessful.
fork() returns a zero to the newly created child process.
fork() returns a positive value, the process ID of the child process, to the parent. The returned process ID is of type pid_t defined in sys/types.h. Normally, the process ID is an integer. Moreover, a process can use function getpid() to retrieve the process ID assigned to this process.
Therefore, after the system call to fork(), a simple test can tell which process is the child. Please note that Unix will make an exact copy of the parent's address space and give it to the child. Therefore, the parent and child processes have separate address spaces.
Let us understand it with an example to make the above points clear. This example does not distinguish parent and the child processes.
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#define MAX_COUNT 200
#define BUF_SIZE 100
void main(void)
{
pid_t pid;
int i;
char buf[BUF_SIZE];
fork();
pid = getpid();
for (i = 1; i <= MAX_COUNT; i++) {
sprintf(buf, "This line is from pid %d, value = %d\n", pid, i);
write(1, buf, strlen(buf));
}
}
Suppose the above program executes up to the point of the call to fork().
If the call to fork() is executed successfully, Unix will make two identical copies of address spaces, one for the parent and the other for the child.
Both processes will start their execution at the next statement following the fork() call. In this case, both processes will start their execution at the assignment
pid = .....;
Both processes start their execution right after the system call fork(). Since both processes have identical but separate address spaces, those variables initialized before the fork() call have the same values in both address spaces. Since every process has its own address space, any modifications will be independent of the others. In other words, if the parent changes the value of its variable, the modification will only affect the variable in the parent process's address space. Other address spaces created by fork() calls will not be affected even though they have identical variable names.
What is the reason of using write rather than printf? It is because printf() is "buffered," meaning printf() will group the output of a process together. While buffering the output for the parent process, the child may also use printf to print out some information, which will also be buffered. As a result, since the output will not be send to screen immediately, you may not get the right order of the expected result. Worse, the output from the two processes may be mixed in strange ways. To overcome this problem, you may consider to use the "unbuffered" write.
If you run this program, you might see the following on the screen:
................
This line is from pid 3456, value 13
This line is from pid 3456, value 14
................
This line is from pid 3456, value 20
This line is from pid 4617, value 100
This line is from pid 4617, value 101
................
This line is from pid 3456, value 21
This line is from pid 3456, value 22
................
Process ID 3456 may be the one assigned to the parent or the child. Due to the fact that these processes are run concurrently, their output lines are intermixed in a rather unpredictable way. Moreover, the order of these lines are determined by the CPU scheduler. Hence, if you run this program again, you may get a totally different result.
You probably don't need to use fork in day-to-day programming if you are writing applications.
Even if you do want your program to start another program to do some task, there are other simpler interfaces which use fork behind the scenes, such as "system" in C and perl.
For example, if you wanted your application to launch another program such as bc to do some calculation for you, you might use 'system' to run it. System does a 'fork' to create a new process, then an 'exec' to turn that process into bc. Once bc completes, system returns control to your program.
You can also run other programs asynchronously, but I can't remember how.
If you are writing servers, shells, viruses or operating systems, you are more likely to want to use fork.
Multiprocessing is central to computing. For example, your IE or Firefox can create a process to download a file for you while you are still browsing the internet. Or, while you are printing out a document in a word processor, you can still look at different pages and still do some editing with it.
Fork creates new processes. Without fork you would have a unix system that could only run init.
Fork() is used to create new processes as every body has written.
Here is my code that creates processes in the form of binary tree.......It will ask to scan the number of levels upto which you want to create processes in binary tree
#include<unistd.h>
#include<fcntl.h>
#include<stdlib.h>
int main()
{
int t1,t2,p,i,n,ab;
p=getpid();
printf("enter the number of levels\n");fflush(stdout);
scanf("%d",&n);
printf("root %d\n",p);fflush(stdout);
for(i=1;i<n;i++)
{
t1=fork();
if(t1!=0)
t2=fork();
if(t1!=0 && t2!=0)
break;
printf("child pid %d parent pid %d\n",getpid(),getppid());fflush(stdout);
}
waitpid(t1,&ab,0);
waitpid(t2,&ab,0);
return 0;
}
OUTPUT
enter the number of levels
3
root 20665
child pid 20670 parent pid 20665
child pid 20669 parent pid 20665
child pid 20672 parent pid 20670
child pid 20671 parent pid 20670
child pid 20674 parent pid 20669
child pid 20673 parent pid 20669
First one needs to understand what is fork () system call. Let me explain
fork() system call creates the exact duplicate of parent process, It makes the duplicate of parent stack, heap, initialized data, uninitialized data and share the code in read-only mode with parent process.
Fork system call copies the memory on the copy-on-write basis, means child makes in virtual memory page when there is requirement of copying.
Now Purpose of fork():
Fork() can be used at the place where there is division of work like a server has to handle multiple clients, So parent has to accept the connection on regular basis, So server does fork for each client to perform read-write.
fork() is used to spawn a child process. Typically it's used in similar sorts of situations as threading, but there are differences. Unlike threads, fork() creates whole seperate processes, which means that the child and the parent while they are direct copies of each other at the point that fork() is called, they are completely seperate, neither can access the other's memory space (without going to the normal troubles you go to access another program's memory).
fork() is still used by some server applications, mostly ones that run as root on a *NIX machine that drop permissions before processing user requests. There are some other usecases still, but mostly people have moved to multithreading now.
The rationale behind fork() versus just having an exec() function to initiate a new process is explained in an answer to a similar question on the unix stack exchange.
Essentially, since fork copies the current process, all of the various possible options for a process are established by default, so the programmer does not have supply them.
In the Windows operating system, by contrast, programmers have to use the CreateProcess function which is MUCH more complicated and requires populating a multifarious structure to define the parameters of the new process.
So, to sum up, the reason for forking (versus exec'ing) is simplicity in creating new processes.
Fork() system call use to create a child process. It is exact duplicate of parent process. Fork copies stack section, heap section, data section, environment variable, command line arguments from parent.
refer: http://man7.org/linux/man-pages/man2/fork.2.html
Fork() was created as a way to create another process with shared a copy of memory state to the parent. It works the way it does because it was the most minimal change possible to get good threading capabilities in time-slicing mainframe systems that previously lacked this capability. Additionally, programs needed remarkably little modification to become multi-process, fork() could simply be added in the appropriate locations, which is rather elegant. Basically, fork() was the path of least resistance.
Originally it actually had to copy the entire parent process' memory space. With the advent of virtual memory, it has been hacked and changed to be more efficient, with copy-on-write mechanisms avoiding the need to actual copy any memory.
However, modern systems now allow the creation of actual threads, which simply share the parent process' actual heap. With modern multi-threading programming paradigms and more advanced languages, it's questionable whether fork() provides any real benefit, since fork() actually prevents processes from communicating through memory directly, and forces them to use slower message passing mechanisms.

Resources