This would be easy with fork(), but I've got no MMU. I've heard that vfork() blocks the parent process until the child exits or executes exec(). How would I accomplish something like this?:
pid_t pid = vfork();
if (pid == -1) {
// fail
exit(-1);
}
if (pid == 0) {
// child
while(1) {
// Do my daemon stuff
}
// Let's pretend it exits sometime
exit();
}
// Continue execution in parent without blocking.....
It seems there is no way to do this exactly as you have it here. exec or _exit have to get called for the parent to continue execution. Either put the daemon code into another executable and exec it, or use the child to spawn the original task. The second approach is the sneaky way, and is described here.
daemon() function for uClinux systems without MMU and fork(), by Jamie Lokier, in patch format
You can't do daemon() with vfork(). To create something similar to a daemon on !MMU using vfork(), the parent process doesn't die (so there are extra processes), and you should call your daemon on the background (i.e. by appending & to the command line on the shell).
On the other hand, Linux provides clone(). Armed with that, knowledge and care, it's possible to implement daemon() for !MMU. Jamie Lokier has a function to do just that on ARM and i386, get it from here.
Edit: made the link to Jamie Lokier's daemon() for !MMU Linux more prominent.
I would have thought that this would be the type of problem that many others had run into before, but I've had a hard time finding anyone talking about the "kill the parent" problems.
I initially thought that you should be able to do this with a (not quite so, but sort of) simple call to clone, like this:
pid_t new_vfork(void) {
return clone(child_func, /* child function */
child_stack, /* child stack */
SIGCHLD | CLONE_VM, /* flags */
NULL, /* argument to child */
NULL, /* pid of the child */
NULL, /* thread local storage for child */
NULL); /* thread id of child in child's mem */
}
Except that determining the child_stack and the child_func to work the way that it does with vfork is pretty difficult since child_func would need to be the return address from the clone call and the child_stack would need to be the top of the stack at the point that the actual system call (sys_clone) is made.
You could probably try to call sys_clone directly with
pid_t new_vfork(void) {
return sys_clone( SIGCHLD | CLONE_VM, NULL);
}
Which I think might get what you want. Passing NULL as the second argument, which is the child_stack pointer, causes the kernel to do the same thing as it does in vfork and fork, which is to use the same stack as the parent.
I've never used sys_clone directly and haven't tested this, but I think it should work. I believe that:
sys_clone( SIGCHLD | CLONE_VM | CLONE_VFORK, NULL);
is equivalent to vfork.
If this doesn't work (and you can't figure out how to do something similar) then you may be able to use the regular clone call along with setjump and longjmp calls to emulate it, or you may be able to get around the need for the "return's twice" semantics of fork and vfork.
Related
So i've been struggling with this exercise. I must get al of the System Calls made by any given Linux command of my choice (I.E. ls or cd), list them in a .txt file, and have their unique IDs listed beside them.
So far here's what i got:
strace -o filename.txt ls
This when executed in the Linux shell gives me a "filename.txt" file containing all the system calls of the ls command. Now in my C script:
#include <stdio.h>
#include <stdlib.h>
int main(){
system("strace -o filename.txt ls");
return 0;
}
This should do the same as the previous code, but it's not returning me anything, although the code succesfully compiles. How would i go about fixing this, and then get the IDs? I'm using the "stdlib" library because in my research i found that it has some relation to system call IDs, but haven't found any indication on how to get them. Basically i must read that file i created and have it give each system call its ID.
The exercise is obviously designed to be solved by using the ptrace() facility, because the strace utility does not have an option to print the syscall number (as far as I know).
Technically, you can use something like
printf '#include <sys/syscall.h>\n' | gcc -dD -E - | awk '$1 == "#define" { m[$2] = $3 } END { for (name in m) if (name ~ /^SYS_/) { v = name; while (v in m) v = m[v]; sub(/^SYS_/, "", name); printf "%s %s\n", v, name } }'
to generate a number of syscall-number syscall-name lines, to be used for mapping syscall names back to syscall numbers, but this would be silly and error-prone. Silly, because being able to use ptrace() gives you much more control than using the strace utility, and using a "clever hack" like above just means you avoid learning how to do that, which in my opinion is by definition self-defeating and therefore utterly silly; and error-prone, because there is absolutely no guarantee that the installed headers match the running architecture. This is especially problematic on multiarch architectures, where you can use -m32 and -m64 compiler options to switch between 32-bit and 64-bit architectures. They typically have completely different syscall numbers.
Essentially, your program should:
fork() a child process.
In the child process:
Enable ptracing by calling prctl(PR_SET_DUMPABLE, 1L)
Make parent process the tracer by calling ptrace(PTRACE_TRACEME, (pid_t)0, (void *)0, (void *)0)
Optionally, set tracing options. For example, call ptrace(PTRACE_SETOPTIONS, getpid(), PTRACE_O_TRACECLONE | PTRACE_O_TRACEEXEC | PTRACE_O_TRACEEXIT | PTRACE_O_TRACEFORK) so that you catch at least clone(), fork(), and exec() family of syscalls.
If you do not set the PTRACE_O_TRACEEXEC option, you should stop the child process at this point using e.g. raise(SIGSTOP);, so that the parent process can start tracing this child.
Execute the command to be traced using e.g. execv(). In particular, if the first command line parameter is the command to run, optionally followed by its options, you can use execvp(argv[1], argv + 1);.
If you set the PTRACE_O_TRACEEXEC option above, then the kernel will auto-pause the child process just before executing the new binary.
If the exec fails, the child process should exit. I like to use exit(127);, to return exit status 127.
In the parent process, use waitpid(childpid, &status, WUNTRACED | WCONTINUED in a loop, to catch events in the child process.
The very first event should be the initial pause, i.e. WIFSTOPPED(status) being true. (If not, something else went wrong.)
There are three three different reasons why waitpid(childpid, &status, WUNTRACED | WCONTINUED) may return:
When the child exits (WIFEXITED(status) will be true).
This should obviously end the tracing, and have the parent tracer process exit, too.
When the child resumes execution (WIFCONTINUED(status) will be true).
You cannot assume that a PTRACE_SYSCALL, PTRACE_SYSEMU, PTRACE_CONT etc. commands have actually caused the child process to continue, until the parent gets this signal. In other words, you cannot just fire ptrace() commands to the child process, and expect them to take place in an orderly fashion! The ptrace() facility is asynchronous, and the call will return immediately; you need to waitpid() for the WIFCONTINUED(status) type of event to know that the child process heeded the command.
When the kernel stopped the child (with SIGTRAP) because the child process is about to execute a syscall. (In the parent, WIFSTOPPED(status) will be true.)
Whenever the child process gets stopped because it is about to execute a syscall, you need to use ptrace(PTRACE_GETREGS, childpid, (void *)0, ®s) to obtain the CPU register state in the child process at the point of syscall execution.
regs is of type struct user, defined in <sys/user.h>. For Intel/AMD architectures, regs.regs.eax (for 32-bit) or regs.regs.rax (for 64-bit) contains the syscall number (SYS_foo as defined in <sys/syscall.h>.
You then need to call ptrace(PTRACE_SYSCALL, childpid, (void *)0, (void *)0) to tell the kernel to execute that syscall, and waitpid() again to wait for the WIFCONTINUED(status) event notifying that it did.
The next WIFSTOPPED(status) type event from waitpid() will occur when the syscall is completed. If you want, you can use PTRACE_GETREGS again to examine regs.regs.eax or regs.regs.rax, which contains the syscall return value; on Intel/AMD, if an error occurred, it will be a negative errno value (i.e. -EACCES, -EINVAL, or similar.)
You need to call ptrace(PTRACE_SYSCALL, childpid, (void *)0, (void *)0) to tell the kernel to continue running the child, until the next syscall.
There are quite a few examples on-line showing some of the details above, although most that I have personally seen are pretty lax on error checking, and occasionally omit checking the WIFCONTINUED(status) waitpid() events. I've even written an answer detailing how to stop and continue individual threads on StackOverflow. Since the technique can be used as a very powerful custom debugging tool, I do recommend you try to learn the facility so you can leverage it in your work, rather than just copy-paste some existing code to get a passing grade on the exercise.
I want to run the child process earlier than the parent process. I just want to use execv call from the child process, So i am using vfork instead of fork.
But suppose execv fails and returns, i want to return non-zero value from the function in which i am calling vfork.
something like this,
int start_test()
{
int err = 0;
pid_t pid;
pid = vfork();
if(pid == 0) {
execv(APP, ARGS);
err = -1;
_exit(1);
} else if(pid > 0) {
//Do something else
}
return err;
}
Is above code is proper, Or i should use some other mechanism to run child earlier than parent?
On some links, I have read that, We should not modify any parent resource in child process created through vfork(I am modifying err variable).
Is above code is proper?
No. The err variable is already in the child's memory, thus parent will not see its modification.
We should not modify any parent resource in child process created through vfork (I am modifying err variable).
The resource is outdated. Some *nix systems in the past tried to play tricks with the fork()/exec() pair, but generally those optimizations have largely backfired, resulting in unexpected, hard to reproduce problems. Which is why the vfork() was removed from the recent versions of POSIX.
Generally you can make this assumptions about vfork() on modern systems which support it: If the child does something unexpected by the OS, the child would be upgraded from vfork()ed to normal fork()ed one.
For example on Linux the only difference between fork() and vfork() is that in later case, more of the child's data are made into lazy COW data. The effect is that vfork() is slightly faster than fork(), but child is likely to sustain some extra performance penalty if it tries to access the yet-not-copied data, since they are yet to be duplicated from the parent. (Notice the subtle problem: parent can also modify the data. That would also trigger the COW and duplication of the data between parent and child processes.)
I just want to use execv call from the child process, So i am using vfork instead of fork.
The handling should be equivalent regardless of the vfork() vs fork(): child should return with a special exit code, and parent should do the normal waitpid() and check the exit status.
Otherwise, if you want to write a portable application, do not use the vfork(): it is not part of the POSIX standard.
P.S. This article might be also of interest. If you still want to use the vfork(), then read this scary official manpage.
Yes. You should not modify the variable err. Use waitpid in the parent process to check the exit code of the child process. Also check the return value from execv and the errno variable(see man execv), to determine why your execv fails.
I am curious to see if it would be possible to implement posix_spawn in Linux using a combination of vfork+exec. In a very simplified way (leaving out most optional arguments) this could look more or less like this:
int my_posix_spawn(pid_t *ppid, char **argv, char **env)
{
pid_t pid;
pid = vfork();
if (pid == -1)
return errno;
if (pid == 0)
{
/* Child */
execve(argv[0], argv, env);
/* If we got here, execve failed. How to communicate this to
* the parent? */
_exit(-1);
}
/* Parent */
if (ppid != NULL)
*ppid = pid;
return 0;
}
However I am wondering how to cope with the case where vfork succeeds (so the child process is created) but the exec call fails. There seems to be no way to communicate this to the parent, which would only see that it could apparently create a child process successfully (as it would get a valid pid back)
Any ideas?
As others have noted in the comments, posix_spawn is permitted to create a child process that immediately dies to due to exec failure or other post-fork failures; the calling application needs to be prepared for this. But of course it's preferable not to do so.
The general procedure for communicating exec failure to the parent is described in an answer I wrote on this question: What can cause exec to fail? What happens next?.
Unfortunately, however, some of the operations you need to perform are not legal after vfork due to its nasty returns-twice semantics. I've covered this topic in the past in an article on ewontfix.com. The solution for making a posix_spawn that avoids duplicating the VM seems to be using clone with CLONE_VM (and possibly CLONE_VFORK) to get a new process that shares memory but doesn't run on the same stack. However, this still requires a lot of care to avoid making any calls to libc functions that might modify memory used by the parent. My current implementation is here:
http://git.musl-libc.org/cgit/musl/tree/src/process/posix_spawn.c?id=v1.1.4
and as you can see it's rather complicated. Reading the git history may be informative regarding some of the design decisions that were made.
I don't think there's any good way to do this with the current set of system calls. You've correctly identified the biggest problem -- the absence of any reliable way to report failure after the vfork. Other problems include race conditions in setting child state, and Linux's lack of interest in picking up closefrom.
Several years ago I sketched a new system-level API that would solve this problem: the key addition is a system call, which I called egg(), that creates a process without giving it an address space, and inheriting no state from the parent. Obviously, an egg process can't execute code; but you can (with a whole bunch more new system calls) set all of its kernelside state, and then (with yet another system call, hatch()) load an executable into it and set it going. Crucially, all of the new system calls report failure in the parent. For instance, there's a dup_into(pid, to_fd, from_fd) call that copies parent file descriptor from_fd to egg-state process pid's file descriptor to_fd; if it fails, the parent gets the failure code.
I never had time to flesh all of that out into a coherent API specification and code it up (and I'm not a kernel hacker, anyway) but I still think the concept has legs and I would be happy to work with someone to get it done.
Let's suppose we have a code doing something like this:
int pipes[2];
pipe(pipes);
pid_t p = fork();
if(0 == p)
{
dup2(pipes[1], STDOUT_FILENO);
execv("/path/to/my/program", NULL);
...
}
else
{
//... parent process stuff
}
As you can see, it's creating a pipe, forking and using the pipe to read the child's output (I can't use popen here, because I also need the PID of the child process for other purposes).
Question is, what should happen if in the above code, execv fails? Should I call exit() or abort()? As far as I know, those functions close the open file descriptors. Since fork-ed process inherits the parent's file descriptors, does it mean that the file descriptors used by the parent process will become unusable?
UPD
I want to emphasize that the question is not about the executable loaded by exec() failing, but exec itself, e.g. in case the file referred by the first argument is not found or is not executable.
You should use exit(int) since the (low byte) of the argument can be read by the parent process using waitpid(). This lets you handle the error appropriately in the parent process. Depending on what your program does you may want to use _exit instead of exit. The difference is that _exit will not run functions registered with atexit nor will it flush stdio streams.
There are about a dozen reasons execv() can fail and you might want to handle each differently.
The child failing is not going to affect the parent's file descriptors. They are, in effect, reference counted.
You should call _exit(). It does everything exit() does, but it avoids invoking any registered atexit() functions. Calling _exit() means that the parent will be able to get your failed child's exit status, and take any necessary steps.
I'd like to use the namespacing features of the clone function. Reading the manpage, it seems like clone has lots of intricate details I need to worry about.
Is there an equivalent clone invocation to good ol' fork()?
I'm already familiar with fork, and believe that if I have a starting point in clone, I can add flags and options from there.
I think that this will work, but I'm not entirely certain about some of the pointer arguments.
pid_t child = clone( child_f, child_stack,
/* int flags */ SIGCHLD,
/* argument to child_f */ NULL,
/* pid_t *pid */ NULL,
/* struct usr_desc * tls */ NULL,
/* pid_t *ctid */ NULL );
In the flags parameter the lower byte of it is used to specify which signal to send to notify the parent of the thread doing things like dying or stopping. I believe that all of the actual flags turn on switches which are different from fork. Looking at the kernel code suggests this is the case.
If you really want to get something close to fork you may want to call sys_clone which does not take function pointer and instead returns twice like fork.
You could fork a normal child process using fork(), then use unshare() to create a new namespace.
Namespaces are a bit weird, I can't see a lot of use-cases for them.
clone() is used to create a thread. The big difference between clone() and fork() is that clone() is meant to execute starting at a separate entry point - a function, whereas fork() just continues on down from the same point in the code from where was invoked. int (*fn)(void *) in the manpage definition is the function, which on exit returns an int, the exit status.
The closest call to clone is pthread_create() which is essentially a wrapper for clone().
This does not get you a way to get fork() behavior.