How to programmatically backtrace crash of forked child using C - c

Is there a possibility to backtrace a location where child process crashed in Linux using C/C++ code?
What I want to do is the following:
fork a new child process and retrieve it's PID
wait for forked child process to crash ... probably using signal handler for SIGCHLD, or using waitpid()/waitid()
retrieve stack trace of child at location where it crashed
This would make parent process act similar to debugger when attached proces crashes.
You can assume that child process is compiled with debug symbols and parent process has root permissions.
What is the simplest way to achieve such functionality?

It is much simpler in Linux to use the libSegFault library provided as part of the GNU C library. On my system, it is installed in /lib/x86_64-linux-gnu/libSegFault.so.
All you need to do is to set SEGFAULT_SIGNALS environment variable to all (so you can catch all crash causes the library supports), optionally SEGFAULT_OUTPUT_NAME to point to the file the stack trace is written to (default is to standard error), and LD_PRELOAD to point to the segfault library. As long as the process does not modify these environment variables, they apply to all child processes as well.
For example, if ./yourprog was the program that forks a child that crashes, and you want the stack trace to ./yourprog.stacktrace, run
SEGFAULT_SIGNALS=all \
SEGFAULT_OUTPUT_NAME=./yourprog.stacktrace \
LD_PRELOAD=/lib/x86_64-linux-gnu/libSegFault.so \
./yourprog
or all in one line without the backslashes (\).
The only downside is that each crash overwrites the existing file, so you'll only see the latest one. If you have /proc mounted, then the crash dump includes both a backtrace and the memory map of the process at the crash moment.
If you insist on doing it in your own C program, I recommend you first take a look at the libSegFault sources.
The point is, the stack trace must be dumped by the process itself; it is not accessible to the parent. To do that, you inject code into the child process using e.g. LD_PRELOAD environment variable (which is one of the dynamic linker control variables in Linux). (Note that the stack tracing etc. is done in a signal handler context, so only async-signal-safe functions should be used.)
For example, the parent process can create a pipe, and move its write end to a specific descriptor in the child process before executing the target process, with your helper preload library path in LD_PRELOAD.
The helper preload library interposes signal(), sigaction(), and possibly sigprocmask(), sigwait(), sigwaitinfo(), pthread_sigmask(), to ensure the helper librarys crash dump signal handlers are executed when such a signal is delivered (SIGSEGV, SIGBUS, SIGILL, possibly SIGTRAP). The signal handler does the stack dump (and prints /proc/PID/maps), then sets the signal disposition to default, and re-raises the signal (using raise()).
Essentially, it boils down to doing the same as above libSegFault, except with your own C code.
If you don't want to inject code to the child process, or managing the signal handlers is too complicated, you can use ptrace instead.
When the tracee is killed by a signal (other than SIGKILL), the thread receiving the signal is stopped first ("signal-delivery-stop"), so the tracer can examine its stack (and memory map of the tracee), before letting the child process continue/die.
In practice, ptracing is more invasive, as there are many events that cause the tracees threads to stop. It is also much more complicated for multithreaded processes than the LD_PRELOAD approach, because ptrace can control individual threads in the tracee; there are much more details to get right.

Related

Can I kill another process from SIGSEGV handler?

Background: I'm fuzzing a long-lived process with afl-fuzz by passing to it the filename to process from a stub that afl-fuzz runs for each sample.
When the long-lived process crashes via SIGSEGV, I want the stub to also generate a SIGSEGV, so that afl-fuzz will mark the sample as interesting.
Will calling kill(stub_pid, SIGSEGV) from the long-lived process's SIGSEGV handler work ?
Will calling kill(stub_pid, SIGSEGV) from the long-lived process's SIGSEGV handler work ?
If a process ends up in a SIGSEGV-handler something very bad happened, which might include a completely destroyed stack and/or memory management.
It is not a good idea to rely on anything any more at this point, but just that the process goes down.
Trying to invoke any functionally beyond this point is likely to fail, that is unreliable.
A much safer approach to this would be to have the calling process monitor its child, and if the child happens to terminated unexpected (typically via SIGSEGV) start the appropriate actions.
Have a look at signal handling inside shell scripts (seach-key: "trap"), as such a script might be the parent to the process you want to monitor.
not recommended to do this through SIGSEGV but you can do this if you have proper permission.
Instead of wondering how to cause a segmentation fault in your program so that AFL would notice something odd, just call abort(). SIGABRT is caught by AFL as well and is much easier to trigger.

Wait for child exec

Short quesion:
I want wait in the parent for the child to be replaced with some exec call, not wait for terminate.
How can I do it?
(c language, linux platform)
Basile's answer is incorrect.
While it is true that there's no real way to wait for an exec after a call to fork(2), this is not the only way to create a child process. What you can do instead is use the vfork(2) call. This will block in the parent until the child calls either _exit or one of the exec functions.
Note that part of the reason this works the way it does is that the child process from vfork(2) does not, in fact, clone the entirety of the parent's address space. This means it is undefined behaviour to modify data in the child process before exec. If you need to do anything weird, you may be better off with for example using pause(2) and installing a signal handler for SIGUSR1 or some other signal of your choice, then using that signal immediately before the exec, or using some other IPC mechanism as mentioned above.
If you don't need to do anything special at all, and only want to call fork/exec right after one another, but want to be sure that execution of the child process has started, you can instead use posix_spawn(3), which should also start an external program immediately, effectively blocking the parent until after the exec.
You can't wait in a parent for the child to do some exec, except by having some convention about IPC, e.g. deciding to send something (in the child) on a pipe(7) just before the exec. You'll set up the pipe(2) before the fork(2). You might also use the Linux specific eventfd(2) for such IPC.
After the fork(2) and before any exec you are running (in the child process) the same code as the parent. So it is up to you to implement such conventional communications.
BTW, generally, the child process does not do a lot of things after the fork and before the exec, so waiting for the exec to happen is useless.... In the unlikely case an error happens -including failure of exec- you just _exit (usually with an exit code like 127).
You might consider ptrace(2) (with PTRACE_SYSCALL ...) but I would not do it that way.
Read Advanced Linux Programming and study the source code of some free software shells (sash or bash). Use also strace to understand what is happening in a shell.

vfork VS fork in MPI mutiple thread

My program is very very large. So, I can't list it here. My program uses openMPI & mutiple_thread.
The problem has been solved. (using vfork() instead of fork()) But I don't know why it works. So, could anyone give me an explaination about it?
The problem is caused by free().
There are some segments of code in my program. All these segments are in threads which is created by pthread_create. The logic of these segments are like:
{
*p = malloc();
fun(p);
free(p);
}
All errors are at free(). It report a segment fault error. I ran the program more than 100 times. I found that there is always a fork() being called before each corruption at free.
The logic of fork segment is like(in thread):
{
MPI_program_code...
if(!fork())
{
execv(exe_file,arg);
}
MPI_program_code...
}
(Note that, in exe_file no MPI_function is used.)
When I use vfork() instead of fork(), there is no problem at all. But I don't know why it works.
So, could anyone explain why it works?
You might find the Open MPI FAQ topic on forking child processes very useful. Also an explanation on why using fork() with InfiniBand is dangerous can be found here.
vfork(2) differs from fork(2) in that it is specifically designed to be as lightweight as possible and is only meant to be used together with an immediately following execve(2) (or any of its wrappers from the C library) or _exit(2) call. The reason for that is that vfork(2) creates a child process that shares all memory with the parent instead of having it copy-on-write mapped, i.e. the new child is more like a thread than like a full-blown process. Since the child also uses the stack of the original thread, the parent is blocked until the child has either execve'd another executable or exited.
Open MPI registers a fork() handler using pthread_atfork(). The handler is not called when vfork() is used on modern Linux systems, therefore no actions are taken by the parent process upon forking.

Building a Linux Debugger C

Trying to build a debugger in C for fuzzing.
Basically in linux, I just want to start a process via fork and then execve(), then monitor this process to see if it crashes after 1 second.
On linux, is this done via creating the process then monitoring the SIGNALs it generates for anything that looks like a crash? Or is it about monitoring the application and? I'm not sure.
Use the ptrace(2) system call:
While being traced, the child will stop each time a signal is
delivered, even if the signal is being ignored. (The exception is
SIGKILL, which has its usual effect.) The parent will be notified at
its next wait(2) and may inspect and modify the child process while it
is stopped. The parent then causes the child to continue, optionally
ignoring the delivered signal (or even delivering a different signal
instead).
The signals you should be interested in, regarding to the process having crashed are SIGSEGV (restricted memory access), SIGBUS (unaligned data access), SIGILL (illegal instruction), SIGFPE (illegal floating-point operation), etc.

Ctrl + C: does it kill threads too along with main process?

While running a thread program and repeatedly killing the main program using Ctrl + C, i see unexpected results in the program in second run. However, if i let the program run and voluntarily exit, there are no issues.
So, my doubt is, does Ctrl + C, kill threads also along with the main process?
Thanks in advance.
In multithreaded programming, signals are delivered to a single thread (usually chosen unpredictably among the threads that don't have that particular signal blocked). However, this does not mean that a signal whose default action is to kill the process only terminates one thread. In fact, there is no way to kill a single thread without killing the whole process.
As long as you leave SIGINT with its default action of terminating the process, it will do so as long as at least one thread leaves SIGINT unblocked. It doesn't matter which thread has it unblocked as long as at least one does, so library code creating threads behind the application's back should always block all signals before calling pthread_create and restore the signal mask in the calling thread afterwards.
Well, the only thing that Ctrl + C does is sending SIGINT to one thread in the process that is not masking the signal. Signals can be handled or ignored.
If the program does handle Ctrl+C, the usual behavior is self-termination, but once again, it could be used for anything else.
In your case, SIGINT is being received by one thread, which probably does kill itself, but does not kill the others.
Under Linux 2.6 using NPTL threads: I am assuming that the process uses the default signal handler, or calls exit() in it: Yes it does. The C library exit() call maps to the exit_group system call which exits all the threads immediately; the default signal handler calls this or something similar.
Under Linux 2.4 using Linuxthreads (or using 2.6 if your app still uses Linuxthreads for some weird reason): Not necessarily.
The Linuxthreads library implements threads using clone(), creating a new process which happens to share its address-space with the parent. This does not necessarily die when the parent dies. To fix this, there is a "master thread" which pthreads creates. This master thread does various things, one of them is to try to ensure that all the threads get killed when the process exits (for whatever reason).
It does not necessarily succeed
If it does succeed, it is not necessarily immediate, particularly if there are a large number of threads.
So if you're using Linuxthreads, possibly not.
The other threads might not exit immediately, or indeed at all.
However, no matter what thread library you use, forked child processes will continue (they might receive the signal if they are still in the same process-group, but can freely ignore it)

Resources