Debugging Linux process hangs, which code is it running?

Debugging Linux process hangs, which code is it running? - c

I have a process running on a very weak Linux embedded device, which could not run gdb / gdb server on itself. I let it provoking a function X from a shared library repeatedly (there are also some others process calling it at the same time with much less frequency), it usually hangs somewhere inside the shared library after 1 day or a half-day. How do I debug:
In case it blocked somewhere: which is the last line of code it ran?
In case it stuck in an infinite loop: which lines of code it running?
What I tried:
I dig into the shared library and put a lot of syslog inside to check. However, with a very high amount of syslog being called constantly, my process now hangs every 2-5 minutes. I think syslog is blocked by UNIX socket?

gdb comes with a program called gcore, which will generate a core file from the running process.
Many systems nowadays disable core files by default (ulimit -c in a shell will show 0). Use the ulimit -c unlimited shell command, then run your process in the same shell (these limits are inherited from the parent process. If you start your process some other way than directly from the shell, you will need to find out how to set them there. e.g., LimitCORE= in a systemd unit file).
Once your process gets into the bad state, run gcore on its process ID. You can then copy it to your workstation and load it into gdb (gdb <executable> <core-file>). You can then view the stack trace and other state as of the moment the core dump was taken.

Related

Detect if debugger is in use during runtime

I'm looking to develop a "secure" application and as a security mitigation, I'd like to be able to discover if a debugger (GDB, LLDB...) is in use on the currently running application; aborting if detected.
How can I detect monitoring of a statically-linked C application?
Walk the /proc tree
...

Just a crazy idea - load BPF program (assuming your binary has a capability to do it) to intercept ptrace syscall from process parent, and check if pid of process being traced match your process' pid, then you can either fail the syscall, preventing the debug, and send and event to userspace to stop your process.
Although it won't work for attached process, so you'd need to intercept ptrace from all processes, I'm not sure BPF allows it, don't remember.

Another crazy idea - tracer expects SIGTRAPs from tracee on each breakpoint/step, so you can catch this signal from your process, again using BPF, and do something about it. But again it is based on the assumption that tracer doesn't know about it.

You can't. Software can not detect if it runs in a perfect emulation or in the real world. And a emulator can be stopped, the software can be analyzed, variables can be changed, basically everything can be done what can be done in a debugger.
Lets say you want to detect if the parent process is a debugger. So you make a system call to get the parent PID? The debugger can intercept the system call and return any PID which does not have to be the real PID. You want to intercept every SIGTRAP so the debugger can't use it anymore? Well the debugger can just stop in this case and send the SIGTRAP also to your process.
You want to measure the time when you send SIGTRAP to know if the the process stops for a short time by the debugger for sending SIGTRAP so you know when there is a debugger? The debugger can replace your calls to get the time and return a fake time. Lets say you run on a Processor that has a instruction that returns the time, so no function call is needed to get the time. Now you can know that the time you are getting is real? No, the debugger can replace this instruction with a SIGTRAP instruction and return any time he wants or in case such a instruction does not exist, run the Software in a emulator that can be programmed in any way. Everything you can come up with to detect a debugger or emulator can be faked by the environment and you have 0 change to detect it.
The only way to stop debugging is by not giving the software to the customers but keep it in your hands. Make a cloud service and run the software on your server. In this case the customer can not debug your program since he does not run it and has no control over it. Except the customer can access the server or the data somehow, but that is a different story.

Shutdown Linux from C program meant to be ran as init process

I'm writing a C program to do various stuff on a Linux system, then shut it down. This program will be started with the command line option init=/path/to/program with PID 1, and because of that, I can't use execl("/sbin/poweroff", "poweroff", NULL);, because the poweroff command doesn't shutdown the system itself, it asks the process with PID 1 to do it. So what code does init use to shutdown the system?

Why doesn't poweroff work?
A number of programs assume the kernel has booted with init as PID 1. On many systems init is a symbolic link to the systemd program; similarly on these systems, poweroff is often a symbolic link to the systemctl program.
In your setup, systemd is never started since you set your custom init=/path/to/program kernel parameter line. This is why the poweroff command doesn't work: systemctl is trying to contact a systemd instance which was never created.
How to power off without systemd.
The reboot function is described in the Linux Programmer's Manual. Under glibc, you can pass the RB_POWER_OFF macro constant to perform the reboot.
Note that if reboot is not preceded by a call to sync, data may be lost.
Using glibc in Linux:
#include <unistd.h>
#include <sys/reboot.h>
sync();
reboot(RB_POWER_OFF);
See also
How to restart Linux from inside a C++ program?

Any possible solution to capture process entry/exit?

I Would like to capture the process entry, exit and maintain a log for the entire system (probably a daemon process).
One approach was to read /proc file system periodically and maintain the list, as I do not see the possibility to register inotify for /proc. Also, for desktop applications, I could get the help of dbus, and whenever client registers to desktop, I can capture.
But for non-desktop applications, I don't know how to go ahead apart from reading /proc periodically.
Kindly provide suggestions.

You mentioned /proc, so I'm going to assume you've got a linux system there.
Install the acct package. The lastcomm command shows all processes executed and their run duration, which is what you're asking for. Have your program "tail" /var/log/account/pacct (you'll find its structure described in acct(5)) and voila. It's just notification on termination, though. To detect start-ups, you'll need to dig through the system process table periodically, if that's what you really need.

Maybe the safer way to move is to create a SuperProcess that acts as a parent and forks children. Everytime a child process stops the father can find it. That is just a thought in case that architecture fits your needs.
Of course, if the parent process is not doable then you must go to the kernel.

If you want to log really all process entry and exits, you'll need to hook into kernel. Which means modifying the kernel or at least writing a kernel module. The "linux security modules" will certainly allow hooking into entry, but I am not sure whether it's possible to hook into exit.
If you can live with occasional exit slipping past (if the binary is linked statically or somehow avoids your environment setting), there is a simple option by preloading a library.
Linux dynamic linker has a feature, that if environment variable LD_PRELOAD (see this question) names a shared library, it will force-load that library into the starting process. So you can create a library, that will in it's static initialization tell the daemon that a process has started and do it so that the process will find out when the process exits.
Static initialization is easiest done by creating a global object with constructor in C++. The dynamic linker will ensure the static constructor will run when the library is loaded.
It will also try to make the corresponding destructor to run when the process exits, so you could simply log the process in the constructor and destructor. But it won't work if the process dies of signal 9 (KILL) and I am not sure what other signals will do.
So instead you should have a daemon and in the constructor tell the daemon about process start and make sure it will notice when the process exits on it's own. One option that comes to mind is opening a unix-domain socket to the daemon and leave it open. Kernel will close it when the process dies and the daemon will notice. You should take some precautions to use high descriptor number for the socket, since some processes may assume the low descriptor numbers (3, 4, 5) are free and dup2 to them. And don't forget to allow more filedescriptors for the daemon and for the system in general.
Note that just polling the /proc filesystem you would probably miss the great number of processes that only live for split second. There are really many of them on unix.

Here is an outline of the solution that we came up with.
We created a program that read a configuration file of all possible applications that the system is able to monitor. This program read the configuration file and through a command line interface you was able to start or stop programs. The program itself stored a table in shared memory that marked applications as running or not. A interface that anybody could access could get the status of these programs. This program also had an alarm system that could either email/page or set off an alarm.
This solution does not require any changes to the kernel and is therefore a less painful solution.
Hope this helps.

Linux automatically restarting application on crash - Daemons

I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
If a crash occurs, how do I restart the application automatically?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?

You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.

The gist of it is:
You need to detect if the program is still running and not hung.
You need to (re)start the program if the program is not running or is hung.
There are a number of different ways to do #1, but two that come to mind are:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsan can help detect potential deadlock scenarios and other threading issues with static analysis.

You could create a CRON job to check if the process is running with start-stop-daemon from time to time.

use this script for running your application
#!/bin/bash
while ! /path/to/program #This will wait for the program to exit successfully.
do
echo “restarting” # Else it will restart.
done
you can also put this script on your /etc/init.d/ in other to start as daemon

kernel programming

Is there any other way to execute a program using kernel, other than shell and system calls?

It always used to be the case that there was really only one way to execute a program on Unix and its derivatives, and that was via the exec() system calls. The very first (kernel) process was created by the boot loader; all subsequent processes were created by fork() and exec(). Of course, fork() only created a copy of the original program; it was the exec() system call - in one of a number of forms in the C source code, but eventually equivalent to execve() - that did the donkey work of replacing the current process with a new image.
These days, there are mechanisms like posix_spawn() which might, or might not, use a separate system call to achieve roughly the same mechanism.

A lot of kernels has support for adding kernel modules or drivers at run time. If you want to execute some code from kernel space (probably because you need higher privileges), you can write a kernel module/driver of your own and load it to execute your code. However, inserting a driver only doesn't ensure that your code will be executed. Based on your driver implementation, you will have to have some triggering mechanism for executing your code in kernel space.

yeah, you can compile your kernel with your program sourced in it, but it won't be the smartest thing to do.

Every program is internally executed by Kernel. If you are looking for running kernel module, you have to use the system calls to reach that module and perform some work for you in Kernel mode. Kernel is event driven and only system calls trigger execution of its modules (apart from some system events like network packet received)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight