How to debug a multithreaded hung process in Linux? - c

A multithreaded application hangs and it is not responding to any commands. I have tried following things without luck:
Attach a process to gdb (error: (gdb) attach 6026
Attaching to process 6026
ptrace: Operation not permitted.)
gstack (gstack just hangs like that)
Is there any good way to debug this process?

Thanks for all your response. The problem is at kernel level. we used echo t > /proc/sysrq-trigger, which logs the stack of all the running process in /var/log/messages. This stack trace helped to analyze the problem.
From the stack trace, file system posted some waited event on behalf of the application process to other process (which is in defunct state) and waiting for the response indefinitely. Which results in hung state.

Most likely somebody else already tracing this process. To find out who is doing it look at proc file system.
cat /proc/6026/status|grep TracerPid

Related

Detect if debugger is in use during runtime

I'm looking to develop a "secure" application and as a security mitigation, I'd like to be able to discover if a debugger (GDB, LLDB...) is in use on the currently running application; aborting if detected.
How can I detect monitoring of a statically-linked C application?
Walk the /proc tree
...
Just a crazy idea - load BPF program (assuming your binary has a capability to do it) to intercept ptrace syscall from process parent, and check if pid of process being traced match your process' pid, then you can either fail the syscall, preventing the debug, and send and event to userspace to stop your process.
Although it won't work for attached process, so you'd need to intercept ptrace from all processes, I'm not sure BPF allows it, don't remember.
Another crazy idea - tracer expects SIGTRAPs from tracee on each breakpoint/step, so you can catch this signal from your process, again using BPF, and do something about it. But again it is based on the assumption that tracer doesn't know about it.
You can't. Software can not detect if it runs in a perfect emulation or in the real world. And a emulator can be stopped, the software can be analyzed, variables can be changed, basically everything can be done what can be done in a debugger.
Lets say you want to detect if the parent process is a debugger. So you make a system call to get the parent PID? The debugger can intercept the system call and return any PID which does not have to be the real PID. You want to intercept every SIGTRAP so the debugger can't use it anymore? Well the debugger can just stop in this case and send the SIGTRAP also to your process.
You want to measure the time when you send SIGTRAP to know if the the process stops for a short time by the debugger for sending SIGTRAP so you know when there is a debugger? The debugger can replace your calls to get the time and return a fake time. Lets say you run on a Processor that has a instruction that returns the time, so no function call is needed to get the time. Now you can know that the time you are getting is real? No, the debugger can replace this instruction with a SIGTRAP instruction and return any time he wants or in case such a instruction does not exist, run the Software in a emulator that can be programmed in any way. Everything you can come up with to detect a debugger or emulator can be faked by the environment and you have 0 change to detect it.
The only way to stop debugging is by not giving the software to the customers but keep it in your hands. Make a cloud service and run the software on your server. In this case the customer can not debug your program since he does not run it and has no control over it. Except the customer can access the server or the data somehow, but that is a different story.

debugging a running daemon with GDB

I want to debug a running daemon with the GDB. I have the process id of respective process. I typed : gdb attach process id info threads
I am getting the list of threads. * one is the current one running thread (correct me if I am wrong)
Now I am sending :
systemctl kill daemonname (this command I am running on some other terminal)
now I want to check that after hitting this command which thread is getting run.
since my daemon is getting stuck. it is not killing properly. I tried with the service-name status command. Since it is getting stuck after the above " servicename status " command. I want to trace the last thread that is getting stuck since my daemon is still not killed but running the command for checking the status of that daemon is stucked, no output is coming on output. while it should show that the service is not running or command not found if it gets killed properly.
Please help me.. i am new
I tried with the service-name status command.
You may be holding it wrong.
After you execute systemctl kill daemonname, you want to attach GDB to the process and see where it is stuck (use thread apply all where).
You will likely see that your threads are deadlocked (e.g. thread T1 is waiting for mutex A, which is held by thread T2; thread T2 is waiting for mutex B, held by thread T1).
I want to trace the last thread that is getting stuck
In general, tracing multithreaded processes is a fools errand, because the fact that you are tracing the process changes the execution environment and often causes it to no longer match execution without tracing.
Instead you should think about invariants, and make sure they are not violated.

Call I call a function to self reset?

Let's say I get myself into a situation where I do not know how to recover. What would be the best way to self restart the process? What I'm looking for is something similar to closing itself and launching itself again. On some arduino's I can call NVIC_SystemReset however I'd like a function for windows, mac and linux.
I was thinking perhaps the only way is to execute a detached process and let myself shut down? With shellexecute on windows and execl on linux?
As mentioned by #Lundin, this is for processor not micro-controller.
This answer might not be correct in your case.
1) Create a proxy process to delegate to main process.
2) This proxy process will redirect to main process.
3) If your main process fails/return due to any reason then restart the main process. Otherwise if it normal exit of main program then end the proxy process.

Debugging Linux process hangs, which code is it running?

I have a process running on a very weak Linux embedded device, which could not run gdb / gdb server on itself. I let it provoking a function X from a shared library repeatedly (there are also some others process calling it at the same time with much less frequency), it usually hangs somewhere inside the shared library after 1 day or a half-day. How do I debug:
In case it blocked somewhere: which is the last line of code it ran?
In case it stuck in an infinite loop: which lines of code it running?
What I tried:
I dig into the shared library and put a lot of syslog inside to check. However, with a very high amount of syslog being called constantly, my process now hangs every 2-5 minutes. I think syslog is blocked by UNIX socket?
gdb comes with a program called gcore, which will generate a core file from the running process.
Many systems nowadays disable core files by default (ulimit -c in a shell will show 0). Use the ulimit -c unlimited shell command, then run your process in the same shell (these limits are inherited from the parent process. If you start your process some other way than directly from the shell, you will need to find out how to set them there. e.g., LimitCORE= in a systemd unit file).
Once your process gets into the bad state, run gcore on its process ID. You can then copy it to your workstation and load it into gdb (gdb <executable> <core-file>). You can then view the stack trace and other state as of the moment the core dump was taken.

Linux automatically restarting application on crash - Daemons

I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
If a crash occurs, how do I restart the application automatically?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.
The gist of it is:
You need to detect if the program is still running and not hung.
You need to (re)start the program if the program is not running or is hung.
There are a number of different ways to do #1, but two that come to mind are:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsan can help detect potential deadlock scenarios and other threading issues with static analysis.
You could create a CRON job to check if the process is running with start-stop-daemon from time to time.
use this script for running your application
#!/bin/bash
while ! /path/to/program #This will wait for the program to exit successfully.
do
echo “restarting” # Else it will restart.
done
you can also put this script on your /etc/init.d/ in other to start as daemon

Resources