debugging a running daemon with GDB - c

I want to debug a running daemon with the GDB. I have the process id of respective process. I typed : gdb attach process id info threads
I am getting the list of threads. * one is the current one running thread (correct me if I am wrong)
Now I am sending :
systemctl kill daemonname (this command I am running on some other terminal)
now I want to check that after hitting this command which thread is getting run.
since my daemon is getting stuck. it is not killing properly. I tried with the service-name status command. Since it is getting stuck after the above " servicename status " command. I want to trace the last thread that is getting stuck since my daemon is still not killed but running the command for checking the status of that daemon is stucked, no output is coming on output. while it should show that the service is not running or command not found if it gets killed properly.
Please help me.. i am new

I tried with the service-name status command.
You may be holding it wrong.
After you execute systemctl kill daemonname, you want to attach GDB to the process and see where it is stuck (use thread apply all where).
You will likely see that your threads are deadlocked (e.g. thread T1 is waiting for mutex A, which is held by thread T2; thread T2 is waiting for mutex B, held by thread T1).
I want to trace the last thread that is getting stuck
In general, tracing multithreaded processes is a fools errand, because the fact that you are tracing the process changes the execution environment and often causes it to no longer match execution without tracing.
Instead you should think about invariants, and make sure they are not violated.

Related

ptrace failing when process is in select syscall

I have a library that I inject into running processes using ptrace. I used this library many times in different processes without problems.
Now I want to inject into a running process that is executing a select syscall (waiting for a set of fd). After debugging I noticed that if I inject the library before the process arrives to the select, it works as expected. However if the process arrives to the select, it is impossible to inject the library.
My code injects the library and sends a SIGTRAP to ptrace in order to know if it was loaded. In all cases it works, but when process is in select I receive a SIGSEGV.
According to SIGSEV, it should be a problem accessing a wrong part of the memory, but I really doubt this is the problem as it only fails when process sleeps in select.
Is there any known issue with ptrace when process is in select?
Another interesting fact is, after receiving the SIGNALSEGV, I resume the process with the initial state when it got attached and it continues sleeping in select without any problem. I spent several days debugging the SIGSEGV and looking how select works but I can not find the solution. Any ideas or help will be appreciated.

What is Apache Flink's detached mode?

I saw this line in Flink documentation but can't figure out what 'detached mode' means. Please help. Thanks.
Run example program in detached mode:
./bin/flink run -d ./examples/batch/WordCount.jar
The Flink CLI runs jobs either in blocking or detached mode. In blocking mode, the CliFrontend (client) process keeps running, blocked, waiting for the job to complete -- after which it will print out some information. In the example below I ran a streaming job, which I cancelled from the WebUI after a few seconds:
$ flink run target/oscon-1.0-SNAPSHOT.jar
Starting execution of program
Program execution finished
Job with JobID b02da01c30585bfbc86a23446559987f has finished.
Job Runtime: 8673 ms
If you run in blocking mode, you can kill the CliFrontend (e.g., with ctrl-C) if you like, and the job will be unaffected, so long as it has run far enough to have submitted the job to the cluster.
In detached mode, the CliFrontend submits the job to the cluster and then exits straight away.
That means that the application is not attached (or bound) to your shell session. So if you close your terminal the application will still keep running (until it finished its work). For a batch example that might not be a big problem - they will process the given batch of data and end afterwards. As soon as you skip to a streaming approach the operations will take place on an "infinite stream of data" and have no defined end.
Hope that helps.

How to tell when my Windows app is being terminated?

Is there any way my Windows program (C/C++) can receive a notification when it is being killed from Taskmgr.exe? It does not appear to receive any special Windows Messages - it just terminates.
I don't want to stop it from terminating, I just want to write a notification of some kind that it was manually terminated.
Thanks.
If it's a full windows app, you should get WM_QUIT in your message pump right before the application quits.
As MSDN states: http://msdn.microsoft.com/en-us/library/windows/desktop/ms632641(v=vs.85).aspx
This isn't posted to a window's message queue, you can only retrieve it in your main message pump.
This is only when it quits cleanly. If the process is killed, this never happens.
A way you can detect it being killed on next launch, is to have a file be created on start-up and destroyed on shutdown, If the file still exists on the next start up you know that the process was killed, but not whether it was killed due to an error or because it was killed at a users request.
If you need to know immediately when your process is killed the only way I know of is to use another process as a watchdog. If you use OpenProcess() to get a handle to the process in question, you can wait on that handle (via WaitForSingleObject or similar), and the handle will be signalled when the process terminates. You'll need to do some coordination with the target process in order to track whether the shutdown was clean or forcible.

How to debug a multithreaded hung process in Linux?

A multithreaded application hangs and it is not responding to any commands. I have tried following things without luck:
Attach a process to gdb (error: (gdb) attach 6026
Attaching to process 6026
ptrace: Operation not permitted.)
gstack (gstack just hangs like that)
Is there any good way to debug this process?
Thanks for all your response. The problem is at kernel level. we used echo t > /proc/sysrq-trigger, which logs the stack of all the running process in /var/log/messages. This stack trace helped to analyze the problem.
From the stack trace, file system posted some waited event on behalf of the application process to other process (which is in defunct state) and waiting for the response indefinitely. Which results in hung state.
Most likely somebody else already tracing this process. To find out who is doing it look at proc file system.
cat /proc/6026/status|grep TracerPid

Linux automatically restarting application on crash - Daemons

I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
If a crash occurs, how do I restart the application automatically?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.
The gist of it is:
You need to detect if the program is still running and not hung.
You need to (re)start the program if the program is not running or is hung.
There are a number of different ways to do #1, but two that come to mind are:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsan can help detect potential deadlock scenarios and other threading issues with static analysis.
You could create a CRON job to check if the process is running with start-stop-daemon from time to time.
use this script for running your application
#!/bin/bash
while ! /path/to/program #This will wait for the program to exit successfully.
do
echo “restarting” # Else it will restart.
done
you can also put this script on your /etc/init.d/ in other to start as daemon

Resources