I have a library that I inject into running processes using ptrace. I used this library many times in different processes without problems.
Now I want to inject into a running process that is executing a select syscall (waiting for a set of fd). After debugging I noticed that if I inject the library before the process arrives to the select, it works as expected. However if the process arrives to the select, it is impossible to inject the library.
My code injects the library and sends a SIGTRAP to ptrace in order to know if it was loaded. In all cases it works, but when process is in select I receive a SIGSEGV.
According to SIGSEV, it should be a problem accessing a wrong part of the memory, but I really doubt this is the problem as it only fails when process sleeps in select.
Is there any known issue with ptrace when process is in select?
Another interesting fact is, after receiving the SIGNALSEGV, I resume the process with the initial state when it got attached and it continues sleeping in select without any problem. I spent several days debugging the SIGSEGV and looking how select works but I can not find the solution. Any ideas or help will be appreciated.
Related
I'm looking to develop a "secure" application and as a security mitigation, I'd like to be able to discover if a debugger (GDB, LLDB...) is in use on the currently running application; aborting if detected.
How can I detect monitoring of a statically-linked C application?
Walk the /proc tree
...
Just a crazy idea - load BPF program (assuming your binary has a capability to do it) to intercept ptrace syscall from process parent, and check if pid of process being traced match your process' pid, then you can either fail the syscall, preventing the debug, and send and event to userspace to stop your process.
Although it won't work for attached process, so you'd need to intercept ptrace from all processes, I'm not sure BPF allows it, don't remember.
Another crazy idea - tracer expects SIGTRAPs from tracee on each breakpoint/step, so you can catch this signal from your process, again using BPF, and do something about it. But again it is based on the assumption that tracer doesn't know about it.
You can't. Software can not detect if it runs in a perfect emulation or in the real world. And a emulator can be stopped, the software can be analyzed, variables can be changed, basically everything can be done what can be done in a debugger.
Lets say you want to detect if the parent process is a debugger. So you make a system call to get the parent PID? The debugger can intercept the system call and return any PID which does not have to be the real PID. You want to intercept every SIGTRAP so the debugger can't use it anymore? Well the debugger can just stop in this case and send the SIGTRAP also to your process.
You want to measure the time when you send SIGTRAP to know if the the process stops for a short time by the debugger for sending SIGTRAP so you know when there is a debugger? The debugger can replace your calls to get the time and return a fake time. Lets say you run on a Processor that has a instruction that returns the time, so no function call is needed to get the time. Now you can know that the time you are getting is real? No, the debugger can replace this instruction with a SIGTRAP instruction and return any time he wants or in case such a instruction does not exist, run the Software in a emulator that can be programmed in any way. Everything you can come up with to detect a debugger or emulator can be faked by the environment and you have 0 change to detect it.
The only way to stop debugging is by not giving the software to the customers but keep it in your hands. Make a cloud service and run the software on your server. In this case the customer can not debug your program since he does not run it and has no control over it. Except the customer can access the server or the data somehow, but that is a different story.
I am writing a server application with one connection at a time, I receive a TCP request which has symbol names of function and name of shared libraray.
My server needs to load the shared library using the dlsym system call and call the function using symbol name received.
Right now loading the shared lib and executing the function I am doing in separate thread. My doubt is when thread crashed due to segmentation fault or any signals will my process gets affected ?
Which one is better whether to run in separate thread or process.
Please ask me question If my question is not clear.
A crash in a thread takes down the whole process. And you probably wouldn't want it any other way since a crash signal (like SIGSEGV, SIGBUS, SIGABRT) means that you lost control over the behavior of the process and anything could have happened to its memory.
So if you want to isolate things, spawning separate processes is definitely better. Of course, if someone can make your process crash it's pretty close to them owning your computer anyway. I sure hope that you don't intend to expose this to untrusted users.
Some background:
I have an application that relies on third party hardware and a closed source driver. The driver currently has a bug in it that causes the device to stop responding after a random period of time. This is caused by an apparent deadlock within the driver and interrupts proper functioning of my application, which is in an always-on 24/7 highly visible environment.
What I have found is that attaching GDB to the process, and immediately detaching GDB from the process results in the device resuming functionality. This was my first indication that there was a thread locking issue within the driver itself. There is some kind of race condition that leads to a deadlock. Attaching GDB was obviously causing some reshuffling of threads and probably pushing them out of their wait state, causing them to re-evaluate their conditions and thus breaking the deadlock.
The question:
My question is simply this: is there a clean wait for an application to trigger all threads within the program to interrupt their wait state? One thing that definitely works (at least on my implementation) is to send a SIGSTOP followed immediately by a SIGCONT from another process (i.e. from bash):
kill -19 `cat /var/run/mypidfile` ; kill -18 `cat /var/run/mypidfile`
This triggers a spurious wake-up within the process and everything comes back to life.
I'm hoping there is an intelligent method to trigger a spurious wake-up of all threads within my process. Think pthread_cond_broadcast(...) but without having access to the actual condition variable being waited on.
Is this possible, or is relying on a program like kill my only approach?
The way you're doing it right now is probably the most correct and simplest. There is no "wake all waiting futexes in a given process" operation in the kernel, which is what you would need to achieve this more directly.
Note that if the failure-to-wake "deadlock" is in pthread_cond_wait but interrupting it with a signal breaks out of the deadlock, the bug cannot be in the application; it must actually be in the implementation of pthread condition variables. glibc has known unfixed bugs in its condition variable implementation; see http://sourceware.org/bugzilla/show_bug.cgi?id=13165 and related bug reports. However, you might have found a new one, since I don't think the existing known ones can be fixed by breaking out of the futex wait with a signal. If you can report this bug to the glibc bug tracker, it would be very helpful.
I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
If a crash occurs, how do I restart the application automatically?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?
You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.
The gist of it is:
You need to detect if the program is still running and not hung.
You need to (re)start the program if the program is not running or is hung.
There are a number of different ways to do #1, but two that come to mind are:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsan can help detect potential deadlock scenarios and other threading issues with static analysis.
You could create a CRON job to check if the process is running with start-stop-daemon from time to time.
use this script for running your application
#!/bin/bash
while ! /path/to/program #This will wait for the program to exit successfully.
do
echo “restarting” # Else it will restart.
done
you can also put this script on your /etc/init.d/ in other to start as daemon
I have a TCP Svr process written in C and running on CentOS 5.5. It acts as a TCP Server for external clients and also does some IPC communication with other processes in the system using Unix Domain Sockets it has establised. It's not a multi threaded process. It does one task at a time. There's an epoll_wait() I use to listen for requests on either the TCP socket or any of the IPC sockets it has established with internal processes. When the epoll_wait() function breaks,I process the request for whoever it is and then go back into epoll_wait()
I have a TCP Client that connects to this Process from outside (not IPC). It connects sucessfully, sends a request msg, gets a response back. I've put this in an infinite loop
just to test out its robustness etc.
After a while, the TCP Server stops responding to requests coming from TCP Client. The TCP client connects successfully, sends a request message, but it doesnt get any response msg back from the TCP server.
So I reckon the TCP server is stuck somewhere else, trying to do something and has not returned to the epoll_wait() to process
other requests coming in. I've tried to figure it out using logs, but thats not helping me understand where exactly the process is stuck.
So I wanted to use any debugger that can give me some information (function name would be great), as to what the process is doing. Putting breakpoints, is overwhelming cause the TCP Server process has tons of files and functions....
I'm trying to use DDD on CentOS 5.5, to figureout whats going on. I attach to the process successfully. Then I click on "Step" or "Stepi" or "Next" button....
but nothing happens....
btw when I use Eclipse for debugging, and attach to this process (or any process), I always get "__kernel_vsyscall()"....Does this mean, the program breaks by default at
whatever its doing? If thats the case, how do I come out of the __kernel_vsyscall() call, to continue within my program? If I press f8, it comes out, but then I dont know where it was, since I loose the stack trace....Like I said earlier. Since I cant figure where it was, I dont know where to put breakpoint....
In summary, I want to figureout where my process is stuck or what its doing and try to debug from that point on....
How do I go about this?
Thanks
Amit
1) Attaching to a C process can often cause problems in itself, is there any way for you to start the process in the debugger?
2) Using the step functions of DDD need to be done after you've set a breakpoint and the program is stopped on a command. From reading your question, I'm not sure you've done that. You may not want to set many breakpoints, but is setting one or two in critical sections of code possible?
In summary, What I wanted to accomplish was to be able to find where my program is stuck, when it hangs. I figured it out - It was so simple. Create a configuration in Eclipse ...."Debug Configurations->C/C++ attach to application"...
Let the process run normally from shell (preferably with a terminal attached). When it hangs, open eclipse, click on the debug icon and run the configured process. It'll ask you to attach to a process. Look for your process name and attach to it.
Now, just look at the entire stack trace....you'll see some of your own function calls mixed with kernel function calls. That tells you where the program is stuck.