I have two different way to check whether a process is still up and running:
1) using GetExitCodeProcess()
2) walking the list of processes using CreateToolhelp32Snapshot() and checking PIDs
now, in both cases I'm still getting that a process that I terminated with TerminateProcess is till alive even tho it is not.
Is there a way to positively know whether a process is still alive or dead passing the PID?
thanks!
Don't use PID for something like this. PIDs are reused and have a very narrow range, with a very high collision probability. In other words, you will find a running process but will be a different process.
A call to GetExitCodeProcess should return STILL_ACTIVE for active processes. After a call to TerminateProcess, the process will be dead, and a different value will be returned.
Another way to check if a process is alive is WaitForSingleObject. If you call this on the process handle with a timeout of 0, it will immediately return WAIT_TIMEOUT if the process is still running.
You cannot assume a low level API call functions the way it seems or how you think it should function from its name or high level description. A kernel still has things to do and often calls are just requests to the kernel and there are a multitude of things a kernel needs to do (depending on implementation) before it will actually release the PID. In this case after you issue the call you may assume the process is dead, however the kernel still has to clean up.
From MSDN :
The TerminateProcess function is use
to unconditionally cause a process to
exit. The state of global data
maintained by dynamic-link libraries
(DLLs) may be compromised if
TerminateProcess is used rather than
ExitProcess.
TerminateProcess initiates termination
and returns immediately. This stops
execution of all threads within the
process and requests cancellation of
all pending I/O. The terminated
process cannot exit until all pending
I/O has been completed or canceled.
A process cannot prevent itself from
being terminated.
Could you make use of the Process Status API? There are functions for enumerating all running processes on a system - this could help you.
Related
The waiting works fine with pidfd_open and poll.
The problem I’m facing, after the process quits, apparently the poll() API removes the information about the now dead process, so the waitid with P_PIDFD argument fails at once saying code 22 “Invalid argument”
I don’t think I can afford launching a thread for every child process to sleep on the blocking waitpid, I have multiple processes, and another handles which aren’t processes I need to poll efficiently.
Any workarounds?
If it matters, I only need to support Linux 5.13.12 and newer running on ARM64 and ARMv7 CPUs.
The approximate sequence of kernel calls is following:
fork
In the child: setresuid, setresgid, execvpe
In the new child: printf, sleep, _exit
Meanwhile in the parent: pidfd_open, poll, once completed waitid with P_PIDFD first argument.
Expected result: waitid should give me the exit code of the child.
Actual result: it does nothing and sets errno to EINVAL
There is one crucial bit. From man waitid:
Applications shall specify at least one of the flags WEXITED, WSTOPPED, or WCONTINUED to be OR'ed in with the options argument.
I was passing was WNOHANG
And you want to pass WNOHAND | WEXITED ;)
You can use a single reaper thread, looping on waitpid(-1, &status, 0). Whenever it reaps a child process, it looks it up in the set of current child processes, handles possible notifications (semaphore or callback), and stores the exit status.
There is one notable situation that needs special consideration: the child process may exit before fork() returns in the parent process. This means it is possible for the reaper to see a child process exiting before the code that did the fork() manages to register the child process ID in any data structure. Thus, both the reaper and the fork() registering functions must be ready to look up or create the record in the data store keeping track of child processes; including calling the callback or posting the semaphore. It is not complicated at all, but unless you are used to thinking in asynchronous terms, it is easy to miss these corner cases.
Because wait(...)/waitpid(-1,...) returns immediately when there are no child processes to wait for (with -1 and errno set to ECHILD), the reaper thread should probably wait on a condition variable when there are no child processes to wait for, with the code that registers the child process ID signaling on that condition variable to minimize resource use in the no-child-processes case. (Also, do remember to minimize the reaper thread stack size, as it is unreasonably large (order of 8 MiB) by default, and wastes resources. I often use 2*PTHREAD_STACK_MIN, myself.)
I know similar questions have been asked, but I think my situation is little bit different. I need to check if child thread is alive, and if it's not print error message. Child thread is supposed to run all the time. So basically I just need non-block pthread_join and in my case there are no race conditions. Child thread can be killed so I can't set some kind of shared variable from child thread when it completes because it will not be set in this case.
Killing of child thread can be done like this:
kill -9 child_pid
EDIT: alright, this example is wrong but still I'm sure there exists way to kill a specific thread in some way.
EDIT: my motivation for this is to implement another layer of security in my application which requires this check. Even though this check can be bypassed but that is another story.
EDIT: lets say my application is intended as a demo for reverse engineering students. And their task is to hack my application. But I placed some anti-hacking/anti-debugging obstacles in child thread. And I wanted to be sure that this child thread is kept alive. As mentioned in some comments - it's probably not that easy to kill child without messing parent so maybe this check is not necessary. Security checks are present in main thread also but this time I needed to add them in another thread to make main thread responsive.
killed by what and why that thing can't indicate the thread is dead? but even then this sounds fishy
it's almost universally a design error if you need to check if a thread/process is alive - the logic in the code should implicitly handle this.
In your edit it seems you want to do something about a possibility of a thread getting killed by something completely external.
Well, good news. There is no way to do that without bringing the whole process down. All ways of non-voluntary death of a thread kill all threads in the process, apart from cancellation but that can only be triggered by something else in the same process.
The kill(1) command does not send signals to some thread, but to a entire process. Read carefully signal(7) and pthreads(7).
Signals and threads don't mix well together. As a rule of thumb, you don't want to use both.
BTW, using kill -KILL or kill -9 is a mistake. The receiving process don't have the opportunity to handle the SIGKILL signal. You should use SIGTERM ...
If you want to handle SIGTERM in a multi-threaded application, read signal-safety(7) and consider setting some pipe(7) to self (and use poll(2) in some event loop) which the signal handler would write(2). That well-known trick is well explained in Qt documentation. You could also consider the signalfd(2) Linux specific syscall.
If you think of using pthread_kill(3), you probably should not in your case (however, using it with a 0 signal is a valid but crude way to check that the thread exists). Read some Pthread tutorial. Don't forget to pthread_join(3) or pthread_detach(3).
Child thread is supposed to run all the time.
This is the wrong approach. You should know when and how a child thread terminates because you are coding the function passed to pthread_create(3) and you should handle all error cases there and add relevant cleanup code (and perhaps synchronization). So the child thread should run as long as you want it to run and should do appropriate cleanup actions when ending.
Consider also some other inter-process communication mechanism (like socket(7), fifo(7) ...); they are generally more suitable than signals, notably for multi-threaded applications. For example you might design your application as some specialized web or HTTP server (using libonion or some other HTTP server library). You'll then use your web browser, or some HTTP client command (like curl) or HTTP client library like libcurl to drive your multi-threaded application. Or add some RPC ability into your application, perhaps using JSONRPC.
(your putative usage of signals smells very bad and is likely to be some XY problem; consider strongly using something better)
my motivation for this is to implement another layer of security in my application
I don't understand that at all. How can signal and threads add security? I'm guessing you are decreasing the security of your software.
I wanted to be sure that this child thread is kept alive.
You can't be sure, other than by coding well and avoiding bugs (but be aware of Rice's theorem and the Halting Problem: there cannot be any reliable and sound static source code program analysis to check that). If something else (e.g. some other thread, or even bad code in your own one) is e.g. arbitrarily modifying the call stack of your thread, you've got undefined behavior and you can just be very scared.
In practice tools like the gdb debugger, address and thread sanitizers, other compiler instrumentation options, valgrind, can help to find most such bugs, but there is No Silver Bullet.
Maybe you want to take advantage of process isolation, but then you should give up your multi-threading approach, and consider some multi-processing approach. By definition, threads share a lot of resources (notably their virtual address space) with other threads of the same process. So the security checks mentioned in your question don't make much sense. I guess that they are adding more code, but just decrease security (since you'll have more bugs).
Reading a textbook like Operating Systems: Three Easy Pieces should be worthwhile.
You can use pthread_kill() to check if a thread exists.
SYNOPSIS
#include <signal.h>
int pthread_kill(pthread_t thread, int sig);
DESCRIPTION
The pthread_kill() function shall request that a signal be delivered
to the specified thread.
As in kill(), if sig is zero, error checking shall be performed
but no signal shall actually be sent.
Something like
int rc = pthread_kill( thread_id, 0 );
if ( rc != 0 )
{
// thread no longer exists...
}
It's not very useful, though, as stated by others elsewhere, and it's really weak as any type of security measure. Anything with permissions to kill a thread will be able to stop it from running without killing it, or make it run arbitrary code so that it doesn't do what you want.
When I call kill() on a process, it returns immediately, because it just send a signal. I have a code where I am checking some (foreign, not written nor modifiable by me) processes in a loop infinitely and if they exceed some limits (too much ram eaten etc) it kills them (and write to a syslog etc).
Problem is that when processes are heavily swapped, it takes many seconds to kill them, and because of that, my process executes the same check against same processes multiple times and attempts to send the signal many times to same process, and write this to syslog as well. (this is not done on purpose, it's just a side effect which I am trying to fix)
I don't care how many times it send a signal to process, but I do care how many times it writes to syslog. I could keep a list of PID's that were already sent the kill signal, but in theory, even if there is low probability, there could be another process spawned with same pid as previously killed one had, which might also be supposed to be killed and in this case, the log would be missing.
I don't know if there is unique identifier for any process, but I doubt so. How could I kill a process either synchronously, or keep track of processes that got signal and don't need to be logged again?
Even if you could do a "synchronous kill", you still have the race condition where you could kill the wrong process. It can happen whenever the process you want to kill exits by its own volition, or by third-party action, after you see it but before you kill it. During this interval, the PID could be assigned to a new process. There is basically no solution to this problem. PIDs are inherently a local resource that belongs to the parent of the identified process; use of the PID by any other process is a race condition.
If you have more control over the system (for example, controlling the parent of the processes you want to kill) then there may be special-case solutions. There might also be (Linux-specific) solutions based on using some mechanisms in /proc to avoid the race, though I'm not aware of any.
One other workaround may be to use ptrace on the target process as if you're going to debug it. This allows you to partially "steal" the parent role, avoiding invalidation of the PID while you're still using it and allowing you to get notification when the process terminates. You'd do something like:
Check the process info (e.g. from /proc) to determine that you want to kill it.
ptrace it, temporarily stopping it.
Re-check the process info to make sure you got the process you wanted to kill.
Resume the traced process.
kill it.
Wait (via waitpid) for notification that the process exited.
This will make the script wait for process termination.
kill $PID
while [ kill -0 $PID 2>/dev/null ]
do
sleep 1
done
kill -0 [pid] tests the existence of a process
The following solution works for most processes that aren't debuggers or processes being debugged in a debugger.
Use ptrace with argument PTRACE_ATTACH to attach to the process. This stops the process you want to kill. At this point, you should probably verify that you've attached to the right process.
Kill the target with SIGKILL. It's now gone.
I can't remember whether the process is now a zombie that you need to reap or whether you need to PTRACE_CONT it first. In either case, you'll eventually have to call waitpid to reap it, at which point you know it's dead.
If you are writing this in C you are sending the signal with the kill system call. Rather than repeatedly sending the terminating signal just send it once and then loop (or somehow periodically check) with kill(pid, 0); The zero value of signal will just tell you if the process is still alive and you can act appropriately. When it dies kill will return ESRCH.
when you spawn these processes, the classical waitpid(2) family can be used
when not used anywhere else, you can move the processes going to be killed into an own cgroup; there can be notifiers on these cgroups which get triggered when process is exiting.
to find out, whether process has been killed, you can chdir(2) into /proc/<pid> or open(2) this directory. After process termination, the status files there can not be accessed anymore. This method is racy (between your check and the action, the process can terminate and a new one with the same pid be spawned).
I know there is one for multi processes
waitpid(-1,WNOHANG,NULL)
that is non-blocking function call to check if there is any child process currently working on
But is there any similar lib function to check for multithread?
All i want to do is check if there is any thread currently on, if not reset some global values.
that is non-blocking function call to check if there is any child process currently working on
Wrong. That is a call to check if there is any child process not yet terminated. And it not only checks but also reaps a terminated child, if any. Children might be otherwise in any possible state, like hanging in a deadlock (what on my book is far from being working).
All i want to do is check if there is any thread currently on, if not reset some global values.
Probably you should post here as a question why you want to do it. It sounds that you do something terribly wrong.
If you do not do already pthread_join() for your threads, that means that your threads already do pthread_detach(). If you had no problems adding to your threads pthread_detach() I think there would be no problem to add some extra code to threads to identify that they have (almost) terminated (e.g. sem_post()) so that main() can notice that a thread had terminated (e.g. by calling sem_trylock()).
If portability isn't a requirement, then one can also try query OS number of threads of the process periodically.
Though it is still IMO wrong to have in a program some threads, with undefined life cycle, without any proper sync with main thread.
You could just save the handle of a thread and have a function to check if it is still running. I'm not sure if theres a function but this should work.
pthread_kill(pid, 0) where pid is the thread id that pthread_create has returned can tell you if a thread is still alive. (That is how I understand your question)
It returns 0 if the thread is still alive and an error code otherwise.
I asked myself something quite similar:
POSIX API call to list all the pthreads running in a process
In your case I would just wrapped up ps -eLF.
I'm working on a program which uses shared memory. Multiple instances of said program will either connect to an existing one or create it anew, and give it back to OS when there are no other processes or just detach it and terminate. I thought of using a simple counter to keep track of how many processes use it.
I'm using atexit() function to do the cleanup, however, afaik, upon receiving SIGKILL signal, processes won't do any cleanup, so if any of those processes don't terminate normally, I might never be able to clean the memory.
Is there a way to specify what to do even after a SIGKILL signal? I'm probably going to write some mechanism similar to a timer to check if processes are still alive, but I'd really like to avoid it if there is another way.
No, SIGKILL cannot be caught in any way by your application - if it could, the application could ignore it, which would defeat its purpose.
You can't catch SIGKILL.
However: you can still do cleanup, provided that cleanup is done by another process. There's lots of strategies you can go with here to let your housekeeping process see your other processes appear and disappear.
For example: you could have a Unix domain socket in a known location, which the housekeeper listens to; each slave process opens the socket to indicate it's using the shared memory segment. When a slave exits, for whatever reason, the socket will get closed. The housekeeper can see this happen and can do the cleanup.
Combined with shared memory, robust mutexes located in the shared memory segment would be a great tool. If a process dies while holding a lock on a robust mutex, the next process to attempt locking it will get EOWNERDEAD and can perform the cleanup the original owner should have performed.