I have a program developed in C. This program contains 2 sub threads. Some times, When I try to stop my application with kill -15 <pid of main thread> the application does not exit. And I can see only the pid of the main thread in the ps aux output (The pids of the subthreads are not displayed in the outpout of the ps aux). And keep killing the remaining pid with kill -15 <pid> does not cause the termination of this process. Only kill -9 <pid> will cause the termination of the process.
This behaviour happens 3 times in 1000 tries.
The OS is OpenWRT Linux
The kernel version is 2.6.30
Libs: libuClibc-0.9.30.1.so and libpthread-0.9.30.1.so
Please do not consider this topic duplicated with this one, because my program does not contain sigaction handler.
It's not duplicate, but the answer is the same. Attach strace or gdb and see what it's doing when it's hung. However there are only two explanations: either you (or some library code you're using) blocked SIGTERM with sigprocmask, or the process is stuck in uninterruptable sleep in the kernel, which is usually a result of attempting to access a failing storage device like a dying hard drive or scratched optical disc.
Could you elaborate on what OS, kernel version, libraries, etc. you're using?
Related
I'm writing a Linux shell-like program in C.
Among others, I'm implementing two built-in commands: jobs, history.
In jobs, I print the list of currently working commands (in the background).
In history I print the list of all commands history until now, specifying for each command if it's RUNNING or DONE.
To implement the two, my idea was to have a list of commands, mapping the command name to their PID. Once the jobs/history command is called, I run through them, check which ones are running or done, and print accordingly.
I read online that the function: waitpid(pid, &status, WNOHANG), can detect from "PID" whether a process is still running or done, without stopping the process.
It works well, except for this:
When a program is alive, the function returns it.
When a program is done, the first time I call it returns done, and from there on, if called again with the same PID, it returns -1 (ERROR).
For example, it would look like this: (the & symbolizes background command)
$ sleep 3 &
$ jobs
sleep ALIVE
$ jobs (withing the 3 seconds)
sleep ALIVE
$ jobs (after 3 seconds)
sleep DONE
$ jobs
sleep ERROR
$ jobs
sleep ERROR
....
Also, these are not influenced by other command calls I might do before or after, it seems the behavior described above is independent of other commands.
I read online various reasons why waitpid might return -1, but I wasn't able to identify the reason in my case. Also, I tried looking for how to understand what type of waitpid error is it, but again unsuccessfully.
My questions are:
Why do you think this behavior is happening
If you have a solution (the ideal thing would it for it to keep returning DONE)
If you have a better idea of how to implement the jobs/history command is well accepted
One solution for this problem is that as soon as I get "DONE", I sign the command as DONE, and don't perform the waitid anymore on it before printing it. This would solve the issue, but I would remain in the dark as to WHY is this happening
You should familiarize yourself with how child processes are handled on Unix environments. In particular read about Zombie processes.
When a process dies, it enters a 'zombie' state, so that its PID is still reserved and uniquely identifies the now-dead process. A successful wait on a zombie process frees up the process descriptor and its PID. Consequently subsequent calls to wait on the same PID will fail cause there's no more process with that PID (unless a new process is allocated the same PID, in which case waiting on it would be a logical error).
You should restructure your program so that if a wait is successful and reports that a process is DONE, you record that information in your own data structure and never call wait on that PID again.
For comparison, once a process is done, bourne shell reports it one last time and then removes it from the list of jobs:
$ sleep 10 &
$ jobs
[1] + Running sleep 10
$ jobs
[1] + Running sleep 10
$ jobs
[1] Done sleep 10
$ jobs
$
I've recently had a problem with signals. I'd like to write a program in C which would print anything after a signal is sent to the process. For example: If I send SIGTERM to my process (which is simply running program), I want the program to print out for example, "killing the process denied" instead of killing the process. So how to do that? How to force process to catch and change the meaning of such signal. Also I have a question if there is any possibility to kill the init process (I know it's kind of a stupid question, but I was wondering how linux deals with such a signal, and how would it technically look if I type: sudo kill -9 1.
Don't use the signal handler to print. You can set a variable of type volatile sig_atomic_t instead, and have your main thread check this (see this example).
When your main thread has nothing else to do (which should be most of the time), let it block on a blocking function call (e.g. sleep()) that will wake up immediately when the signal is received (and set errno to EINTR).
C++ gotcha: Unlike the C sleep() function, std::this_thread::sleep_for() (in recent versions of glibc) does not wake up when a signal is received.
Regarding if it's possible to kill pid 1, see this question. The answer seems to be no, but I remember that Linux got very grumpy once I booted with init=/bin/bash and later exited this shell – had to hard reboot.
If you're looking for trouble, better kill pid -1.
Given a single processor virtual machine running lubuntu, I was wondering if it is possible to tie up the processor so that no other program can run any instructions.
For example, if program A and program B were to be run at nearly the same time, is it possible to set the priority of program A (in its source using the setpriority() function) to run before program B and then tie up the processor so that program B cannot execute?
You could call kill with SIGSTOP and a pid value of -1 to stop every process that you can (i.e., have permission to) stop other than init and the calling process, which, if you are root, should stop every process other than init and the process that calls kill.
You'd want to use a scripting language rather than the kill binary, as the kill binary would exit after sending the signal and not give the shell you ran the kill binary from would have been stopped, preventing you from launching your app.
E.g., in ruby, you could do,
#Broadcast the STOP signal
Process.kill(:STOP, -1)
#Run your process with the playground having been cleared
system('the_high_priority_app')
#Resume the stopped processes
Process.kill(:CONT, -1)
The above is a bit of a hack, though, and not very safe if you have many processes that do some IPC by sending the SIGSTOP and SIGCONT signals among themselvesj -- you could be sending SIGCONT to processes that had been stopped by other processes. You could get a list of processes that were stopped at the time of broadcasting the SIGSTOP signal and skip those when you broadcast the SIGCONT signal, but the set of sigstop processes could theoretically change between your scanning for them and your broadcasting of the SIGSTOP signal.
With the right priviliges it is possible to call 'sched_setscheduler' to give a process real time priority. Such a process will not be interrupted by ordinary processes or other real time processes with lower priority. Such real time processes will only lose CPU when they give it up by doing some call like sleep or waiting for IO. They will also be given the CPU back as soon as they are able to work again and the CPU is not needed by any real time process with higher priority.
Is there any way to make a program that cannot be interrupted (an uninterrupted program)? By that, I mean a process that can't be terminated by any signal, kill command, or any other key combinations in any System: Linux, windows etc.
First, I am interested to know whether it's possible or not. And if yes, upto what extend it is possible?
I mostly write code in C, C++, and python; but I don't know any of such command(s) available in these programming languages.
Is it possible with assembly language, & how ? Or in high level language c with embedded assembly code(inline assembly)?
I know some signals are catchable some are not like SIGKILL and SIGSTOP.
I remember, when I was use to work on Windows-XP, some viruses couldn't be terminated even from Task Manager. So I guess some solution is possible in low level languages. maybe by overriding Interrupt Vector Table.
Can we write an uninterrupted program using TSRs(Hooking)? Because TSR can only removed when the computer is rebooted or if the TSR is explicitly removed from memory. Am I correct?
I couldn't find any thing on Google.
Well, possibly one can write a program which doesn't respond for most signals like SIGQUIT, SIGHUP etc. - each kind of "kill" is actually a kind of signal sent to program by kernel, some signals means for the kernel that program is stuck and should be killed.
Actually the only unkillable program is kernel itself, even init ( PID 1 ) can be "killed" with HUP ( which means reload ).
Learn more about signal handling, starting with kill -l ( list signals ) command.
Regarding Windows ( basing on "antivirus" tag ) - which actually applies to linux too - if you just need to run some antivirus user is unable to skip/close, it's permission problem, I mean program started by system, and non-administrative user without permission to kill it, won't be able to close/exit it anyway. I guess lameusers on Windows all over the world would start "solving" any problems they have by trying to close antivirus first, just if it would be possible :)
On Linux, it is possible to avoid being killed by one of two ways:
Become init (PID 1). init ignores all signals that it does not catch, even normally unblockable ones like SIGSTOP and SIGKILL.
Trigger a kernel bug, and get your program stuck in D (uninterruptible wait) state.
For 2., one common way to end up in D state is to attempt to access some hardware that is not responding. Particularly on older versions of Linux, the process would become stuck in kernel mode, and not respond to any signals until the kernel gave up on the hardware (which can take quite some time!). Of course, your program can't do anything else while it's stuck like this, so it's more annoying than useful, and newer versions of Linux are starting to rectify this problem by dividing D state into a killable state (where SIGKILL works) and an unkillable state (where all signals are blocked).
Or, of course, you could simply load your code as a kernel module. Kernel modules can't be 'killed', only unloaded - and only if they allow themselves to be unloaded.
You can catch pretty-much any signal or input and stay alive through it, the main exception being SIGKILL. It is possible to prevent that from killing you, but you'd have to replace init (and reboot to become the new init). PID 0 is special on most Unixes, in that it's the only thing that can't be KILL'd.
Not long ago, I wondered about the question: why are all processes killed when you close a terminal on Linux, and not passed to the "init" process (with pid 1)?
Because, all child processes are adopted by "init" process after termination of the parent.
Please, help me understand difference and the errors in my reasoning.
And also:
If it's possible, then can we use a system call to stop this happening? I think, that for this the programs need use setsid(), but in practice it's not correct.
As explained by cnicutar, it's due to the SIGHUP sent to all processes in the process group associated with the controlling terminal. You may either install a handler for this signal or ignore it completely. For arbitrary programs, you can start them with the nohup utility designed for this purpose.
You can also place the process in a new process group without a controlling terminal.
why on close terminal on linux all his processes will terminated, but
not passed to "init" process (with pid 1)
The processes are losing their controlling terminal so the kernel sends them a SIGHUP. The default action of SIGHUP is to terminate the process.
i think this will help you to understand
http://www.digipedia.pl/usenet/thread/18802/10189/