Is there a way to kill a zombie process? I've tried calling exit to kill the process and even sending SIGINT signal to the process, but it seems that nothing can kill it. I'm programming for Linux.
Zombie processes are already dead, so they cannot be killed, they can only be reaped, which has to be done by their parent process via wait*(). This is usually called the child reaper idiom, in the signal handler for SIGCHLD:
while (wait*(... WNOHANG ...)) {
...
}
Here is a script I created to kill ALL zombie processes. It uses the GDB debugger to attach to the parent process and send a waitpid to kill the zombie process. This will leave the parent live and only slay the zombie.
GDB debugger will need to be installed and you will need to be logged in with permissions to attach to a process. This has been tested on Centos 6.3.
#!/bin/bash
##################################################################
# Script: Zombie Slayer
# Author: Mitch Milner
# Date: 03/13/2013 ---> A good day to slay zombies
#
# Requirements: yum install gdb
# permissions to attach to the parent process
#
# This script works by using a debugger to
# attach to the parent process and then issuing
# a waitpid to the dead zombie. This will not kill
# the living parent process.
##################################################################
clear
# Wait for user input to proceed, give user a chance to cancel script
echo "***********************************************************"
echo -e "This script will terminate all zombie process."
echo -e "Press [ENTER] to continue or [CTRL] + C to cancel:"
echo "***********************************************************"
read cmd_string
echo -e "\n"
# initialize variables
intcount=0
lastparentid=0
# remove old gdb command file
rm -f /tmp/zombie_slayer.txt
# create the gdb command file
echo "***********************************************************"
echo "Creating command file..."
echo "***********************************************************"
ps -e -o ppid,pid,stat,command | grep Z | sort | while read LINE; do
intcount=$((intcount+1))
parentid=`echo $LINE | awk '{print $1}'`
zombieid=`echo $LINE | awk '{print $2}'`
verifyzombie=`echo $LINE | awk '{print $3}'`
# make sure this is a zombie file and we are not getting a Z from
# the command field of the ps -e -o ppid,pid,stat,command
if [ "$verifyzombie" == "Z" ]
then
if [ "$parentid" != "$lastparentid" ]
then
if [ "$lastparentid" != "0" ]
then
echo "detach" >> /tmp/zombie_slayer.txt
fi
echo "attach $parentid" >> /tmp/zombie_slayer.txt
fi
echo "call waitpid ($zombieid,0,0)" >> /tmp/zombie_slayer.txt
echo "Logging: Parent: $parentid Zombie: $zombieid"
lastparentid=$parentid
fi
done
if [ "$lastparentid" != "0" ]
then
echo "detach" >> /tmp/zombie_slayer.txt
fi
# Slay the zombies with gdb and the created command file
echo -e "\n\n"
echo "***********************************************************"
echo "Slaying zombie processes..."
echo "***********************************************************"
gdb -batch -x /tmp/zombie_slayer.txt
echo -e "\n\n"
echo "***********************************************************"
echo "Script complete."
echo "***********************************************************"
Enjoy.
A zombie process is a process id (and associated termination status and resource usage information) that has not yet been waited for by its parent process. The only ways to eliminate it are to get its parent to wait for it (sometimes this can be achieved by sending SIGCHLD to the parent manually if the parent was just buggy and had a race condition where it missed the chance to wait) but usually you're out of luck unless you forcibly terminate the parent.
Edit: Another way, if you're desperate and don't want to kill the parent, is to attach to the parent with gdb and forcibly call waitpid on the zombie child.
kill -17 ZOMBIE_PID
OR
kill -SIGCHLD ZOMBIE_PID
would possibly work, bu tlike everyone else said, it is waiting for the parent to call wait() so unless the parent dies without reaping, and it got stuck there for some reason you might not want to kill it.
if I recall correctly, killing the parent of a zombie process will allow the zombie process to die.
use ps faux to get a nice hierarchical tree of your running processes showing parent/child relationships.
See unix-faqs "How do I get rid of zombie processes that persevere?"
You cannot kill zombies, as they are already dead. But if you have too many zombies then kill parent process or restart service.
You can try to kill zombie process using its pid
kill -9 pid
Please note that kill -9 does not guarantee to kill a zombie process
Related
How do you kill a running process in a bash script without killing the script?
Goal
So... I need to kill a running process to continue with the execution of my script.
Here´s a stripped down version of my script:
# Setup
echo "Use ctrl+C to stop gathering data..."
./logger
#------------------
echo "Rest..."
I need to be able to kill ./logger without killing my script
Expected vs Actual Results
I was expecting a result like this:
Use ctrl+C to stop gathering data...
^C
Rest...
but it stops when I kill the program
use ctrl+C to stop gathering data...
^C
The ./logger command is a C program that already handles the SIGINT signal (Ctrl + C) but it won't stop until it receives it.
Is there a way to stop the program without killing my script?
The C Program is not important but it looks something like this
#include <stdio.h>
#include <signal.h>
int flag = 1;
void handler(int i)
{
flag = 0;
}
int main(int argc, char** argv)
{
signal(SIGINT, handler);
while(flag)
{
usleep(10000);
}
}
If you type control-C at the terminal where the script is running, the signal goes to all the processes — the logger program and the script. And the script terminates because you've not told it to ignore signals while the logger is run. You can do that using the trap command but you'll need to undo the trapping before running the C program. See the Bash manual on Signals too.
For example, I have a program dribbler that generates messages, one every second by default, but it is configurable. I'm using it as a surrogate for your ./logger program. The -m options specifies the message; the -n option requests that the messages are numbered, and the -t option specifies output to standard output (instead of a file which it writes to by default).
#!/bin/bash
#
# SO 6418-0134
program="dribbler -t -n -m Hello"
echo "About to run '$program' command"
trap "" 2
(trap 2; $program)
echo "Continuing after '$program' exits"
sleep 1
echo "But not for long"
The sub-shell is necessary. When I run that script (trapper.sh), I get, for example:
$ bash trapper.sh
About to run 'dribbler -t -n -m Hello' command
0: Hello
1: Hello
2: Hello
3: Hello
^CContinuing after 'dribbler -t -n -m Hello' exits
But not for long
$
Incidentally, POSIX defines a command logger that writes messages using the syslog() function(s) to the syslog daemon.
It might also be worth reviewing the Q&A about Sending SIGINT to forked exec process which runs a script does not kill it, especially the accepted answer and its links to Q&A on other Stack Exchange sites, as it matters that you and I both executed a C program, not a shell script.
I have the following script.
#!/bin/bash
if [ "$EUID" -ne 0 ]
then
echo ''
echo -e "\e[1;31m Please run the script as root \e[0m"
echo ''
exit
fi
for run in {1..11}
do
echo -e '\e[1;32m Initializing AP in backfround... \e[0m'
sudo screen -dmS hotspot
sleep 5
# start the AP in background
echo -e '\e[1;32m Starting AP in backfround... \e[0m'
sudo screen -S hotspot -X exec ./start_hostapd.sh
sleep 20
# save PIDs for dmS
ps -ef | grep "dmS" | awk '{print $2}' > dms.log
sleep 1
# save PIDs for hostapd
ps -ef | grep "hostapd" | awk '{print $2}' > process.log
sleep 1
echo -e '\e[1;33m Running data... \e[0m'
for run in {1..10}
do # send 10 times
sudo /home/ubuntu/Desktop/send_data/run_data
sleep 1
done
echo -e "\e[1;31m Stopping sending... \e[0m"
sleep 2
echo -e "\e[1;31m Quiting hotspot... \e[0m"
sudo /home/ubuntu/Desktop/kill_dms/kill_dms
sleep 5
echo -e "\e[1;31m Stopping AP... \e[0m"
sudo /home/ubuntu/Desktop/kill_hostapd/kill_hostapd
sleep 5
echo -e '\e[1;31m Wiping dead screens... \e[0m'
echo
sudo screen -wipe
sudo screen -X -S hotspot quit
sleep 5
done
I use a bash script that starts the AP (hostapd) and then it executes some another commands. Unfortunately, once the AP is started, the next lines will not be executed anymore. To avoid this problem, in the Script I start the AP using screen command that allows to run AP in background and also it allows to execute next lines.
For each iteration in the for-loop, the AP must be restarted. For this purpose I write out the PIDs of screen and hostapd and then I call my C programs, which kill these processes. At last I use screen commands again to ensure that the AP in the background has been stopped and it can be started again.
This implementation works good. However, when the script comes to the end and all processes has been already killed, the AP disappears in other devices and after some minutes it appears again and it happens several times. Only the system reboot helps to stop the AP completely.
I use htop to find out the processes which runs AP. However, I can not find the processes. The htop says that there is no processes, which I created using script from above. This is right, because the script kills the processes once it is finished.
So, I suppose that there are hidden processes for my AP and I do not see them. Is there a way to find that hidden processes and kill them to stop the AP?
When I just start the AP in another terminal and then I stop it just using CTRL+C, the AP will be stopped and my devices do not see it anymore.
That's why I suppose that the screen starts a hidden process, which can not be found by htop or by other programs like htop.
If you don't need any hostap process at all, I'd rather use pkill instead of trusting the management of pids. Easiest usage should look like:
pkill -f hostap
pkill -f screen
If you'd want to use another signal like 9, use:
pkill -9 -f hostap
pkill -9 -f screen
https://linux.die.net/man/1/pkill
I am running my erlang process with this script
#!/bin/sh
stty -f /dev/tty icanon raw
erl -pa ./ -run thing start -run init -noshell
stty echo echok icanon -raw
my Erlang process:
-module(thing).
-compile(export_all).
process(<<27>>) ->
io:fwrite("Ch: ~w", [<<27>>]),
exit(normal);
process(Ch) ->
io:fwrite("Ch: ~w", [Ch]),
get_char().
get_char() ->
Ch = io:get_chars("p: ", 1),
process(Ch).
start() ->
io:setopts([{binary, true}]),
get_char().
When I run ./invoke.sh, I press keys and see the characters print as expected. When I hit escape, the shell window stops responding (I have to close the window from the terminal). Why does this happen?
When you call exit/1 that only terminates the erlang process, that doesn't stop the erlang runtime system (beam). Since you're running without a shell, you get that behaviour of the window not responding. If you kill the beam process from your task manager or by pkill you'll get your command line back.
An easy fix would be to replace
exit(normal)
with
halt() see doc
I'm presently executing the following Linux command in one of my c programs to display processes that are running. Is there anyway I can modify it to show stopped processes and running ones?
char *const parmList[] = {"ps","-o","pid,ppid,time","-g","-r",groupProcessID,NULL};
execvp("/bin/ps", parmList);
jobs -s list stopped process by SIGTSTP (20), no SIGSTOP (19). The main difference is that SIGSTOP cannot be ignored. More info with help jobs.
You can SIGTSTP a process with ^Z or from other shell with kill -TSTP PROC_PID (or with pkill, see below), and then list them with jobs.
But what about listing PIDs who had received SIGSTOP? One way to get this is
ps -A -o stat,command,pid | grep '^T '
From man ps:
-A Select all processes. Identical to -e.
T stopped by job control signal
I found very useful this two to stop/cont for a while some process (usually the browser):
kill -STOP $(pgrep procName)
kill -CONT $(pgrep procName)
Or with pkill or killall:
pkill -STOP procName
pkill -CONT procName
Credit to #pablo-bianchi, he gave me the oompff (starting point) to find SIGSTOP'd and SIGTSTP'd processes, however his answers are not completely correct.
Pablo's command should use T rather than S
$ ps -e -o stat,command,pid | grep '^T '
T /bin/rm -r 2021-07-23_22-00 1277441
T pyt 999 1290977
$ ps -e -o stat,command,pid | grep '^S ' | wc -l
153
$
From man ps:
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers (header "STAT"
or "S") will display to describe the state of a process:
D uninterruptible sleep (usually IO)
I Idle kernel thread
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped by job control signal
t stopped by debugger during the tracing
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by its parent
WRT pgrep, it is a real grep, the argument is NOT a program name; rather, it is a regular expression applied to the first item in /proc//cmdline (usually the name from the executing commandline (or execve()).
Therefore if you are trying to kill pyt, you would accidentally also kill all the python programs that are running:
$ pgrep -a pyt
7228 python3 /home/wwalker/bin/i3-alt-tab-ww --debug
1290977 pyt 999
You need to "anchor" the regular expression:
$ pgrep -a '^pyt$'
1290977 pyt 999
ps -e lists all processes.
jobs list all processes currently stopped or in background.
So, you can run jobs command using execvp:
char *arg = {"jobs", NULL};
execvp(arg[0], arg);
I want to run a command with different arguments in multi-threading form,
What I tried is:
#!/bin/bash
ARG1=$1
ARG2=$2
ARG3=$3
for ... #counter is i
do
main command with ARG1 ARG2 ARG3 & a[i]=$!
done
wait `echo ${a[#]}`
I used & a[i]=$! in for loop and wait $(echo ${a[#]}) after for loop. I want my bash to wail till all threads finish then echo their pid for me...
But when I run my script after some time it waits.
Thank you
I think you want this:
#!/bin/bash
for i in 0 1 2
do
sleep 3 & a[$i]=$!
done
wait
echo ${a[#]}
You are missing the $ on the array index $i in your script. Also, you don't need to say which PIDs you are wating for if you are waiting for all of them. And you also said you wanted to see the list of PIDs at the end.