How to kill hidden process? - c

I have the following script.
#!/bin/bash
if [ "$EUID" -ne 0 ]
then
echo ''
echo -e "\e[1;31m Please run the script as root \e[0m"
echo ''
exit
fi
for run in {1..11}
do
echo -e '\e[1;32m Initializing AP in backfround... \e[0m'
sudo screen -dmS hotspot
sleep 5
# start the AP in background
echo -e '\e[1;32m Starting AP in backfround... \e[0m'
sudo screen -S hotspot -X exec ./start_hostapd.sh
sleep 20
# save PIDs for dmS
ps -ef | grep "dmS" | awk '{print $2}' > dms.log
sleep 1
# save PIDs for hostapd
ps -ef | grep "hostapd" | awk '{print $2}' > process.log
sleep 1
echo -e '\e[1;33m Running data... \e[0m'
for run in {1..10}
do # send 10 times
sudo /home/ubuntu/Desktop/send_data/run_data
sleep 1
done
echo -e "\e[1;31m Stopping sending... \e[0m"
sleep 2
echo -e "\e[1;31m Quiting hotspot... \e[0m"
sudo /home/ubuntu/Desktop/kill_dms/kill_dms
sleep 5
echo -e "\e[1;31m Stopping AP... \e[0m"
sudo /home/ubuntu/Desktop/kill_hostapd/kill_hostapd
sleep 5
echo -e '\e[1;31m Wiping dead screens... \e[0m'
echo
sudo screen -wipe
sudo screen -X -S hotspot quit
sleep 5
done
I use a bash script that starts the AP (hostapd) and then it executes some another commands. Unfortunately, once the AP is started, the next lines will not be executed anymore. To avoid this problem, in the Script I start the AP using screen command that allows to run AP in background and also it allows to execute next lines.
For each iteration in the for-loop, the AP must be restarted. For this purpose I write out the PIDs of screen and hostapd and then I call my C programs, which kill these processes. At last I use screen commands again to ensure that the AP in the background has been stopped and it can be started again.
This implementation works good. However, when the script comes to the end and all processes has been already killed, the AP disappears in other devices and after some minutes it appears again and it happens several times. Only the system reboot helps to stop the AP completely.
I use htop to find out the processes which runs AP. However, I can not find the processes. The htop says that there is no processes, which I created using script from above. This is right, because the script kills the processes once it is finished.
So, I suppose that there are hidden processes for my AP and I do not see them. Is there a way to find that hidden processes and kill them to stop the AP?
When I just start the AP in another terminal and then I stop it just using CTRL+C, the AP will be stopped and my devices do not see it anymore.
That's why I suppose that the screen starts a hidden process, which can not be found by htop or by other programs like htop.

If you don't need any hostap process at all, I'd rather use pkill instead of trusting the management of pids. Easiest usage should look like:
pkill -f hostap
pkill -f screen
If you'd want to use another signal like 9, use:
pkill -9 -f hostap
pkill -9 -f screen
https://linux.die.net/man/1/pkill

Related

Save one line of `top`, `htop` or `intel_gpu_top` outputs into a Bash array

I want to save 1 line from the output of top into a Bash array to later access its components:
$ timeout 1 top -d 2 | awk 'NR==8'
2436 USER 20 0 1040580 155268 91100 S 6.2 1.0 56:38.94 Xorg
Terminated
I tried:
$ gpu=($(timeout 1s top -d 2 | awk 'NR==8'))
$ mapfile -t gpu < <($(timeout 1s top -d 2 | awk 'NR==8'))
and, departing from the array requisite, even:
$ read -r gpu < <(timeout 1s top -d 2 | awk 'NR==8')
all returned a blank for either ${gpu[#]} (first two) or $gpu (last).
Edit:
As pointed out by #Cyrus and others gpu=($(top -n 1 -d 2 | awk 'NR==8')) is the obvious solution. However I want to build the cmd dynamically so top -d 2 may be replaced by other cmds such as htop -d 20 or intel_gpu_top -s 1. Only top can limit its maximum number of iterations, so that is not an option in general, and for that reason I resort to timeout 1s to kill the process in all shown attempts...
End edit
Using a shell other than Bash is not an option. Why did the above attempts fail and how can I achieve that ?
Why did the above attempts fail
Because redirection to pipe does not have terminal capabilities, top process receives SIGTTOU signal when it tries to write the terminal and take the terminal "back" from the shell. The signal causes top to terminate.
how can I achieve that ?
Use top -n 1. Generally, use the tool specific options to disable using terminal utilities by that tool.
However I want to build the cmd dynamically so top -d 2 may be replaced by other cmds such as htop -d 20 or intel_gpu_top -s 1
Write your own terminal emulation and extract the first line from the buffer of the first stuff the command displays. See GNU screen and tmux source code for inspiration.
I dont think you need the timeout there if its intended to quit top. You can instead use the -n and -b flags but feel free to add it if you need it
#!/bin/bash
arr=()
arr[0]=$(top -n 1 -b -d 2 | awk 'NR==8')
arr[1]=random-value
arr[2]=$(top -n 1 -b -d 2 |awk 'NR==8')
echo ${arr[0]}
echo ${arr[1]}
echo ${arr[2]}
output:
1 root 20 0 99868 10412 7980 S 0.0 0.5 0:00.99 systemd
random-value
1 root 20 0 99868 10412 7980 S 0.0 0.5 0:00.99 systemd
from top man page:
-b :Batch-mode operation
Starts top in Batch mode, which could be useful for sending output from top to other programs or to a
file. In this mode, top will not accept input and runs until the iterations limit you've set with the
`-n' command-line option or until killed.
-n :Number-of-iterations limit as: -n number
Specifies the maximum number of iterations, or frames, top should produce before ending.
-d :Delay-time interval as: -d ss.t (secs.tenths)
Specifies the delay between screen updates, and overrides the corresponding value in one's personal
configuration file or the startup default. Later this can be changed with the `d' or `s' interactive
commands.

qsub array job submission

I am currently trying to run an array job on the "big-computer" at my Uni.
I'm new to Unix and bash and I've been having a hard time getting this to work.
The folder set up is as follow:
model1
- model1.inp
- model1.num
model2
- model2.inp
- model2.num
startup.sh
runAModel.sh
modelArray.sh
Due to restrictions on how long I can run a single job, I was asked to break up my simulations. So I need to run each model 5 times over, each time the model reads the input file .inp and outputs another input file for the subsequent run.
The code below used to work until a week a go or so but it doesn't seem to function anymore. I wonder if I didn't mess something up in there.
I suspected it might be in the line qcmd="qsub -N $modelName -t 1:5 ../../modelArray.sh" of runAModel.sh and that I should replace 1:5 to 1-5 but that didn't seem to work.
I use qstat to see my job and where I would expect to see a list of 5 queued jobs I only see one.
I was given three files to run:
startup.sh :
find . -mindepth 2 -type d -exec ./runAModel.sh {} \;
runAModel.sh :
#!/bin/bash
echo starting model in $1
cd $1 # go into the model directory
modelName=$(basename $PWD)
for f in *
do
dos2unix $f
done
qcmd="qsub -N $modelName -t 1:5 ../../modelArray.sh"
qq=`$qcmd` # runs a qsub command
# extract the job number
qt=`echo $qq | awk '{print $3}'`
jobid=${qt%%.*}
qrls $jobid.1
and modelArray.sh :
#!/bin/bash
# run program, invoke in model directory with input files.
# we want to run in the current working directory
#$ -cwd
# we want to run mpi with 4 cores on he same node:
#$ -pe sharedmem 4
# make a generous guess at the time we need
#$ -l h_rt=30:00:00
# force reservation
#$ -R y
# use 4G per process
#$ -l h_vmem=4G
# hold the array
#$ -h
echo I am task $SGE_TASK_ID in $JOB_ID with $SGE_TASK_LAST tasks in total
echo on $HOSTNAME
date
# run our model - set modules, then get the model name
echo "set modules"
. /etc/profile.d/modules.sh
PROGRAMBUILD=/exports/programlocation
. $PROGRAMBUILD/loadModules.sh
modelName=$(basename $PWD)
echo mpirun -np 4 $PROGRAMBUILD/bin/program $modelName
mpirun -np 4 $PROGRAMBUILD/bin/program $modelName
if [ $SGE_TASK_ID == $SGE_TASK_LAST ]
then
echo I am last task
else
# release the next task....
# next task in this array:
next=$((SGE_TASK_ID+1))
echo insert a test that this task in the array job was successful
echo if so, release next task
echo releasing $next
ssh login01.***.uk qrls $JOB_ID.$next
if [[ "$?" -ne 0 ]]; then
echo failed to qrls $pid
fi
fi

Linux Command to Show Stopped and Running processes?

I'm presently executing the following Linux command in one of my c programs to display processes that are running. Is there anyway I can modify it to show stopped processes and running ones?
char *const parmList[] = {"ps","-o","pid,ppid,time","-g","-r",groupProcessID,NULL};
execvp("/bin/ps", parmList);
jobs -s list stopped process by SIGTSTP (20), no SIGSTOP (19). The main difference is that SIGSTOP cannot be ignored. More info with help jobs.
You can SIGTSTP a process with ^Z or from other shell with kill -TSTP PROC_PID (or with pkill, see below), and then list them with jobs.
But what about listing PIDs who had received SIGSTOP? One way to get this is
ps -A -o stat,command,pid | grep '^T '
From man ps:
-A Select all processes. Identical to -e.
T stopped by job control signal
I found very useful this two to stop/cont for a while some process (usually the browser):
kill -STOP $(pgrep procName)
kill -CONT $(pgrep procName)
Or with pkill or killall:
pkill -STOP procName
pkill -CONT procName
Credit to #pablo-bianchi, he gave me the oompff (starting point) to find SIGSTOP'd and SIGTSTP'd processes, however his answers are not completely correct.
Pablo's command should use T rather than S
$ ps -e -o stat,command,pid | grep '^T '
T /bin/rm -r 2021-07-23_22-00 1277441
T pyt 999 1290977
$ ps -e -o stat,command,pid | grep '^S ' | wc -l
153
$
From man ps:
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers (header "STAT"
or "S") will display to describe the state of a process:
D uninterruptible sleep (usually IO)
I Idle kernel thread
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped by job control signal
t stopped by debugger during the tracing
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by its parent
WRT pgrep, it is a real grep, the argument is NOT a program name; rather, it is a regular expression applied to the first item in /proc//cmdline (usually the name from the executing commandline (or execve()).
Therefore if you are trying to kill pyt, you would accidentally also kill all the python programs that are running:
$ pgrep -a pyt
7228 python3 /home/wwalker/bin/i3-alt-tab-ww --debug
1290977 pyt 999
You need to "anchor" the regular expression:
$ pgrep -a '^pyt$'
1290977 pyt 999
ps -e lists all processes.
jobs list all processes currently stopped or in background.
So, you can run jobs command using execvp:
char *arg = {"jobs", NULL};
execvp(arg[0], arg);

Find tmux session that a PID belongs to

I am using htop so see what processes are taking up a lot of memory so I can kill them. I have a lot of tmux sessions and lots of similar processes. How can I check which tmux pane a PID is in so I can be sure I am killing stuff I want to kill?
Given that PID in the below line is the target pid number:
$ tmux list-panes -a -F "#{pane_pid} #{pane_id}" | grep ^PID
The above will identify the pane where the PID is running. The output will be two strings. The first number should be the same as PID and the second one (with a percent sign) is "tmux pane id". Example output:
2345 %30
Now, you can use "tmux pane id" to kill the pane without "manually" searching for it:
$ tmux kill-pane -t %30
To answer your question completely, in order to find *tmux session* that a PID belongs to, this command can be used:
$ tmux list-panes -a -F "#{pane_pid} #{session_name}" | grep ^PID
# example output: 2345 development
Here's another possibly useful "line":
$ tmux list-panes -a -F "#{pane_pid} #{session_name}:#{window_index}:#{pane_index}" | grep ^PID
# example output: 2345 development:2:0
The descriptions for all of the interpolation strings (example #{pane_pid}) can be looked up in tmux man page in the FORMATS section.
The answers above give you the pids of the shells running in the panes, you'll be out of luck if you want to find something running in the shells.
try:
https://gist.github.com/nkh/0dfa8bf165a53832a4b5b17ee0d7ab12
This scrip gives you all the pids as well as the files the processes have opened. I never know in which session, window, pane, attached or not, I have a file open, this helps.
I haven't tried it on another machine, tell me if you encounter any problem.
lsof needs to be installed.
if you just want pids, pstree is useful, you can modity the script to use it (it's already there commented)
The following script displays the tree of processes in each window (or pane). It takes list of PIDs as one parameter (one PID per line). Specified processes are underlined. It automatically pipes to less unless is a part of some other pipe. Example:
$ ./tmux-processes.sh "$(pgrep ruby)"
-- session-name-1 window-index-1 window-name-1
7184 7170 bash bash --rcfile /dev/fd/63 -i
7204 7184 vim vim ...
-- session-name-2 window-index-2 window-name-2
7186 7170 bash bash --rcfile /dev/fd/63 -i
10771 7186 bash bash ./manage.sh runserver
10775 10771 django-admi /srv/www/s1/env/bin/python /srv/www/s1/env/bin/...
5761 10775 python /srv/www/s1/env/bin/python /srv/www/s1/env/bin/...
...
tmux-processes.sh:
#!/usr/bin/env bash
set -eu
pids=$1
my_pid=$$
subtree_pids() {
local pid=$1 level=${2:-0}
if [ "$pid" = "$my_pid" ]; then
return
fi
echo "$pid"
ps --ppid "$pid" -o pid= | while read -r pid; do
subtree_pids "$pid" $((level + 1))
done
}
# server_pid=$(tmux display-message -p '#{pid}')
underline=$(tput smul)
# reset=$(tput sgr0) # produces extra symbols in less (^O), TERM=screen-256color (under tmux)
reset=$(echo -e '\033[m')
re=$(echo "$pids" | paste -sd'|')
tmux list-panes -aF '#{session_name} #{window_index} #{window_name} #{pane_pid}' \
| while read -r session_name window_index window_name pane_pid; do
echo "-- $session_name $window_index $window_name"
ps -p "$(subtree_pids "$pane_pid" | paste -sd,)" -Ho pid=,ppid=,comm=,args= \
| sed -E 's/^/ /' \
| awk \
-v re="$re" -v underline="$underline" -v reset="$reset" '
$1 ~ re {print underline $0 reset}
$1 !~ re {print $0}
'
done | {
[ -t 1 ] && less -S || cat
}
Details regarding listing tmux processes you can find here.
To underline lines I use ANSI escape sequences. To show the idea separately, here's a script that displays list of processes and underlines some of them (having PIDs passed as an argument):
#!/usr/bin/env bash
set -eu
pids=$1
bold=$(tput bold)
# reset=$(tput sgr0) # produces extra symbols in less (^O), TERM=xterm-256color
reset=$(echo -e '\033[m')
underline=$(tput smul)
re=$(echo "$pids" | paste -sd'|')
ps -eHo pid,ppid,comm,args | awk \
-v re="$re" -v bold="$bold" -v reset="$reset" -v underline="$underline" '
$1 ~ re {print underline $0 reset}
$1 !~ re {print $0}
'
Usage:
$ ./ps.sh "$(pgrep ruby)"
Details regarding less and $(tput sgr0) can be found here.

Executing shell script with system() returns 256. What does that mean?

I've written a shell script to soft-restart HAProxy (reverse proxy). Executing the script from the shell works. But I want a daemon to execute the script. That doesn't work. system() returns 256. I have no clue what that might mean.
#!/bin/sh
# save previous state
mv /home/haproxy/haproxy.cfg /home/haproxy/haproxy.cfg.old
mv /var/run/haproxy.pid /var/run/haproxy.pid.old
cp /tmp/haproxy.cfg.new /home/haproxy/haproxy.cfg
kill -TTOU $(cat /var/run/haproxy.pid.old)
if haproxy -p /var/run/haproxy.pid -f /home/haproxy/haproxy.cfg; then
kill -USR1 $(cat /var/run/haproxy.pid.old)
rm -f /var/run/haproxy.pid.old
exit 1
else
kill -TTIN $(cat /var/run/haproxy.pid.old)
rm -f /var/run/haproxy.pid
mv /var/run/haproxy.pid.old /var/run/haproxy.pid
mv /home/haproxy/haproxy.cfg /home/haproxy/haproxy.cfg.err
mv /home/haproxy/haproxy.cfg.old /home/haproxy/haproxy.cfg
exit 0
fi
HAProxy is executed with user haproxy. My daemon has it's own user too. Both run with sudo.
Any hints?
According to this and that, Perl's system() returns exit values multiplied by 256. So it's actually exiting with 1. It seems this happens in C too.
Unless system returns -1 its return value is of the same format as the status value from the wait family of system calls (man 2 wait). There are macros to help you interpret this status:
man 3 wait
Lists these macros and what they tell you.
A code of 256 probably means that the system command cannot locate the binary to run it. Remember that it may not be calling bash and that it may not have paths setup. Try again with full paths to the binaries!
I have the same problem when call script that contains `kill' command in a daemon.
The daemon must have closed the stdout, stderr...
Use something like system("scrips.sh > /dev/null") should work.

Resources