Is it possible to watch external process by its pids for read/write events? In particular I want to write a program which counts bytes an external process has written to stdout, stderr or FILE*. Desired platform is Linux. Note: I cannot change source code of target processes.
To see each write as it occurs:
strace -ewrite -p$PID
To see a sum of bytes written when the process ends:
strace -ewrite -p$PID 2>&1 | { while read; do set $REPLY; n=$_; [ $n -gt 0 ] && let count+=$n; done; echo $count; }
The above assume that process $PID already runs; note that only writes after strace started are counted. To account from the start of $PROCESS, start it with strace -ewrite $PROCESS instead.
Related
How do you kill a running process in a bash script without killing the script?
Goal
So... I need to kill a running process to continue with the execution of my script.
Here´s a stripped down version of my script:
# Setup
echo "Use ctrl+C to stop gathering data..."
./logger
#------------------
echo "Rest..."
I need to be able to kill ./logger without killing my script
Expected vs Actual Results
I was expecting a result like this:
Use ctrl+C to stop gathering data...
^C
Rest...
but it stops when I kill the program
use ctrl+C to stop gathering data...
^C
The ./logger command is a C program that already handles the SIGINT signal (Ctrl + C) but it won't stop until it receives it.
Is there a way to stop the program without killing my script?
The C Program is not important but it looks something like this
#include <stdio.h>
#include <signal.h>
int flag = 1;
void handler(int i)
{
flag = 0;
}
int main(int argc, char** argv)
{
signal(SIGINT, handler);
while(flag)
{
usleep(10000);
}
}
If you type control-C at the terminal where the script is running, the signal goes to all the processes — the logger program and the script. And the script terminates because you've not told it to ignore signals while the logger is run. You can do that using the trap command but you'll need to undo the trapping before running the C program. See the Bash manual on Signals too.
For example, I have a program dribbler that generates messages, one every second by default, but it is configurable. I'm using it as a surrogate for your ./logger program. The -m options specifies the message; the -n option requests that the messages are numbered, and the -t option specifies output to standard output (instead of a file which it writes to by default).
#!/bin/bash
#
# SO 6418-0134
program="dribbler -t -n -m Hello"
echo "About to run '$program' command"
trap "" 2
(trap 2; $program)
echo "Continuing after '$program' exits"
sleep 1
echo "But not for long"
The sub-shell is necessary. When I run that script (trapper.sh), I get, for example:
$ bash trapper.sh
About to run 'dribbler -t -n -m Hello' command
0: Hello
1: Hello
2: Hello
3: Hello
^CContinuing after 'dribbler -t -n -m Hello' exits
But not for long
$
Incidentally, POSIX defines a command logger that writes messages using the syslog() function(s) to the syslog daemon.
It might also be worth reviewing the Q&A about Sending SIGINT to forked exec process which runs a script does not kill it, especially the accepted answer and its links to Q&A on other Stack Exchange sites, as it matters that you and I both executed a C program, not a shell script.
I'm presently executing the following Linux command in one of my c programs to display processes that are running. Is there anyway I can modify it to show stopped processes and running ones?
char *const parmList[] = {"ps","-o","pid,ppid,time","-g","-r",groupProcessID,NULL};
execvp("/bin/ps", parmList);
jobs -s list stopped process by SIGTSTP (20), no SIGSTOP (19). The main difference is that SIGSTOP cannot be ignored. More info with help jobs.
You can SIGTSTP a process with ^Z or from other shell with kill -TSTP PROC_PID (or with pkill, see below), and then list them with jobs.
But what about listing PIDs who had received SIGSTOP? One way to get this is
ps -A -o stat,command,pid | grep '^T '
From man ps:
-A Select all processes. Identical to -e.
T stopped by job control signal
I found very useful this two to stop/cont for a while some process (usually the browser):
kill -STOP $(pgrep procName)
kill -CONT $(pgrep procName)
Or with pkill or killall:
pkill -STOP procName
pkill -CONT procName
Credit to #pablo-bianchi, he gave me the oompff (starting point) to find SIGSTOP'd and SIGTSTP'd processes, however his answers are not completely correct.
Pablo's command should use T rather than S
$ ps -e -o stat,command,pid | grep '^T '
T /bin/rm -r 2021-07-23_22-00 1277441
T pyt 999 1290977
$ ps -e -o stat,command,pid | grep '^S ' | wc -l
153
$
From man ps:
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers (header "STAT"
or "S") will display to describe the state of a process:
D uninterruptible sleep (usually IO)
I Idle kernel thread
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped by job control signal
t stopped by debugger during the tracing
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by its parent
WRT pgrep, it is a real grep, the argument is NOT a program name; rather, it is a regular expression applied to the first item in /proc//cmdline (usually the name from the executing commandline (or execve()).
Therefore if you are trying to kill pyt, you would accidentally also kill all the python programs that are running:
$ pgrep -a pyt
7228 python3 /home/wwalker/bin/i3-alt-tab-ww --debug
1290977 pyt 999
You need to "anchor" the regular expression:
$ pgrep -a '^pyt$'
1290977 pyt 999
ps -e lists all processes.
jobs list all processes currently stopped or in background.
So, you can run jobs command using execvp:
char *arg = {"jobs", NULL};
execvp(arg[0], arg);
I am writing an application in linux and need to access the serial port.
For debugging purposes I need to snif what comes and/or goes through the serial port.
I looked around and found out I can use strace to do that. So I tried the following:
-I print the file_descriptor of the serial device that I use.
(after restarting my application a few times, I reassured myself that the file_descriptor number my application gets from kernel is "4"
-if i start my application as strace -e write=4 ./myapp , I would expect to get messages in the terminal, from file_descriptor "4" only. instead I get looots of output:
read(5, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\300Q\254C4\0\0\0"..., 52
fstat64(5, {st_mode=S_IFREG|0644, st_size=1448930, ...}) = 0
mmap2(0x43ab8000, 153816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0)0
mprotect(0x43ad6000, 28672, PROT_NONE) = 0
mmap2(0x43add000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRI0
close(5) = 0
munmap(0x2ab4c000, 38409) = 0
exit(0)
from several different file_descriptors.
If I run my app with strace -e trace=write -e write=4 ./myapp
I'll get far less message, even though they will still be many, and or file_descriptor "1".
write(1, "GPIO data bank:0x8, data: 0x80 a"..., 52GPIO data bank:0x8, data: 0x81) = 52
write(1, "\n", 1) = 1
write(1, "--> Version: 0677 <--\n", 22--> Version: 0677 <-- ) = 22
serial fd = 4
what you see above are some printf statements.
The extremely weird part is that the line serial fd = 4 is also a printf statement, but for some reason it is not wrapped around write(fd, ....) statement in strace output.
Can someone explain that, too?
thank you for your help.
Try it out with someting simple.
strace -e write=1 echo foo
This will write all syscalls, and in addition to these, the data written to fd 1.
strace -e trace=none -e write=1 echo foo
This will generate no output except for the output from the program itself. It seems you have to trace write if you want to see its data.
strace -e trace=write -e write=1 echo foo
This will print all write syscalls, for any file descriptor. In addition to that, it will print a dump of the data sent to descriptor 1. The output will look like this:
write(1, "foo\n", 4foo
) = 4
| 00000 66 6f 6f 0a foo. |
+++ exited with 0 +++
The syscall starts in the first line. After the list of arguments, the syscall is actually executed, and prints foo followed by a newline. Then the syscall return value is printed by strace. After that, we have the data dump.
I'd suggest using -e trace=write -e write=4 -o write4.txt followed by grep '^ |' write4.txt or something like that. If you want to see data in real time, you can use a bash redirection like this:
strace -e trace=write -e write=4 -o >(grep '^ |') ./myapp
This will send output from strace to grep, where you can strip the write syscalls and concentrate on the data dumps.
The extremely weird part is that the line serial fd = 4 is also a printf statement, but for some reason it is not wrapped around write(fd, ....) statement in strace output. Can someone explain that, too?
I'd say that line is output not from strace, but from some application. That's the reason it is not wrapped. The fact that no wrapped version of this appears in addition to that unwrapped one (like in my foo example output above) suggests that the output might originate in a child process lainced by myapp. Perhaps you want to add -f so you follow child process creation?
Notice that a child might decide to rename its file descriptors, e.g. redirect its standard output to that serial port opened by the parent. If that happens, write=4 won't be appropriate any more. To be on the safe side, I'd write the whole -f -e trace=write output to a file, and look at that to see where the data actually gets written. Then adjust things to home in on that data.
I've written a shell script to soft-restart HAProxy (reverse proxy). Executing the script from the shell works. But I want a daemon to execute the script. That doesn't work. system() returns 256. I have no clue what that might mean.
#!/bin/sh
# save previous state
mv /home/haproxy/haproxy.cfg /home/haproxy/haproxy.cfg.old
mv /var/run/haproxy.pid /var/run/haproxy.pid.old
cp /tmp/haproxy.cfg.new /home/haproxy/haproxy.cfg
kill -TTOU $(cat /var/run/haproxy.pid.old)
if haproxy -p /var/run/haproxy.pid -f /home/haproxy/haproxy.cfg; then
kill -USR1 $(cat /var/run/haproxy.pid.old)
rm -f /var/run/haproxy.pid.old
exit 1
else
kill -TTIN $(cat /var/run/haproxy.pid.old)
rm -f /var/run/haproxy.pid
mv /var/run/haproxy.pid.old /var/run/haproxy.pid
mv /home/haproxy/haproxy.cfg /home/haproxy/haproxy.cfg.err
mv /home/haproxy/haproxy.cfg.old /home/haproxy/haproxy.cfg
exit 0
fi
HAProxy is executed with user haproxy. My daemon has it's own user too. Both run with sudo.
Any hints?
According to this and that, Perl's system() returns exit values multiplied by 256. So it's actually exiting with 1. It seems this happens in C too.
Unless system returns -1 its return value is of the same format as the status value from the wait family of system calls (man 2 wait). There are macros to help you interpret this status:
man 3 wait
Lists these macros and what they tell you.
A code of 256 probably means that the system command cannot locate the binary to run it. Remember that it may not be calling bash and that it may not have paths setup. Try again with full paths to the binaries!
I have the same problem when call script that contains `kill' command in a daemon.
The daemon must have closed the stdout, stderr...
Use something like system("scrips.sh > /dev/null") should work.
I need to send the arp of a IP to get it's mac address which is configured on different machine. I am arping this ip from a C program by "system(arping -c 3 -i eth0 ) but I see that this is hanged in there.
But if I run the same command from bash "arping -c 3 -i eth0 " it get executed successfully.
I could not understand why system command hanged in this case while the command is successfully completed when run from bash.
Thanks,
Since you said it was hanging you can try:
strace -o my_prog.strace -f ./my_prog
and then kill it after it hangs. Then you can view the strace output file my_prog.strace and try to figure out what went wrong.
You may want to look at the strace man page to see other options that you might like use -- of particular use to me are ones that make it show more data in buffer (and string) input/output.
If it's not really hanging you should check the return value from your call to system( ) and then inspect errno.
edit
Something that I just thought of that could cause a hang would be if arping was actually a link to a setuid root program that did sudo on the real arping and it is waiting on a password to be typed in, but the terminal for that program isn't set correctly.
try system("arping -c 3 -I eth0 ip-addr");
something like:
main()
{
system("arping -c 3 -I eth0 192.168.10.1");
}
Are you using any child process to execute above ?
From Definition of system() :
The system() function shall ignore the SIGINT and SIGQUIT signals, and shall block the SIGCHLD signal, while waiting for the command to terminate. The system() function shall not return until the child process has terminated.
Recommendations:
1.check on the return value of system() & take appropriate decision.
Eg: If return value is zero it means command processor is not available.If a child process cannot be created, or if the termination status for the command language interpreter cannot be obtained, system() shall return -1 and set errno to indicate the error.
2.Use complete shell commands to be executed.
Eg: system("arping -c 3 -I eth0 10.203.198.10");