I want to check the exit code of a foreground process using C code running on linux. As I understand, wait() and waitpid() are useful for child processes, but in my case, it is a foreign process. I am able to read information from /proc/<pid>/stat for that process while it is active, but as the process closes, reading from /proc/<pid>/ becomes problematic and I didn't find any information relating to exit code.
Other things I've tried:
popen() some bash commands. echo $? always returned 0, even when process of interest exited with an error code. I am not sure it targeted the process of interest. Another bash command I tried to call, was wait <pid> but this command returned immediately, while the process was still running.
If you have access to the foreground process' code, you can send a message via a message queue or even a socket (e.g. udp multicast) - and that will make the solution more general (your c program can run on a different machine).
Another option is to use a loggging service (syslog or something like that). it has some
useful interfaces that enable processes to log their exit codes.
Related
I'm trying to interface with a really crappy, completely opaque API that creates two subprocesses within a POSIX-like environment (OS X/Linux) in C. Basically, it starts an external program and provides rudimentary support for passing messages back and forth. The process tree looks something like this:
+ My_program
\
+ an initiation shell script (csh -f -c external_program_startup_script)
\
- the external program instance
When I press control-c in the terminal while My_program is running, the controlling terminal sends SIGINT to all processes in its process group — all three above processes. I want SIGINT to get to the program instance, but if it also hits the shell script then that middle process is terminated and the communication link is severed.
Within My_program, I can setup a signal handler to ignore SIGINTs. But I have absolutely no control over the two child processes (the API doesn't even expose their PIDs), so existing solutions such as changing their process group or attaching handlers won't work. Is there a way to prevent the controlling terminal from sending SIGINT to all processes in the foreground process group?
(The API in question is MATLAB's libeng, which allows an external C program to process commands within MATLAB. But it has absolutely no functionality for sending interrupts beyond that which the OS provides.)
You need to run sub-processes in separate control group. You can achieve that by proper usage of setpgid() function. After fork() call, in the child process run setpgid(), then exec the program you want to be run in separate group.
When I call kill() on a process, it returns immediately, because it just send a signal. I have a code where I am checking some (foreign, not written nor modifiable by me) processes in a loop infinitely and if they exceed some limits (too much ram eaten etc) it kills them (and write to a syslog etc).
Problem is that when processes are heavily swapped, it takes many seconds to kill them, and because of that, my process executes the same check against same processes multiple times and attempts to send the signal many times to same process, and write this to syslog as well. (this is not done on purpose, it's just a side effect which I am trying to fix)
I don't care how many times it send a signal to process, but I do care how many times it writes to syslog. I could keep a list of PID's that were already sent the kill signal, but in theory, even if there is low probability, there could be another process spawned with same pid as previously killed one had, which might also be supposed to be killed and in this case, the log would be missing.
I don't know if there is unique identifier for any process, but I doubt so. How could I kill a process either synchronously, or keep track of processes that got signal and don't need to be logged again?
Even if you could do a "synchronous kill", you still have the race condition where you could kill the wrong process. It can happen whenever the process you want to kill exits by its own volition, or by third-party action, after you see it but before you kill it. During this interval, the PID could be assigned to a new process. There is basically no solution to this problem. PIDs are inherently a local resource that belongs to the parent of the identified process; use of the PID by any other process is a race condition.
If you have more control over the system (for example, controlling the parent of the processes you want to kill) then there may be special-case solutions. There might also be (Linux-specific) solutions based on using some mechanisms in /proc to avoid the race, though I'm not aware of any.
One other workaround may be to use ptrace on the target process as if you're going to debug it. This allows you to partially "steal" the parent role, avoiding invalidation of the PID while you're still using it and allowing you to get notification when the process terminates. You'd do something like:
Check the process info (e.g. from /proc) to determine that you want to kill it.
ptrace it, temporarily stopping it.
Re-check the process info to make sure you got the process you wanted to kill.
Resume the traced process.
kill it.
Wait (via waitpid) for notification that the process exited.
This will make the script wait for process termination.
kill $PID
while [ kill -0 $PID 2>/dev/null ]
do
sleep 1
done
kill -0 [pid] tests the existence of a process
The following solution works for most processes that aren't debuggers or processes being debugged in a debugger.
Use ptrace with argument PTRACE_ATTACH to attach to the process. This stops the process you want to kill. At this point, you should probably verify that you've attached to the right process.
Kill the target with SIGKILL. It's now gone.
I can't remember whether the process is now a zombie that you need to reap or whether you need to PTRACE_CONT it first. In either case, you'll eventually have to call waitpid to reap it, at which point you know it's dead.
If you are writing this in C you are sending the signal with the kill system call. Rather than repeatedly sending the terminating signal just send it once and then loop (or somehow periodically check) with kill(pid, 0); The zero value of signal will just tell you if the process is still alive and you can act appropriately. When it dies kill will return ESRCH.
when you spawn these processes, the classical waitpid(2) family can be used
when not used anywhere else, you can move the processes going to be killed into an own cgroup; there can be notifiers on these cgroups which get triggered when process is exiting.
to find out, whether process has been killed, you can chdir(2) into /proc/<pid> or open(2) this directory. After process termination, the status files there can not be accessed anymore. This method is racy (between your check and the action, the process can terminate and a new one with the same pid be spawned).
In bash when I run a command like wc & or cat & that wants standard in right away, it returns immediately with
[1]+ Stopped cat
How is this accomplished? How do I stop a program that I started with exec, and how do I know to stop these programs in the first place? Is there some way to tell that these programs want stdin?
Thanks!
PS also, what is the + about? I've always wondered, but that's really hard to google...
If you want spawned programs to behave similarly to how the shell works, call setpgrp() after forking your child. This will cause the background program to run in its own process group, and therefore have a detached tty. When it tries to do I/O to the console, it will receive SIGTTIN or SIGTTOU signals. The default behaviour of SIGTTIN or SIGTTOU is to stop the process just like SIGSTOP.
As the parent, you can find out whether you have stopped child processes using waitpid() and WUNTRACED.
[Edited -- see other answers for the answer to the main question]
The + sign simply refers to the current job. Each pipeline of commands (such as foo | bar | baz) is a job, which can be referred to using a jobspec beginning with the % character. %1 is job number 1, %+ is the current job, and %- is the previous job.
For more information about jobs, see the Job Control section of the Bash manual.
The setpgid() manual page explains how this works:
A session can have a controlling terminal. At any time, one (and only
one) of the process groups in the session can be the foreground
process group for the terminal; the remaining process groups are in
the background. If a signal is generated from the terminal (e.g.,
typing the interrupt key to generate SIGINT), that signal is sent to
the foreground process group. (See termios(3) for a description of
the characters that generate signals.) Only the foreground process
group may read(2) from the terminal; if a background process group
tries to read(2) from the terminal, then the group is sent a
SIGTSTP signal, which suspends it. The tcgetpgrp(3) and
tcsetpgrp(3) functions are used to get/set the foreground process
group of the controlling terminal.
So what you want to do is this:
When you create a new pipeline, call setpgid() to put all the members of the pipeline in a new process group (with the PID of the first process in the pipeline as the PGID).
Use tcsetpgrp() to manage which process group is in the foreground - if you put a pipeline in the background with &, you should make the shell's own process group the foreground process group again.
Call waitpid() with the WNOHANG and WUNTRACED flags to check on the status of child processes - this will inform you when they are stopped by SIGTSTP, which will allow you to print a message like bash does.
You can use system("command &") in the forked child process and then manually make it exit, if there are no time and priority constraints.
Is there an easy way to determine if a certain process is running?
I need to know if an instance of my program is running in the background, and if not fork and create the background process.
Normally the race-free way of doing this is:
Open a lock file / pid file for writing (but do not truncate it)
Attempt to take an exclusive lock on it (using fcntl or flock) without blocking
If that fails with EAGAIN, then the other process is already running.
The file descriptor should now be inherited by the daemon and left open for its lifetime
The advantage of doing this over simply storing a PID, is that if somebody reuses the PID, you won't get a false positive.
The biggest problem with storing the pid in the file is that a low-numbered pid used by a system start up daemon can get reused on a subsequent reboot by a different daemon. I have seen this happen.
This is usually done using pidfiles: a file in /var/run/[name].pid containing only the process ID returned by fork().
if pidfile exists:
exit()
else:
create pidfile
pid = start_background()
pidfile.write(pid)
On shutdown: remove pidfile
Linux software, by far and large does not care about the exclusivity of programs, only the resources they use. "Caring" is most often provided by the implementation (E.G. the infrastructure of the distro).
For instance, if you want to run a program, but that program locks up or turns zombie and you have no way to kill it, or it's running as a different user performing some other function. Why should the program care whether another copy of itself is running? Having it do so only seems like an unnecessary restriction.
If it's a process that opens a socket (like a TCP port), have the program fail if it can't open the socket. If it needs exclusive access to a file, have it fail if it can't get it. Support a PID file, but don't make it mandatory.
You'll see this methodology all over GNU software, which is part of what makes it so versatile.
Current scenario, I launch a process that forks, and after a while it aborts().
The thing is that both the fork and the original process print to the shell, but after the original one dies, the shell "returns" to the prompt.
I'd like to avoid the shell returning to the prompt and keep as if the process didn't die, having the child handle the situation there.
I'm trying to figure out how to do it but nothing yet, my first guess goes somewhere around tty handling, but not sure how that works.
I forgot to mention, the shell takeover for the child could be done on fork-time, if that makes it easier, via fd replication or some redirection.
I think you'll probably have to go with a third process that handles user interaction, communicating with the "parent" and "child" through pipes.
You can even make it a fairly lightweight wrapper, just passing data back and forth to the parent and terminal until the parent dies, and then switching to passing to/from the child.
To add a little further, as well, I think the fundamental problem you're going to run into is that the execution of a command by the shell just doesn't work that way. The shell is doing the equivalent of calling system() -- it's going to wait for the process it just spawned to die, and once it does, it's going to present the user with a prompt again. It's not really a tty issue, it's how the shell works.
bash (and I believe other shells) have the wait command:
wait: wait [n]
Wait for the specified process and report its termination status. If
N is not given, all currently active child processes are waited for,
and the return code is zero. N may be a process ID or a job
specification; if a job spec is given, all processes in the job's
pipeline are waited for.
Have you considered inverting the parent child relationship?
If the order in which the new processes will die is predictable, run the code that will abort in the "child" and the code that will continue in the parent.