The system() function will launch a new process from C and a Perl script.
What exactly are the differences between processes called by system() in C and from Perl scripts, in terms of representation of error codes?
A little research brings up:
The return value is the exit status of
the program as returned by the wait
call. To get the actual exit value,
shift right by eight (see below). See
also "exec". This is not what you want
to use to capture the output from a
command, for that you should use
merely backticks or qx//, as described
in "STRING" in perlop. Return value
of -1 indicates a failure to start the
program or an error of the wait(2)
system call (inspect $! for the
reason).
And the docs of wait say:
Behaves like the wait(2) system call
on your system: it waits for a child
process to terminate and returns the
pid of the deceased process, or -1 if
there are no child processes. The
status is returned in $? and
${^CHILD_ERROR_NATIVE} . Note that a
return value of -1 could mean that
child processes are being
automatically reaped, as described in
perlipc.
Sources: This was taken from perldoc. Here's a tutorial on system in Perl.
Related
What is the usage of execl command?
excel("/bin/sh", "sh", "-c", cmdstring, (char *)0);
_exit(127);
Can anyone explain each statement used in execl command?
And why only _exit(127) and not exit(0).
What is 127 number indicating?
execl is one of the several functions (exec*) that let you replace the current code of your process with the one provided by the file (an executable one) specified as the first argument. The whole space of your process is replaced by a fresh one... Other arguments serve as arguments to the command. A process is a kind of system structure controlling execution of some code. exec lets you mutate that code.
In the case exec succeed then the old code is forgotten, you will never be able to get back into it. This is not a function call.
In the case exec fails then execution continues, and in your case a call to _exit is made. _exit is a system function designed to stop execution of the current process, leading to its death. exit is a C-function that does the same at C-level, roughly closing C-streams and calling _exit.
The value provided (127) is used to communicate some information about the termination of this process to the process that launched it (process are launched by the use of fork, a call that clones a process). The value can be any eight bit value, but 0 is used to denote a process that terminated correctly, and any other non 0 value to denote a process that terminated in some abnormal condition.
You can type
man execl
in the terminal for more information on the execl command.
This is regarding the application that runs on POSIX (Linux) environment. Most signals (e.g. Ctrl+C - signal 2, SIGINT), and few others are handled. When that is done the exit() system call is called from the handler with a desirable exit code.
However, there are some signals like Signal 9 and Signal 15 can't be handled.
Unfortunately, the parent process (an external script) which launches the given application needs to know and clean up some stuff if the signal 9 or 15 was the reason for termination.
Is there a predefined exit code that can be received by parent process to know the above?
The script that launches the app is a bash_script. The application itself is in C.
The return status from wait() or waitpid() encodes the information you need.
The POSIX macros are:
WIFEXITED(status) returns true if the child exited via exit() or one of its relatives.
WEXITSTATUS(status) tells you what that exit status was (0..255).
WIFSIGNALED(status) returns true if the child exited because of a signal (any signal).
WTERMSIG(status) returns the signal number that killed the child.
The non-standard but common macro WCOREDUMP(status) tells you if the process dumped core. You can also tell whether status reflect that the process was stopped, or continued (and what the stop signal was).
Note that signal 15 is usually SIGTERM and SIGTERM can be trapped by an application. The signals that cannot be trapped are SIGKILL (9) and SIGSTOP (17 on Mac OS X; may not be the same everywhere).
The question then is if bash provides this info for a script.
The answer is yes, but only indirectly and not 100% unambiguously. The status value reported by bash will be 128 + <signum> for processes that terminate due to signal <signum>, but you can't distinguish between a process that exits with status 130, say, and a process that was interrupted by SIGINT, aka signal 2.
15 (SIGTERM) could be caught and handled by the application, if it so chose to do so, but perhaps it does not at the moment
9 (SIGKILL) obviously cannot be caught by any application.
However, typically the operating system sets the exit status in such a way that the signal which terminated the process can be identified. Normally only the lower 8 bits of the status parameter to the exit(3) function [and thus the _exit(2) system call] are copied into the status value returned by wait(2) to the parent process (the shell running the external script in your example). So, that leaves sizeof(int)-1 bytes of space in the status value for the OS to use to fill in other information about the terminated process. Typically the wait(2) manual page will describe the way to interpret the wait status and thus split appart any additional information about the process termination from the status the process passed to _exit(2), IFF the process exited.
Unfortunately whether or not this extra information is made available to a script depends on how the shell executing the script might handle it.
First check your shell's manual page for details on how to interpret $?.
If the shell makes the whole status int value available verbatim to the script (in the $? variable), then it will be possible to parse apart the value and determine how and why the program exited. Most shells don't seem to do this completely (and for various reasions, not the least of which might be standards compliance), but they do at least go far enough to make it possible to solve your query (and must, to be POSIX compatible).
Here for example I'm running the AT&T version of KSH on Mac OS X. My ksh(1) manual page says that the exit status is 0-255 if the program just run terminated normally (where the value is presumably what was passed to _exit(2)) and 256+signum if the process was terminated by a signal (numbered "signum"). I don't know about on Linux, but on OS X bash gives a different exit status than Ksh does (with bash using the 8'th bit to represent a signal and thus only allowing 0-127 as valid exit values). (There is discrepancy in the POSIX standard between wait(2)'s claim that 8 low-order bits of _exit(2) being available, and the shell's conversion of wait status to $? preserving only 7 bits. Go figure! Ksh's behaviour is in violation of POSIX, but it is safer, since a strictly compatible shell may not be able to distinguish between a process passing a value of 128-255 to _exit(2) and having been terminated by a signal.)
So, anyway, I start a cat process, then I send it a SIGQUIT from the terminal (by pressing ^) (I use SIGQUIT because there's no easy way to send SIGTERM from the terminal keyboard):
22:01 [2389] $ cat
^\Quit(coredump)
ksh: exit code: 259
(I have a shell EXIT trap defined to print $? if it is not zero, so you see it above too)
22:01 [2390] $ echo $?
259
(259 is an integer value representing the status returned by wait(2) to the shell)
22:02 [2391] $ bc
obase=16
259
103
^D22:03 [2392] $
(see that 259 has the hex value 0x0103, note that 0x0100 is 256 decimal)
22:03 [2392] $ signo SIGQUIT
#define SIGQUIT 3 /* quit */
(I have a shell alias called signo that searches headers to find the number representing a symbolic signal name. See here that 0x03 from the status value is the same number as SIGQUIT.)
Further exploration of the wait(2) system call, and the related macros from <sys/wait.h> will allow us to understand a bit more of what's going on.
In C the basic logic for decoding a wait status makes use of the macros from <sys/wait.h>:
if (!WIFEXITED(status)) {
if (WIFSIGNALED(status)) {
termsig = WTERMSIG(status);
} else if (WIFSTOPPED(status)) {
stopsig = WSTOPSIG(status);
}
} else {
exit_value = WEXITSTATUS(status));
}
I hope that helps!
It is not possible for a parent process to detect the SIGKILL or Signal 9 - given the SIGNAL occurs outside of the user space.
A suggestion would be to have your Parent Process detect whether your child process has gone away and deal with it accordingly.A Great example is seen in mysqld-safe etc.
I understand that exit(1) indicated an error , for example :
if (something went wrong)
exit(EXIT_FAILURE);
But what's the purpose of using exit(EXIT_SUCCESS); ?
When handling with processes maybe ? e.g. for fork() ?
thanks
This gives the part of the system that invokes the program (usually the command shell) a way to check if the program terminated normally or not.
Edit - start -
By the way, it is possible to query the exit code of an interactive command as well through the use of the $? shell variable. For instance this failed ls command yields an exit code of value 2.
$ ls -3
ls: invalid option -- '3'
Try `ls --help' for more information.
$ echo $?
2
Edit - end -
Imagine a batch file (or shell script) that invokes a series of programs and depending on the outcome of each run may choose some action or the other. This action may consist of a simple message to the user, or the invocation of some other program or set of programs.
This is a way for a program to return a status of its run.
Also, note that zero denotes no problem, any non-zero value indicates a problem.
Programs will often use different non-zero values to pass more information back (other than just non-normal termination). So the non-zero exit value then serves as a more specific error code that can identify a particular problem. This of course depends on the meanings of the code being available (usually/hopefully in the documentation)
For instance, the ls man page has this bit of information at the bottom:
Exit status is 0 if OK, 1 if minor problems, 2 if serious trouble.
For Unix/Linux man pages, look for the section titled EXIT STATUS to get this information.
you can only exit your program from the main function by calling return. To exit the program from anywhere else, you can call exit(EXIT_SUCCESS). For example, when the user clicks an exit button.
It's a system call. There's always good information on system calls if you check the man pages:
http://linux.die.net/man/3/exit
On a Linux box, you can simply type man exit into a terminal and this information will come up.
There are two ways of 'normally' exiting a program: returning from main(), or calling exit(). Normally exit() is used, and thought of, for signalling a failure. However, if you are not in main(), you must still exit somehow. exit(0) is usually used to terminate the process when not in main().
main() is actually not a special function to the operating system, only to the runtime environment. The 'function' that actually gets loaded is normally defined as _start() (this is handled by the linker, and beyond the scope of this answer), written in assembly, which simply prepares the environment and calls main(). Upon return from main(), it also calls exit() with the return value from main().
Given the pid of a Linux process, I want to check, from a C program, if the process is still running.
Issue a kill(2) system call with 0 as the signal. If the call succeeds, it means that a process with this pid exists.
If the call fails and errno is set to ESRCH, a process with such a pid does not exist.
Quoting the POSIX standard:
If sig is 0 (the null signal), error checking is performed but no
signal is actually sent. The null signal can be used to check the
validity of pid.
Note that you are not safe from race conditions: it is possible that the target process has exited and another process with the same pid has been started in the meantime. Or the process may exit very quickly after you check it, and you could do a decision based on outdated information.
Only if the given pid is of a child process (fork'ed from the current one), you can use waitpid(2) with the WNOHANG option, or try to catch SIGCHLD signals. These are safe from race conditions, but are only relevant to child processes.
Use procfs.
#include <sys/stat.h>
[...]
struct stat sts;
if (stat("/proc/<pid>", &sts) == -1 && errno == ENOENT) {
// process doesn't exist
}
Easily portable to
Solaris
IRIX
Tru64 UNIX
BSD
Linux
IBM AIX
QNX
Plan 9 from Bell Labs
kill(pid, 0) is the typical approach, as #blagovest-buyukliev said. But if the process you are checking might be owned by a different user, and you don't want to take the extra steps to check whether errno == ESRCH, it turns out that
(getpgid(pid) >= 0)
is an effective one-step method for determining if any process has the given PID (since you are allowed to inspect the process group ID even for processes that don't belong to you).
You can issue a kill(2) system call with 0 as the signal.
There's nothing unsafe about kill -0. The program
must be aware that the result can become obsolete at any time
(including that the pid can get reused before kill is called),
that's all. And using procfs instead does use the pid too,
and doing so in a more cumbersome and nonstandard way.
As an addendum to the /proc filesystem method, you can check the /proc/<pid>/cmdline (assuming it was started from the command line) to see if it is the process you want.
ps -p $PID > /dev/null 2>&1; echo $?
This command return 0 if process with $PID is still running. Otherwise it returns 1.
One can use this command in OSX terminal too.
I have a small C program calling a shell script myScript.sh. I am getting the value of ret as 256. Please help me in knowing what went wrong with the system call?
int main()
{
int ret;
ret = system (myScript.sh);
ret >>= ret;
if (ret != 0)
{
printf("ret is [%d]",ret);
}
}
Working on 64 bit UNIX operating system and using ksh shell
On my system, man system says:
The system() function returns the exit status of the shell as returned by
waitpid(2), or -1 if an error occurred when invoking fork(2) or
waitpid(2). A return value of 127 means the execution of the shell
failed.
The waitpid man page describes a set of macros such as WEXITSTATUS() that extract the actual exit code from the return value.
I'm not quite sure what you're intending to do with ret >>= ret, but that can't be right.
The way that the system function usually works on *nix is that it calls fork and then the child calls one of the exec functions with /bin/sh -c and then the string you passed to system in the child, which turns the child process into an instance of the /bin/sh program which runs the command. The parent calls one of the wait functions, which waits for the /bin/sh to exit, which it does with the same exit status as the shell script, and then system also returns that value.
If you look at the man pages for the wait system call(s):
main 3 wait
You should get some information about what gets returns and some macro functions that help you make sense of it.
The WIFEXITED(stat_val) macro can be used to test if the program exited normally as opposed to with a signal. Normal exits involve calling the exit system call. If this function returns a non-zero value then you can use the WEXITSTATUS(stat_val) macro to get the value that it actually returned.
The WIFSIGNALED(stat_val) macro can be used to test if the program was terminated with a signal, and if so the WTERMSIG(stat_val) macro will return the signal number that caused the termination.
There are some other macros that can tell you if the process were stopped or continued, rather than terminated, but I don't think that they are overly helpful to you for this purpose, but you may want to look into them.
As far as what is actually happening in this case, it can be difficult to tell. If the fork call fails then system will be able to return -1 and set errno to reflect the error. If the fork did not fail then the error may have happened in the child and be more difficult to locate. It may be possible that on your platform system might do some tests before forking to insure that you have permission to execute the appropriate files and set errno to reflect that, but maybe not.
You should look into the perror function to print out error messages in the case that errno is set.
If the failure happens after fork and within the child then you either need to get the shell to tell you more about what is happening, or get the shell script to. This may be by including echo statements in the script similarly to using print statements in your C programs.
You should also look into the access function to test if you have permission to read and/or execute files.
If you are using Linux then you should be able to do:
strace -o my_program.strace -f ./my_program
or
ltrace -o my_program.ltrace -f -S ./my_program
and then examine the trace files (after the -o) to look at what the programs and kernel say to each other. ltrace looks at how the program talks to library function, while strace looks at system calls, but the -S tells ltrace to also look at system calls. The -f argument tells them both to trace children of the program as they are created.
I just noticed that you said that you were using ksh
As I mentioned system under a Posix system should use /bin/sh or a compatible shell. This doesn't mean that /bin/sh won't run /bin/ksh to run your script (or that the kernel won't use the #! line at the beginning of the script file to do this), but it could be a problem. There are ways to run shell scripts so that this line is not used to know which shell is to be used. The most notable is:
. myshell.sh
The period and space essentially tries to dump the text of the file into the current shell session rather than run it in another process (this is useful for setting up an environment). If you were to be doing:
int x = system(". myshell.sh");
Then that could be a problem.
The exit status of the command is encoded as two bytes:
The high-order byte contains the exit status.
The low-order byte contains the signal that killed it (if any).
Since 0x0100 is 256 decimal, your shell script exited with status 1. Review your shell script and ensure it exits with status 0 when it is successful.
From the Standard (emphasis is mine):
6.5.7/3
If the value of the right operand [of the >> operator] is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
So, when you do
ret >>= ret;
and ret < 0 or ret >= CHAR_BIT * sizeof (int) ... anything goes
The return value of the system function can be -1 on error.
If your call returned in such a negative value, the next operation ret >>= ret; (same as ret = -1 >> -1;) results in something that has no meaning: you cannot right shift by a negative number of bits.
When you try to do things with no meaning, C is allowed to do anything ... anything at all (that includes doing nothing, doing what you expect, reformatting your hard disk, transferring your bank account to mine, making demons fly out your nose, ..., ..., ...)
Make sure your script is executable and in the path, or use the full path instead.
Nothing went wrong. Did you read the documentation? See:
RETURN VALUE
The value returned is -1 on error (e.g. fork(2) failed), and the
return status of the command otherwise. This latter return status is in the format specified in wait(2).
Thus, the exit code of the command will be WEXITSTATUS(status). In case /bin/sh could not be executed,
the exit status will be that of a command that does exit(127).
Since 256 is not -1, the call did not fail.
Why do you shift the result? Just remove the line ret >>= ret, and it will work.
I am working on linux and it helped to call the script with system("sh script.sh")