What are the reasons that an exec (execl,execlp, etc.) can fail? If you make a call to exec and it returns, are there any best practices other than just panicking and calling exit?
The problem with handling exec failure is that usually exec is performed in a child process, and you want to do the error handling in the parent process. But you can't just exit(errno) because (1) you don't know if error codes fit in an exit code, and (2), you can't distinguish between failure to exec and failure exit codes from the new program you exec.
The best solution I know is using pipes to communicate the success or failure of exec:
Before forking, open a pipe in the parent process.
After forking, the parent closes the writing end of the pipe and reads from the reading end.
The child closes the reading end and sets the close-on-exec flag for the writing end.
The child calls exec.
If exec fails, the child writes the error code back to the parent using the pipe, then exits.
The parent reads eof (a zero-length read) if the child successfully performed exec, since close-on-exec made successful exec close the writing end of the pipe. Or, if exec failed, the parent reads the error code and can proceed accordingly. Either way, the parent blocks until the child calls exec.
The parent closes the reading end of the pipe.
From the exec(3) man page:
The execl(), execle(), execlp(), execvp(), and execvP() functions may fail and set errno for any of the errors specified for the library functions execve(2) and malloc(3).
The execv() function may fail and set errno for any of the errors specified for the library function execve(2).
And then from the execve(2) man page:
ERRORS
Execve() will fail and return to the calling process if:
[E2BIG] - The number of bytes in the new process's argument list is larger than the system-imposed limit. This limit is specified by the sysctl(3) MIB variable KERN_ARGMAX.
[EACCES] - Search permission is denied for a component of the path prefix.
[EACCES] - The new process file is not an ordinary file.
[EACCES] - The new process file mode denies execute permission.
[EACCES] - The new process file is on a filesystem mounted with execution disabled (MNT_NOEXEC in <sys/mount.h>).
[EFAULT] - The new process file is not as long as indicated by the size values in its header.
[EFAULT] - Path, argv, or envp point to an illegal address.
[EIO] - An I/O error occurred while reading from the file system.
[ELOOP] - Too many symbolic links were encountered in translating the pathname. This is taken to be indicative of a looping symbolic link.
[ENAMETOOLONG] - A component of a pathname exceeded {NAME_MAX} characters, or an entire path name exceeded {PATH_MAX} characters.
[ENOENT] - The new process file does not exist.
[ENOEXEC] - The new process file has the appropriate access permission, but has an unrecognized format (e.g., an invalid magic number in its header).
[ENOMEM] - The new process requires more virtual memory than is allowed by the imposed maximum (getrlimit(2)).
[ENOTDIR] - A component of the path prefix is not a directory.
[ETXTBSY] - The new process file is a pure procedure (shared text) file that is currently open for writing or reading by some process.
malloc() is a lot less complicated, and uses only ENOMEM. From the malloc(3) man page:
If successful, calloc(), malloc(), realloc(), reallocf(), and valloc() functions return a pointer to allocated memory. If there is an error, they return a NULL pointer and set errno to ENOMEM.
What you do after the exec() call returns depends on the context - what the program is supposed to do, what the error is, and what you might be able to do to work around the problem.
One source of trouble could be that you specified a simple program name instead of a pathname; maybe you could retry with execvp(), or convert the command into an invocation of sh -c 'what you originally specified'. Whether any of these is reasonable depends on the application. If there are major security issues involved, probably you don't try again.
If you specified a pathname and there is a problem with that (ENOTDIR, ENOENT, EPERM), then you may not have any sensible fallback, but you can report the error meaningfully.
In the old days (10+ years ago), some systems did not support the '#!' shebang notation, and if you were not sure whether you were executing an executable or a shell script, you tried it as an executable and then retried it as a shell script. That might or might not work if you were running a Perl script, but in those days, you wrote your Perl scripts to detect that they were being run by a shell and to re-exec themselves with Perl. Fortunately, those days are mostly over.
To the extent possible, it is important to ensure that the process reports the problem so that it can be traced - writing its message to a log file or just to stderr (or maybe even syslog()), so that those who have to work out what went wrong have more information to help them other than the hapless end user's report "I tried X and it didn't work". It is crucial that if nothing works, then the exit status is not 0 as that indicates success. Even that might be ignored - but you did what you could.
Other than just panicking, you could take a decision based on errno's value.
Exec should always succeed
(except for shells, e.g. if the user entered a bogus command).
If exec does fail, it indicates:
a "fault" with the program (missing or bad component, wrong pathname, bad memory, ...), or
a serious system error (out of memory, too many processes, disk fault, ...)
For any serious error, the normal approach is to write the error message on stderr, then exit with a failure code. Almost all of the standard tools do this. For exec:
execl("bork", "bork", NULL);
perror("failed: exec");
exit(127);
The shell does that, too (more or less).
Normally if a child process fails, the parent has failed too and should exit. It does not matter whether the child failed in exec, or while running the program. If exec failed, it does not matter why exec failed. If the child process failed for any reason, the calling process is in trouble and needs to stop.
Don't waste lots of time trying to anticipate all possible error conditions. Don't write code that tries to handle each error code in the best possible way. You'll just bloat the code, and introduce many new bugs. If your program is broken, or it's being abused, it should simply fail. If you force it to continue, worse trouble will come of that.
For example, if the system is out of memory and thrashing swap, we don't want to cycle over and over trying to run a process; it would just make the situation worse. If we get a filesystem error, we don't want to continue running on that filesystem; it might make the corruption worse. If the program was installed wrongly, or has a bug, or has memory corruption, we want to stop as soon as possible, before that broken program does some real damage (such as sending a corrupted report to a client, trashing a database, ...).
One possible alternative: a failing process might call for help, pause itself (SIGSTOP), then retry the operation if told to continue. This could help when the system is out of memory, or disks are full, or perhaps even if there is a fault in the program. Few operations are so expensive and important that this would be worthwhile.
If you're making an interactive GUI program, try to do it as a thin wrapper over reusable command-line tools (which exit if something goes wrong). Every function in your program should be accessible through the GUI, through the command-line, and as a function call. Write your functions. Write a few tools to make commmand-line and GUI wrappers for any function. Use sub-processes too.
If you are making a truly critical system, such as a controller for a nuclear power station, or a program to predict tsunamis, then what are you doing reading my dumb advice? Critical systems should not depend entirely on computers or software. There needs to be a 'manual override', with someone to drive it. Especially, do not attempt to build a critical system on MS Windows; that is like building sandcastles underwater.
Related
the man page says that "The exec() family of functions replaces the current process image with a new process image." but I am not quite understand the meaning of "replaces the current process image with a new process image". For example, if exec succeed, perror would not be reached
execl("/bin/ls", /* Remaining items sent to ls*/ "/bin/ls", ".", (char *) NULL);
perror("exec failed");
Correct. If the exec works, the perror will not be called, simply because the call to perror no longer exists.
I find it's sometimes easier when educating newcomers to these concepts, to think of the UNIX execution model as being comprised of processes, programs and program instances.
Programs are executable files such as /bin/ls or /sbin/fdisk (note that this doesn't include things like bash or Python scripts since, in that case, the actual executable would be the bash or python interpreter, not the script).
Program instances are programs that have been loaded into memory and are basically running. While there is only one program like /bin/ls, there may be multiple instances of it running at any given time if, for example, both you and I run it concurrently.
That "loaded into memory" phrase is where processes come into the picture. Processes are just "containers" in which instances of programs can run.
So, when you fork a process, you end up with two distinct processes but they're still each running distinct instances of the same program. The fork call is often referred to as one which one process calls but two processes return from.
Likewise, exec will not have an effect on the process itself but it will discard the current program instance in that process and start a new instance of the requested program.
This discard in a successful exec call is what dictates that the code following it (perror in this case) will not be called.
It means your current process becomes the new process instead of what it was. You stop doing what you're doing and start doing,really being, something else instead, never to rebecome what that process once was.
Instead of starting a whole new process, however, your current pid and environment become the new process instead. That let's you setup things the way the new process will need it before doing the exec
You are correct. perror will not be called unless the execl fails. The exec functions are the means for starting new processes in a POSIX compliant OS (typically combined with a fork call). Maybe an example will help. Suppose your program, call it programX, is running. It then calls one of the exec functions like the one you have above. programX will no longer exist as a running process. Instead, ls will be running. It will have the same exact PID as programX, but pretty much be a whole new process otherwise.
the man page says that "The exec() family of functions replaces the current process image with a new process image." but I am not quite understand the meaning of "replaces the current process image with a new process image". For example, if exec succeed, perror would not be reached
execl("/bin/ls", /* Remaining items sent to ls*/ "/bin/ls", ".", (char *) NULL);
perror("exec failed");
Correct. If the exec works, the perror will not be called, simply because the call to perror no longer exists.
I find it's sometimes easier when educating newcomers to these concepts, to think of the UNIX execution model as being comprised of processes, programs and program instances.
Programs are executable files such as /bin/ls or /sbin/fdisk (note that this doesn't include things like bash or Python scripts since, in that case, the actual executable would be the bash or python interpreter, not the script).
Program instances are programs that have been loaded into memory and are basically running. While there is only one program like /bin/ls, there may be multiple instances of it running at any given time if, for example, both you and I run it concurrently.
That "loaded into memory" phrase is where processes come into the picture. Processes are just "containers" in which instances of programs can run.
So, when you fork a process, you end up with two distinct processes but they're still each running distinct instances of the same program. The fork call is often referred to as one which one process calls but two processes return from.
Likewise, exec will not have an effect on the process itself but it will discard the current program instance in that process and start a new instance of the requested program.
This discard in a successful exec call is what dictates that the code following it (perror in this case) will not be called.
It means your current process becomes the new process instead of what it was. You stop doing what you're doing and start doing,really being, something else instead, never to rebecome what that process once was.
Instead of starting a whole new process, however, your current pid and environment become the new process instead. That let's you setup things the way the new process will need it before doing the exec
You are correct. perror will not be called unless the execl fails. The exec functions are the means for starting new processes in a POSIX compliant OS (typically combined with a fork call). Maybe an example will help. Suppose your program, call it programX, is running. It then calls one of the exec functions like the one you have above. programX will no longer exist as a running process. Instead, ls will be running. It will have the same exact PID as programX, but pretty much be a whole new process otherwise.
the man page says that "The exec() family of functions replaces the current process image with a new process image." but I am not quite understand the meaning of "replaces the current process image with a new process image". For example, if exec succeed, perror would not be reached
execl("/bin/ls", /* Remaining items sent to ls*/ "/bin/ls", ".", (char *) NULL);
perror("exec failed");
Correct. If the exec works, the perror will not be called, simply because the call to perror no longer exists.
I find it's sometimes easier when educating newcomers to these concepts, to think of the UNIX execution model as being comprised of processes, programs and program instances.
Programs are executable files such as /bin/ls or /sbin/fdisk (note that this doesn't include things like bash or Python scripts since, in that case, the actual executable would be the bash or python interpreter, not the script).
Program instances are programs that have been loaded into memory and are basically running. While there is only one program like /bin/ls, there may be multiple instances of it running at any given time if, for example, both you and I run it concurrently.
That "loaded into memory" phrase is where processes come into the picture. Processes are just "containers" in which instances of programs can run.
So, when you fork a process, you end up with two distinct processes but they're still each running distinct instances of the same program. The fork call is often referred to as one which one process calls but two processes return from.
Likewise, exec will not have an effect on the process itself but it will discard the current program instance in that process and start a new instance of the requested program.
This discard in a successful exec call is what dictates that the code following it (perror in this case) will not be called.
It means your current process becomes the new process instead of what it was. You stop doing what you're doing and start doing,really being, something else instead, never to rebecome what that process once was.
Instead of starting a whole new process, however, your current pid and environment become the new process instead. That let's you setup things the way the new process will need it before doing the exec
You are correct. perror will not be called unless the execl fails. The exec functions are the means for starting new processes in a POSIX compliant OS (typically combined with a fork call). Maybe an example will help. Suppose your program, call it programX, is running. It then calls one of the exec functions like the one you have above. programX will no longer exist as a running process. Instead, ls will be running. It will have the same exact PID as programX, but pretty much be a whole new process otherwise.
I am writing a C program for an embedded Linux (debian-arm) device. In some cases, e.g. if a fatal error occurs on the system/program, I want the program to reboot the system by system("reboot");after logging the error(s) via syslog(). My program includes multithreads, UDP sockets, severalfwrite()/fopen(), malloc() calls, ..
I would like to ask a few question what (how) the program should perform processes just before rebooting the system apart from the syslog. I would appreciate to know how these things are done by the experienced programmers.
Is it necessary to close the open sockets (UDP) and threads just before rebooting? If it is the case, is there a function/system call that closes the all open sockets and threads? If the threads needs to be closed and there is no such global function/call to end them, how I suppose to execute pthread_exit(NULL); for each specific threads? Do I need go use something like goto to end the each threads?
How should the program closes files that fopen and fwrite uses? Is there a global call to close the files in use or do I need to find out the files in use manually then use fclose for the each file? I see see some examples on the forums fflush(), flush(), sync(),.. are used, which one(s) would you recommend to use? In a generic case, would it cause any problem if all of these functions are used (although these could be used unnecessary)?
It is not necessary to free the variables that malloc allocated space, is it?
Do you suggest any other tasks to be performed?
The system automatically issues SIGTERM signals to all processes as one of the steps in rebooting. As long as you correctly handle SIGTERM, you need not do anything special after invoking the reboot command. The normal idiom for "correctly handling SIGTERM" is:
Create a pipe to yourself.
The signal handler for SIGTERM writes one byte (any value will do) to that pipe.
Your main select loop includes the read end of that pipe in the set of file descriptors of interest. If that pipe ever becomes readable, it's time to exit.
Furthermore, when a process exits, the kernel automatically closes all its open file descriptors, terminates all of its threads, and deallocates all of its memory. And if you exit cleanly, i.e. by returning from main or calling exit, all stdio FILEs that are still open are automatically flushed and closed. Therefore, you probably don't have to do very much cleanup on the way out -- the most important thing is to make sure you finish generating any output files and remove any temporary files.
You may find the concept of crash-only software useful in figuring out what does and does not need cleaning up.
The only cleanup you need to do is anything your program needs to start up in a consistent state. For example, if you collect some data internally then write it to a file, you will need to ensure this is done before exiting. Other than that, you do not need to close sockets, close files, or free all memory. The operating system is designed to release these resources on process exit.
What happens to a process if the filesystem is full? Does the kernel send us a signal to shutdown and if so what signal is it. Obviously, a program will probably crash if it writes to the file system but I'm curious as to how this occurs (in gory kernel/operating system detail).
What happens to a process if the filesystem fills up?
Operations that would require additional disk space on the full partition (like creating or appending to a file) fail with an errno of ENOSPC.
No signal is sent, as a full filesystem is not a critical condition which makes a signal necessary. It's a routine, easily handled error.
There is no reason a program should crash when the filesystem is full. Obviously file writes will fail, but a well-written program should be able to cope with that - in C, this would mean that fopen returns NULL or ferror returns a non-zero value, etc. I have encountered this many times, and some nasty things can happen such as overwriting a file with a blank version, but never a program crash. If it does happen, it is presumably because the author of the program tried to use a NULL file descriptor or some similar problem, in which case the program would receive a SIGSEGV as usual.