I learned how assembly (x86) globally works in the book : "Programming from ground up".
In this book, every program ends with an interruption call to exit.
However, in C compiled programs, I found out that programs end with a ret. This supposes that there is an address to be popped and that would lead to the end of the program.
So my question is :
What is this address? (And what is the code there?)
You start your program by asking the OS to pass control to the start or _start function of your program by jumping to that label in your code. In a C program the start function comes from the C library and (as others already said before) does some platform specific environment initialization. Then the start function calls your main and the control is yours. After you return from the main, it passes control back to the C library that terminates the program properly and does the platform specific system call to return control back to the OS.
So the address main pops is a label coming from the C library. If you want to check it, it should be in stdlib.h (cstdlib) and you will see it calling exit that does the cleanup.
Its function is to destroy the static objects (C++ of course) at program termination or thread termination (C++11). In the C case it just closes the streams, flushes their buffers, calls atexit functions and does the system call.
I hope this is the answer you seek.
It is implementation specific.
On Linux, main is called by crt0, and the _start entry point there is analyzing the initial call stack set up by the kernel interpreting the execve(2) system call of your executable program. On return from main the epilogue part of crt0 is dealing with atexit(3) registered functions and flushing stdio.
FWIW, crt0 is provided by your GCC compiler, and perhaps your C standard library. All this (with the Linux kernel) is free software on Linux distribution.
every program ends with an interruption call to exit.
Not really. It is a system call (see syscalls(2) for their list), not an interrupt. See also this.
Related
I've come across at_quick_exit and quick_exit while going over stdlib.h and looking for functions that I haven't implemented.
I don't understand the point of having these two functions. Do they have any practical usage?
Basically it exists in C because of C++. The relevant document from WG 14 C standard committe can be found here.
The document was adapted from the paper accepted by the C++ standard. The idea behind quick_exit is to exit the program without canceling all threads and without executing destructors of static objects. C doesn't has language support for such things as "destructors" at all and the thread support library in C is almost nowhere implemented. The at_quick_exit and quick_exit functions have very little to no meaning at all in C.
In C there is a function _Exit that causes normal program termination to occur and control to be returned to the host environment, but is not required to flush open file descriptors, write unbuffered data, close open files, as opposed to exit(). Basically the at_quick_exit and quick_exit functions are facilities build to run custom user handles and then execute _Exit, while atexit is a facility to execute custom handlers upon calling exit().
They essentially have no practical usage. The intent seems to be that a function that may have significant nontrivial atexit handlers could use quick_exit to exit with just a minimal subset of such handlers (that it defines by calling at_quick_exit) being called, under conditions where calling all the atexit handlers may not be safe. It may also be called from a signal handler, but it doesn't seem like there'd be anything meaningful you could do from the at_quick_exit handlers in that case.
I wrote a simple C program which just calls the exit() function, however strace says that the binary is actually calling exit_group, is exit() a exit_group() wrapper? Are these two functions equivalent? If so why would the compiler choose exit_group() over exit()?
The Linux and glibc man pages document all of this (See especially the "C library/kernel differences" in the NOTES section).
_exit(2): In glibc 2.3 and later, this wrapper function actually uses the Linux SYS_exit_group system call to exit all threads. Before glibc2.3, it was a wrapper for SYS_exit to exit just the current thread.
exit_group(2): glibc wrapper for SYS_exit_group, which exits all threads.
exit(3): The ISO C89 function which flushes buffers and then exits the whole process. (It always uses exit_group() because there's no benefit to checking if the process was single-threaded and deciding to use SYS_exit vs. SYS_exit_group). As #Matteo points out, recent ISO C / POSIX standards are thread-aware and one or both probably require this behaviour.
But apparently exit(3) itself is not thread-safe (in the C library cleanup parts), so I guess don't call it from multiple threads at once.
syscall / int 0x80 with SYS_exit: terminates just the current thread, leaving others running. AFAIK, modern glibc has no thin wrapper function for this Linux system call, but I think pthread_exit() uses it if this isn't the last thread. (Otherwise exit(3) -> exit_group(2).)
Only exit(), not _exit() or exit_group(), flushes stdout, leading to "printf doesn't print anything" problems in newbie asm programs if writing to a pipe (which makes stdout full-buffered instead of line-buffered), or if you forgot the \n in the format string. For example, How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?. If you use any buffered I/O functions, or at_exit, or anything like that, it's usually a good idea to call the libc exit(3) function instead of the system call directly. But of course you can call fflush before SYS_exit_group.
(Also related: On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program? - ret from main is equivalent to calling exit(3))
It's not of course the compiler that chose anything, it's libc. When you include headers and write read(fd, buf, 123) or exit(1), the C compiler just sees an ordinary function call.
Some C libraries (e.g. musl, but not glibc) may use inline asm to inline a syscall instruction into your binary, but still the headers are part of the C library, not the compiler.
So I was studying about shared libraries and I read that an implicit dlclose() is performed upon process termination.
I want to know who is responsible for this call. For example, if I wrote:
#include <stdio.h>
int main() {
printf("Hello World\n");
return 0;
}
And then if I did ldd ./a.out then I get a list of these libraries:
linux-vdso.so.1 => (0x00007ffd6675c000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2569866000)
/lib64/ld-linux-x86-64.so.2 (0x0000562b69162000)
Linker is responsible for loading these right, so who is responsible upon termination of this ./a.out executable for implicit dlclose() of these libraries?
I do not have Kerrisk's book, but if you have accurately characterized its contents then they seem to be a bit simplified. It is not strictly correct to say that whenever a process terminates, the function dlclose() is called for each of its open shared libraries, but it is reasonable to say that whenever a process terminates, all its handles on open shared libraries are closed. As a result, the operating system recognizes that one fewer process references each of those shared libraries, and if that brings any shared libraries' reference counts to zero then the OS may choose to unload them from memory.
dlclose() does more work than that. In particular, it causes any destructor functions in the library to run. Those functions will also run when the process exits normally by returning from main() or by calling exit(), but not if the process terminates by other means, such as calling _exit() or in response to receiving a signal. In the normal-exit case, the net effect may be the same as if dlclose() were called for each open shared library, but even then, that's not necessarily achieved by actually calling dlclose().
Finally, do be aware that although the dl*() functions are defined by POSIX, substantially all details of dynamic / shared libraries are left at the discretion of implementations. Since you've asked about a Linux book, I've referenced a few Linux-specific details.
I suspect the book is only talking about normal process termination when calling exit() or returning from main(). dlopen() presumably registers an atexit() handler that executes all the termination functions of the dynamic libraries.
It's not feasible for libraries to execute any code when a process is terminated abnormally. If the process is terminated by the OS instead of by exiting normally, the OS just releases any file handles, but it won't execute code in the context of the process.
I'm interested in whether I can call an arbitrary function on _exit(2) call, which bypasses other hooking architectures, so it doesn't seem easy to me.
If this is an ordinary exit(3) or return statement, obviously it's possible by atexit(3), on_exit(3), or __attribute__((destructor)) with gcc extension
It is possible by overriding _exit(2) with LD_PRELOAD; which I wish to avoid
Is there a way to do it without LD_PRELOAD, say, overriding _exit(2)?
Edit: The problem I'm facing is fork(2)ed Perl programs with CoW. The program's children processes run destructors on exit(3) call, in which they touch many memory locations, to cause large memory copy, in spite they will exit.
It's hard to bypass destructors with ordinary exit call in Perl, so an idea is call POSIX::_exit instead.
However, there is a dynamically loaded library with LD_PRELOAD, and I want to call a function in it on process exit.
AFAIU, it is simply not possible without LD_PRELOAD tricks, or ptrace(2) with PTRACE_SYSCALLfrom another process (e.g. the parent process running gdb). At the lowest level, _exit(2) is a system call so is an "atomic" operation using the SYSENTER machine instruction, e.g. thru vdso(7).
Notice that a C program could use some asm to invoke the _exit syscall (or use the indirect syscall(2))
Assuming a dynamically linked executable to GNU libc or musl-libc, your only way is to catch exit(3) library function (not the _exit(2) syscall!) using atexit(3)
You could redefine _exit and hope that the dynamic linker would call your _exit, not the one in libc. I won't play such tricks.
Alternatively, write a small wrapping C program which fork, execve and waitpid the original program.
could you make use of the 'atexit()' function call?
Near the beginning of main() call atexit() with a parameter of the function that you want executed when the program exits.
You can call atexit() numerous time, thereby stacking several things to be executed when the application exits.
This question already has answers here:
How is the system call in Linux implemented?
(6 answers)
Closed 8 years ago.
How does system calls work ?
What are the operations happen during system call?
There are various system call like open , read, write, socket etc. I would like to know how do they work in general ?
In short, here's how a system call works:
First, the user application program sets up the arguments for the system call.
After the arguments are all set up, the program executes the "system call" instruction.
This instruction causes an exception: an event that causes the processor to jump to a new address and start executing the code there.
The instructions at the new address save your user program's state, figure out what system call you want, call the function in the kernel that implements that system call, restores your user program state, and returns control back to the user program.
A visual explanation of a user application invoking the open() system call:
It should be noted that the system call interface (it serves as the link to system calls made available by the operating system) invokes intended system call in OS kernel and returns status of the system call and any return values. The caller need know nothing about how the system call is implemented or what it does during execution.
Another example: A C program invoking printf() library call, which calls write() system call
For more detailed explanation read section 1.5.1 in CH-1 and Section 2.3 in CH-2 from Operating System Concepts.