Syscall implementation of exit() - c

I wrote a simple C program which just calls the exit() function, however strace says that the binary is actually calling exit_group, is exit() a exit_group() wrapper? Are these two functions equivalent? If so why would the compiler choose exit_group() over exit()?

The Linux and glibc man pages document all of this (See especially the "C library/kernel differences" in the NOTES section).
_exit(2): In glibc 2.3 and later, this wrapper function actually uses the Linux SYS_exit_group system call to exit all threads. Before glibc2.3, it was a wrapper for SYS_exit to exit just the current thread.
exit_group(2): glibc wrapper for SYS_exit_group, which exits all threads.
exit(3): The ISO C89 function which flushes buffers and then exits the whole process. (It always uses exit_group() because there's no benefit to checking if the process was single-threaded and deciding to use SYS_exit vs. SYS_exit_group). As #Matteo points out, recent ISO C / POSIX standards are thread-aware and one or both probably require this behaviour.
But apparently exit(3) itself is not thread-safe (in the C library cleanup parts), so I guess don't call it from multiple threads at once.
syscall / int 0x80 with SYS_exit: terminates just the current thread, leaving others running. AFAIK, modern glibc has no thin wrapper function for this Linux system call, but I think pthread_exit() uses it if this isn't the last thread. (Otherwise exit(3) -> exit_group(2).)
Only exit(), not _exit() or exit_group(), flushes stdout, leading to "printf doesn't print anything" problems in newbie asm programs if writing to a pipe (which makes stdout full-buffered instead of line-buffered), or if you forgot the \n in the format string. For example, How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?. If you use any buffered I/O functions, or at_exit, or anything like that, it's usually a good idea to call the libc exit(3) function instead of the system call directly. But of course you can call fflush before SYS_exit_group.
(Also related: On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program? - ret from main is equivalent to calling exit(3))
It's not of course the compiler that chose anything, it's libc. When you include headers and write read(fd, buf, 123) or exit(1), the C compiler just sees an ordinary function call.
Some C libraries (e.g. musl, but not glibc) may use inline asm to inline a syscall instruction into your binary, but still the headers are part of the C library, not the compiler.

Related

How to distinguish calling a c library function from making a system call?

There is the C library function pipe(3) and the kernel (system call) pipe(2).
Both have the same signature and should be used like this (same include header):
#include <unistd.h>
int fds[2];
pipe(fds);
Will this code call pipe(3) or pipe(2)?
How can I decide whether I want to use libc or a system call?
If pipe(3) and pipe(2) are the same, how do I know that?
Will this code call pipe(3) or pipe(2)?
It will call pipe(3).
There is no way to call the system call directly from C, you either
have to call libc wrapper for such system call (if one is provided), or
use syscall(2) to "stuff" the right arguments into the right registers before executing architecture-appropriate system call instruction, or
provide your own assembly wrapper which will do the same, or
use inline __asm__ to do the same.
I think you're making a distinction where there isn't one. Your code will call the pipe library function, which is just a wrapper around the pipe system call. It's not an either/or. The section 3 manual page is from the POSIX programmer's manual, and the section 2 manual page is Linux-specific.

How does the function _exit() in glibc looks like?

I am studying C and Operating system, during which i came across the system call exit() in *nix systems.
I know there will be wrapper procedure call in C for each of the system call in *nix systems. In this case, exit() is the procedure call provided in the standard C library, let us consider glibc for the discussion.
I have downloaded the glibc code and checked the code inside exit() it finally makes a call to _exit().
where can i find the function body of _exit()?
In library glibc, would the function _exit() be making a system call exit() in case of UNIX and ExitProcess() in case of Windows, based on the operating system?

where goes the ret instruction of the main

I learned how assembly (x86) globally works in the book : "Programming from ground up".
In this book, every program ends with an interruption call to exit.
However, in C compiled programs, I found out that programs end with a ret. This supposes that there is an address to be popped and that would lead to the end of the program.
So my question is :
What is this address? (And what is the code there?)
You start your program by asking the OS to pass control to the start or _start function of your program by jumping to that label in your code. In a C program the start function comes from the C library and (as others already said before) does some platform specific environment initialization. Then the start function calls your main and the control is yours. After you return from the main, it passes control back to the C library that terminates the program properly and does the platform specific system call to return control back to the OS.
So the address main pops is a label coming from the C library. If you want to check it, it should be in stdlib.h (cstdlib) and you will see it calling exit that does the cleanup.
Its function is to destroy the static objects (C++ of course) at program termination or thread termination (C++11). In the C case it just closes the streams, flushes their buffers, calls atexit functions and does the system call.
I hope this is the answer you seek.
It is implementation specific.
On Linux, main is called by crt0, and the _start entry point there is analyzing the initial call stack set up by the kernel interpreting the execve(2) system call of your executable program. On return from main the epilogue part of crt0 is dealing with atexit(3) registered functions and flushing stdio.
FWIW, crt0 is provided by your GCC compiler, and perhaps your C standard library. All this (with the Linux kernel) is free software on Linux distribution.
every program ends with an interruption call to exit.
Not really. It is a system call (see syscalls(2) for their list), not an interrupt. See also this.

System calls - functions used by kernel

I was asked about system calls , what they are, which mode are they used in and if read(), getchar() and sqrt() uses or not system calls.
For the first part I answered that system calls provide a interface between a process and the OS and these are used in kernel mode.
The thing that is bothering me is the fact that for me the only function that uses system calls of those 3 is read().
Am I right? or getchar() and sqrt() also use system calls?
(NOTE: read() from unistd.h getchar() from stdio.h and sqrt() from math.h)
The difference between a system and a regular call is that a system call has to issue a trap to the operating system whereas a regular call just calls another user level subroutine. You're right in saying that the difference is in what mode the calls are executed in.
Sqrt is not a system call. All it does is perform a simple calculation. If I remember correctly, both read() and getchar() are system calls because the operating system is the one who handles input/output operations.

Hook arbitrary function on to _exit(2)

I'm interested in whether I can call an arbitrary function on _exit(2) call, which bypasses other hooking architectures, so it doesn't seem easy to me.
If this is an ordinary exit(3) or return statement, obviously it's possible by atexit(3), on_exit(3), or __attribute__((destructor)) with gcc extension
It is possible by overriding _exit(2) with LD_PRELOAD; which I wish to avoid
Is there a way to do it without LD_PRELOAD, say, overriding _exit(2)?
Edit: The problem I'm facing is fork(2)ed Perl programs with CoW. The program's children processes run destructors on exit(3) call, in which they touch many memory locations, to cause large memory copy, in spite they will exit.
It's hard to bypass destructors with ordinary exit call in Perl, so an idea is call POSIX::_exit instead.
However, there is a dynamically loaded library with LD_PRELOAD, and I want to call a function in it on process exit.
AFAIU, it is simply not possible without LD_PRELOAD tricks, or ptrace(2) with PTRACE_SYSCALLfrom another process (e.g. the parent process running gdb). At the lowest level, _exit(2) is a system call so is an "atomic" operation using the SYSENTER machine instruction, e.g. thru vdso(7).
Notice that a C program could use some asm to invoke the _exit syscall (or use the indirect syscall(2))
Assuming a dynamically linked executable to GNU libc or musl-libc, your only way is to catch exit(3) library function (not the _exit(2) syscall!) using atexit(3)
You could redefine _exit and hope that the dynamic linker would call your _exit, not the one in libc. I won't play such tricks.
Alternatively, write a small wrapping C program which fork, execve and waitpid the original program.
could you make use of the 'atexit()' function call?
Near the beginning of main() call atexit() with a parameter of the function that you want executed when the program exits.
You can call atexit() numerous time, thereby stacking several things to be executed when the application exits.

Resources