How does the function _exit() in glibc looks like? - c

I am studying C and Operating system, during which i came across the system call exit() in *nix systems.
I know there will be wrapper procedure call in C for each of the system call in *nix systems. In this case, exit() is the procedure call provided in the standard C library, let us consider glibc for the discussion.
I have downloaded the glibc code and checked the code inside exit() it finally makes a call to _exit().
where can i find the function body of _exit()?
In library glibc, would the function _exit() be making a system call exit() in case of UNIX and ExitProcess() in case of Windows, based on the operating system?

Related

How to distinguish calling a c library function from making a system call?

There is the C library function pipe(3) and the kernel (system call) pipe(2).
Both have the same signature and should be used like this (same include header):
#include <unistd.h>
int fds[2];
pipe(fds);
Will this code call pipe(3) or pipe(2)?
How can I decide whether I want to use libc or a system call?
If pipe(3) and pipe(2) are the same, how do I know that?
Will this code call pipe(3) or pipe(2)?
It will call pipe(3).
There is no way to call the system call directly from C, you either
have to call libc wrapper for such system call (if one is provided), or
use syscall(2) to "stuff" the right arguments into the right registers before executing architecture-appropriate system call instruction, or
provide your own assembly wrapper which will do the same, or
use inline __asm__ to do the same.
I think you're making a distinction where there isn't one. Your code will call the pipe library function, which is just a wrapper around the pipe system call. It's not an either/or. The section 3 manual page is from the POSIX programmer's manual, and the section 2 manual page is Linux-specific.

where goes the ret instruction of the main

I learned how assembly (x86) globally works in the book : "Programming from ground up".
In this book, every program ends with an interruption call to exit.
However, in C compiled programs, I found out that programs end with a ret. This supposes that there is an address to be popped and that would lead to the end of the program.
So my question is :
What is this address? (And what is the code there?)
You start your program by asking the OS to pass control to the start or _start function of your program by jumping to that label in your code. In a C program the start function comes from the C library and (as others already said before) does some platform specific environment initialization. Then the start function calls your main and the control is yours. After you return from the main, it passes control back to the C library that terminates the program properly and does the platform specific system call to return control back to the OS.
So the address main pops is a label coming from the C library. If you want to check it, it should be in stdlib.h (cstdlib) and you will see it calling exit that does the cleanup.
Its function is to destroy the static objects (C++ of course) at program termination or thread termination (C++11). In the C case it just closes the streams, flushes their buffers, calls atexit functions and does the system call.
I hope this is the answer you seek.
It is implementation specific.
On Linux, main is called by crt0, and the _start entry point there is analyzing the initial call stack set up by the kernel interpreting the execve(2) system call of your executable program. On return from main the epilogue part of crt0 is dealing with atexit(3) registered functions and flushing stdio.
FWIW, crt0 is provided by your GCC compiler, and perhaps your C standard library. All this (with the Linux kernel) is free software on Linux distribution.
every program ends with an interruption call to exit.
Not really. It is a system call (see syscalls(2) for their list), not an interrupt. See also this.

Syscall implementation of exit()

I wrote a simple C program which just calls the exit() function, however strace says that the binary is actually calling exit_group, is exit() a exit_group() wrapper? Are these two functions equivalent? If so why would the compiler choose exit_group() over exit()?
The Linux and glibc man pages document all of this (See especially the "C library/kernel differences" in the NOTES section).
_exit(2): In glibc 2.3 and later, this wrapper function actually uses the Linux SYS_exit_group system call to exit all threads. Before glibc2.3, it was a wrapper for SYS_exit to exit just the current thread.
exit_group(2): glibc wrapper for SYS_exit_group, which exits all threads.
exit(3): The ISO C89 function which flushes buffers and then exits the whole process. (It always uses exit_group() because there's no benefit to checking if the process was single-threaded and deciding to use SYS_exit vs. SYS_exit_group). As #Matteo points out, recent ISO C / POSIX standards are thread-aware and one or both probably require this behaviour.
But apparently exit(3) itself is not thread-safe (in the C library cleanup parts), so I guess don't call it from multiple threads at once.
syscall / int 0x80 with SYS_exit: terminates just the current thread, leaving others running. AFAIK, modern glibc has no thin wrapper function for this Linux system call, but I think pthread_exit() uses it if this isn't the last thread. (Otherwise exit(3) -> exit_group(2).)
Only exit(), not _exit() or exit_group(), flushes stdout, leading to "printf doesn't print anything" problems in newbie asm programs if writing to a pipe (which makes stdout full-buffered instead of line-buffered), or if you forgot the \n in the format string. For example, How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?. If you use any buffered I/O functions, or at_exit, or anything like that, it's usually a good idea to call the libc exit(3) function instead of the system call directly. But of course you can call fflush before SYS_exit_group.
(Also related: On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program? - ret from main is equivalent to calling exit(3))
It's not of course the compiler that chose anything, it's libc. When you include headers and write read(fd, buf, 123) or exit(1), the C compiler just sees an ordinary function call.
Some C libraries (e.g. musl, but not glibc) may use inline asm to inline a syscall instruction into your binary, but still the headers are part of the C library, not the compiler.

fork+exec without atfork handlers

I have a library which registers an atfork handler (via pthread_atfork()) which does not support multiple threads when fork() is called. In my case, I don't need the forked environment to be usable because all I want is to call exec() right after the fork(). So, I want the fork() but without any atfork handlers. Is that possible? Do I miss any important edge cases?
For background info, the library is OpenBlas, the issue is described here and here.
You could use vfork() (NPTL implementation doesn't call fork handlers). Although POSIX has removed vfork from the standard, it's likely available on your implementation.
Fork handlers established using pthread_atfork(3) are not called when
a multithreaded program employing the NPTL threading library calls
vfork(). Fork handlers are called in this case in a program using
the LinuxThreads threading library. (See pthreads(7) for a
description of Linux threading libraries.)
Or, posix_spawn(). This is similar to vfork. Man page says:
According to POSIX, it unspecified whether fork handlers established with pthread_atfork(3) are called when posix_spawn() is invoked. On glibc, fork handlers are called only if the child is created using fork(2).
Or, syscall and directly use SYS_clone. SYS_clone is the system call number used to create threads and processes on Linux. So syscall(SYS_clone, SIGCHLD, 0); should work, provided you would exec immediately.
syscall(SYS_fork); (as answered by Shachar) would likely work too. But note that SYS_fork not available on some platforms (e.g., aarch64, ia64). SYS_fork is considered as obsolete in Linux and it's only there for backward compatibility and Linux kernel uses SYS_clone for creating all "types" of processes.
(Note: These options are mostly limited to glibc/Linux).
Yes. The following should work on Linux (and, I think, all glibc based platforms):
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
...
syscall(SYS_fork);
This bypasses the library and directly calls the system call for fork. You might run into trouble if your platform does not implement fork as a single system call. For Linux, that simply means that you should use clone instead.
With that in mind, I'm not sure I'd recomment doing that. Since you're a library, you have no idea why someone registered an atfork. Assuming it's irrelevant is bad programming practice.
So you lose portability in order to do something that may or may not break stuff, all in the name of, what? Saving a few function calls? Personally, I'd just use fork.

System calls - functions used by kernel

I was asked about system calls , what they are, which mode are they used in and if read(), getchar() and sqrt() uses or not system calls.
For the first part I answered that system calls provide a interface between a process and the OS and these are used in kernel mode.
The thing that is bothering me is the fact that for me the only function that uses system calls of those 3 is read().
Am I right? or getchar() and sqrt() also use system calls?
(NOTE: read() from unistd.h getchar() from stdio.h and sqrt() from math.h)
The difference between a system and a regular call is that a system call has to issue a trap to the operating system whereas a regular call just calls another user level subroutine. You're right in saying that the difference is in what mode the calls are executed in.
Sqrt is not a system call. All it does is perform a simple calculation. If I remember correctly, both read() and getchar() are system calls because the operating system is the one who handles input/output operations.

Resources