This question already has answers here:
syscall wrapper asm C
(2 answers)
Closed 2 years ago.
Can a system call happen in a C program?
Consider this:
int main()
{
int f = open("/tmp/test.txt", O_CREAT | O_RDWR, 0666);
write(f, "hello world", 11);
close(f);
return 0;
}
In this sample code, open, write, and close are library functions. During my searches I conclude that they are functions not system calls. Each of these functions (open, write, and close) make a system call.
Questions
Are my conclusions above all correct?
Can system calls happen in C programs?
If system calls can happen in C programs, when do they happen? Please give an example.
Can the use of a library function versus directly making a system call be controlled by compile options? For example, is it possible that can we compile the above program with some options so that the write and read system calls are made directly, and if we compile it with different options, it calls library functions instead?
System call background
A system call, according to Wikipedia, is a "programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed".
Another way of understanding a system call is as a user space program making a request to the operating system kernel to perform some task on behalf of the user space program. The full set of system calls provided by the kernel is analogous (in some ways) to an API provided by the kernel to user space.
As system calls are a low level interface to the kernel, correctly providing their arguments can be error prone or even dangerous. For these reasons, C library authors provide simpler and safer wrapper functions for a significant portion of a kernel's set of system calls.
These wrapper functions take a simplified argument set and then derive the appropriate values to pass on to the kernel so the system call can be executed.
Example
Note: This example is based on compiling and running a C program with gcc on Linux. The system calls, library functions, and output may differ on other POSIX or non-POSIX operating systems.
I will attempt to show how to see when system calls are being made with a simple example.
#include <stdio.h>
int main() {
write(1, "Hello world!\n", 13);
}
Above we have a very simple C program that writes the string Hello world!\n to stdout. If we compile and then execute this program with strace, we see the following (note the output may look different on other computers):
$ strace ./hello > /dev/null
execve("./hello", ["./hello"], 0x7fff083a0630 /* 58 vars */) = 0
<a bunch of output we aren't interested in>
write(1, "Hello world!\n", 13) = 13
exit_group(0) = ?
+++ exited with 0 +++
strace is a Linux program that intercepts and displays all system calls made by a program, as well as the arguments provided to the system calls and their return values.
We can see here that, as expected, the write system call was made with the expected arguments. Nothing strange yet.
Another Linux tracing program is ltrace, which intercepts dynamic library calls made by a program, and displays their arguments and return values.
If we run the same program with ltrace, we see this:
$ ltrace ./hello > /dev/null
write(1, "Hello world!\n", 13) = 13
+++ exited (status 0) +++
This tells us that the write library function was executed. This means that the C code first called the write library function, which then in turn called the write system call.
Suppose now that we want to explicitly make a write system call without calling the write library function. (This is inadvisable in normal use, but useful for illustration.)
Here is the new code:
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
int main() {
syscall(SYS_write, 1, "Hello world!\n", 13);
}
Here we directly call the syscall library function, telling it we want to execute the write system call.
After recompiling, here is the output of strace:
$ strace ./hello > /dev/null
execve("./hello", ["./hello"], 0x7ffe3790a660 /* 58 vars */) = 0
<a bunch of output we aren't interested in>
write(1, "Hello world!\n", 13) = 13
exit_group(0) = ?
+++ exited with 0 +++
We can see the write system call is made as before as expected.
If we run ltrace we see the following:
$ ltrace ./hello > /dev/null
syscall(1, 1, 0x560b30e4d704, 13) = 13
+++ exited (status 0) +++
So the write library function is no longer being called, but we are still making a library function call. Now we are making a call to the syscall library function instead of the write library function.
There may be a way to directly make a system call from a user space C program without calling any library functions, and if there is a way I believe it would be very advanced.
Detecting when a C program makes system calls
In general, nearly every non-trivial C program makes at least one system call. This is because user space does not have direct access to kernel memory or to the computer's hardware. User space programs have indirect access to kernel memory and the hardware through system calls.
To identify if a compiled C program (or any other program on Linux) makes a system call, and to identify which system calls it makes, simply use strace.
Are there compiler options to prevent calling the library wrapper functions for system calls?
You can compile your C program (assuming you are using gcc) with the -nostdlib option. This will prevent linking the C standard library as part of producing your executable. However, then you would need to write your own code to make system calls.
"In computing, a system call is a programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed. This may include hardware-related services (for example, accessing a hard disk drive), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system." - Wikipedia
The close system call is a system call used to close a file descriptor by the kernel. For most file systems, a program terminates access to a file in a filesystem using the close system call.
int close(int fd);
close() closes a file descriptor, so that it no longer refers to any file and may be reused.
so as you see, syscall is used when you need to access something in kernel space and this is possible only syscalls.
Related
(Question edited, thanks to #fuz)
What is the Linux 64 Assembly Equivalent for C's system call?
I want to write assembly that essentially has the same function as calling the CLI in C, such as system("ls -l").
The code I want to reproduce has essentially the same function as the following C:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
system("ls -l");
exit(0);
}
The system libc function is not a kernel system call. That's why its man page is system(3) not system(2).
It's implemented on top of fork(2) + execve(2), and the waitpid(2) system calls. In fact that's the first thing the system(3) man page says! Go read it, just like you should read the Linux man page for any actual system call or library function you want to know about.
Use strace on a program that calls it, or single-step into it with GDB, or read the glibc source code.
I learned how assembly (x86) globally works in the book : "Programming from ground up".
In this book, every program ends with an interruption call to exit.
However, in C compiled programs, I found out that programs end with a ret. This supposes that there is an address to be popped and that would lead to the end of the program.
So my question is :
What is this address? (And what is the code there?)
You start your program by asking the OS to pass control to the start or _start function of your program by jumping to that label in your code. In a C program the start function comes from the C library and (as others already said before) does some platform specific environment initialization. Then the start function calls your main and the control is yours. After you return from the main, it passes control back to the C library that terminates the program properly and does the platform specific system call to return control back to the OS.
So the address main pops is a label coming from the C library. If you want to check it, it should be in stdlib.h (cstdlib) and you will see it calling exit that does the cleanup.
Its function is to destroy the static objects (C++ of course) at program termination or thread termination (C++11). In the C case it just closes the streams, flushes their buffers, calls atexit functions and does the system call.
I hope this is the answer you seek.
It is implementation specific.
On Linux, main is called by crt0, and the _start entry point there is analyzing the initial call stack set up by the kernel interpreting the execve(2) system call of your executable program. On return from main the epilogue part of crt0 is dealing with atexit(3) registered functions and flushing stdio.
FWIW, crt0 is provided by your GCC compiler, and perhaps your C standard library. All this (with the Linux kernel) is free software on Linux distribution.
every program ends with an interruption call to exit.
Not really. It is a system call (see syscalls(2) for their list), not an interrupt. See also this.
I wrote a simple C program which just calls the exit() function, however strace says that the binary is actually calling exit_group, is exit() a exit_group() wrapper? Are these two functions equivalent? If so why would the compiler choose exit_group() over exit()?
The Linux and glibc man pages document all of this (See especially the "C library/kernel differences" in the NOTES section).
_exit(2): In glibc 2.3 and later, this wrapper function actually uses the Linux SYS_exit_group system call to exit all threads. Before glibc2.3, it was a wrapper for SYS_exit to exit just the current thread.
exit_group(2): glibc wrapper for SYS_exit_group, which exits all threads.
exit(3): The ISO C89 function which flushes buffers and then exits the whole process. (It always uses exit_group() because there's no benefit to checking if the process was single-threaded and deciding to use SYS_exit vs. SYS_exit_group). As #Matteo points out, recent ISO C / POSIX standards are thread-aware and one or both probably require this behaviour.
But apparently exit(3) itself is not thread-safe (in the C library cleanup parts), so I guess don't call it from multiple threads at once.
syscall / int 0x80 with SYS_exit: terminates just the current thread, leaving others running. AFAIK, modern glibc has no thin wrapper function for this Linux system call, but I think pthread_exit() uses it if this isn't the last thread. (Otherwise exit(3) -> exit_group(2).)
Only exit(), not _exit() or exit_group(), flushes stdout, leading to "printf doesn't print anything" problems in newbie asm programs if writing to a pipe (which makes stdout full-buffered instead of line-buffered), or if you forgot the \n in the format string. For example, How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?. If you use any buffered I/O functions, or at_exit, or anything like that, it's usually a good idea to call the libc exit(3) function instead of the system call directly. But of course you can call fflush before SYS_exit_group.
(Also related: On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program? - ret from main is equivalent to calling exit(3))
It's not of course the compiler that chose anything, it's libc. When you include headers and write read(fd, buf, 123) or exit(1), the C compiler just sees an ordinary function call.
Some C libraries (e.g. musl, but not glibc) may use inline asm to inline a syscall instruction into your binary, but still the headers are part of the C library, not the compiler.
So I was studying about shared libraries and I read that an implicit dlclose() is performed upon process termination.
I want to know who is responsible for this call. For example, if I wrote:
#include <stdio.h>
int main() {
printf("Hello World\n");
return 0;
}
And then if I did ldd ./a.out then I get a list of these libraries:
linux-vdso.so.1 => (0x00007ffd6675c000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2569866000)
/lib64/ld-linux-x86-64.so.2 (0x0000562b69162000)
Linker is responsible for loading these right, so who is responsible upon termination of this ./a.out executable for implicit dlclose() of these libraries?
I do not have Kerrisk's book, but if you have accurately characterized its contents then they seem to be a bit simplified. It is not strictly correct to say that whenever a process terminates, the function dlclose() is called for each of its open shared libraries, but it is reasonable to say that whenever a process terminates, all its handles on open shared libraries are closed. As a result, the operating system recognizes that one fewer process references each of those shared libraries, and if that brings any shared libraries' reference counts to zero then the OS may choose to unload them from memory.
dlclose() does more work than that. In particular, it causes any destructor functions in the library to run. Those functions will also run when the process exits normally by returning from main() or by calling exit(), but not if the process terminates by other means, such as calling _exit() or in response to receiving a signal. In the normal-exit case, the net effect may be the same as if dlclose() were called for each open shared library, but even then, that's not necessarily achieved by actually calling dlclose().
Finally, do be aware that although the dl*() functions are defined by POSIX, substantially all details of dynamic / shared libraries are left at the discretion of implementations. Since you've asked about a Linux book, I've referenced a few Linux-specific details.
I suspect the book is only talking about normal process termination when calling exit() or returning from main(). dlopen() presumably registers an atexit() handler that executes all the termination functions of the dynamic libraries.
It's not feasible for libraries to execute any code when a process is terminated abnormally. If the process is terminated by the OS instead of by exiting normally, the OS just releases any file handles, but it won't execute code in the context of the process.
Does anyone tell me how to block some specific system calls within a program, please? I am building a system which takes a piece of C source code, compiles it with gcc and runs it. For security reasons, I need to prevent the compiled program from calling some system calls. Is there any way to do it, from the source code level (e.g. stripping the header files of gcc, detecting malicious external calls, ...) to the executable level?
Edited #1: Add details about malicious calls.
Edited #2: My system is a GNU/Linux one.
Edited #3:
I have tried some methods within a few days and here are the conclusions I've got so far:
Scanning the source code does not solve the main problem since one can always obsfucate his/her C source file quite well.
"Overriding C symbol" works well for libraries, but for system calls I have not achieved what I wanted. This idea is not dead, however, doing this would definitely cause me a lot of time hacking (gcc and/or ld).
Permission deescalation works like a charm. I could use fakeroot or a "guest" user to do it. This method is also the easiest to implement.
The other one is native client which I have not tried yet but I definitely would in near future due to the common between the project and my work.
As others have noted, it's impossible for a program to avoid making system calls, they permate the C library all over the place.
However you might be able to make some headway with careful use of the LD_PRELOAD mechanism, if your platform supports it (e.g. Linux): you write a shared library with the same symbol names as those in the C library, which are called instead of the intended libc functions. (For example, Electric Fence is built as a shared library on Debian-based systems and intercepts calls to malloc, free et al.)
I suspect you could use this mechanism to trap or argument-check calls to any libc functions you don't like, and perhaps to note those which you consider unconditionally safe. It might then be reasonable to scan the compiled executable for the code corresponding to INT 0x80 to trap out any attempts to make raw syscalls (0xcd 0x80 - though beware of false positives). However I have only give this a few moments of thought, I could easily have missed something or this might turn out to be impractical...
You could run the compiled program by forking it from a wrapper and use the Linux ptrace(2) facility to intercept and inspect all system calls invoked by the program.
The following example code shows a wrapper that runs the /usr/bin/w command, prints each system call invoked by the command, and terminates the command if it tries to invoke the write(2) system call.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/syscall.h>
#include <sys/reg.h>
#define BAD_SYSCALL __NR_write
int main(int argc, char *argv)
{
pid_t child;
int status, syscall_nr;
child = fork();
if (child == 0) {
/* In child. */
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl("/usr/bin/w", NULL, NULL);
// not reached
}
/* In parent. */
while (1) {
wait(&status);
/* Abort loop if child has exited. */
if (WIFEXITED(status) || WIFSIGNALED(status))
break;
/* Obtain syscall number from the child's process context. */
syscall_nr = ptrace(PTRACE_PEEKUSER, child, 4 * ORIG_EAX, NULL);
printf("Child wants to execute system call %d: ", syscall_nr);
if (syscall_nr != BAD_SYSCALL) {
/* Allow system call. */
printf("allowed.\n");
ptrace(PTRACE_SYSCALL, child, NULL, NULL);
} else {
/* Terminate child. */
printf("not allowed. Terminating child.\n");
ptrace(PTRACE_KILL, child, NULL, NULL);
}
}
exit(EXIT_SUCCESS);
}
You can do much more powerful things using ptrace, such as inspect and change a process' address space (e.g., to obtain and modify the parameters passed to a system call).
A good introduction can be found in this Linux Journal Article and its follow-up.
You can't.
Even this program:
#include <stdio.h>
int main()
{
printf("Hello, World\n");
return 0;
}
makes at least one system call (to send the string "Hello, World\n" to standard out). System calls are the only way for a program to interact with the outside World. Use the operating system's security model for security.
Edited for this comment:
I meant not all system calls but malicious system calls, e.g. execv() could be used to execute a BASH script which wipes out my data on the disk.
Your operating system already includes mechanisms to stop that sort of thing happening. For instance, in order for a bash script to wipe out your data, the process must already have write access to that data. That means it must have been started by you or root. Your only real option is not to install untrustworthy software.
By the way, depending on your platform, execv is not necessarily a system call. On Linux, it's a C library wrapper for the real system call (execve).
Just to illustrate that this is not possible, the following program:
int main() {
return 0;
}
makes over 20 system calls as reported using strace. The calls include open (twice) which is one of the calls you seem to want to block.
Well, if you just want to block specific calls, why not just do a grep through the source code before attempting to compile it ? And reject programs which use the insecure system calls.
Some project have similar idea you can take a look at nacl: http://code.google.com/p/nativeclient/