Why do certain Linux x86_64 system calls require a stub? - c

If one tries to hook certain syscalls via sys_call_table-hooking, e.g. sys_execve this will fail, because they are indirectly called by a stub. For sys_execve this is stub_execve (compare assembly code on LXR).
But what are these stubs good for? Why do only certain system calls like execve(2) and fork(2) require a stub and how is this connected to x86_64? Is there a workaround to hook stubbed syscalls (in a Loadable Kernel Module)?

From here, it says:
"Certain special system calls that need to save a complete full stack frame."
And I think execve is just one of these special system calls.
From the code of stub_execve, If you want to hook it, at least you can try:
Get to understand the meaning of those assembly code and do it by yourself, then you can call your own function in your own assembly code.
From the middle of the assembly code, it has a call sys_execve, you can replace the address of sys_execve to your own hook function.

Related

how to change stack protection via syscalls without parameters

This is a little bit strange question. I am trying to find a syscall that allowed to execute code on the stack without parameters on i386. I am doing ctf and I success to find a way to call syscall and control eax and have full control on the stack (with argv so just pointer to my strings). now I am jumping to the vdso (thats all the code in the program no dll's or anything else) to run a syscall that will allowed stack execution. but I go on the man page over and over and didn't found something I can use.
$uname -r 4.4.179-0404179-generic
There's no zero-arg Linux system call equivalent to mprotect(stack_base, stack_size, PROT_WRITE|PROT_READ|PROT_EXEC).
Not that I know of, and I wouldn't expect there to be one. Probably the only use case would be to help attackers, which is the opposite of hardening; normally you can make the stack executable via linker options or any specific pages via mprotect with args. There's no need for a shortcut for that.
There's also not one that can set the READ_IMPLIES_EXEC personality for an already-running process, even if you do allow args. (See Using personality syscall to make the stack executable - at best it will have an effect after execve.)
You might be able to use some ROP techniques to get some args set up for mprotect, and then return to the code you injected.

Calling system calls from the kernel code

I am trying to create a mechanism to read performance counters for processes. I want this mechanism to be executed from within the kernel (version 4.19.2) itself.
I am able to do it from the user space the sys_perf_event_open() system call as follows.
syscall (__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
I would like to invoke this call from the kernel space. I got some basic idea from here How do I use a Linux System call from a Linux Kernel Module
Here are the steps I took to achieve this:
To make sure that the virtual address of the kernel remains valid, I have used set_fs(), get_fs() and get_fd().
Since sys_perf_event_open() is defined in /include/linux/syscalls.h I have included that in the code.
Eventually, the code for calling the systems call looks something like this:
mm_segment_t fs;
fs = get_fs();
set_fs(get_ds());
long ret = sys_perf_event_open(&pe, pid, cpu, group_fd, flags);
set_fs(fs);
Even after these measures, I get an error claiming "implicit declaration of function ‘sys_perf_event_open’ ". Why is this popping up when the header file defining it is included already? Does it have to something with the way one should call system calls from within the kernel code?
In general (not specific to Linux) the work done for systems calls can be split into 3 categories:
switching from user context to kernel context (and back again on the return path). This includes things like changing the processor's privilege level, messing with gs, fiddling with stacks, and doing security mitigations (e.g. for Meltdown). These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
using a "function number" parameter to find the right function to call, and calling it. This typically includes some sanity checks (does the function exist?) and a table lookup, plus code to mangle input and output parameters that's needed because the calling conventions used for system calls (in user space) is not the same as the calling convention that normal C functions use. These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
the final normal C function that ends up being called. This is the function that you might have (see note) been able to call directly without using any of the expensive, useless and/or dangerous system call junk.
Note: If you aren't able to call the final normal C function directly without using (any part of) the system call junk (e.g. if the final normal C function isn't exposed to other kernel code); then you must determine why. For example, maybe it's not exposed because it alters user-space state, and calling it from kernel will corrupt user-space state, so it's not exposed/exported to other kernel code so that nobody accidentally breaks everything. For another example, maybe there's no reason why it's not exposed to other kernel code and you can just modify its source code so that it is exposed/exported.
Calling system calls from inside the kernel using the sys_* interface is discouraged for the reasons that others have already mentioned. In the particular case of x86_64 (which I guess it is your architecture) and starting from kernel versions v4.17 it is now a hard requirement not to use such interface (but for a few exceptions). It was possible to invoke system calls directly prior to this version but now the error you are seeing pops up (that's why there are plenty of tutorials on the web using sys_*). The proposed alternative in the Linux documentation is to define a wrapper between the syscall and the actual syscall's code that can be called within the kernel as any other function:
int perf_event_open_wrapper(...) {
// actual perf_event_open() code
}
SYSCALL_DEFINE5(perf_event_open, ...) {
return perf_event_open_wrapper(...);
}
source: https://www.kernel.org/doc/html/v4.19/process/adding-syscalls.html#do-not-call-system-calls-in-the-kernel
Which kernel version are we talking about?
Anyhow, you could either get the address of the sys_call_table by looking at the System map file, or if it is exported, you can look up the symbol (Have a look at kallsyms.h), once you have the address to the syscall table, you may treat it as a void pointer array (void **), and find your desired functions indexed. i.e sys_call_table[__NR_open] would be open's address, so you could store it in a void pointer and then call it.
Edit: What are you trying to do, and why can't you do it without calling syscalls? You must understand that syscalls are the kernel's API to the userland, and should not be really used from inside the kernel, thus such practice should be avoided.
calling system calls from kernel code
(I am mostly answering to that title; to summarize: it is forbidden to even think of that)
I don't understand your actual problem (I feel you need to explain it more in your question which is unclear and lacks a lot of useful motivation and context). But a general advice -following the Unix philosophy- is to minimize the size and vulnerability area of your kernel or kernel module code, and to deport, as much as convenient, such code in user-land, in particular with the help of systemd, as soon as your kernel code requires some system calls. Your question is by itself a violation of most Unix and Linux cultural norms.
Have you considered to use efficient kernel to user-land communication, in particular netlink(7) with socket(7). Perhaps you also
want some driver specific kernel thread.
My intuition would be that (in some user-land daemon started from systemd early at boot time) AF_NETLINK with socket(2) is exactly fit for your (unexplained) needs. And eventd(2) might also be relevant.
But just thinking of using system calls from inside the kernel triggers a huge flashing red light in my brain and I tend to believe it is a symptom of a major misunderstanding of operating system kernels in general. Please take time to read Operating Systems: Three Easy Pieces to understand OS philosophy.

how sysproc.c and sysfile.c are linked to xv6

I am trying to add a set of system calls to support semaphore in xv6.
I added a syssemaphore.c file(which will be instored with functions that will path the user arguments from the ustack using argptr, argint, etc..) and noticed that I cant find the h file which will link the functions I will write.
basicly I want to add files like sysproc.c and sysfile.c.
is it possible?
Adding a new system call to XV6 meaning altering the entire system call mechanism flow, from user space invoking system call interrupt while setting the system call id number in eax register, through syscall function which runs the right system call handler, and finally to the system call implementation (which includes a sys_something function to retrieve user parameters and validate them).
If I understand your question correctly, you're new file, syssemaphore.c, includes the sys_something functions that you wish to call from syscall in syscall.c file.
The syscall function is the only function that should invoke your new sys_something wrappers. therefore, it will be sufficient to add those functions prototypes (as extern function) above the syscalls array in syscall.c file, which will then allow you to add your new functions to the syscalls array.
See additional information at How to pass a value into system call XV6

Is this is a good way to intercept system calls?

I am writing a tool. A part of that tool will be its ability to log the parameters of the system calls. Alright I can use ptrace for that purpose, but ptrace is pretty slow. A faster method that came to my mind was to modify the glibc. But this is getting difficult, as gcc magically inserts its own built in functions as system call wrappers than using the code defined in glibc. Using -fno-builtin is also not helping there.
So I came up with this idea of writing a shared library, which includes every system call wrapper, such as mmap and then perform the logging before calling the actual system call wrapper function. For example pseudo code of what my mmap would look like is given below.
int mmap(...)
{
log_parameters(...);
call_original_mmap(...);
...
}
Then I can use LD_PRELOAD to load this library firstup. Do you think this idea will work, or am I missing something?
No method that you can possibly dream up in user-space will work seamlessly with any application. Fortunately for you, there is already support for doing exactly what you want to do in the kernel. Kprobes and Kretprobes allow you to examine the state of the machine just preceeding and following a system call.
Documentation here: https://www.kernel.org/doc/Documentation/kprobes.txt
As others have mentioned, if the binary is statically linked, the dynamic linker will skip over any attempts to intercept functions using libdl. Instead, you should consider launching the process yourself and detouring the entry point to the function you wish to intercept.
This means launching the process yourself, intercepting it's execution, and rewriting it's memory to place a jump instruction at the beginning of a function's definition in memory to a new function that you control.
If you want to intercept the actual system calls and can't use ptrace, you will either have to find the execution site for each system call and rewrite it, or you may need to overwrite the system call table in memory and filtering out everything except the process you want to control.
All system calls from user-space goes through a interrupt handler to switch to kernel mode, if you find this handler you probably can add something there.
EDIT I found this http://cateee.net/lkddb/web-lkddb/AUDITSYSCALL.html. Linux kernels: 2.6.6–2.6.39, 3.0–3.4 have support for system call auditing. This is a kernel module that has to be enabled. Maybe you can look at the source for this module if it's not to confusing.
If the code you are developing is process-related, sometimes you can develop alternative implementations without breaking the existing code. This is helpful if you are rewriting an important system call and would like a fully functional system with which to debug it.
For your case, you are rewriting the mmap() algorithm to take advantage of an exciting new feature(or enhancing with new feature). Unless you get everything right on the first try, it would not be easy to debug the system: A nonfunctioning mmap() system call is certain to result in a nonfunctioning system. As always, there is hope.
Often, it is safe to keep the remaining algorithm in place and construct your replacement on the side. You can achieve this by using the user id (UID) as a conditional with which to decide which algorithm to use:
if (current->uid != 7777) {
/* old algorithm .. */
} else {
/* new algorithm .. */
}
All users except UID 7777 will use the old algorithm. You can create a special user, with UID 7777, for testing the new algorithm. This makes it much easier to test critical process-related code.

Where can I find system call source code?

In Linux where can I find the source code for all system calls given that I have the source tree? Also if I were to want to look up the source code and assembly for a particular system call is there something that I can type in terminal like my_system_call?
You'll need the Linux kernel sources in order to see the actual source of the system calls. Manual pages, if installed on your local system, only contain the documentation of the calls and not their source itself.
Unfortunately for you, system calls aren't stored in just one particular location in the whole kernel tree. This is because various system calls can refer to different parts of the system (process management, filesystem management, etc.) and therefore it would be infeasible to store them apart from the part of the tree related to that particular part of the system.
The best thing you can do is look for the SYSCALL_DEFINE[0-6] macro. It is used (obviously) to define the given block of code as a system call. For example, fs/ioctl.c has the following code :
SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
/* do freaky ioctl stuff */
}
Such a definition means that the ioctl syscall is declared and takes three arguments. The number next to the SYSCALL_DEFINE means the number of arguments. For example, in the case of getpid(void), declared in kernel/timer.c, we have the following code :
SYSCALL_DEFINE0(getpid)
{
return task_tgid_vnr(current);
}
Hope that clears things up a little.
From an application's point of view, a system call is an elementary and atomic operation done by the kernel.
The Assembly Howto explains what is happening, in terms of machine instruction.
Of course, the kernel is doing a lot of things when handling a syscall.
Actually, you almost could believe that the entire kernel code is devoted to handle all system calls (this is not entirely true, but almost; from applications' point of view, the kernel is only visible thru system calls). The other answer by Daniel Kamil Kozar is explaining what kernel function is starting the handling of some system call (but very often, many other parts of the kernel indirectly participate to system calls; for example, the scheduler participates indirectly into implementing fork because it manages the child process created by a successful fork syscall).
I know it's old, but I was searching for the source for _system_call() too and found this tidbit
Actual code for system_call entry point can be found in /usr/src/linux/kernel/sys_call.S Actual code for many of the system calls can be found in /usr/src/linux/kernel/sys.c, and the rest are found elsewhere. find is your friend.
I assume this is dated, because I don't even have that file. However, grep found ENTRY(system_call) in arch/x86/kernel/entry_64.S and seems to be the thing that calls the individual system calls. I'm not up on my intel-syntax x86 asm right now, so you'll have to look and see if this is what you wanted.

Resources