Intercepting a system call

Intercepting a system call - c

I have been trying to intercept the system call at the kernel level. I got the basic idea from this question . The system call I was trying to intercept was the fork(). So I found out the address of the sys_fork() from System.map and it turned out to be 0xc1010e0c.Now I wrote the module as below.
#include<linux/kernel.h>
#include<linux/module.h>
#include<linux/unistd.h>
#include<linux/semaphore.h>
#include<asm/cacheflush.h>
MODULE_LICENSE("GPL");
void **sys_call_table;
asmlinkage int (*original_call)(struct pt_regs);
asmlinkage int our_call(struct pt_regs regs)
{
printk("Intercepted sys_fork");
return original_call(regs);
}
static int __init p_entry(void)
{
printk(KERN_ALERT "Module Intercept inserted");
sys_call_table=(void *)0xc1010e0c;
original_call=sys_call_table[__NR_open];
set_memory_rw((long unsigned int)sys_call_table,1);
sys_call_table[__NR_open]=our_call;
return 0;
}
static void __exit p_exit(void)
{
sys_call_table[__NR_open]=original_call;
set_memory_ro((long unsigned int)sys_call_table,1);
printk(KERN_ALERT "Module Intercept removed");
}
module_init(p_entry);
module_exit(p_exit);
However , after compiling the module and when I tried to insert it to the kernel, I got the following from the dmesg output.
Of course its not intercepting the system call.Can you help me figure out the problem? I am using 3.2.0-4-686 version of Linux kernel.

http://lxr.linux.no/linux+*/arch/x86/mm/pageattr.c#L874 says
if (*addr & ~PAGE_MASK) {
*addr &= PAGE_MASK;
/*
* People should not be passing in unaligned addresses:
*/
WARN_ON_ONCE(1);
}
So the warning is because your sys_call_table variable is not page-aligned.
It should be said that patching the system call table is officially discouraged by the kernel maintainers, and they've put some deliberate roadblocks in your way -- you've probably already noticed that you can't access the real sys_call_table symbol, and the write protection is also deliberate. If you can possibly find another way to do what you want, then you should. Depending on your larger goal, you might be able to accomplish it using ptrace and no kernel module at all. The trace_sched_process_fork hook may also be useful.

original_call=sys_call_table[__NR_open];
....
sys_call_table[__NR_open]=our_call;
If you're intercepting fork, the entry for open is not what you want to change.
And instead of the address of the sys_fork() from System.map, you should have used the address of sys_call_table.

It is not clear if you solved your problem, but depending on how you test your module glib don't use sys_fork anymore, but use sys_clone instead.

Related

Where does the linux kernel panic message go?

I don't know if it's related to SO. I know that when I use the Linux kernel panic function, its job is to freeze my system, but it takes 1 argument, a message. Where can I actually see the message if my system is completely frozen and I force shutdown my PC by holding the power-off button?
main.c
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kernel.h> // panic
MODULE_LICENSE("GPL");
static int __init initialization_function(void)
{
panic("Module: my message!\n");
return 0;
}
static void __exit cleanup_funcion(void)
{
printk(KERN_INFO "Module: Cleanup done, exiting.\n");
}
module_init(initialization_function);
module_exit(cleanup_funcion);
By the way, I don't know how can I see the actual oops message, where and how can I see it?

It goes to the kernel console, the same place where printk() message goes. There is a screenshot in Wikipedia article on kernel panic:
Usually, you will be able to see it if the kernel panic happens at boot time.
As for what happens if you have a running desktop system, unfortunately I don't remember. Either you won't see it, or X/Wayland server will crash and you will see the message it in the console.

As you have noticed yourself, this is pretty tricky since the system gets frozen. What you can do is to have a look in the system log after reboot. Exactly how that is done depends on the distribution. On a system with systemd you can use journalctl -xe.

Linux Kernel 4.2.x: Why does the expected system call address not match the actual address when checked?

Short Background
I'm currently writing a linux kernel module as a project to better understand linux kernel internals. I've written 'hello world'-type modules before, but I want to get beyond that, so I'm trying to replace some common system calls like open, read, write, and close with my own so that I can print a bit more information into the system log.
Some content I found while searching was either pre-2.6 kernel, which is not useful because the sys_call_table symbol stopped being exported starting on kernel 2.6.x. On the other hand, those that I found for 2.6.x or later appear seem to have problems of their own, even though they apparently worked at the time.
One particular O'Reilly article, which I found on the sys_call_table in linux kernel 2.6.18 post, suggests that what I'm trying to do ought to work, but it isn't. (Specifically, see the Intercepting sys_unlink() Using System.map section.)
I also read through the Linux Kernel: System call hooking example and Kernel sys_call_table address does not match address specified in system.map which, while somewhat informative, were not useful for me.
Problems and Questions
Part 1 - Unexpected Address Mismatch
I'm using Linux kernel 4.2.0-16-generic on a Kubuntu 15.10 x86_64 architecture installation. Since the sys_call_table symbol is no longer exported, I grepped the address from the system map file:
# grep 'sys_call_table' < System.map-4.2.0-16-generic
ffffffff818001c0 R sys_call_table
ffffffff81801580 R ia32_sys_call_table
With this in hand, I added the following line to my kernel module:
static unsigned long *syscall_table = (unsigned long *) 0xffffffff818001c0;
Based on this, I was expecting that a simple check would actually confirm that I was actually pointing to the location I thought I was pointing to, i.e. the base address of the kernel's unexported sys_call_table. So, I wrote a simple check like the one below into the module's init function to verify:
if(syscall_table[__NR_close] != (unsigned long *)sys_close)
{
pr_info("sys_close = 0x%p, syscall_table[__NR_close] = 0x%p\n", sys_close, syscall_table[__NR_close]);
return -ENXIO;
}
This check failed and different addresses were printed in the log.
I was not expecting the body of this if statement to get executed because I thought the address returned by syscall_table[__NR_close] would be the same as that of sys_close, but it does enter.
Q1: Have I missed something so far regarding the expected address-based comparison? If so, what?
Part 2 - Partially Successful?
If I remove this check, it seems I'm partially successful, because, apparently, I can at least replace the read call successfully using the code below:
static asmlinkage ssize_t (*original_read)(unsigned int fd, char __user *buf, size_t count);
// ...
static void systrap_replace_syscalls(void)
{
pr_debug("systrap: replacing system calls\n");
original_read = syscall_table[__NR_read];
original_write = syscall_table[__NR_write];
original_close = syscall_table[__NR_close];
write_cr0(read_cr0() & ~0x10000);
syscall_table[__NR_read] = systrap_read;
syscall_table[__NR_write] = systrap_write;
syscall_table[__NR_close] = systrap_close;
write_cr0(read_cr0() | 0x10000);
pr_debug("systrap: system calls replaced\n");
}
My replacement functions simply print a message and forward the call to the actual system call. For example, the read replacement function's code is below:
static asmlinkage ssize_t systrap_read(unsigned int fd, char __user *buf, size_t count)
{
pr_debug("systrap: reading from fd = %u\n", fd);
return original_read(fd, buf, count);
}
And the system log shows the following output when I insmod and rmmod the module:
kernel: [23226.797460] systrap: setting up module
kernel: [23226.797462] systrap: replacing system calls
kernel: [23226.797464] systrap: system calls replaced
kernel: [23226.797465] systrap: module setup complete
kernel: [23226.864198] systrap: reading from fd = 4279272912
<similar output ommitted for brevity>
kernel: [23235.560663] systrap: reading from fd = 2835745072
kernel: [23235.564774] systrap: reading from fd = 861079840
kernel: [23235.564986] systrap: cleaning up module
kernel: [23235.564990] systrap: trying to restore system calls
kernel: [23235.564993] systrap: restored sys_read
kernel: [23235.564995] systrap: restored sys_write
kernel: [23235.564997] systrap: restored sys_close
kernel: [23235.565000] systrap: system call restoration attempt complete
kernel: [23235.565002] systrap: module cleanup complete
I can let it run for a long time and, oddly enough, I never observe entries for the write and close function calls --only for the reads, which is why I thought I was only partially successful.
Q2: Have I missed something regarding the replaced system calls? If so, what?
Part 3 - Unexpected Error Message on rmmod Command
Even though the module seems to operate normally, I always get the following error when I rmmod the module from the kernel:
rmmod: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '(null)/modules.builtin.bin'
My module cleanup function simply calls another one (below) that tries to restore the function calls by doing the opposite of the replacement function above:
// called by the exit function
static void systrap_restore_syscalls(void)
{
pr_debug("systrap: trying to restore system calls\n");
write_cr0(read_cr0() & ~0x10000);
/* make sure no other modules have made changes before restoring */
if(syscall_table[__NR_read] == systrap_read)
{
syscall_table[__NR_read] = original_read;
pr_debug("systrap: restored sys_read\n");
}
else
{
pr_warn("systrap: sys_read not restored; address mismatch\n");
}
// ... ommitted: same stuff for other sys calls
write_cr0(read_cr0() | 0x10000);
pr_debug("systrap: system call restoration attempt complete\n");
}
Q3: I don't know what causes the error message; any ideas here?
Part 4 - sys_open Marked for Deprecation?
In another unexpected turn of events, I find that the __NR_open macro is no longer be defined by default. In order for me to see the definition, I have to #define __ARCH_WANT_SYSCALL_NO_AT before #includeing the header files:
/*
* Force __NR_open definition. It seems sys_open has been replaced by sys_openat(?)
* See include/uapi/asm-generic/unistd.h:724-725
*/
#define __ARCH_WANT_SYSCALL_NO_AT
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
// ...
Going through the kernel source code (mentioned in comment above), you find the following comments:
/*
* All syscalls below here should go away really,
* these are provided for both review and as a porting
* help for the C library version.
*
* Last chance: are any of these important enough to
* enable by default?
*/
#ifdef __ARCH_WANT_SYSCALL_NO_AT
#define __NR_open 1024
__SYSCALL(__NR_open, sys_open)
// ...
Can anyone clarify:
Q4: ...the comments above on why __NR_open is not available by default?,
Q5: ...whether it's a good idea to do what I'm doing with the #define?, and
Q6: ...what I should be using instead if I really shouldn't be trying to use __NR_open?
Epiloge - Crashing My System 😑
I tried using __NR_openat, replacing that call as I had done with the previous ones:
static asmlinkage long systrap_openat(int dfd, const char __user *filename, int flags, umode_t mode)
{
pr_debug("systrap: opening file dfd = %d, name = % s\n", filename);
return original_openat(dfd, filename, flags, mode);
}
But this simply helped me unceremoniously crash my own system 😑 by causing other processes to segfault when they tried to open a file, with gems such as:
kernel: [135489.202693] systrap: opening file dfd = 0, name = P^Q
kernel: [135489.202913] zsh[11806]: segfault at 410 ip 00007f3a380abe60 sp 00007ffd04c5b550 error 4 in libc-2.21.so[7f3a37fe1000+1c0000]
Trying to print argument data also showed odd/garbage info.
Q7: Any additional suggestions on why it would suddenly crash and why the arguments seem to be garbage-like?
I've spent several days trying to work through this and I just hope I've not missed something utterly stupid...
Please, let me know if something's not entirely clear to you in the comments and I'll attempt to clarify.
I'd be most helpful if you could provide some code snippets that actually work and/or point me in a precise-enough direction that would allow me to understand what I'm doing wrong and how to quickly get this fixed.

I've managed to complete this and I'm now taking the time to document my findings.
Q1: Have I missed something so far regarding the expected address-based comparison?
The problem with this comparison is that, after checking /proc/kallsyms, I saw that sys_close and other related symbols are also no longer exported. I already knew this for some symbols, but I was still under the (mistaken) impression that some others were still available. So the check I was using (below) evaluates to true and causes the module to fail the 'safety' check.
if(syscall_table[__NR_close] != (unsigned long *)sys_close)
{
/* ... */
}
In short, you simply need to trust the assumption about the system call table address retrieved from the System.map-$(uname -r) file. The 'safety' check is unnecessary and will also not work as expected.
Q2: Have I missed something regarding the replaced system calls?
This problem was eventually traced to either one or both of the following header files I had included (I didn't bother to figure out which one.):
#include <uapi/asm-generic/unistd.h>
#include <uapi/asm-generic/errno-base.h>
These were causing the __NR_* macros to get redefined, and therefore expanded, to incorrect values --at least for the x86_64 architecture. For example, the indices for sys_read and sys_write in the system call table are supposed to be 0 and 1 respectively, but they were getting other values and ended up indexing to completely unexpected locations in the table.
Just removing the header files above fixed the issue without additional code changes.
Q3: I don't know what causes the error message; any ideas here?
The error message was a side-effect of the previous issue. Obviously, the fact that the system call table was being indexed incorrectly (see Q2) caused other locations in memory to get modified.
Q4: ...the comments above on why __NR_open is not available by default?
This was a mis-report of the IDE, which I stopped using. The __NR_open macro was already defined; the fix on Q2 made it even more obvious.
Q5: ...whether it's a good idea to do what I'm doing with the #define?
Short answer: No, not a good idea and definitely not needed. See Q2 above.
Q6: ...what I should be using instead if I really shouldn't be trying to use __NR_open
Based on answers to previous questions, this is not a problem. Using __NR_open is just fine and expected. This part had gotten messed up due to the header files in Q2
Q7: Any additional suggestions on why it would suddenly crash and why the arguments seem to be garbage-like?
The use of __NR_openat and the crashes was likely being caused by the macro being expanded to an incorrect value (see Q2 again). However, I can say that I had no real need to use it. I was supposed to be using __NR_open as specified above, but was trying out __NR_openat as a workaround for the issue fixed in Q2.
In short, the answer to Q2 helped fix several issues in a cascading effect.

How to use the function from a custom kernel module?

I have successfully implemented a custom syscall getpuid(), and now I need to write a custom dynamically loadable module to export a function which has exactly the same functionality of the custom system call getpeuid(). This syscall is used to get the euid of the calling process's parent process. And the segment of the custom module:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/syscalls.h>
#include <linux/printk.h>
#include <linux/rcupdate.h>
#include <linux/sched.h>
#include <asm/uaccess.h>
#include <linux/cred.h>
static int *getpeuid(pid_t pid, uid_t *uid)
{
// Code to get the parent process euid
......;
}
EXPORT_SYMBOL(getpeuid);
/* This function is called when the module is loaded. */
int getpeuid_init(void)
{
printk(KERN_INFO "getpeuid() loaded\n");
return 0;
}
/* This function is called when the module is removed. */
void getpeuid_exit(void) {
printk(KERN_INFO "Removing getpeuid()\n");
}
/* Macros for registering module entry and exit points. */
module_init( getpeuid_init );
module_exit( getpeuid_exit );
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Return parent euid.");
MODULE_AUTHOR("CGG");
I have successfully compiled this custom module and insert module into the kernel. Then, I wrote a test to test the functionality of the function exported from the loadable kernel module implemented:
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
pid_t pid;
uid_t *uid;
uid = (uid_t *)malloc(sizeof(uid_t));
pid = getppid();
int retval = getpeuid(pid, uid);
if( retval < 0 )
{
perror("My system call returned with an error code.");
}
printf("My syscall's parameters: %ld \n", pid);
printf("My system call returned %d.\n", retval);
printf("Current values: uid=%ld \n", *uid);
return 0;
}
But when I am compiling the test script, it gives me the following error:
/tmp/ccV8WTx0.o: In function 'main':
hw5-test.c:(.text+0x33): undefined reference to `supermom'
collect2: error: ld returned 1 exit status
I checked the available symbols in the system using cat /proc/kallsyms, and the symbol I exported is there:
fa0eb000 T getpeuid [getpeuid]
I just don't know how am I supposed to use my custom function then, since I don't have a header file for my custom module to be included in my test script. Even if I need to write a header file, I don't know how to write a header file for custom kernel module.
Could someone give me a hand here?
Thanks in advance!
EDIT:
I am only allowed to use the dynamically loadable kernel module to simulate the functionality of the syscall.
EDIT:
I am not allowed to modify the system call table in the module initialization code.
I got the following link from others as a hint：
https://www.linux.com/learn/linux-career-center/31161-the-kernel-newbie-corner-kernel-symbols-whats-available-to-your-module-what-isnt

Use sysfs
Checkout the list of various Linux kernel <--> Userspace interfaces.
To allow userspace to interact with a loadable kernel module, consider using sysfs.
To add support for sysfs within your loadable module, checkout the basics of a sys-fs entry.
A good guide with the best practices of creating sysfs entries should get you started the right way.
The userspace test will then change from
int retval = getpeuid(pid, uid);
to something that uses open, write() and read()
to interact with the sysfs entry just like a regular file.
( Why file? because everything is a file on UNIX. )
You could further simplify this to using a shell-script that uses echo/cat commands to pass/gather data from the loadable kernel module via the sysfs entry.
Alternate option : A beautiful/ugly Hack
Disclaimer: I agree that trying to use syscalls within a loadable kernel module is neither a proper solution, nor guaranteed to always work. I know what i am doing.
(Hover the mouse over the following block, ONLY if you agree to the above)
Checkout this answer and related code that describes a potential "hack" to allow implementing custom syscalls in loadable modules in any unused locations within the current syscall table of the kernel.
Also carefully go through the several answers/comments to this question. They deal with overcoming the problem of not being able to modify the syscall table. One of the comments also emphasises the fact that hypervisors implementing their own extensions are not likely to be affected by this "exploit" as they offer better protection of the syscall table.
Note that such non-standard interfaces may not always work and even if they do, they can stop working anytime. Stick to standard interfaces for reliability.

EXPORT_SYMBOL exports the symbol within the kernel, so that other kernel modules can use it. It will not make it available to userland programs.
Adding a new system call doesn't appear to be possible via a kernel module:
https://unix.stackexchange.com/questions/47701/adding-a-new-system-call-to-linux-3-2-x-with-a-loadable-kernel-module

Use of "__kprobes" and how it works?

While referring to memory module of Linux kernel some functions are not clear to me. One of the functions is shown below:
static inline int __kprobes notify_page_fault(struct pt_regs *regs)
{
int ret = 0;
/* kprobe_running() needs smp_processor_id() */
if (kprobes_built_in() && !user_mode_vm(regs)) {
preempt_disable();
if (kprobe_running() && kprobe_fault_handler(regs, 14))
ret = 1;
preempt_enable();
}
return ret;
}
I am confused with the "__kprobes" between return type and function name. When I looked at the initialization of "__kprobes" in compiler.h, I found below:
/*Ignore/forbid kprobes attach on very low level functions marked by
this attribute:*/
#ifdef CONFIG_KPROBES
# define __kprobes __attribute__((__section__(".kprobes.text")))
#else
# define __kprobes
#endif
Well, I know that at compile time __kprobe is going to be replaced with its defined part.
Questions:
1.) What is the significance of __attribute__((__section__(".kprobes.text")))?
and
2.) What does it do at compile time and at run time when it is used before "function_name"?
I read about kprobe and found that it has to do something about breakpoints and back trace. What I understand about kprobe is it will help debugger in creating back traces and breakpoints. Could someone please explain me in simple words how does it really works and please correct me if I am wrong.

TL;DR
__attribute__((__section__(".kprobes.text"))) will place that function in separate section which is not findable by kprobes thus preventing infinite breakpoints.
You must use it before "function_name" to place whole "function_name" symbol in separate section.
Real answer
kprobes (kernel probes) is Linux kernel mechanism for dynamic tracing. It allows you to insert breakpoint at almost any kernel function, invoke your handler and then continue executing. It works by runtime patching kernel image with so-called kernel probe/kprobe - see struct kprobe. This probe will allow you to pass control to your handler, and that handler is usually do some tracing.
So, what's going under the hood:
You create your struct kprobe by defining address at which to break and handler to pass reference.
You register your probe with register_kprobe
Kernel kprobe subsystem finds address from your probe
Then kprobe:
inserts breakpoint CPU instruction (int 3 for x86) at given address
adds some wrapper code to save context(registers, etc.)
adds even more code to help you get access to function arguments or return values.
Now when kernel execution hits that probed address:
it will fall into CPU trap
it will save context
it will pass control to your handler via notifier_call_chain
...
after all it will restore context
That's how it works. As you can see it's a really neat and dirty hack, but some kernel function is so terribly low-level that it's just pointless to probe them. notify_page_fault is one of those functions - as a part of notifier_call_chain it's used in passing control to your handler.
So if you probe at notify_page_fault you'll get infinite loop of breakpoints, which is not what you want. What you really want is to protect that kind of functions and kprobes do this by placing it in separate section .kprobes.text. This will prevent to probe at that functions because kprobe will not lookup for address in that section. And that's a job for __attribute__((__section__(".kprobes.text"))).

Linux device driver that print periodically an information

I should write a linux device driver code that periodically print an information. This information should be printed until the module will be unloaded. I should write something like this
int boolean = 1;
static int hello_init(void)
{
while(boolean){
printk(KERN_ALERT "An information\n");
msleep(1000);
}
return 0;
}
static void hello_exit(void)
{
boolean=0;
printk(KERN_ALERT "Goodbye, cruel world\n");
}
module_init(hello_init);
module_exit(hello_exit);
Obviously, this code doesn't work (I suppose because __init and __exit can't work concurrently, so the boolean value cannot change). Can anyone help me to solve this problem?

If the task you are performing periodically needs to go to sleep, you may not be able to use timer functions. Delayed workqueues can be used in that situation -- they are not as precise as the hrtimer but if the timing requirements aren't too strict, they work just fine.
I recently posted a question about doing things periodically here:
Calling spi_write periodically in a linux driver
I posted a workqueue example in it that you may find useful.
I also found this documentation to be helpful:
http://www.makelinux.net/ldd3/chp-7-sect-6
However, some changes have been made to the API since it was published. This article outlines these changes:
http://lwn.net/Articles/211279/

You should set up a timer with hrtimer_start() at the hello_init().
The struct hrtimer *timer contains a function pointer what will be called at the time you set. That callback function should contains the printk(). You have to renew the timer each time the callback called.
Don't forget to call the hrtimer_cancel() at the hello_exit().
You can use the ktime_set() function to calculate the expire time you want. Have a look here, there are some related and useful functions: High-resolution timers