I have successfully implemented a custom syscall getpuid(), and now I need to write a custom dynamically loadable module to export a function which has exactly the same functionality of the custom system call getpeuid(). This syscall is used to get the euid of the calling process's parent process. And the segment of the custom module:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/syscalls.h>
#include <linux/printk.h>
#include <linux/rcupdate.h>
#include <linux/sched.h>
#include <asm/uaccess.h>
#include <linux/cred.h>
static int *getpeuid(pid_t pid, uid_t *uid)
{
// Code to get the parent process euid
......;
}
EXPORT_SYMBOL(getpeuid);
/* This function is called when the module is loaded. */
int getpeuid_init(void)
{
printk(KERN_INFO "getpeuid() loaded\n");
return 0;
}
/* This function is called when the module is removed. */
void getpeuid_exit(void) {
printk(KERN_INFO "Removing getpeuid()\n");
}
/* Macros for registering module entry and exit points. */
module_init( getpeuid_init );
module_exit( getpeuid_exit );
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Return parent euid.");
MODULE_AUTHOR("CGG");
I have successfully compiled this custom module and insert module into the kernel. Then, I wrote a test to test the functionality of the function exported from the loadable kernel module implemented:
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
pid_t pid;
uid_t *uid;
uid = (uid_t *)malloc(sizeof(uid_t));
pid = getppid();
int retval = getpeuid(pid, uid);
if( retval < 0 )
{
perror("My system call returned with an error code.");
}
printf("My syscall's parameters: %ld \n", pid);
printf("My system call returned %d.\n", retval);
printf("Current values: uid=%ld \n", *uid);
return 0;
}
But when I am compiling the test script, it gives me the following error:
/tmp/ccV8WTx0.o: In function 'main':
hw5-test.c:(.text+0x33): undefined reference to `supermom'
collect2: error: ld returned 1 exit status
I checked the available symbols in the system using cat /proc/kallsyms, and the symbol I exported is there:
fa0eb000 T getpeuid [getpeuid]
I just don't know how am I supposed to use my custom function then, since I don't have a header file for my custom module to be included in my test script. Even if I need to write a header file, I don't know how to write a header file for custom kernel module.
Could someone give me a hand here?
Thanks in advance!
EDIT:
I am only allowed to use the dynamically loadable kernel module to simulate the functionality of the syscall.
EDIT:
I am not allowed to modify the system call table in the module initialization code.
I got the following link from others as a hint:
https://www.linux.com/learn/linux-career-center/31161-the-kernel-newbie-corner-kernel-symbols-whats-available-to-your-module-what-isnt
Use sysfs
Checkout the list of various Linux kernel <--> Userspace interfaces.
To allow userspace to interact with a loadable kernel module, consider using sysfs.
To add support for sysfs within your loadable module, checkout the basics of a sys-fs entry.
A good guide with the best practices of creating sysfs entries should get you started the right way.
The userspace test will then change from
int retval = getpeuid(pid, uid);
to something that uses open, write() and read()
to interact with the sysfs entry just like a regular file.
( Why file? because everything is a file on UNIX. )
You could further simplify this to using a shell-script that uses echo/cat commands to pass/gather data from the loadable kernel module via the sysfs entry.
Alternate option : A beautiful/ugly Hack
Disclaimer: I agree that trying to use syscalls within a loadable kernel module is neither a proper solution, nor guaranteed to always work. I know what i am doing.
(Hover the mouse over the following block, ONLY if you agree to the above)
Checkout this answer and related code that describes a potential "hack" to allow implementing custom syscalls in loadable modules in any unused locations within the current syscall table of the kernel.
Also carefully go through the several answers/comments to this question. They deal with overcoming the problem of not being able to modify the syscall table. One of the comments also emphasises the fact that hypervisors implementing their own extensions are not likely to be affected by this "exploit" as they offer better protection of the syscall table.
Note that such non-standard interfaces may not always work and even if they do, they can stop working anytime. Stick to standard interfaces for reliability.
EXPORT_SYMBOL exports the symbol within the kernel, so that other kernel modules can use it. It will not make it available to userland programs.
Adding a new system call doesn't appear to be possible via a kernel module:
https://unix.stackexchange.com/questions/47701/adding-a-new-system-call-to-linux-3-2-x-with-a-loadable-kernel-module
Related
I am making an online code judge using Replit, and I want to use seccomp to securely run submitted code.
Through reading a few tutorials, I have made a simple test program to test seccomp:
#include <stdio.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
int main(){
prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
printf("Message #1\n");
fork();
printf("Message #2\n");
}
When I run the program, Message #2 prints twice, which must mean seccomp didn't do it's job of stopping the fork. When I investigate using strace, I notice the following message within the output, though I am not sure what to do with it:
...
prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT) = -1 EINVAL (Invalid argument)
...
How can I fix this problem, and get seccomp running in strict mode? I do not own a Linux machine, so I am not sure if this problem is specific to Replit, or I am doing something wrong.
Seccomp is already in use on replit. Make your program do prctl(PR_GET_SECCOMP);, or check /proc/self/status, and you'll see it's already active and in filter mode. While I don't see anything about that in prctl's man page, I do in seccomp's (which fails the same way if you try syscall(SYS_seccomp, SECCOMP_SET_MODE_STRICT, 0, NULL);):
EINVAL A secure computing mode has already been set, and
operation differs from the existing setting.
So if you want to use seccomp strict mode, you'll need to do so somewhere else. Setting up a Linux VM on your computer is easy and free, so that's what I'd recommend.
I don't know if it's related to SO. I know that when I use the Linux kernel panic function, its job is to freeze my system, but it takes 1 argument, a message. Where can I actually see the message if my system is completely frozen and I force shutdown my PC by holding the power-off button?
main.c
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kernel.h> // panic
MODULE_LICENSE("GPL");
static int __init initialization_function(void)
{
panic("Module: my message!\n");
return 0;
}
static void __exit cleanup_funcion(void)
{
printk(KERN_INFO "Module: Cleanup done, exiting.\n");
}
module_init(initialization_function);
module_exit(cleanup_funcion);
By the way, I don't know how can I see the actual oops message, where and how can I see it?
It goes to the kernel console, the same place where printk() message goes. There is a screenshot in Wikipedia article on kernel panic:
Usually, you will be able to see it if the kernel panic happens at boot time.
As for what happens if you have a running desktop system, unfortunately I don't remember. Either you won't see it, or X/Wayland server will crash and you will see the message it in the console.
As you have noticed yourself, this is pretty tricky since the system gets frozen. What you can do is to have a look in the system log after reboot. Exactly how that is done depends on the distribution. On a system with systemd you can use journalctl -xe.
The Linux kernel added a user space API to its crypto functions at 2.6 via a new new socket family AF_ALG. Also see crypto: af_alg - User-space interface for Crypto API on LWN.
I'm working with Gentoo, and it requires one to configure and build the kernel. It appears the default settings omit AF_ALG, so I'm [currently] working with a kernel that lacks the support. OpenSSL 1.1.0 has an Engine interface into the crypto API. Its failing its self tests due to lack of support for AF_ALG.
I'd like to know how to detect availability of AF_ALG at both compile time and runtime. I have not found a way to detect availability at compile time. I think we can use alg_get_type to detect runtime availability, but I'm not certain.
How can I determine availability of AF_ALG at compile time and at runtime?
The socket(2) man pages has this to say: "Some socket types may not be implemented by all protocol families." But it does not discuss how to detect availability.
The kernel docs cover the API in Chapter 4. User Space Interface, but it does not appear to discuss how to detect availability.
For completeness, it looks like the following kernel configuration parameters need to be set for Gentoo (from Marek Vašut's Utilizing the crypto accelerators):
CONFIG_CRYPTO_USER_API=m
CONFIG_CRYPTO_USER_API_HASH=m
CONFIG_CRYPTO_USER_API_SKCIPHER=m
I'd say the only way to detect at compile time is to write a separate program that will do a runtime check for AF_ALG sockets and create a header file with a define such as #define AF_ALG_AVAILABLE.
Expanding on your answer to the second question, you may want to make sure errno holds EAFNOSUPPORT. Otherwise, another error, such as being out of file descriptors, will make your program falsely believe that AF_ALG is not supported, which could be bad if you are checking for AF_ALG at compile time using my method.
Checking at compile time:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <linux/if_alg.h>
int main(){
//Alternatively you can set the path to argv[1]
FILE *f = fopen("/path/to/output/file", "w");
if(f == NULL){
//Handle error
}
int sockfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
if(sockfd == -1){
if(errno == EAFNOSUPPORT){
//Unavailable, put whatever you want here
fprintf(f, "#define AF_ALG_UNAVAILABLE\n");
} else {
//Unable to detect for some other error
}
} else { //AF_ALG is available
fprintf(f, "#define AF_ALG_AVAILABLE\n");
}
close(sockfd);
fclose(f);
return 0;
}
Then just compile and run that in your makefile, and you will find your header file where you put it. Then you can simply select what code to use using #ifdef AF_ALG_AVAILABLE.
I think I can answer some of the second question, runtime availability, with the following:
int ret = socket(AF_ALG, SOCK_SEQPACKET, 0);
if (ret != -1)
close(ret);
int supported = ret != -1;
Short Background
I'm currently writing a linux kernel module as a project to better understand linux kernel internals. I've written 'hello world'-type modules before, but I want to get beyond that, so I'm trying to replace some common system calls like open, read, write, and close with my own so that I can print a bit more information into the system log.
Some content I found while searching was either pre-2.6 kernel, which is not useful because the sys_call_table symbol stopped being exported starting on kernel 2.6.x. On the other hand, those that I found for 2.6.x or later appear seem to have problems of their own, even though they apparently worked at the time.
One particular O'Reilly article, which I found on the sys_call_table in linux kernel 2.6.18 post, suggests that what I'm trying to do ought to work, but it isn't. (Specifically, see the Intercepting sys_unlink() Using System.map section.)
I also read through the Linux Kernel: System call hooking example and Kernel sys_call_table address does not match address specified in system.map which, while somewhat informative, were not useful for me.
Problems and Questions
Part 1 - Unexpected Address Mismatch
I'm using Linux kernel 4.2.0-16-generic on a Kubuntu 15.10 x86_64 architecture installation. Since the sys_call_table symbol is no longer exported, I grepped the address from the system map file:
# grep 'sys_call_table' < System.map-4.2.0-16-generic
ffffffff818001c0 R sys_call_table
ffffffff81801580 R ia32_sys_call_table
With this in hand, I added the following line to my kernel module:
static unsigned long *syscall_table = (unsigned long *) 0xffffffff818001c0;
Based on this, I was expecting that a simple check would actually confirm that I was actually pointing to the location I thought I was pointing to, i.e. the base address of the kernel's unexported sys_call_table. So, I wrote a simple check like the one below into the module's init function to verify:
if(syscall_table[__NR_close] != (unsigned long *)sys_close)
{
pr_info("sys_close = 0x%p, syscall_table[__NR_close] = 0x%p\n", sys_close, syscall_table[__NR_close]);
return -ENXIO;
}
This check failed and different addresses were printed in the log.
I was not expecting the body of this if statement to get executed because I thought the address returned by syscall_table[__NR_close] would be the same as that of sys_close, but it does enter.
Q1: Have I missed something so far regarding the expected address-based comparison? If so, what?
Part 2 - Partially Successful?
If I remove this check, it seems I'm partially successful, because, apparently, I can at least replace the read call successfully using the code below:
static asmlinkage ssize_t (*original_read)(unsigned int fd, char __user *buf, size_t count);
// ...
static void systrap_replace_syscalls(void)
{
pr_debug("systrap: replacing system calls\n");
original_read = syscall_table[__NR_read];
original_write = syscall_table[__NR_write];
original_close = syscall_table[__NR_close];
write_cr0(read_cr0() & ~0x10000);
syscall_table[__NR_read] = systrap_read;
syscall_table[__NR_write] = systrap_write;
syscall_table[__NR_close] = systrap_close;
write_cr0(read_cr0() | 0x10000);
pr_debug("systrap: system calls replaced\n");
}
My replacement functions simply print a message and forward the call to the actual system call. For example, the read replacement function's code is below:
static asmlinkage ssize_t systrap_read(unsigned int fd, char __user *buf, size_t count)
{
pr_debug("systrap: reading from fd = %u\n", fd);
return original_read(fd, buf, count);
}
And the system log shows the following output when I insmod and rmmod the module:
kernel: [23226.797460] systrap: setting up module
kernel: [23226.797462] systrap: replacing system calls
kernel: [23226.797464] systrap: system calls replaced
kernel: [23226.797465] systrap: module setup complete
kernel: [23226.864198] systrap: reading from fd = 4279272912
<similar output ommitted for brevity>
kernel: [23235.560663] systrap: reading from fd = 2835745072
kernel: [23235.564774] systrap: reading from fd = 861079840
kernel: [23235.564986] systrap: cleaning up module
kernel: [23235.564990] systrap: trying to restore system calls
kernel: [23235.564993] systrap: restored sys_read
kernel: [23235.564995] systrap: restored sys_write
kernel: [23235.564997] systrap: restored sys_close
kernel: [23235.565000] systrap: system call restoration attempt complete
kernel: [23235.565002] systrap: module cleanup complete
I can let it run for a long time and, oddly enough, I never observe entries for the write and close function calls --only for the reads, which is why I thought I was only partially successful.
Q2: Have I missed something regarding the replaced system calls? If so, what?
Part 3 - Unexpected Error Message on rmmod Command
Even though the module seems to operate normally, I always get the following error when I rmmod the module from the kernel:
rmmod: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '(null)/modules.builtin.bin'
My module cleanup function simply calls another one (below) that tries to restore the function calls by doing the opposite of the replacement function above:
// called by the exit function
static void systrap_restore_syscalls(void)
{
pr_debug("systrap: trying to restore system calls\n");
write_cr0(read_cr0() & ~0x10000);
/* make sure no other modules have made changes before restoring */
if(syscall_table[__NR_read] == systrap_read)
{
syscall_table[__NR_read] = original_read;
pr_debug("systrap: restored sys_read\n");
}
else
{
pr_warn("systrap: sys_read not restored; address mismatch\n");
}
// ... ommitted: same stuff for other sys calls
write_cr0(read_cr0() | 0x10000);
pr_debug("systrap: system call restoration attempt complete\n");
}
Q3: I don't know what causes the error message; any ideas here?
Part 4 - sys_open Marked for Deprecation?
In another unexpected turn of events, I find that the __NR_open macro is no longer be defined by default. In order for me to see the definition, I have to #define __ARCH_WANT_SYSCALL_NO_AT before #includeing the header files:
/*
* Force __NR_open definition. It seems sys_open has been replaced by sys_openat(?)
* See include/uapi/asm-generic/unistd.h:724-725
*/
#define __ARCH_WANT_SYSCALL_NO_AT
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
// ...
Going through the kernel source code (mentioned in comment above), you find the following comments:
/*
* All syscalls below here should go away really,
* these are provided for both review and as a porting
* help for the C library version.
*
* Last chance: are any of these important enough to
* enable by default?
*/
#ifdef __ARCH_WANT_SYSCALL_NO_AT
#define __NR_open 1024
__SYSCALL(__NR_open, sys_open)
// ...
Can anyone clarify:
Q4: ...the comments above on why __NR_open is not available by default?,
Q5: ...whether it's a good idea to do what I'm doing with the #define?, and
Q6: ...what I should be using instead if I really shouldn't be trying to use __NR_open?
Epiloge - Crashing My System 😑
I tried using __NR_openat, replacing that call as I had done with the previous ones:
static asmlinkage long systrap_openat(int dfd, const char __user *filename, int flags, umode_t mode)
{
pr_debug("systrap: opening file dfd = %d, name = % s\n", filename);
return original_openat(dfd, filename, flags, mode);
}
But this simply helped me unceremoniously crash my own system 😑 by causing other processes to segfault when they tried to open a file, with gems such as:
kernel: [135489.202693] systrap: opening file dfd = 0, name = P^Q
kernel: [135489.202913] zsh[11806]: segfault at 410 ip 00007f3a380abe60 sp 00007ffd04c5b550 error 4 in libc-2.21.so[7f3a37fe1000+1c0000]
Trying to print argument data also showed odd/garbage info.
Q7: Any additional suggestions on why it would suddenly crash and why the arguments seem to be garbage-like?
I've spent several days trying to work through this and I just hope I've not missed something utterly stupid...
Please, let me know if something's not entirely clear to you in the comments and I'll attempt to clarify.
I'd be most helpful if you could provide some code snippets that actually work and/or point me in a precise-enough direction that would allow me to understand what I'm doing wrong and how to quickly get this fixed.
I've managed to complete this and I'm now taking the time to document my findings.
Q1: Have I missed something so far regarding the expected address-based comparison?
The problem with this comparison is that, after checking /proc/kallsyms, I saw that sys_close and other related symbols are also no longer exported. I already knew this for some symbols, but I was still under the (mistaken) impression that some others were still available. So the check I was using (below) evaluates to true and causes the module to fail the 'safety' check.
if(syscall_table[__NR_close] != (unsigned long *)sys_close)
{
/* ... */
}
In short, you simply need to trust the assumption about the system call table address retrieved from the System.map-$(uname -r) file. The 'safety' check is unnecessary and will also not work as expected.
Q2: Have I missed something regarding the replaced system calls?
This problem was eventually traced to either one or both of the following header files I had included (I didn't bother to figure out which one.):
#include <uapi/asm-generic/unistd.h>
#include <uapi/asm-generic/errno-base.h>
These were causing the __NR_* macros to get redefined, and therefore expanded, to incorrect values --at least for the x86_64 architecture. For example, the indices for sys_read and sys_write in the system call table are supposed to be 0 and 1 respectively, but they were getting other values and ended up indexing to completely unexpected locations in the table.
Just removing the header files above fixed the issue without additional code changes.
Q3: I don't know what causes the error message; any ideas here?
The error message was a side-effect of the previous issue. Obviously, the fact that the system call table was being indexed incorrectly (see Q2) caused other locations in memory to get modified.
Q4: ...the comments above on why __NR_open is not available by default?
This was a mis-report of the IDE, which I stopped using. The __NR_open macro was already defined; the fix on Q2 made it even more obvious.
Q5: ...whether it's a good idea to do what I'm doing with the #define?
Short answer: No, not a good idea and definitely not needed. See Q2 above.
Q6: ...what I should be using instead if I really shouldn't be trying to use __NR_open
Based on answers to previous questions, this is not a problem. Using __NR_open is just fine and expected. This part had gotten messed up due to the header files in Q2
Q7: Any additional suggestions on why it would suddenly crash and why the arguments seem to be garbage-like?
The use of __NR_openat and the crashes was likely being caused by the macro being expanded to an incorrect value (see Q2 again). However, I can say that I had no real need to use it. I was supposed to be using __NR_open as specified above, but was trying out __NR_openat as a workaround for the issue fixed in Q2.
In short, the answer to Q2 helped fix several issues in a cascading effect.
I have been trying to intercept the system call at the kernel level. I got the basic idea from this question . The system call I was trying to intercept was the fork(). So I found out the address of the sys_fork() from System.map and it turned out to be 0xc1010e0c.Now I wrote the module as below.
#include<linux/kernel.h>
#include<linux/module.h>
#include<linux/unistd.h>
#include<linux/semaphore.h>
#include<asm/cacheflush.h>
MODULE_LICENSE("GPL");
void **sys_call_table;
asmlinkage int (*original_call)(struct pt_regs);
asmlinkage int our_call(struct pt_regs regs)
{
printk("Intercepted sys_fork");
return original_call(regs);
}
static int __init p_entry(void)
{
printk(KERN_ALERT "Module Intercept inserted");
sys_call_table=(void *)0xc1010e0c;
original_call=sys_call_table[__NR_open];
set_memory_rw((long unsigned int)sys_call_table,1);
sys_call_table[__NR_open]=our_call;
return 0;
}
static void __exit p_exit(void)
{
sys_call_table[__NR_open]=original_call;
set_memory_ro((long unsigned int)sys_call_table,1);
printk(KERN_ALERT "Module Intercept removed");
}
module_init(p_entry);
module_exit(p_exit);
However , after compiling the module and when I tried to insert it to the kernel, I got the following from the dmesg output.
Of course its not intercepting the system call.Can you help me figure out the problem? I am using 3.2.0-4-686 version of Linux kernel.
http://lxr.linux.no/linux+*/arch/x86/mm/pageattr.c#L874 says
if (*addr & ~PAGE_MASK) {
*addr &= PAGE_MASK;
/*
* People should not be passing in unaligned addresses:
*/
WARN_ON_ONCE(1);
}
So the warning is because your sys_call_table variable is not page-aligned.
It should be said that patching the system call table is officially discouraged by the kernel maintainers, and they've put some deliberate roadblocks in your way -- you've probably already noticed that you can't access the real sys_call_table symbol, and the write protection is also deliberate. If you can possibly find another way to do what you want, then you should. Depending on your larger goal, you might be able to accomplish it using ptrace and no kernel module at all. The trace_sched_process_fork hook may also be useful.
original_call=sys_call_table[__NR_open];
....
sys_call_table[__NR_open]=our_call;
If you're intercepting fork, the entry for open is not what you want to change.
And instead of the address of the sys_fork() from System.map, you should have used the address of sys_call_table.
It is not clear if you solved your problem, but depending on how you test your module glib don't use sys_fork anymore, but use sys_clone instead.