Hooking into syscall table with module - c

In my early endeavours into kernel programming I'm trying to replace/hook into the ioctl syscall, with the purpose of logging and eventually inspecting every ioctl call done.
The target system is a mips (o32) system with kernel 3.10.
Based on similar projects/examples I've seen for x86 based systems I've arrived at a basic snippet I thought would work. I don't have access to a System.map but I noticed the sys_call_table address, so I based my attempts on the address found /proc/kallsyms on the target system. I know this address will change from kernel build to build but that doesn't matter at this point; this is for experimental purposes only.
The module in its entirety:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/syscalls.h>
static u32 **sct = (u32**)0x80008660; // `grep sys_call_table /proc/kallsyms`
asmlinkage int (*ioctl_orig)(s32 fd, u32 cmd, void* addr);
asmlinkage int ioctl_new(s32 fd, u32 cmd, void* addr)
{
printk("[IOC] Intercepted ioctl 0x%x to addr 0x%p\n", cmd, addr);
return ioctl_orig(fd, cmd, addr);
}
static int __init _enter(void)
{
ioctl_orig = (void*)sct[__NR_ioctl];
sct[__NR_ioctl] = (u32*)ioctl_new;
printk("[IOC] Original IOCTL addr: %p\n", ioctl_orig);
printk("[IOC] New IOCTL addr: %p\n", sct[__NR_ioctl]);
return 0;
}
static void __exit _exit(void)
{
sct[__NR_ioctl] = (u32 *)ioctl_orig;
printk("[IOC] Unloaded\n");
}
module_init(_enter);
module_exit(_exit);
MODULE_LICENSE("GPL");
Obviously this doesn't work or I wouldn't be here scraping the walls. The module loads fine and the printks from _enter/_exit do indeed appear, but nothing happens when I do ioctls towards the kernel in any way (I would expect to see the "Intercepted ioctl" message from ioctl_new), which leads me to believe I'm modifying the wrong spot.
Questions:
Obviously: What am I doing wrong?
Can I rely on /proc/kallsyms providing the correct pointer to the beginning of the syscall table?
Am I right in my assumption that the value associated with sys_ioctl in /proc/kallsyms should match *sct[__NR_ioctl] or am I missing something?
Am I casting correctly?
Is this method of modifying the sctable even applicable on mips?

Looking at arch/mips/kernel/ftrace.c leads me to believe that you need to use the table called "sys32_call_table"

What am I doing wrong?
You are trying to modify the system call table from a kernel module. This is unsafe and unsupported. Don't do it.
If you want to inspect system calls, there are a number of better tools available in the kernel, such as ftrace, perf, and SystemTap. Which one is most appropriate for you will depend on your specific requirements.

#alexst provided true answer!
According to linux/unistd.h for MIPS architecture:
#define __NR_Linux 4000
...
#define __NR_ioctl (__NR_Linux + 54)
So you need substract __NR_Linux from __NR_ioctl, e.g.:
ioctl_orig = (void*)sct[__NR_ioctl-__NR_Linux];

Related

Gaining access to heap metadata of a process from within itself

While I can write reasonable C code, my expertise is mainly with Java and so I apologize if this question makes no sense.
I am writing some code to help me do heap analysis. I'm doing this via instrumentation with LLVM. What I'm looking for is a way to access the heap metadata for a process from within itself. Is such a thing possible? I know that information about the heap is stored in many malloc_state structs (main_arena for example). If I can gain access to main_arena, I can start enumerating the different arenas, heaps, bins, etc. As I understand, these variables are all defined statically and so they can't be accessed.
But is there some way of getting this information? For example, could I use /proc/$pid/mem to leak the information somehow?
Once I have this information, I want want to basically get information about all the different freelists. So I want, for every bin in each bin type, the number of chunks in the bin and their sizes. For fast, small, and tcache bins I know that I just need the index to figure out the size. I have looked at how these structures are implemented and how to iterate through them. So all I need is to gain access to these internal structures.
I have looked at malloc_info and that is my fallback, but I would also like to get information about tcache and I don't think that is included in malloc_info.
An option I have considered is to build a custom version of glibc has the malloc_struct variables declared non-statically. But from what I can see, it's not very straightforward to build your own custom glibc as you have to build the entire toolchain. I'm using clang so I would have to build LLVM from source against my custom glibc (at least this is what I've understood from researching this approach).
I had a similar requirement recently, so I do think that being able to get to main_arena for a given process does have its value, one example being post-mortem memory usage analysis.
Using dl_iterate_phdr and elf.h, it's relatively straightforward to resolve main_arena based on the local symbol:
#define _GNU_SOURCE
#include <fcntl.h>
#include <link.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
// Ignored:
// - Non-x86_64 architectures
// - Resource and error handling
// - Style
static int cb(struct dl_phdr_info *info, size_t size, void *data)
{
if (strcmp(info->dlpi_name, "/lib64/libc.so.6") == 0) {
int fd = open(info->dlpi_name, O_RDONLY);
struct stat stat;
fstat(fd, &stat);
char *base = mmap(NULL, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
Elf64_Ehdr *header = (Elf64_Ehdr *)base;
Elf64_Shdr *secs = (Elf64_Shdr*)(base+header->e_shoff);
for (unsigned secinx = 0; secinx < header->e_shnum; secinx++) {
if (secs[secinx].sh_type == SHT_SYMTAB) {
Elf64_Sym *symtab = (Elf64_Sym *)(base+secs[secinx].sh_offset);
char *symnames = (char *)(base + secs[secs[secinx].sh_link].sh_offset);
unsigned symcount = secs[secinx].sh_size/secs[secinx].sh_entsize;
for (unsigned syminx = 0; syminx < symcount; syminx++) {
if (strcmp(symnames+symtab[syminx].st_name, "main_arena") == 0) {
void *mainarena = ((char *)info->dlpi_addr)+symtab[syminx].st_value;
printf("main_arena found: %p\n", mainarena);
raise(SIGTRAP);
return 0;
}
}
}
}
}
return 0;
}
int main()
{
dl_iterate_phdr(cb, NULL);
return 0;
}
dl_iterate_phdr is used to get the base address of the mapped glibc. The mapping does not contain the symbol table needed (.symtab), so the library has to be mapped again. The final address is determined by the base address plus the symbol value.
(gdb) run
Starting program: a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77f0700 (LWP 24834)]
main_arena found: 0x7ffff7baec60
Thread 1 "a.out" received signal SIGTRAP, Trace/breakpoint trap.
raise (sig=5) at ../sysdeps/unix/sysv/linux/raise.c:50
50 return ret;
(gdb) select 1
(gdb) print mainarena
$1 = (void *) 0x7ffff7baec60 <main_arena>
(gdb) print &main_arena
$3 = (struct malloc_state *) 0x7ffff7baec60 <main_arena>
The value matches that of main_arena, so the correct address was found.
There are other ways to get to main_arena without relying on the library itself. Walking the existing heap allows for discovering main_arena, for example, but that strategy is considerably less straightforward.
Of course, once you have main_arena, you need all internal type definitions to be able to inspect the data.
I am writing some code to help me do heap analysis.
What kind of heap analysis?
I want want to basically get information about all the different freelists. So I want, for every bin in each bin type, the number of chunks in the bin and their sizes. For fast, small, and tcache bins I know that I just need the index to figure out the size.
This information only makes sense if you are planning to change the malloc implementation. It does not make sense to attempt to collect it if your goal is to analyze or improve heap usage by the application, so it sounds like you have an XY problem.
In addition, things like bin and tcache only make sense in a context of particular malloc implementation (TCMalloc and jemalloc would not have any bins).
For analysis of application heap usage, you may want to use TCmalloc, as it provides a lot of tools for heap profiling and introspection.

Can a Linux kernel module use UIO if it does not use any physical hardware?

I am planning on building a Linux kernel module which will need to interface with a user-space device driver, and I will need to export data to user-space. After some reading I figured that the UIO interface might be what I need.
I looked at some examples and they are all based on the assumption that the kernel module itself will interact directly with hardware, and reference things like a device structure, interrupts, etc.
Is it possible to write a software only kernel module and still use the UIO library? Or would just using sysfs directly be a better approach?
EDIT: I am attaching some test code I was working on. The goal was to try and read a string from user-space through the UIO interface, but I don't think this will work since I cannot see how to properly initiate a struct device which I think is required for uio_register_device.
#include <linux/module.h> // Needed by all modules
#include <linux/kernel.h> // Needed for KERN_ALERT
#include <linux/uio_driver.h>
#include <linux/slab.h> // GFP_ defs
#include <linux/device.h>
char test_data[] = "This is some test data to read in user-space via UIO\n";
int init_module(void)
{
struct uio_info *info;
struct device *dev;
info = kzalloc(sizeof(struct uio_info), GFP_KERNEL);
if (!info)
return -ENOMEM;
// need to use struct device for uio_register_device
dev = kzalloc(sizeof(struct device), GFP_KERNEL);
dev->parent = 0;
dev->init_name = "UIO test driver";
info->name = "uio_test";
info->version = "0.0.1";
info->mem[0].size = sizeof(test_data);
info->mem[0].memtype = UIO_MEM_LOGICAL;
info->mem[0].addr = (phys_addr_t) kmalloc(sizeof(test_data), GFP_KERNEL);
snprintf((char *) info->mem[0].addr, sizeof(test_data), "%s", test_data);
info->irq = UIO_IRQ_NONE;
// now we need to register the device for it to create /dev/uioN and sysfs files
if (uio_register_device(dev, info)) {
printk(KERN_ALERT "uio_test: couldn't register UIO device\n");
kfree(dev);
kfree((char *) info->mem[0].addr);
kfree(info);
return -ENODEV;
}
printk(KERN_ALERT "uio_test: init complete\n");
return 0;
}
void cleanup_module(void)
{
printk(KERN_ALERT "uio_test: exit\n");
}
MODULE_LICENSE("GPL");
The whole point behind the kernel driver is to talk to hardware. If you don't have any hardware, then you probably don't need a kernel driver at all.
What is kernel module doing, if it isn't talking to hardware? Where is getting its data from? To answer your question, it is totally possible to write a kernel driver that doesn't actually talk to hardware and still talks to UIO, but I'm not sure what it would actually say.

Workqueue implementation in Linux Kernel

Can any one help me to understand difference between below mentioned APIs in Linux kernel:
struct workqueue_struct *create_workqueue(const char *name);
struct workqueue_struct *create_singlethread_workqueue(const char *name);
I had written sample modules, when I try to see them using ps -aef, both have created a workqueue, but I was not able to see any difference.
I have referred to http://www.makelinux.net/ldd3/chp-7-sect-6, and according to LDD3:
If you use create_workqueue, you get a workqueue that has a dedicated thread for each processor on the system. In many cases, all those threads are simply overkill; if a single worker thread will suffice, create the workqueue with create_singlethread_workqueue instead.
But I was not able to see multiple worker threads (each for a processor).
Workqueues have changed since LDD3 was written.
These two functions are actually macros:
#define create_workqueue(name) \
alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, (name))
#define create_singlethread_workqueue(name) \
alloc_workqueue("%s", WQ_UNBOUND | WQ_MEM_RECLAIM, 1, (name))
The alloc_workqueue documentation says:
Allocate a workqueue with the specified parameters. For detailed
information on WQ_* flags, please refer to Documentation/workqueue.txt.
That file is too big to quote entirely, but it says:
alloc_workqueue() allocates a wq. The original create_*workqueue()
functions are deprecated and scheduled for removal.
[...]
A wq no longer manages execution resources but serves as a domain for
forward progress guarantee, flush and work item attributes.
if(singlethread){
cwq = init_cpu_workqueue(wq, singlethread_cpu);
err = create_workqueue_thread(cwq, singlethread_cpu);
start_workqueue_thread(cwq, -1);
}else{
list_add(&wq->list, &workqueues);
for_each_possible_cpu(cpu) {
cwq = init_cpu_workqueue(wq, cpu);
err = create_workqueue_thread(cwq, cpu);
start_workqueue_thread(cwq, cpu);
}
}

How to set breakpoint to obtain address of a function in fork.c , in the kernel source?

Good day to all. I have this query which I hope someone is able to help me with. I forward my gratitude and thanks in advance. I had done hours of search but unable to find a solution.
My problem:
I need to obtain the address of the " security_task_create(clone_flags)" function the following code snippet (located in line 926 ,fork.c as per "/usr/src/linux-2.6.27/kernel/fork.c") -:
************************************ ************************************
static struct task_struct *copy_process(unsigned long clone_flags,
unsigned long stack_start,
struct pt_regs *regs,
unsigned long stack_size,
int __user *child_tidptr,
struct pid *pid,
int trace)
{
int retval;
struct task_struct *p;
int cgroup_callbacks_done = 0;
if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
return ERR_PTR(-EINVAL);
/*
* Thread groups must share signals as well, and detached threads
* can only be started up within the thread group.
*/
if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
return ERR_PTR(-EINVAL);
/*
* Shared signal handlers imply shared VM. By way of the above,
* thread groups also imply shared VM. Blocking this case allows
* for various simplifications in other code.
*/
if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
return ERR_PTR(-EINVAL);
****retval = security_task_create(clone_flags);****
if (retval)
goto fork_out;
retval = -ENOMEM;
p = dup_task_struct(current);
if (!p)
goto fork_out;
rt_mutex_init_task(p);
************************************ ************************************
I've enabled KDB access over keyboard in my Fedora Core 16 machine with kernel 3.1.7. Upon entering KDB console i.e. " kdb[0]> , I typed security_task_create and a hex address e.g. 0x0040118e is displayed.
My Questions:
Is the displayed hex address - the address of the security_task_create upon the kernel loaded?
2.If not, how am I able to obtain the address of the security_task_create function? How do I configure KDB to obtain the address of the security_task_create function?
What I have in mind is to insert a breakpoint at line 926 in fork.c using KDB when the kernel runs security_task_create in memory. If such is indeed the proper solution, how do I obtain the address of security_task_create using such method?
For getting address of any symbol in kernel use System.map file simply.
CONFIG_KALLSYMS is needs to be enabled in kernel configuration for getting all symbols in that file.
Just grep for printk in your source directory and I'm sure you'll find tons of examples.
printk(KERN_INFO "fork(): process `%s' used deprecated "
"clone flags 0x%lx\n",
get_task_comm(comm, current),
clone_flags & CLONE_STOPPED);

Unable to handle kernel paging request at X while intercepting the system call [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Linux Kernel: System call hooking example
I have been trying to hook the system calls at the kernel level.I got the basic idea from this question.The system call I was trying to intercept was the fork(). So I found out the address of the sys_call_table from System.map and it turned out to be 0xc12c9e90.Now I wrote the module as below.
#include<linux/kernel.h>
#include<linux/module.h>
#include<linux/unistd.h>
#include<linux/semaphore.h>
#include<asm/cacheflush.h>
MODULE_LICENSE("GPL");
void **sys_call_table;
unsigned long addr;
asmlinkage int (*original_call)(struct pt_regs);
asmlinkage int our_call(struct pt_regs regs)
{
printk("Intercepted sys_fork");
return original_call(regs);
}
static int __init p_entry(void)
{
struct page *pg;
printk(KERN_ALERT "Module Intercept inserted");
sys_call_table=(void *)0xc12c9e90;
pg=virt_to_page(sys_call_table);
addr=(unsigned long)page_address(pg);
set_memory_rw(addr,1);
original_call=sys_call_table[__NR_fork];
sys_call_table[__NR_fork]=our_call;
set_memory_ro(addr,1);
return 0;
}
static void __exit p_exit(void)
{
sys_call_table[__NR_fork]=original_call;
set_memory_ro(addr,1);
printk(KERN_ALERT "Module Intercept removed");
}
module_init(p_entry);
module_exit(p_exit);
I compiled the module and tried to insert it to the kernel.Unfortunately the dmesg output gave me a message as follows BUG:unable to handle kernel paging request at c12c9e98 and here is the ellaborate dmesg out put
As an experiment to find out the problem, I simply commented out the line
sys_call_table[__NR_fork]=our_call;
After that I repeated the compilation and followed by insertion.And it didn't show up any errors. So I concluded that ,the above specified line which assigns the new function in to sys_call_table is the problem. But I don't know what could be causing it and how to solve it.Can any one help me out to solve it?
I would expect that your call to set_memory_rw isn't taking effect because you do not use flush_tlb, so the TLB of the CPU is still active when your write to the syscall_table takes effect. You need to flush the TLB. You can probably use local_flush_tlb().

Resources