Syscall hijacking from kernel - c

I want to intercept the open() syscall, for testing each time a file is opened by the user, the message “OPEN IS!” should be displayed in dmesg.
The syscall table and open-call addresses in dmesg are displayed, but the message “OPEN IS!” is not visible. Kernel v. 4.18
I would like to know what the problem is. The code:
unsigned long cr0;
static unsigned long *__sys_call_table;
typedef asmlinkage int (*orig_open_t)(const char *, int, int);
orig_open_t orig_open;
unsigned long *
get_syscall_table_bf(void)
{
unsigned long *syscall_table;
unsigned long int i;
for (i = (unsigned long int)ksys_close; i < ULONG_MAX;
i += sizeof(void *)) {
syscall_table = (unsigned long *)i;
if (syscall_table[__NR_close] == (unsigned long)ksys_close) {
printk(KERN_INFO "syscall: %08lx\n", syscall_table);
return syscall_table;
}
}
return NULL;
}
asmlinkage int
hacked_open(const char *filename, int flags, int mode)
{
printk(KERN_INFO "OPEN IS!\n");
return 0;
}
static inline void
protect_memory(void)
{
write_cr0(cr0);
}
static inline void
unprotect_memory(void)
{
write_cr0(cr0 & ~0x00010000);
}
static int __init
diamorphine_init(void)
{
__sys_call_table = get_syscall_table_bf();
if (!__sys_call_table)
return -1;
cr0 = read_cr0();
orig_open = (orig_open_t)__sys_call_table[__NR_open];
unprotect_memory();
__sys_call_table[__NR_open] = (unsigned long)hacked_open;
printk(KERN_INFO "WE DO IT!\n");
printk(KERN_INFO "hacked is: %08lx\n", hacked_open);
protect_memory();
return 0;
}
static void __exit
diamorphine_cleanup(void)
{
unprotect_memory();
__sys_call_table[__NR_open] = (unsigned long)orig_open;
protect_memory();
}
module_init(diamorphine_init);
module_exit(diamorphine_cleanup);
MODULE_LICENSE("GPL");

I'm guessing something in your hooking is wrong. Either you're hooking a wrong offset of the syscall table or you're completely off. I couldn't understand why explicitly you start searching with ksys_close(), especially when it's an inlined function. You should try looking for the syscall table symbol as such:
typedef void (*_syscall_ptr_t)(void);
_syscall_ptr_t *_syscall_table = NULL;
_syscall_table=(_syscall_ptr_t *)kallsyms_lookup_name("sys_call_table");
A different (huge) issue I see with this is resetting CR0, which allows anything within your system to write to a read only memory at the time of your writing, instead of page-walking and setting the W bit on the specific page you're about to edit.
Additional one small word of advice: You should complete your hook to redirect to the original open syscall. Otherwise, you'll result in the entire system reading from STDIN for every newly opened file descriptor (which will kill your system, eventually)

Related

Finding the sys call table in memory 64-bit on 4.x.x Kernel

I'm trying to write a simple kernel module to find the sys_call_table in Linux and am having some trouble. I found a basic guide for 32-bit Linux here: https://memset.wordpress.com/2011/03/18/syscall-hijacking-dynamically-obtain-syscall-table-address-kernel-2-6-x-2/. I tried to adapt it to a modern 64-bit kernel and am having some trouble. My code is here:
#include<linux/init.h>
#include<linux/module.h>
#include<linux/kernel.h>
#include<linux/errno.h>
#include<linux/types.h>
#include<linux/unistd.h>
#include<asm/current.h>
#include<linux/sched.h>
#include<linux/syscalls.h>
#include<linux/utsname.h>
//#include<asm/system.h>
#include<linux/slab.h>
MODULE_LICENSE("GPL");
#define START_MEM PAGE_OFFSET
#define END_MEM ULLONG_MAX
unsigned long long *syscall_table;
unsigned long long **find(void) {
unsigned long long **sctable;
unsigned long long int i = START_MEM;
while ( i < END_MEM) {
sctable = (unsigned long long **)i;
if ( sctable[__NR_close] == (unsigned long long *) sys_close) {
printk(KERN_WARNING "------%p---------\n", (unsigned long long *) sys_close);
return &sctable[0];
}
i += sizeof(void *);
}
return NULL;
}
static int __init init_load(void) {
syscall_table = (unsigned long long *) find();
if (syscall_table != NULL) {
printk(KERN_WARNING "syscall table found at %p\n", syscall_table);
}
else {
printk(KERN_WARNING "syscall table not found!");
}
return 0;
}
static void __exit exit_unload(void) {
return;
}
module_init(init_load);
module_exit(exit_unload);
I compile the code using a simple makefile, no warnings or notices come up and here is what is printed (after insmod-ing):
$ dmesg
<snip>
[39592.352209] ------ffffffff8120a2a0---------
[39592.352214] syscall table found at ffff880001a001c0
</snip>
So all seems well right? But then I check out where the sys_call_table is by running:
punk#punk-vbox:~/dev/m-dev/rootkit-examples$ sudo cat /boot/System.map-4.4.0-36-generic | grep sys_call_table
ffffffff81a001c0 R sys_call_table
ffffffff81a01500 R ia32_sys_call_table
..meaning that my code seemingly gets the location of the sys_call_table totally wrong. So ffff880001a001c0 instead of ffffffff81a001c0.
I'm not entirely sure where to start debugging this. Am I doing something obviously wrong here? What would be a good first step to debugging this on my own? I'm fairly new to writing kernel modules and any help would be appreciated!
Your kernel may have enabled x32 compat.
There are two sys_call_tables in this kind of kernel. compat_sys_call_table(ia23_sys_call_table) for 32-bit and sys_call_table for 64-bit. And they use the same sys_close.
You may find sys_close in compat_sys_call_table, but __NR_close is different between 32-bit unistd.h and 64-bit unistd.h. You may be using 64-bit __NR_close, so you cannot get compat_sys_call_table nor sys_call_table correctly.
You can check my code, ASyScallHookFrame, it works fine on Android kernel 3.10.

How to get "valid data length" of a file?

There is a function to set the "valid data length" value: SetFileValidData, but I didn't find a way to get the "valid data length" value.
I want to know about given file if the EOF is different from the VDL, because writing after the VDL in case of VDL<EOF will cause a performance penalty as described here.
I found this page, claims that:
there is no mechanism to query the value of the VDL
So the answer is "you can't".
If you care about performance you can set the VDL to the EOF, but then note that you may allow access old garbage on your disk - the part between those two pointers, that supposed to be zeros if you would access that file without setting the VDL to point the EOF.
Looked into this. No way to get this information via any API, even the e.g. NtQueryInformationFile API (FileEndOfFileInformation only worked with NtSetInformationFile). So finally I read this by manually reading NTFS records. If anyone has a better way, please tell! This also obviously only works with full system access (and NTFS) and might be out of sync with the in-memory information Windows uses.
#pragma pack(push)
#pragma pack(1)
struct NTFSFileRecord
{
char magic[4];
unsigned short sequence_offset;
unsigned short sequence_size;
uint64 lsn;
unsigned short squence_number;
unsigned short hardlink_count;
unsigned short attribute_offset;
unsigned short flags;
unsigned int real_size;
unsigned int allocated_size;
uint64 base_record;
unsigned short next_id;
//char padding[470];
};
struct MFTAttribute
{
unsigned int type;
unsigned int length;
unsigned char nonresident;
unsigned char name_lenght;
unsigned short name_offset;
unsigned short flags;
unsigned short attribute_id;
unsigned int attribute_length;
unsigned short attribute_offset;
unsigned char indexed_flag;
unsigned char padding1;
//char padding2[488];
};
struct MFTAttributeNonResident
{
unsigned int type;
unsigned int lenght;
unsigned char nonresident;
unsigned char name_length;
unsigned short name_offset;
unsigned short flags;
unsigned short attribute_id;
uint64 starting_vnc;
uint64 last_vnc;
unsigned short run_offset;
unsigned short compression_size;
unsigned int padding;
uint64 allocated_size;
uint64 real_size;
uint64 initial_size;
};
#pragma pack(pop)
HANDLE GetVolumeData(const std::wstring& volfn, NTFS_VOLUME_DATA_BUFFER& vol_data)
{
HANDLE vol = CreateFileW(volfn.c_str(), GENERIC_WRITE | GENERIC_READ,
FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (vol == INVALID_HANDLE_VALUE)
return vol;
DWORD ret_bytes;
BOOL b = DeviceIoControl(vol, FSCTL_GET_NTFS_VOLUME_DATA,
NULL, 0, &vol_data, sizeof(vol_data), &ret_bytes, NULL);
if (!b)
{
CloseHandle(vol);
return INVALID_HANDLE_VALUE;
}
return vol;
}
int64 GetFileValidData(HANDLE file, HANDLE vol, const NTFS_VOLUME_DATA_BUFFER& vol_data)
{
BY_HANDLE_FILE_INFORMATION hfi;
BOOL b = GetFileInformationByHandle(file, &hfi);
if (!b)
return -1;
NTFS_FILE_RECORD_INPUT_BUFFER record_in;
record_in.FileReferenceNumber.HighPart = hfi.nFileIndexHigh;
record_in.FileReferenceNumber.LowPart = hfi.nFileIndexLow;
std::vector<BYTE> buf;
buf.resize(sizeof(NTFS_FILE_RECORD_OUTPUT_BUFFER) + vol_data.BytesPerFileRecordSegment - 1);
NTFS_FILE_RECORD_OUTPUT_BUFFER* record_out = reinterpret_cast<NTFS_FILE_RECORD_OUTPUT_BUFFER*>(buf.data());
DWORD bout;
b = DeviceIoControl(vol, FSCTL_GET_NTFS_FILE_RECORD, &record_in,
sizeof(record_in), record_out, 4096, &bout, NULL);
if (!b)
return -1;
NTFSFileRecord* record = reinterpret_cast<NTFSFileRecord*>(record_out->FileRecordBuffer);
unsigned int currpos = record->attribute_offset;
MFTAttribute* attr = nullptr;
while ( (attr==nullptr ||
attr->type != 0xFFFFFFFF )
&& record_out->FileRecordBuffer + currpos +sizeof(MFTAttribute)<buf.data() + bout)
{
attr = reinterpret_cast<MFTAttribute*>(record_out->FileRecordBuffer + currpos);
if (attr->type == 0x80
&& record_out->FileRecordBuffer + currpos + attr->attribute_offset+sizeof(MFTAttributeNonResident)
< buf.data()+ bout)
{
if (attr->nonresident == 0)
return -1;
MFTAttributeNonResident* dataattr = reinterpret_cast<MFTAttributeNonResident*>(record_out->FileRecordBuffer
+ currpos + attr->attribute_offset);
return dataattr->initial_size;
}
currpos += attr->length;
}
return -1;
}
[...]
NTFS_VOLUME_DATA_BUFFER vol_data;
HANDLE vol = GetVolumeData(L"\\??\\D:", vol_data);
if (vol != INVALID_HANDLE_VALUE)
{
int64 vdl = GetFileValidData(alloc_test->getOsHandle(), vol, vol_data);
if(vdl>=0) { [...] }
[...]
}
[...]
The SetValidData (according to MSDN) can be used to create for example a large file without having to write to the file. For a database this will allocate a (contiguous) storage area.
As a result, it seems the file size on disk will have changed without any data having been written to the file.
By implication, any GetValidData (which does not exist) just returns the size of the file, so you can use GetFileSize which returns the "valid" file size.
I think you are confused as to what "valid data length" actually means. Check this answer.
Basically, while SetEndOfFile lets you increase the length of a file quickly, and allocates the disk space, if you skip to the (new) end-of-file to write there, all the additionally allocated disk space would need to be overwritten with zeroes, which is kind of slow.
SetFileValidData lets you skip that zeroing-out. You're telling the system, "I am OK with whatever is in those disk blocks, get on with it". (This is why you need the SE_MANAGE_VOLUME_NAME priviledge, as it could reveal priviledged data to unpriviledged users if you don't overwrite the data. Users with this priviledge can access the raw drive data anyway.)
In either case, you have set the new effective size of the file. (Which you can read back.) What, exactly, should a seperate "read file valid data" report back? SetFileValidData told the system that whatever is in those disk blocks is "valid"...
Different approach of explanation:
The documentation mentions that the "valid data length" is being tracked; the purpose for this is for the system to know which range (from end-of-valid-data to end-of-file) it still needs to zero out, in the context of SetEndOfFile, when necessary (e.g. you closing the file). You don't need to read back this value, because the only way it could be different from the actual file size is because you, yourself, did change it via the aforementioned functions...

What does sparc_do_fork() exactly do?

I tried to find out the functionality of this function but I couldn't.. It is defined in Linux/arch/sparc/kernel/process_32.c Thanks
asmlinkage int sparc_do_fork(unsigned long clone_flags,
unsigned long stack_start,
struct pt_regs *regs,
unsigned long stack_size)
{
unsigned long parent_tid_ptr, child_tid_ptr;
unsigned long orig_i1 = regs->u_regs[UREG_I1];
long ret;
parent_tid_ptr = regs->u_regs[UREG_I2];
child_tid_ptr = regs->u_regs[UREG_I4];
ret = do_fork(clone_flags, stack_start, stack_size,
(int __user *) parent_tid_ptr,
(int __user *) child_tid_ptr);
/* If we get an error and potentially restart the system
* call, we're screwed because copy_thread() clobbered
* the parent's %o1. So detect that case and restore it
* here.
*/
if ((unsigned long)ret >= -ERESTART_RESTARTBLOCK)
regs->u_regs[UREG_I1] = orig_i1;
return ret;
}
It appears to be right there in the source code.
It's a wrapper around the regular Linux do_fork() call, one which saves and restores data (specifically, regs->u_regs[UREG_I1], which equates to the SPARC output register 1) that would otherwise be corrupted under certain circumstances:
/* If we get an error and potentially restart the system
* call, we're screwed because copy_thread() clobbered
* the parent's %o1. So detect that case and restore it
* here.
*/
It does this with:
unsigned long orig_i1 = regs->u_regs[UREG_I1]; // Save it.
ret = do_fork(...);
if ((unsigned long)ret >= -ERESTART_RESTARTBLOCK) // It may be corrupt
regs->u_regs[UREG_I1] = orig_i1; // so restore it.

Why do I get -38 error, while trying to insmod a kernel module probing do_fork?

I am trying to insmod a jprobe module to a rooted Android phone:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
/*
* Jumper probe for do_fork.
* Mirror principle enables access to arguments of the probed routine
* from the probe handler.
*/
/* Proxy routine having the same arguments as actual do_fork() routine */
static long jdo_fork(unsigned long clone_flags, unsigned long stack_start,
struct pt_regs *regs, unsigned long stack_size,
int __user *parent_tidptr, int __user *child_tidptr)
{
printk(KERN_INFO "jprobe: clone_flags = 0x%lx, stack_size = 0x%lx,"
" regs = 0x%p\n",
clone_flags, stack_size, regs);
/* Always end with a call to jprobe_return(). */
jprobe_return();
return 0;
}
static struct jprobe my_jprobe = {
.entry = jdo_fork,
.kp = {
.symbol_name = "do_fork",
},
};
static int __init jprobe_init(void)
{
int ret;
ret = register_jprobe(&my_jprobe);
if (ret < 0) {
printk(KERN_INFO "register_jprobe failed, returned %d\n", ret);
return -1;
}
printk(KERN_INFO "Planted jprobe at %p, handler addr %p\n",
my_jprobe.kp.addr, my_jprobe.entry);
return 0;
}
static void __exit jprobe_exit(void)
{
unregister_jprobe(&my_jprobe);
printk(KERN_INFO "jprobe at %p unregistered\n", my_jprobe.kp.addr);
}
module_init(jprobe_init)
module_exit(jprobe_exit)
MODULE_LICENSE("GPL");
but it is failed:
root#android:# insmod my_jprobe.ko
[3223.32]register_jprobe failed, returned -38
I get -38 error, and couldn't understand what is it, the only return value on failure I saw is -22, is it possible to insmod a jprobe module on arm based chip?
do_fork is in the System.map and is in the object table.
What flags do I need to turn on in the config file to support jpobes?
If you don't have register_probe or register_kprobe in your System.map, that means that CONFIG_KPROBES is not enabled in your current kernel config.
You would need to build kernel for your platform with it enabled and then try your module.
CONFIG_OPTPROBES=y
CONFIG_PREEMPT=y
CONFIG_OPTPROBES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULES=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_DEBUG_INFO=y
And one more config flag I needed specific to my platform:
CONFIG_MODULE_FORCE_LOAD=y
http://www.kernel.org/doc/Documentation/kprobes.txt
I was facing the same issue with Linux 4.17.0.
I found that jprobes has been abolished after 4.15:
https://lwn.net/Articles/735667/

why is access_ok failing for this ioctl

EDIT: I don't have a good answer yet as to why I'm getting a failure here... So let me rephrase this a little. Do I even need the verify_area() check? What is the point of that? I have tested out the fact that my structure gets passed successfully to this ioctl, I'm thinking of just removing the failing check, but I'm not 100% what it's in there to do. Thoughts?
END EDIT
I'm working to update some older linux kernel drivers and while testing one out I'm getting a failure which seems odd to me. Here we go:
I have a simple ioctl call in user space:
Config_par_t cfg;
int ret;
cfg.target = CONF_TIMING;
cfg.val1 = nBaud;
ret = ioctl(fd, CAN_CONFIG, &cfg);
The Config_par_t is defined in can4linux.h file (this is the CAN driver that comes with uCLinux):
typedef struct Command_par {
int cmd; /**< special driver command */
int target; /**< special configuration target */
unsigned long val1; /**< 1. parameter for the target */
unsigned long val2; /**< 2. parameter for the target */
int error; /**< return value */
unsigned long retval; /**< return value */
} Command_par_t ;
In the kernel side of things, the ioctl function calls verify_area, which is the failing procedure:
long can_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
void *argp;
long retval = -EIO;
Message_par_t Message;
Command_par_t Command;
struct inode *inode = file->f_path.dentry->d_inode;
argp = &Message;
Can_errno = 0;
switch(cmd) {
case CONFIG:
if( verify_area(VERIFY_READ, (void *) arg, sizeof(Command_par_t))) {
return(retval);
}
Now I know that verify_area() isn't used anymore so I updated it in a header file with this macro to access_ok:
#if LINUX_VERSION_CODE > KERNEL_VERSION(2, 6, 0)
#define verify_area(type, addr, size) access_ok(type, addr, size)
#endif
I'm on a x86 platform so I'm pretty sure the actual access_ok() macro being called is the one in /usr/src/linux/arch/x86/include/asm/uaccess.h as defined here:
#define access_ok(type, addr, size) (likely(__range_not_ok(addr, size) == 0))
#define __range_not_ok(addr, size) \
({ \
unsigned long flag, roksum; \
__chk_user_ptr(addr); \
asm("add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0" \
: "=&r" (flag), "=r" (roksum) \
: "1" (addr), "g" ((long)(size)), \
"rm" (current_thread_info()->addr_limit.seg)); \
flag; \
})
I guess to me this looks like it should be working. Any ideas why I'm getting a 1 return from this verify_area if check? Or any ideas on how I can go about narrowing down the problem?
if( verify_area(VERIFY_READ, (void *) arg, sizeof(Command_par_t))) {
The macro access_ok returns 0 if the block is invalid and nonzero if it may be valid. So in your test, if the block is valid you immediately return -EIO. The way things look, you might want to negate the result of access_ok, something like:
if (!access_ok(...))

Resources