Kernels lower than 4.6 use assembly stubs to harden the hooking of critical system calls like fork, clone, exec etc. Particularly speaking for execve, the following snippet from Kernel-4.5 shows entry stub of execve:
ENTRY(stub_execve)
call sys_execve
return_from_execve:
...
END(stub_execve)
System call table contains this stub's address and this stub further calls original execve. So, to hook execve in this environment we need to patch call sys_execve in stub with our hooking routine and after doing our desired things call the original execve. This all can be seen in action in execmon, a process execution monitoring utility for linux. I'd tested execmon successfully working in Ubuntu 16.04 with kernel 4.4.
Starting from kernel 4.6, upper scheme for critical calls protection had been changed. Now the stub looks like:
ENTRY(ptregs_\func)
leaq \func(%rip), %rax
jmp stub_ptregs_64
END(ptregs_\func)
where \func will expand to sys_execve for execve calls. Again, system call table contains this stub and this stub calls original execve, but now in a more secured manner instead of just doing call sys_execve.
This newer stub, stores called function's address in RAX register and jumps to another stub shown below (comments removed):
ENTRY(stub_ptregs_64)
cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
jne 1f
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
popq %rax
jmp entry_SYSCALL64_slow_path
1:
jmp *%rax /* called from C */
END(stub_ptregs_64)
Please have a look on this to see comments and other referenced labels in this stub.
I'd tried hard to come up with some logic to overcome this protection and patch original calls with hooking functions, but no success yet.
Would someone like to join me and help to get out of it.
I completely don't understand where you take the security angle from.
Neither previous nor current from of the func is "hardened".
You never stated why do you want to hook execve either.
The standard hooking mechanism is with kprobes and you can check systemtap for an example consumer.
I had a look at aforementioned 'execmon' code and I find it to be of poor quality and in not fit for learning. For instance https://github.com/kfiros/execmon/blob/master/kmod/syscalls.c#L65
accesses userspace memory directly (no get_user, copy_from_user etc.)
does it twice. first it computes the lengths (unbound!) and then copies stuff in. in particular if someone made strings longer after the compupation, but before they get copied, this triggers a buffer overflow.
Related
This is a little bit strange question. I am trying to find a syscall that allowed to execute code on the stack without parameters on i386. I am doing ctf and I success to find a way to call syscall and control eax and have full control on the stack (with argv so just pointer to my strings). now I am jumping to the vdso (thats all the code in the program no dll's or anything else) to run a syscall that will allowed stack execution. but I go on the man page over and over and didn't found something I can use.
$uname -r 4.4.179-0404179-generic
There's no zero-arg Linux system call equivalent to mprotect(stack_base, stack_size, PROT_WRITE|PROT_READ|PROT_EXEC).
Not that I know of, and I wouldn't expect there to be one. Probably the only use case would be to help attackers, which is the opposite of hardening; normally you can make the stack executable via linker options or any specific pages via mprotect with args. There's no need for a shortcut for that.
There's also not one that can set the READ_IMPLIES_EXEC personality for an already-running process, even if you do allow args. (See Using personality syscall to make the stack executable - at best it will have an effect after execve.)
You might be able to use some ROP techniques to get some args set up for mprotect, and then return to the code you injected.
When I use gdb to debug process in arm linux I can use call like call write(123,"abc",3)
How does gdb inject that call into process and recover all?
How does gdb inject that call into process and recover all?
GDB can read and write the inferior (being debugged) process memory using ptrace system call.
So it reads and saves in its own memory some chunk of instructions from inferior (say 100 bytes).
Then it overwrites this chunk with new instructions, which look something like:
r0 = 123
r1 = pointer to "abc"
r2 = 3
BLR write
BKPT
Now GDB saves the current inferior registers, sets ip to point to the chunk of instructions it just wrote, and resumes the inferior.
Inferior executes instructions until it reaches the breakpoint, at which point GDB regains control. It can now look at the return register to know what write returned and print it. GDB now restores the original instructions and original register values, and we are back as if nothing happened.
P.S. This is a general description of how "call function in inferior" works; I do not claim that this is exactly how it works.
There are also complications: if write calls back into the code that GDB overwrote, it wouldn't work. So in reality GDB uses some other mechanism to obtain suitable "scratch area" in the inferior. Also, the "abc" string requires scratch area as well.
I trying to understand the internals of the Linux kernel by reading Robert Love's Linux Kernel Development.
On page 74 he says the easiest way to pass arguments to a syscall is via :
Somehow, user-space must relay the parameters to the kernel during the
trap.The easiest way to do this is via the same means that the syscall
number is passed: The parameters are stored in registers. On x86-32,
the registers ebx, ecx, edx, esi, and edi contain, in order, the first
five arguments.
Now this is bothering me for a number of reasons:
All syscalls are defined with the asmlinkage option. Which implies that the arguments are always to be found on the stack and not the register. So what is all this business with the registers ?
It may be possible that before the syscall is performed the values are copied on to the kernel stack. I have no idea why that would be efficient but it might be a possibility.
(This answer is for 32-bit x86 Linux to match your question; things are slightly different for 64-bit x86 and other architectures.)
The parameters are passed from userspace in registers as Love says.
When userspace invokes a system call with int $0x80, the kernel syscall entry code gets control. This is written in assembly language and can be seen here, for instance. One of the things this code does is to take the parameters from the registers and push them onto the stack, and then call the appropriate kernel sys_XXX() function (which is written in C). So those functions do indeed expect their arguments on the stack.
It wouldn't work as well to try to pass parameters from userspace to the kernel on the stack. When the system call is made, the CPU switches to a separate kernel stack, so the parameters would have to be copied from the userspace stack to the kernel stack, and this is somewhat complicated. And it would have to be done even for very simple system calls that just take a few numeric arguments and wouldn't otherwise need to access userspace memory at all (think about close() for instance).
I have seen this: Implementing a User-Level Threads Package and it doesn't apply.
During the implementation of Thread_new(int func(void*)), that assigns a thread and creates a stack, I am unable to think of a way to set the program counter (%eip) if I am correct, so when the thread is started by the scheduler, it starts at the given function's (func) entry point.
Although I have seen many c-only (no assembly) implementations, we have been given the following code (x86):
_thrstart:
pushl %edi
call *%esi
pushl %eax
call Thread_exit
Is there a specific reason to push %edi to the stack? I can't seem to find another use for esi/edi apart from byte copying.
I realize that the indirect call to *%esi is probably used to call the function from the context of the new thread, but apart from that, I don't seem to understand how (or what) %esi points to being a valid function address when _thrstart is called from Thread_new
NOTES:
Thread_exit is the cleanup thread, implemented in c.
This is HOMEWORK
In general; you can break "scheduler" down into 4 parts.
The first part is the mechanics of switching from one thread to another. This mostly involves storing the previous thread's state somewhere and loading the next thread's state from somewhere. Here, "somewhere" could be some sort of thread control block, or it could be the thread's stack, or both, or something else. A thread's state may include the contents of general purpose registers, it's stack top (esp), it's instruction pointer (eip), and anything else (MMX/SSE/AVX registers). However, for co-operative scheduling a thread's state could be much less (e.g. most of a thread's state is trashed by thread switching and cooperative scheduling is used so that the thread itself knows when its state is going to be trashed and can prepare for that).
The second part is deciding when to do a thread switch and which thread to switch to. This varies widely for different schedulers.
The third part is starting a thread. This mostly involves constructing the data that would be loaded during a thread switch. However, it's possible to do this in a "lazy" way, where you only create the minimal amount of state when first creating a thread, and then finish creating the remainder of the thread's state after it has been given CPU time.
The fourth part is terminating a thread. This involves destroying/freeing the data that would be loaded during a thread switch; but can also mean cleaning up any resources that the thread failed to release (e.g. file handles, network connections, thread local storage, whatever) so that you don't end up with "resource leaks".
Typically, in simple RTOessess, threads are not started by being called or jumped to - they are started by being returned or interrupt-returned to.
The trick is to assemble data at the top of the new stack so that is looks as if the thread has been running before and has either called the scheduler or entered it via an interrupt. At the bottom of this 'frame' should be the address of the thread function. You can then load the stack pointer with the address of the frame, enable interrupts and and perform a RET or IRET to start the thread function.
It's convenient to also first shove on a parameter that the new thread can retrieve and a call to the 'TerminateThread' or 'Thread_Exit', so that if the thread function returns, the scheduler can terminate it.
Seems that the problem wasn't as complicated as before.
Based on the answer given by #Martin James, the Stack is prepared so that the return address is the _thrstart function.
Based on the assembly used to perform a context switch, the registers edi and esi are stored in specific locations on the stack (when the thread is inactive). By using edi and esi as general purpose registers, edi contains the void* argument, and esi contains the address of the function to be called from the new thread.
_thrstart:
pushl %edi #pushes argument for function func to the stack
call *%esi #indirect call to func
pushl %eax #Expect return value in eax, push to stack
call Thread_exit #Call thread cleanup
If one tries to hook certain syscalls via sys_call_table-hooking, e.g. sys_execve this will fail, because they are indirectly called by a stub. For sys_execve this is stub_execve (compare assembly code on LXR).
But what are these stubs good for? Why do only certain system calls like execve(2) and fork(2) require a stub and how is this connected to x86_64? Is there a workaround to hook stubbed syscalls (in a Loadable Kernel Module)?
From here, it says:
"Certain special system calls that need to save a complete full stack frame."
And I think execve is just one of these special system calls.
From the code of stub_execve, If you want to hook it, at least you can try:
Get to understand the meaning of those assembly code and do it by yourself, then you can call your own function in your own assembly code.
From the middle of the assembly code, it has a call sys_execve, you can replace the address of sys_execve to your own hook function.