After learn C and theorical stuffs of operatives systems, i decided to analyze one kernel rootkit for linux, but i can't understand one line of code, i don't know how to read that line:
*(void **)&((char *)h->original_function)[ASM_HOOK_CODE_OFFSET] = h->modified_function;
Full context:
#if defined __i386__
// push 0x00000000, ret
#define ASM_HOOK_CODE "\x68\x00\x00\x00\x00\xc3"
#define ASM_HOOK_CODE_OFFSET 1
// alternativly we could do `mov eax 0x00000000, jmp eax`, but it's a byte longer
//#define ASM_HOOK_CODE "\xb8\x00\x00\x00\x00\xff\xe0"
#elif defined __x86_64__
// there is no push that pushes a 64-bit immidiate in x86_64,
// so we do things a bit differently:
// mov rax 0x0000000000000000, jmp rax
#define ASM_HOOK_CODE "\x48\xb8\x00\x00\x00\x00\x00\x00\x00\x00\xff\xe0"
#define ASM_HOOK_CODE_OFFSET 2
#else
#error ARCH_ERROR_MESSAGE
#endif
struct asm_hook {
void *original_function;
void *modified_function;
char original_asm[sizeof(ASM_HOOK_CODE)-1];
struct list_head list;
};
/**
* Patches machine code of the original function to call another function.
* This function should not be called directly.
*/
void _asm_hook_patch(struct asm_hook *h)
{
DISABLE_W_PROTECTED_MEMORY
memcpy(h->original_function, ASM_HOOK_CODE, sizeof(ASM_HOOK_CODE)-1);
*(void **)&((char *)h->original_function)[ASM_HOOK_CODE_OFFSET] = h->modified_function;
ENABLE_W_PROTECTED_MEMORY
}
Rootkit link:
https://github.com/nurupo/rootkit/blob/master/rootkit.c
(line 314 of rootkit.c)
I don't want to be explained how the rootkit, only want to understand how to read that line of code, the first part of the line makes me dizzy.
If I am not mistaken: you start at the innermost part, which is
h->original_function
Then we see a brace at the right so now we scan left for the matching brace. But wait a second, we see a cast to (char *) so it is a pointer to a char and now the brace closes.
On the right we now see an array indexing to take element [ASM_HOOK_CODE_OFFSET] and on the left we now see & to take its address. So we now have the address of a char.
Now we can only go the the left and see *(void **), which casts this address to be a pointer-to-a-pointer-to-void and then dereference it and assign the right part to the address.
Related
I have to change the designated section of function_b so that it changes the stack in such a way that the program prints:
Executing function_a
Executing function_b
Finished!
At this point it also prints Executed function_b in between Executing function_b and Finished!.
I have the following code and I have to fill something in, in the part where it says // ... insert code here
#include <stdio.h>
void function_b(void){
char buffer[4];
// ... insert code here
fprintf(stdout, "Executing function_b\n");
}
void function_a(void) {
int beacon = 0x0b1c2d3;
fprintf(stdout, "Executing function_a\n");
function_b();
fprintf(stdout, "Executed function_b\n");
}
int main(void) {
function_a();
fprintf(stdout, "Finished!\n");
return 0;
}
I am using Ubuntu Linux with the gcc compiler. I compile the program with the following options: -g -fno-stack-protector -fno-omit-frame-pointer. I am using an intel processor.
Here is a solution, not exactly stable across environments, but works for me on x86_64 processor on Windows/MinGW64.
It may not work for you out of the box, but still, you might want to use a similar approach.
void function_b(void) {
char buffer[4];
buffer[0] = 0xa1; // part 1
buffer[1] = 0xb2;
buffer[2] = 0xc3;
buffer[3] = 0x04;
register int * rsp asm ("rsp"); // part 2
register size_t r10 asm ("r10");
r10 = 0;
while (*rsp != 0x04c3b2a1) {rsp++; r10++;} // part 3
while (*rsp != 0x00b1c2d3) rsp++; // part 4
rsp -= r10; // part 5
rsp = (int *) ((size_t) rsp & ~0xF); // part 6
fprintf(stdout, "Executing function_b\n");
}
The trick is that each of function_a and function_b have only one local variable, and we can find the address of that variable just by searching around in the memory.
First, we put a signature in the buffer, let it be the 4-byte integer 0x04c3b2a1 (remember that x86_64 is little-endian).
After that, we declare two variables to represent the registers: rsp is the stack pointer, and r10 is just some unused register.
This allows to not use asm statements later in the code, while still being able to use the registers directly.
It is important that the variables don't actually take stack memory, they are references to processor registers themselves.
After that, we move the stack pointer in 4-byte increments (since the size of int is 4 bytes) until we get to the buffer. We have to remember the offset from the stack pointer to the first variable here, and we use r10 to store it.
Next, we want to know how far in the stack are the instances of function_b and function_a. A good approximation is how far are buffer and beacon, so we now search for beacon.
After that, we have to push back from beacon, the first variable of function_a, to the start of instance of the whole function_a on the stack.
That we do by subtracting the value stored in r10.
Finally, here comes a werider bit.
At least on my configuration, the stack happens to be 16-byte aligned, and while the buffer array is aligned to the left of a 16-byte block, the beacon variable is aligned to the right of such block.
Or is it something with a similar effect and different explanation?..
Anyway, so we just clear the last four bits of the stack pointer to make it 16-byte aligned again.
The 32-bit GCC doesn't align anything for me, so you might want to skip or alter this line.
When working on a solution, I found the following macro useful:
#ifdef DEBUG
#define show_sp() \
do { \
register void * rsp asm ("rsp"); \
fprintf(stdout, "stack pointer is %016X\n", rsp); \
} while (0);
#else
#define show_sp() do{}while(0);
#endif
After this, when you insert a show_sp(); in your code and compile with -DDEBUG, it prints what is the value of stack pointer at the respective moment.
When compiling without -DDEBUG, the macro just compiles to an empty statement.
Of course, other variables and registers can be printed in a similar way.
ok, let assume that epilogue (i.e code at } line) of function_a and for function_b is the same
despite functions A and B not symmetric, we can assume this because it have the same signature (no parameters, no return value), same calling conventions and same size of local variables (4 byte - int beacon = 0x0b1c2d3 vs char buffer[4];) and with optimization - both must be dropped because unused. but we must not use additional local variables in function_b for not break this assumption. most problematic point here - what is function_A or function_B will be use nonvolatile registers (and as result save it in prologue and restore in epilogue) - but however look like here no place for this.
so my next code based on this assumption - epilogueA == epilogueB (really solution of #Gassa also based on it.
also need very clearly state that function_a and function_b must not be inline. this is very important - without this any solution impossible. so I let yourself add noinline attribute to function_a and function_b. note - not code change but attribute add, which author of this task implicitly implies but not clearly stated. don't know how in GCC mark function as noinline but in CL __declspec(noinline) for this used.
next code I write for CL compiler where exist next intrinsic function
void * _AddressOfReturnAddress();
but I think that GCC also must have the analog of this function. also I use
void* _ReturnAddress();
but however really _ReturnAddress() == *(void**)_AddressOfReturnAddress() and we can use _AddressOfReturnAddress() only. simply using _ReturnAddress() make source (but not binary - it equal) code smaller and more readable.
and next code is work for both x86 and x64. and this code work (tested) with any optimization.
despite I use 2 global variables - code is thread safe - really we can call main from multiple threads in concurrent, call it multiple time - but all will be worked correct (only of course how I say at begin if epilogueA == epilogueB)
hope comments in code enough self explained
__declspec(noinline) void function_b(void){
char buffer[4];
buffer[0] = 0;
static void *IPa, *IPb;
// save the IPa address
_InterlockedCompareExchangePointer(&IPa, _ReturnAddress(), 0);
if (_ReturnAddress() == IPa)
{
// we called from function_a
function_b();
// <-- IPb
if (_ReturnAddress() == IPa)
{
// we called from function_a, change return address for return to IPb instead IPa
*(void**)_AddressOfReturnAddress() = IPb;
return;
}
// we at stack of function_a here.
// we must be really at point IPa
// and execute fprintf(stdout, "Executed function_b\n"); + '}' (epilogueA)
// but we will execute fprintf(stdout, "Executing function_b\n"); + '}' (epilogueB)
// assume that epilogueA == epilogueB
}
else
{
// we called from function_b
IPb = _ReturnAddress();
return;
}
fprintf(stdout, "Executing function_b\n");
// epilogueB
}
__declspec(noinline) void function_a(void) {
int beacon = 0x0b1c2d3;
fprintf(stdout, "Executing function_a\n");
function_b();
// <-- IPa
fprintf(stdout, "Executed function_b\n");
// epilogueA
}
int main(void) {
function_a();
fprintf(stdout, "Finished!\n");
return 0;
}
We're trying to implement some kind of "fibers" and want for each a "stack" allocated on the heap, let's say of fixed size somewhere near 2MB, for now.
//2MB ~ 2^21 B = 2097152 B
#define FIB_STACK_SIZE 2097152
#define reg_t uint32_t
typedef struct fiber fiber;
struct fiber{
...
//fiber's stack
reg_t esp;
...
};
During creation of a fiber, we allocate that "stack" and enqueue the created struct for later use in a ready queue.
void fib_create(...){
//fiber struct itself
f = malloc(sizeof(*f)); //f later enqueued
...
//fiber stack
f->stack = malloc(FIB_STACK_SIZE);
f->esp = (reg_t)f->stack;
...
}
fib is the struct taken from the ready queue for which we need to restore the context.
Obviously, we first need to restore the stack pointer s.th. we can restore everything else:
void fib_resume(){
//assumes `fib' holds fiber to resume execution
//restore stack pointers
__asm__(
"movl %0, %%esp;"
:
:"rm"(fib->esp)
);
...
}
However, that move instruction will result in a segfault. Why? And how can we circumvent that?
On i386 (which is pretty apparent from the inline assembler) the stack grows down. That means towards lower addresses, so function calls will decrement the stack address.
This means that when we're allocating a stack for a thread/process/etc. the normal way of doing it is to point the stack pointer register at the end of the allocated memory.
In your case this should be:
f->esp = (reg_t)f->stack + FIB_STACK_SIZE;
I'm still not sure if it's a good idea to do this with inline assembler in a C function rather than writing the function completely in assembler, but this should resolve the immediate problem.
I want to pretend that an array in C is an area of memory in a microprocessor, so I can compile some code on a PC. I've written a small program to try to get the syntax correct, but the program won't run, it either crashes or won't compile when I change the way I access the variable - it's late and I can't see why. What is wrong with this please?
// original code in microprocessor header that I need to change if I compile on the host
// BASE is simply a hex value that is later used as an address or a hex value
#define BASE (0x0000)
// used later in header like this (cannot change the way this is done)
#define OFFSET 0x0001
#define PERIPHERAL (BASE + OFFSET)
// also used like (also cannot change):
uint32_t var = PERIPHERAL | HEXMASK;
// here is how I intend to replace the uC specific code
// replace the BASE DEFINE with the next 2 lines of code:
// instead of writing to memory location, write to array of bytes instead, so declare it:
uint8_t BASE_memory[4] = {0, 0, 0, 0};
// define BASE as hex value that can be used as drop-in replacement in either of the 2 uses shown above
#define BASE ((uint32_t)(BASE_memory))
// now test usage
// access contents of BASE_memory[0]
printf("contents of BASE_memory[0] == %02x\n", *((uint32_t *)(BASE)));
// now I want to access PERIPHERAL, the second element of the array, i.e. BASE_memory[1]
printf("contents of BASE_memory[1] == %02x\n", *((uint32_t *)(PERIPHERAL)));
I think you are on a 64-bit system.
#include <stdint.h>
uint8_t BASE_memory[4] = {1, 2, 3, 4};
int func1()
{
return *(uint32_t *) (uint32_t) BASE_memory;
}
int func2()
{
return *(uint32_t *) (uintptr_t) BASE_memory;
}
Here's the assembly output for func1:
leaq _BASE_memory(%rip), %rax
movl %eax, %eax
movl (%rax), %eax
Here's the assembly for func2:
movl _BASE_memory(%rip), %eax
You can see that if you cast the address to uint32_t, then there's an extra step where the high bits are set to zero. The address is then wrong, and you get a segmentation fault. That's why you use uintptr_t or intptr_t instead of uint32_t.
I would like to generate a function at runtime in C. And by this I mean I would essentially like to allocate some memory, point at it and execute it via function pointer. I realize this is a very complex topic and my question is naïve. I also realize there are some very robust libraries out there that do this (e.g. nanojit).
But I would like to learn the technique, starting with the basics. Could someone knowledgeable give me a very simple example in C?
EDIT: The answer below is great but here is the same example for Windows:
#include <Windows.h>
#define MEMSIZE 100*1024*1024
typedef void (*func_t)(void);
int main() {
HANDLE proc = GetCurrentProcess();
LPVOID p = VirtualAlloc(
NULL,
MEMSIZE,
MEM_RESERVE|MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
func_t func = (func_t)p;
PDWORD code = (PDWORD)p;
code[0] = 0xC3; // ret
if(FlushInstructionCache(
proc,
NULL,
0))
{
func();
}
CloseHandle(proc);
VirtualFree(p, 0, MEM_RELEASE);
return 0;
}
As said previously by other posters, you'll need to know your platform pretty well.
Ignoring the issue of casting a object pointer to a function pointer being, technically, UB, here's an example that works for x86/x64 OS X (and possibly Linux too). All the generated code does is return to the caller.
#include <unistd.h>
#include <sys/mman.h>
typedef void (*func_t)(void);
int main() {
/*
* Get a RWX bit of memory.
* We can't just use malloc because the memory it returns might not
* be executable.
*/
unsigned char *code = mmap(NULL, getpagesize(),
PROT_READ|PROT_EXEC|PROT_WRITE,
MAP_SHARED|MAP_ANON, 0, 0);
/* Technically undefined behaviour */
func_t func = (func_t) code;
code[0] = 0xC3; /* x86 'ret' instruction */
func();
return 0;
}
Obviously, this will be different across different platforms but it outlines the basics needed: get executable section of memory, write instructions, execute instructions.
This requires you to know your platform. For instance, what is the C calling convention on your platform? Where are parameters stored? What register holds the return value? What registers must be saved and restored? Once you know that, you can essentially write some C code that assembles code into a block of memory, then cast that memory into a function pointer (though this is technically forbidden in ANSI C, and will not work depending if your platform marks some pages of memory as non-executable aka NX bit).
The simple way to go about this is simply to write some code, compile it, then disassemble it and look at what bytes correspond to which instructions. You can write some C code that fills allocated memory with that collection of bytes and then casts it to a function pointer of the appropriate type and executes.
It's probably best to start by reading the calling conventions for your architecture and compiler. Then learn to write assembly that can be called from C (i.e., follows the calling convention).
If you have tools, they can help you get some things right easier. For example, instead of trying to design the right function prologue/epilogue, I can just code this in C:
int foo(void* Data)
{
return (Data != 0);
}
Then (MicrosoftC under Windows) feed it to "cl /Fa /c foo.c". Then I can look at "foo.asm":
_Data$ = 8
; Line 2
push ebp
mov ebp, esp
; Line 3
xor eax, eax
cmp DWORD PTR _Data$[ebp], 0
setne al
; Line 4
pop ebp
ret 0
I could also use "dumpbin /all foo.obj" to see that the exact bytes of the function were:
00000000: 55 8B EC 33 C0 83 7D 08 00 0F 95 C0 5D C3
Just saves me some time getting the bytes exactly right...
I'm attempting to write a simple buffer overflow using C on Mac OS X 10.6 64-bit. Here's the concept:
void function() {
char buffer[64];
buffer[offset] += 7; // i'm not sure how large offset needs to be, or if
// 7 is correct.
}
int main() {
int x = 0;
function();
x += 1;
printf("%d\n", x); // the idea is to modify the return address so that
// the x += 1 expression is not executed and 0 gets
// printed
return 0;
}
Here's part of main's assembler dump:
...
0x0000000100000ebe <main+30>: callq 0x100000e30 <function>
0x0000000100000ec3 <main+35>: movl $0x1,-0x8(%rbp)
0x0000000100000eca <main+42>: mov -0x8(%rbp),%esi
0x0000000100000ecd <main+45>: xor %al,%al
0x0000000100000ecf <main+47>: lea 0x56(%rip),%rdi # 0x100000f2c
0x0000000100000ed6 <main+54>: callq 0x100000ef4 <dyld_stub_printf>
...
I want to jump over the movl instruction, which would mean I'd need to increment the return address by 42 - 35 = 7 (correct?). Now I need to know where the return address is stored so I can calculate the correct offset.
I have tried searching for the correct value manually, but either 1 gets printed or I get abort trap – is there maybe some kind of buffer overflow protection going on?
Using an offset of 88 works on my machine. I used Nemo's approach of finding out the return address.
This 32-bit example illustrates how you can figure it out, see below for 64-bit:
#include <stdio.h>
void function() {
char buffer[64];
char *p;
asm("lea 4(%%ebp),%0" : "=r" (p)); // loads address of return address
printf("%d\n", p - buffer); // computes offset
buffer[p - buffer] += 9; // 9 from disassembling main
}
int main() {
volatile int x = 7;
function();
x++;
printf("x = %d\n", x); // prints 7, not 8
}
On my system the offset is 76. That's the 64 bytes of the buffer (remember, the stack grows down, so the start of the buffer is far from the return address) plus whatever other detritus is in between.
Obviously if you are attacking an existing program you can't expect it to compute the answer for you, but I think this illustrates the principle.
(Also, we are lucky that +9 does not carry out into another byte. Otherwise the single byte increment would not set the return address how we expected. This example may break if you get unlucky with the return address within main)
I overlooked the 64-bitness of the original question somehow. The equivalent for x86-64 is 8(%rbp) because pointers are 8 bytes long. In that case my test build happens to produce an offset of 104. In the code above substitute 8(%%rbp) using the double %% to get a single % in the output assembly. This is described in this ABI document. Search for 8(%rbp).
There is a complaint in the comments that 4(%ebp) is just as magic as 76 or any other arbitrary number. In fact the meaning of the register %ebp (also called the "frame pointer") and its relationship to the location of the return address on the stack is standardized. One illustration I quickly Googled is here. That article uses the terminology "base pointer". If you wanted to exploit buffer overflows on other architectures it would require similarly detailed knowledge of the calling conventions of that CPU.
Roddy is right that you need to operate on pointer-sized values.
I would start by reading values in your exploit function (and printing them) rather than writing them. As you crawl past the end of your array, you should start to see values from the stack. Before long you should find the return address and be able to line it up with your disassembler dump.
Disassemble function() and see what it looks like.
Offset needs to be negative positive, maybe 64+8, as it's a 64-bit address. Also, you should do the '+7' on a pointer-sized object, not on a char. Otherwise if the two addresses cross a 256-byte boundary you will have exploited your exploit....
You might try running your code in a debugger, stepping each assembly line at a time, and examining the stack's memory space as well as registers.
I always like to operate on nice data types, like this one:
struct stackframe {
char *sf_bp;
char *sf_return_address;
};
void function() {
/* the following code is dirty. */
char *dummy;
dummy = (char *)&dummy;
struct stackframe *stackframe = dummy + 24; /* try multiples of 4 here. */
/* here starts the beautiful code. */
stackframe->sf_return_address += 7;
}
Using this code, you can easily check with the debugger whether the value in stackframe->sf_return_address matches your expectations.