Why do these two pointers that should be the same point to different data? - c

I'm writing a FAT16 driver in GNU C for a hobby operating system, and I have a structure defined as such:
struct directory_entry {
uint8_t name[11];
uint8_t attrib;
uint8_t name_case;
uint8_t created_decimal;
uint16_t created_time;
uint16_t created_date;
uint16_t accessed_date;
uint16_t ignore;
uint16_t modified_time;
uint16_t modified_date;
uint16_t first_cluster;
uint32_t length;
} __attribute__ ((packed));
I was under the impression that name would be at the same address as the whole struct, and that attrib would be 11 bytes after that. And indeed, (void *)e.name - (void *)&e is 0 and (void *)&e.attrib - (void *)&e is 11, where e is of type struct directory_entry.
In my kernel, a void pointer to e is passed to a function which reads its contents from a disk. After this function, *(uint8_t *)&e is 80 and *((uint8_t *)&e + 11 is 8, as expected for what's on the disk. However, e.name[0] and e.attrib both are 0.
What gives here? Am I misunderstanding how __attribute__ ((packed)) works? Other structs with the same attribute work how I expect at other parts of my kernel. I can post a link to the full source if needed.
Edit: The full source is in this gitlab repository, on the stack-overflow branch. The relevant part is lines 34 to 52 of src/kernel/main.c. I'm sure that the data is being populated right, as I check *(uint8_t *)&e and *((uint8_t *)&e + 11). When I run it, the following is output by that part:
(void *)e.name - *(void *)&e
=> 0
*(uint8_t *)&e
=> 80
e.name[0]
=> 0
(void *)&e.attrib - (void *)&e
=> 11
*((uint8_t *)&e + 11)
=> 8
e.attrib
=> 0
I'm very confused about why e.name[0] would be any different than *(uint8_t *)&e.
Edit 2: I disassembled this part using objdump, to see what the difference was in the compiled code, but now I'm even more confused.
u8_dec(*(uint8_t *)&e, nbuf); and u8_dec(e.name[0], nbuf); are both compiled to: (comments mine)
lea eax, [ebp - 0x30] ;loads address of e from stack into eax
movzx eax, byte [eax] ;loads byte pointed to by eax into eax, zero-extending
movzx eax, al ;not sure why this is here, as it's already zero-extended
sub esp, 0x8
push 0x31ce0 ;nbuf
push eax ;the byte we loaded
call 0x3162f ;u8_dec
add esp, 0x10
This passes in the first byte of the struct, as expected. I'm sure that u8_dec doesn't modify e, as its first argument is passed by value and not by reference. nbuf is an array declared at file-scope, while e is declared at function scope, so it's not that they overlap or anything. Perhaps u8_dec isn't doing its job right? Here's the source of that:
void u8_dec(uint8_t n, uint8_t *b) {
if (!n) {
*(uint16_t *)b = '0';
return;
}
bool zero = false;
for (uint32_t m = 100; m; m /= 10) {
uint8_t d = (n / m) % 10;
if (zero)
*(b++) = d + '0';
else if (d) {
zero = true;
*(b++) = d + '0';
}
}
*b = 0;
}
It's pretty clear now that packed structs do work how I think they do, but I'm still not sure what's causing the problem. I'm passing the same value to a function that should be deterministic, but I'm getting different results on different calls.

My kernel utilizes 32-bit protected mode segmenting. I had my data segment as 0x0000.0000 - 0x000f.ffff and my stack segment as 0x0003.8000 - 0x0003.ffff, to trigger a general protection fault if the stack over overflowed, rather than allowing it to overflow into other kernel data and code.
However, when GCC compiles C code, it assumes that the stack and data segments have the same base, as this is most often the case. This was causing a problem as when I took the address of the local variable, it was relative to the stack segment (as local variables are on the stack), but when I dereferenced the pointer in the function that was called, it was relative to the data segment.
I have changed my segmenting model so that the stack is in the data segment instead of its own segment, and this has fixed the problem.

Related

How to make a Hook and Trampoline function in one for WinAPI hooking

So I have been learning about the concept of hooking and using trampolines in order to bypass/execute data in a WinAPI hook function (In a different executable file, using DLL injection). So far I know how to make it (the trampoline and hook) using a mixture of assembly and C, but I can't seem to do it with just using C, as I seem to be missing something. I'd appreciate if someone could tell me what I'm doing wrong and how to fix it up.
Right now my code:
#include <Windows.h>
unsigned char* address = 0;
__declspec(naked) int __stdcall MessageBoxAHookTrampoline(HWND Window, char* Message, char* Title, int Type) {
__asm
{
push ebp
mov ebp, esp
mov eax, address
add eax, 5
jmp eax
}
}
int __stdcall MessageBoxAHook(HWND Window, char* Message, char* Title, int Type) {
wchar_t* WMessage = L"Hooked!";
wchar_t* WTitle = L"Success!";
MessageBoxW(0, WMessage, WTitle, 0);
return MessageBoxAHookTrampoline(Window, Message, Title, Type);
}
unsigned long __stdcall Thread(void* Context) {
address = (unsigned char*)GetProcAddress(LoadLibraryA("user32"), "MessageBoxA");
ULONG OP = 0;
if (VirtualProtect(address, 1, PAGE_EXECUTE_READWRITE, &OP)) {
memset(address, 0x90, 5);
*address = 0xE9;
*(unsigned long*)(address + 1) = (unsigned long)MessageBoxAHook - (unsigned long)address - 5;
}
else {
MessageBoxA(0, "Failed to change protection", "RIP", 0);
}
return 1;
}
// Entry point.
BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved) {
if (fdwReason == DLL_PROCESS_ATTACH) {
CreateThread(0, 0, Thread, 0, 0, 0);
}
else if (fdwReason == DLL_PROCESS_DETACH) {
}
return true;
}
So question is: How would I make a function say InstallHook that will install the hook and return a trampoline so I can use it easily?
Function prototype probably would be: void* InstallHook(void* originalFunc, void* targetFunc, int jumpsize), or so I've understood reading online, but unsure what jumpsize would be used for.
So far I know that the first 5 bytes must be preserved and restored, and then there's a jump to the address of the original hooked function. So I'd have to use malloc to allocate memory, memcpy to copy bytes over, the 0xE9 is the value of a jump instruction and such, but I just don't know how to implement it using just pure C. I figure it would be something similar to the code in this question. So how can I write a hook function that returns a trampoline using pure C for WinAPI functions?
If I understood the question correctly, you want to avoid "hard-coding" the trampoline function in assembly, presumably so you could have multiple trampolines in use at the same time without duplicating the code. You can achieve this using VirtualAlloc (malloc won't work since the returned memory won't be executable).
I wrote this from memory without access to a compiler so it might have some minor bugs, but the general idea is here. Normally you would also use VirtualProtect to change the page permissions to r-x instead of rwx once you're done modifying it, but I've left that out for the sake of simplicity:
void *CreateTrampoline(void *originalFunc)
{
/* Allocate the trampoline function */
uint8_t *trampoline = VirtualAlloc(
NULL,
5 + 5, /* 5 for the prologue, 5 for the JMP */
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE); /* Make trampoline executable */
/* Copy the original function's prologue */
memcpy(trampoline, originalFunc, 5);
/* JMP rel/32 opcode */
trampoline[5] = 0xE9;
/* JMP rel/32 operand */
uint32_t jmpDest = (uint32_t)originalFunc + 5; /* Skip original prologue */
uint32_t jmpSrc = (uint32_t)trampoline + 10; /* Starting after the JMP */
uint32_t delta = jmpDest - jmpSrc;
memcpy(trampoline + 6, &delta, 4);
return trampoline;
}
Your InstallHook function would then just call CreateTrampoline to create a trampoline, then patch the first 5 bytes of the original function with a JMP rel/32 to your hook.
Be warned, this only works on WinAPI functions, because Microsoft requires that they have a 5-byte prologue to enable hot-patching (which is what you're doing here). Normal functions do not have this requirement -- usually they only start with push ebp; mov ebp, esp which is only 3 bytes (and sometimes not even that, if the compiler decides to optimize it out).
Edit: here's how the math works:
_______________delta______________
| |
trampoline | originalFunc |
| | | |
v | v v
[prologue][jmp delta] [prologue][rest of func]
|________||_________| |________|
5 + 5 5

Memory mapped address in C (how to dereference)

I want to pretend that an array in C is an area of memory in a microprocessor, so I can compile some code on a PC. I've written a small program to try to get the syntax correct, but the program won't run, it either crashes or won't compile when I change the way I access the variable - it's late and I can't see why. What is wrong with this please?
// original code in microprocessor header that I need to change if I compile on the host
// BASE is simply a hex value that is later used as an address or a hex value
#define BASE (0x0000)
// used later in header like this (cannot change the way this is done)
#define OFFSET 0x0001
#define PERIPHERAL (BASE + OFFSET)
// also used like (also cannot change):
uint32_t var = PERIPHERAL | HEXMASK;
// here is how I intend to replace the uC specific code
// replace the BASE DEFINE with the next 2 lines of code:
// instead of writing to memory location, write to array of bytes instead, so declare it:
uint8_t BASE_memory[4] = {0, 0, 0, 0};
// define BASE as hex value that can be used as drop-in replacement in either of the 2 uses shown above
#define BASE ((uint32_t)(BASE_memory))
// now test usage
// access contents of BASE_memory[0]
printf("contents of BASE_memory[0] == %02x\n", *((uint32_t *)(BASE)));
// now I want to access PERIPHERAL, the second element of the array, i.e. BASE_memory[1]
printf("contents of BASE_memory[1] == %02x\n", *((uint32_t *)(PERIPHERAL)));
I think you are on a 64-bit system.
#include <stdint.h>
uint8_t BASE_memory[4] = {1, 2, 3, 4};
int func1()
{
return *(uint32_t *) (uint32_t) BASE_memory;
}
int func2()
{
return *(uint32_t *) (uintptr_t) BASE_memory;
}
Here's the assembly output for func1:
leaq _BASE_memory(%rip), %rax
movl %eax, %eax
movl (%rax), %eax
Here's the assembly for func2:
movl _BASE_memory(%rip), %eax
You can see that if you cast the address to uint32_t, then there's an extra step where the high bits are set to zero. The address is then wrong, and you get a segmentation fault. That's why you use uintptr_t or intptr_t instead of uint32_t.

Generating functions at runtime in C

I would like to generate a function at runtime in C. And by this I mean I would essentially like to allocate some memory, point at it and execute it via function pointer. I realize this is a very complex topic and my question is naïve. I also realize there are some very robust libraries out there that do this (e.g. nanojit).
But I would like to learn the technique, starting with the basics. Could someone knowledgeable give me a very simple example in C?
EDIT: The answer below is great but here is the same example for Windows:
#include <Windows.h>
#define MEMSIZE 100*1024*1024
typedef void (*func_t)(void);
int main() {
HANDLE proc = GetCurrentProcess();
LPVOID p = VirtualAlloc(
NULL,
MEMSIZE,
MEM_RESERVE|MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
func_t func = (func_t)p;
PDWORD code = (PDWORD)p;
code[0] = 0xC3; // ret
if(FlushInstructionCache(
proc,
NULL,
0))
{
func();
}
CloseHandle(proc);
VirtualFree(p, 0, MEM_RELEASE);
return 0;
}
As said previously by other posters, you'll need to know your platform pretty well.
Ignoring the issue of casting a object pointer to a function pointer being, technically, UB, here's an example that works for x86/x64 OS X (and possibly Linux too). All the generated code does is return to the caller.
#include <unistd.h>
#include <sys/mman.h>
typedef void (*func_t)(void);
int main() {
/*
* Get a RWX bit of memory.
* We can't just use malloc because the memory it returns might not
* be executable.
*/
unsigned char *code = mmap(NULL, getpagesize(),
PROT_READ|PROT_EXEC|PROT_WRITE,
MAP_SHARED|MAP_ANON, 0, 0);
/* Technically undefined behaviour */
func_t func = (func_t) code;
code[0] = 0xC3; /* x86 'ret' instruction */
func();
return 0;
}
Obviously, this will be different across different platforms but it outlines the basics needed: get executable section of memory, write instructions, execute instructions.
This requires you to know your platform. For instance, what is the C calling convention on your platform? Where are parameters stored? What register holds the return value? What registers must be saved and restored? Once you know that, you can essentially write some C code that assembles code into a block of memory, then cast that memory into a function pointer (though this is technically forbidden in ANSI C, and will not work depending if your platform marks some pages of memory as non-executable aka NX bit).
The simple way to go about this is simply to write some code, compile it, then disassemble it and look at what bytes correspond to which instructions. You can write some C code that fills allocated memory with that collection of bytes and then casts it to a function pointer of the appropriate type and executes.
It's probably best to start by reading the calling conventions for your architecture and compiler. Then learn to write assembly that can be called from C (i.e., follows the calling convention).
If you have tools, they can help you get some things right easier. For example, instead of trying to design the right function prologue/epilogue, I can just code this in C:
int foo(void* Data)
{
return (Data != 0);
}
Then (MicrosoftC under Windows) feed it to "cl /Fa /c foo.c". Then I can look at "foo.asm":
_Data$ = 8
; Line 2
push ebp
mov ebp, esp
; Line 3
xor eax, eax
cmp DWORD PTR _Data$[ebp], 0
setne al
; Line 4
pop ebp
ret 0
I could also use "dumpbin /all foo.obj" to see that the exact bytes of the function were:
00000000: 55 8B EC 33 C0 83 7D 08 00 0F 95 C0 5D C3
Just saves me some time getting the bytes exactly right...

Buffer overflow in C

I'm attempting to write a simple buffer overflow using C on Mac OS X 10.6 64-bit. Here's the concept:
void function() {
char buffer[64];
buffer[offset] += 7; // i'm not sure how large offset needs to be, or if
// 7 is correct.
}
int main() {
int x = 0;
function();
x += 1;
printf("%d\n", x); // the idea is to modify the return address so that
// the x += 1 expression is not executed and 0 gets
// printed
return 0;
}
Here's part of main's assembler dump:
...
0x0000000100000ebe <main+30>: callq 0x100000e30 <function>
0x0000000100000ec3 <main+35>: movl $0x1,-0x8(%rbp)
0x0000000100000eca <main+42>: mov -0x8(%rbp),%esi
0x0000000100000ecd <main+45>: xor %al,%al
0x0000000100000ecf <main+47>: lea 0x56(%rip),%rdi # 0x100000f2c
0x0000000100000ed6 <main+54>: callq 0x100000ef4 <dyld_stub_printf>
...
I want to jump over the movl instruction, which would mean I'd need to increment the return address by 42 - 35 = 7 (correct?). Now I need to know where the return address is stored so I can calculate the correct offset.
I have tried searching for the correct value manually, but either 1 gets printed or I get abort trap – is there maybe some kind of buffer overflow protection going on?
Using an offset of 88 works on my machine. I used Nemo's approach of finding out the return address.
This 32-bit example illustrates how you can figure it out, see below for 64-bit:
#include <stdio.h>
void function() {
char buffer[64];
char *p;
asm("lea 4(%%ebp),%0" : "=r" (p)); // loads address of return address
printf("%d\n", p - buffer); // computes offset
buffer[p - buffer] += 9; // 9 from disassembling main
}
int main() {
volatile int x = 7;
function();
x++;
printf("x = %d\n", x); // prints 7, not 8
}
On my system the offset is 76. That's the 64 bytes of the buffer (remember, the stack grows down, so the start of the buffer is far from the return address) plus whatever other detritus is in between.
Obviously if you are attacking an existing program you can't expect it to compute the answer for you, but I think this illustrates the principle.
(Also, we are lucky that +9 does not carry out into another byte. Otherwise the single byte increment would not set the return address how we expected. This example may break if you get unlucky with the return address within main)
I overlooked the 64-bitness of the original question somehow. The equivalent for x86-64 is 8(%rbp) because pointers are 8 bytes long. In that case my test build happens to produce an offset of 104. In the code above substitute 8(%%rbp) using the double %% to get a single % in the output assembly. This is described in this ABI document. Search for 8(%rbp).
There is a complaint in the comments that 4(%ebp) is just as magic as 76 or any other arbitrary number. In fact the meaning of the register %ebp (also called the "frame pointer") and its relationship to the location of the return address on the stack is standardized. One illustration I quickly Googled is here. That article uses the terminology "base pointer". If you wanted to exploit buffer overflows on other architectures it would require similarly detailed knowledge of the calling conventions of that CPU.
Roddy is right that you need to operate on pointer-sized values.
I would start by reading values in your exploit function (and printing them) rather than writing them. As you crawl past the end of your array, you should start to see values from the stack. Before long you should find the return address and be able to line it up with your disassembler dump.
Disassemble function() and see what it looks like.
Offset needs to be negative positive, maybe 64+8, as it's a 64-bit address. Also, you should do the '+7' on a pointer-sized object, not on a char. Otherwise if the two addresses cross a 256-byte boundary you will have exploited your exploit....
You might try running your code in a debugger, stepping each assembly line at a time, and examining the stack's memory space as well as registers.
I always like to operate on nice data types, like this one:
struct stackframe {
char *sf_bp;
char *sf_return_address;
};
void function() {
/* the following code is dirty. */
char *dummy;
dummy = (char *)&dummy;
struct stackframe *stackframe = dummy + 24; /* try multiples of 4 here. */
/* here starts the beautiful code. */
stackframe->sf_return_address += 7;
}
Using this code, you can easily check with the debugger whether the value in stackframe->sf_return_address matches your expectations.

How to skip a line doing a buffer overflow in C

I want to skip a line in C, the line x=1; in the main section using bufferoverflow; however, I don't know why I can not skip the address from 4002f4 to the next address 4002fb in spite of the fact that I am counting 7 bytes form <main+35> to <main+42>.
I also have configured the options the randomniZation and execstack environment in a Debian and AMD environment, but I am still getting x=1;. What it's wrong with this procedure?
I have used dba to debug the stack and the memory addresses:
0x00000000004002ef <main+30>: callq 0x4002a4 **<function>**
**0x00000000004002f4** <main+35>: movl $0x1,-0x4(%rbp)
**0x00000000004002fb** <main+42>: mov -0x4(%rbp),%esi
0x00000000004002fe <main+45>: mov $0x4629c4,%edi
void function(int a, int b, int c)
{
char buffer[5];
int *ret;
ret = buffer + 12;
(*ret) += 8;
}
int main()
{
int x = 0;
function(1, 2, 3);
x = 1;
printf("x = %i \n", x);
return 0;
}
You must be reading Smashing the Stack for Fun and Profit article. I was reading the same article and have found the same problem it wasnt skipping that instruction. After a few hours debug session in IDA I have changed the code like below and it is printing x=0 and b=5.
#include <stdio.h>
void function(int a, int b) {
int c=0;
int* pointer;
pointer =&c+2;
(*pointer)+=8;
}
void main() {
int x =0;
function(1,2);
x = 3;
int b =5;
printf("x=%d\n, b=%d\n",x,b);
getch();
}
In order to alter the return address within function() to skip over the x = 1 in main(), you need two pieces of information.
1. The location of the return address in the stack frame.
I used gdb to determine this value. I set a breakpoint at function() (break function), execute the code up to the breakpoint (run), retrieve the location in memory of the current stack frame (p $rbp or info reg), and then retrieve the location in memory of buffer (p &buffer). Using the retrieved values, the location of the return address can be determined.
(compiled w/ GCC -g flag to include debug symbols and executed in a 64-bit environment)
(gdb) break function
...
(gdb) run
...
(gdb) p $rbp
$1 = (void *) 0x7fffffffe270
(gdb) p &buffer
$2 = (char (*)[5]) 0x7fffffffe260
(gdb) quit
(frame pointer address + size of word) - buffer address = number of bytes from local buffer variable to return address
(0x7fffffffe270 + 8) - 0x7fffffffe260 = 24
If you are having difficulties understanding how the call stack works, reading the call stack and function prologue Wikipedia articles may help. This shows the difficulty in making "buffer overflow" examples in C. The offset of 24 from buffer assumes a certain padding style and compile options. GCC will happily insert stack canaries nowadays unless you tell it not to.
2. The number of bytes to add to the return address to skip over x = 1.
In your case the saved instruction pointer will point to 0x00000000004002f4 (<main+35>), the first instruction after function returns. To skip the assignment you need to make the saved instruction pointer point to 0x00000000004002fb (<main+42>).
Your calculation that this is 7 bytes is correct (0x4002fb - 0x4002fb = 7).
I used gdb to disassemble the application (disas main) and verified the calculation for my case as well. This value is best resolved manually by inspecting the disassembly.
Note that I used a Ubuntu 10.10 64-bit environment to test the following code.
#include <stdio.h>
void function(int a, int b, int c)
{
char buffer[5];
int *ret;
ret = (int *)(buffer + 24);
(*ret) += 7;
}
int main()
{
int x = 0;
function(1, 2, 3);
x = 1;
printf("x = %i \n", x);
return 0;
}
output
x = 0
This is really just altering the return address of function() rather than an actual buffer overflow. In an actual buffer overflow, you would be overflowing buffer[5] to overwrite the return address. However, most modern implementations use techniques such as stack canaries to protect against this.
What you're doing here doesn't seem to have much todo with a classic bufferoverflow attack. The whole idea of a bufferoverflow attack is to modify the return adress of 'function'. Disassembling your program will show you where the ret instruction (assuming x86) takes its adress from. This is what you need to modify to point at main+42.
I assume you want to explicitly provoke the bufferoverflow here, normally you'd need to provoke it by manipulating the inputs of 'function'.
By just declaring a buffer[5] you're moving the stackpointer in the wrong direction (verify this by looking at the generated assembly), the return adress is somewhere deeper inside in the stack (it was put there by the call instruction). In x86 stacks grow downwards, that is towards lower adresses.
I'd approach this by declaring an int* and moving it upward until I'm at the specified adress where the return adress has been pushed, then modify that value to point at main+42 and let function ret.
You can't do that this way.
Here's a classic bufferoverflow code sample. See what happens once you feed it with 5 and then 6 characters from your keyboard. If you go for more (16 chars should do) you'll overwrite base pointer, then function return address and you'll get segmentation fault. What you want to do is to figure out which 4 chars overwrite the return addr. and make the program execute your code. Google around linux stack, memory structure.
void ff(){
int a=0; char b[5];
scanf("%s",b);
printf("b:%x a:%x\n" ,b ,&a);
printf("b:'%s' a:%d\n" ,b ,a);
}
int main() {
ff();
return 0;
}

Resources