Memory mapped address in C (how to dereference) - c

I want to pretend that an array in C is an area of memory in a microprocessor, so I can compile some code on a PC. I've written a small program to try to get the syntax correct, but the program won't run, it either crashes or won't compile when I change the way I access the variable - it's late and I can't see why. What is wrong with this please?
// original code in microprocessor header that I need to change if I compile on the host
// BASE is simply a hex value that is later used as an address or a hex value
#define BASE (0x0000)
// used later in header like this (cannot change the way this is done)
#define OFFSET 0x0001
#define PERIPHERAL (BASE + OFFSET)
// also used like (also cannot change):
uint32_t var = PERIPHERAL | HEXMASK;
// here is how I intend to replace the uC specific code
// replace the BASE DEFINE with the next 2 lines of code:
// instead of writing to memory location, write to array of bytes instead, so declare it:
uint8_t BASE_memory[4] = {0, 0, 0, 0};
// define BASE as hex value that can be used as drop-in replacement in either of the 2 uses shown above
#define BASE ((uint32_t)(BASE_memory))
// now test usage
// access contents of BASE_memory[0]
printf("contents of BASE_memory[0] == %02x\n", *((uint32_t *)(BASE)));
// now I want to access PERIPHERAL, the second element of the array, i.e. BASE_memory[1]
printf("contents of BASE_memory[1] == %02x\n", *((uint32_t *)(PERIPHERAL)));

I think you are on a 64-bit system.
#include <stdint.h>
uint8_t BASE_memory[4] = {1, 2, 3, 4};
int func1()
{
return *(uint32_t *) (uint32_t) BASE_memory;
}
int func2()
{
return *(uint32_t *) (uintptr_t) BASE_memory;
}
Here's the assembly output for func1:
leaq _BASE_memory(%rip), %rax
movl %eax, %eax
movl (%rax), %eax
Here's the assembly for func2:
movl _BASE_memory(%rip), %eax
You can see that if you cast the address to uint32_t, then there's an extra step where the high bits are set to zero. The address is then wrong, and you get a segmentation fault. That's why you use uintptr_t or intptr_t instead of uint32_t.

Related

Why do these two pointers that should be the same point to different data?

I'm writing a FAT16 driver in GNU C for a hobby operating system, and I have a structure defined as such:
struct directory_entry {
uint8_t name[11];
uint8_t attrib;
uint8_t name_case;
uint8_t created_decimal;
uint16_t created_time;
uint16_t created_date;
uint16_t accessed_date;
uint16_t ignore;
uint16_t modified_time;
uint16_t modified_date;
uint16_t first_cluster;
uint32_t length;
} __attribute__ ((packed));
I was under the impression that name would be at the same address as the whole struct, and that attrib would be 11 bytes after that. And indeed, (void *)e.name - (void *)&e is 0 and (void *)&e.attrib - (void *)&e is 11, where e is of type struct directory_entry.
In my kernel, a void pointer to e is passed to a function which reads its contents from a disk. After this function, *(uint8_t *)&e is 80 and *((uint8_t *)&e + 11 is 8, as expected for what's on the disk. However, e.name[0] and e.attrib both are 0.
What gives here? Am I misunderstanding how __attribute__ ((packed)) works? Other structs with the same attribute work how I expect at other parts of my kernel. I can post a link to the full source if needed.
Edit: The full source is in this gitlab repository, on the stack-overflow branch. The relevant part is lines 34 to 52 of src/kernel/main.c. I'm sure that the data is being populated right, as I check *(uint8_t *)&e and *((uint8_t *)&e + 11). When I run it, the following is output by that part:
(void *)e.name - *(void *)&e
=> 0
*(uint8_t *)&e
=> 80
e.name[0]
=> 0
(void *)&e.attrib - (void *)&e
=> 11
*((uint8_t *)&e + 11)
=> 8
e.attrib
=> 0
I'm very confused about why e.name[0] would be any different than *(uint8_t *)&e.
Edit 2: I disassembled this part using objdump, to see what the difference was in the compiled code, but now I'm even more confused.
u8_dec(*(uint8_t *)&e, nbuf); and u8_dec(e.name[0], nbuf); are both compiled to: (comments mine)
lea eax, [ebp - 0x30] ;loads address of e from stack into eax
movzx eax, byte [eax] ;loads byte pointed to by eax into eax, zero-extending
movzx eax, al ;not sure why this is here, as it's already zero-extended
sub esp, 0x8
push 0x31ce0 ;nbuf
push eax ;the byte we loaded
call 0x3162f ;u8_dec
add esp, 0x10
This passes in the first byte of the struct, as expected. I'm sure that u8_dec doesn't modify e, as its first argument is passed by value and not by reference. nbuf is an array declared at file-scope, while e is declared at function scope, so it's not that they overlap or anything. Perhaps u8_dec isn't doing its job right? Here's the source of that:
void u8_dec(uint8_t n, uint8_t *b) {
if (!n) {
*(uint16_t *)b = '0';
return;
}
bool zero = false;
for (uint32_t m = 100; m; m /= 10) {
uint8_t d = (n / m) % 10;
if (zero)
*(b++) = d + '0';
else if (d) {
zero = true;
*(b++) = d + '0';
}
}
*b = 0;
}
It's pretty clear now that packed structs do work how I think they do, but I'm still not sure what's causing the problem. I'm passing the same value to a function that should be deterministic, but I'm getting different results on different calls.
My kernel utilizes 32-bit protected mode segmenting. I had my data segment as 0x0000.0000 - 0x000f.ffff and my stack segment as 0x0003.8000 - 0x0003.ffff, to trigger a general protection fault if the stack over overflowed, rather than allowing it to overflow into other kernel data and code.
However, when GCC compiles C code, it assumes that the stack and data segments have the same base, as this is most often the case. This was causing a problem as when I took the address of the local variable, it was relative to the stack segment (as local variables are on the stack), but when I dereferenced the pointer in the function that was called, it was relative to the data segment.
I have changed my segmenting model so that the stack is in the data segment instead of its own segment, and this has fixed the problem.

Complex casting

After learn C and theorical stuffs of operatives systems, i decided to analyze one kernel rootkit for linux, but i can't understand one line of code, i don't know how to read that line:
*(void **)&((char *)h->original_function)[ASM_HOOK_CODE_OFFSET] = h->modified_function;
Full context:
#if defined __i386__
// push 0x00000000, ret
#define ASM_HOOK_CODE "\x68\x00\x00\x00\x00\xc3"
#define ASM_HOOK_CODE_OFFSET 1
// alternativly we could do `mov eax 0x00000000, jmp eax`, but it's a byte longer
//#define ASM_HOOK_CODE "\xb8\x00\x00\x00\x00\xff\xe0"
#elif defined __x86_64__
// there is no push that pushes a 64-bit immidiate in x86_64,
// so we do things a bit differently:
// mov rax 0x0000000000000000, jmp rax
#define ASM_HOOK_CODE "\x48\xb8\x00\x00\x00\x00\x00\x00\x00\x00\xff\xe0"
#define ASM_HOOK_CODE_OFFSET 2
#else
#error ARCH_ERROR_MESSAGE
#endif
struct asm_hook {
void *original_function;
void *modified_function;
char original_asm[sizeof(ASM_HOOK_CODE)-1];
struct list_head list;
};
/**
* Patches machine code of the original function to call another function.
* This function should not be called directly.
*/
void _asm_hook_patch(struct asm_hook *h)
{
DISABLE_W_PROTECTED_MEMORY
memcpy(h->original_function, ASM_HOOK_CODE, sizeof(ASM_HOOK_CODE)-1);
*(void **)&((char *)h->original_function)[ASM_HOOK_CODE_OFFSET] = h->modified_function;
ENABLE_W_PROTECTED_MEMORY
}
Rootkit link:
https://github.com/nurupo/rootkit/blob/master/rootkit.c
(line 314 of rootkit.c)
I don't want to be explained how the rootkit, only want to understand how to read that line of code, the first part of the line makes me dizzy.
If I am not mistaken: you start at the innermost part, which is
h->original_function
Then we see a brace at the right so now we scan left for the matching brace. But wait a second, we see a cast to (char *) so it is a pointer to a char and now the brace closes.
On the right we now see an array indexing to take element [ASM_HOOK_CODE_OFFSET] and on the left we now see & to take its address. So we now have the address of a char.
Now we can only go the the left and see *(void **), which casts this address to be a pointer-to-a-pointer-to-void and then dereference it and assign the right part to the address.

Is it possible to wrap shellcode in a C function such that control is returned to the caller after completion?

Suppose I have some arbitrary x86 instructions that I want to have executed in the context of some program, and I convert these instructions automatically or manually into shellcode. For example, the following instructions.
movq 1, %rax
cpuid
There are various questions, such as here and here, about casting shellcode to a function pointer and executing it by using a standard function invocation. However, arbitrary asm will generally not have the instructions to return to the caller after all the instructions have been completed.
I am interesting in writing an "interpreter" of sorts for arbitrary shellcode, so that it can execute a bunch of instructions (perhaps they are in a file somewhere), read out the value of certain registers, and return control to the main C program. I assume the shell code does not do something like exec and change the process, but merely runs instructions like rdpmc or cpuid.
I imagine something that looks like this, but I am not sure how I can patch the shellcode so that it returns control to the right place.
void executeAndReadRegisters(char* shellcode, int length, uint64_t* rax, uint64_t* rbx, uint64_t* rbx) {
// Modify the shellcode in some way so that it returns control to the
// current program's code after execution, right after "read out registers".
char* modifiedShellCode = malloc((length + EXTRA_NEEDED) * sizeof(char));
// How do I modify the shellcode to return to "Read out registers?"
int (*func)();
func = (int (*)()) modifiedShellCode;
(int)(*func)();
// Read out registers
asm("\t movq %%rax,%0" : "=r"(*rax));
asm("\t movq %%rbx,%0" : "=r"(*rbx));
asm("\t movq %%rcx,%0" : "=r"(*rcx));
}
int main(int argc, char **argv)
{
// Suppose this comes from a file somewhere
char shellcode[] = "...";
int length = ; // Get from external source
uint64_t rax,rbx,rcx;
executeAndReadRegisters(shellcode, length, &rax,&rbx, &rcx);
printf("%lu %lu %lu\n", rax,rbx,rcx);
}

Segmentation fault creating a user-level thread with C and assembly

I am trying to understand some OS fundamentals using some assignments. I have already posted a similar question and got satisfying answers. But this one is slightly different but I haven't been able to debug it. So here's what I do:
What I want to do is to start a main program, malloc a space, use it as a stack to start a user-level thread. My problem is with return address. Here's the code so far:
[I'm editing my code to make it up-to-date to the current state of my answer ]
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#define STACK_SIZE 512
void switch_thread(int*,int*);
int k = 0;
void simple_function()
{
printf("I am the function! k is: %d\n",k);
exit(0);
}
void create_thread(void (*function)())
{
int* stack = malloc(STACK_SIZE + 32);
stack = (int* )(((long)stack & (-1 << 4)) + 0x10);
stack = (int* ) ((long)stack + STACK_SIZE);
*stack = (long) function;
switch_thread(stack,stack);
}
int main()
{
create_thread(simple_function);
assert(0);
return 0;
}
switch_thread is an assembly code I've written as follows:
.text
.globl switch_thread
switch_thread:
movq %rdi, %rsp
movq %rsi, %rbp
ret
This code runs really well under GDB and gives the expected output (which is,passing the control to simple_function and printing "I am the function! k is: 0". But when run separately, this gives a segmentation fault. I'm baffled by this result.
Any help would be appreciated. Thanks in advance.
Two problems with your code:
Unless your thread is actually inside a proper procedure (or a nested procedure), there's no such thing as "base pointer". This makes the value of %rbp irrelevant since the thread is not inside a particular procedure at the point of initialization.
Contrary to what you think, when the ret instruction gets executed, the value that %rsp is referring to becomes the new value of the program counter. This means that instead of *(base_pointer + 1), *(base_pointer) will be consulted when it gets executed. Again, the value of %rbp is irrelevant here.
Your code (with minimal modification to make it run) should look like this:
void switch_thread(int* stack_pointer,int* entry_point);
void create_thread(void (*function)())
{
int* stack_pointer = malloc(STACK_SIZE + 8);
stack_pointer += STACK_SIZE; //you'd probably want to back up the original allocated address if you intend to free it later for any reason.
switch_thread(stack_pointer,function);
}
Your switch_thread routine should look like this:
.text
.globl switch_thread
switch_thread:
mov %rsp, %rax //move the original stack pointer to a scratch register
mov %rdi, %rsp //set stack pointer
push %rax //back-up the original stack pointer
call %rsi //call the function
pop %rsp //restore the original stack pointer
ret //return to create_thread
FYI: If you're initializing a thread on your own, I suggest that you first create a proper trampoline that acts as a thread entry point (e.g. ntdll's RtlUserThreadStart). This will make things much cleaner especially if you want to make your program multithreaded and also pass in any parameters to the start routine.
base_pointer needs to be suitably aligned to store void (*)() values, otherwise you're dealing with undefined behaviour. I think you mean something like this:
void create_thread(void (*function)())
{
size_t offset = STACK_SIZE + sizeof function - STACK_SIZE % sizeof function;
char *stack_pointer = malloc(offset + sizeof *base_pointer);
void (**base_pointer)() = stack_pointer + offset;
*base_pointer = function;
switch_thread(stack_pointer,base_pointer);
}
There is no need to cast malloc. It's generally a bad idea to cast pointers to integer types, or function pointers to object pointer types.
I understand that this is all portable-C nit-picky advice, but it really does help to write as much as your software as possible in portable code rather than relying upon undefined behaviour.

Buffer overflow in C

I'm attempting to write a simple buffer overflow using C on Mac OS X 10.6 64-bit. Here's the concept:
void function() {
char buffer[64];
buffer[offset] += 7; // i'm not sure how large offset needs to be, or if
// 7 is correct.
}
int main() {
int x = 0;
function();
x += 1;
printf("%d\n", x); // the idea is to modify the return address so that
// the x += 1 expression is not executed and 0 gets
// printed
return 0;
}
Here's part of main's assembler dump:
...
0x0000000100000ebe <main+30>: callq 0x100000e30 <function>
0x0000000100000ec3 <main+35>: movl $0x1,-0x8(%rbp)
0x0000000100000eca <main+42>: mov -0x8(%rbp),%esi
0x0000000100000ecd <main+45>: xor %al,%al
0x0000000100000ecf <main+47>: lea 0x56(%rip),%rdi # 0x100000f2c
0x0000000100000ed6 <main+54>: callq 0x100000ef4 <dyld_stub_printf>
...
I want to jump over the movl instruction, which would mean I'd need to increment the return address by 42 - 35 = 7 (correct?). Now I need to know where the return address is stored so I can calculate the correct offset.
I have tried searching for the correct value manually, but either 1 gets printed or I get abort trap – is there maybe some kind of buffer overflow protection going on?
Using an offset of 88 works on my machine. I used Nemo's approach of finding out the return address.
This 32-bit example illustrates how you can figure it out, see below for 64-bit:
#include <stdio.h>
void function() {
char buffer[64];
char *p;
asm("lea 4(%%ebp),%0" : "=r" (p)); // loads address of return address
printf("%d\n", p - buffer); // computes offset
buffer[p - buffer] += 9; // 9 from disassembling main
}
int main() {
volatile int x = 7;
function();
x++;
printf("x = %d\n", x); // prints 7, not 8
}
On my system the offset is 76. That's the 64 bytes of the buffer (remember, the stack grows down, so the start of the buffer is far from the return address) plus whatever other detritus is in between.
Obviously if you are attacking an existing program you can't expect it to compute the answer for you, but I think this illustrates the principle.
(Also, we are lucky that +9 does not carry out into another byte. Otherwise the single byte increment would not set the return address how we expected. This example may break if you get unlucky with the return address within main)
I overlooked the 64-bitness of the original question somehow. The equivalent for x86-64 is 8(%rbp) because pointers are 8 bytes long. In that case my test build happens to produce an offset of 104. In the code above substitute 8(%%rbp) using the double %% to get a single % in the output assembly. This is described in this ABI document. Search for 8(%rbp).
There is a complaint in the comments that 4(%ebp) is just as magic as 76 or any other arbitrary number. In fact the meaning of the register %ebp (also called the "frame pointer") and its relationship to the location of the return address on the stack is standardized. One illustration I quickly Googled is here. That article uses the terminology "base pointer". If you wanted to exploit buffer overflows on other architectures it would require similarly detailed knowledge of the calling conventions of that CPU.
Roddy is right that you need to operate on pointer-sized values.
I would start by reading values in your exploit function (and printing them) rather than writing them. As you crawl past the end of your array, you should start to see values from the stack. Before long you should find the return address and be able to line it up with your disassembler dump.
Disassemble function() and see what it looks like.
Offset needs to be negative positive, maybe 64+8, as it's a 64-bit address. Also, you should do the '+7' on a pointer-sized object, not on a char. Otherwise if the two addresses cross a 256-byte boundary you will have exploited your exploit....
You might try running your code in a debugger, stepping each assembly line at a time, and examining the stack's memory space as well as registers.
I always like to operate on nice data types, like this one:
struct stackframe {
char *sf_bp;
char *sf_return_address;
};
void function() {
/* the following code is dirty. */
char *dummy;
dummy = (char *)&dummy;
struct stackframe *stackframe = dummy + 24; /* try multiples of 4 here. */
/* here starts the beautiful code. */
stackframe->sf_return_address += 7;
}
Using this code, you can easily check with the debugger whether the value in stackframe->sf_return_address matches your expectations.

Resources