void foo(int a)
{ printf ("In foo, a = %d\n", a); }
unsigned char code[9];
* ((DWORD *) &code[0]) = 0x042444FF; /* inc dword ptr [esp+4] */
code[4] = 0xe9; /* JMP */
* ((DWORD *) &code[5]) = (DWORD) &foo - (DWORD) &code[0] - 9;
void (*pf)(int/* a*/) = (void (*)(int)) &code[0];
pf (6);
Anyone knows where in the above code 6 is incremented by 1?
foo(), as well as your thunk, uses the __cdecl calling conversion, which requires the caller to push parameters on the stack. So when pf(6) is called, 6 gets pushed onto the stack via a PUSH 6 instruction, and then the thunk is entered via a CALL pf instruction. The memory that 6 occupies on the stack is located at ESP+4 when the thunk is entered, ie 4 bytes from the current value of the stack pointer register ESP. The first instruction of the thunk is to increment the value that is pointed to by ESP+4, thus the value '6' is incremented to '7'. foo() is then entered by the thunk's JMP foo instruction. foo() then sees its a parameter as 7 instead of the original 6 because the thunk modified foo()'s call stack.
Related
I'am doing an exercice for an Operational Systems class and getting an SegFault error when calling printf with arguments.
The objective of the exercice is to simulate the initialization of a thread and print a counter, not very difficult. I have a table of 4 entries each with size 4096 bytes, each entry must represent the thread's stack represented as
#define STACK_SIZE 4096
char table[4][STACK_SIZE];
I defined a type called coroutine that will get only a stack address
typedef void* coroutine_t;
The i have a initialization code. This code must take the end of the routine stack, append the address of the coroutine and the initialization of the registers and return the pointer that will be the stack pointer for the coroutine.
coroutine_t init_coroutine(void *stack_begin, unsigned int stack_size,
void (*initial_pc)(void)) {
char *stack_end = ((char *)stack_begin) + stack_size;
void **ptr = (void**) stack_end;
ptr--;
*ptr = initial_pc;
ptr--;
*ptr = stack_end; /* Frame pointer */
ptr--;
*ptr = 0; /* RBX*/
ptr--;
*ptr = 0; /* R12 */
ptr--;
*ptr = 0; /* R13 */
ptr--;
*ptr = 0; /* R14 */
ptr--;
*ptr = 0; /* R15 */
return ptr;
}
Then i have this code in x86 assembly to enter the coroutine that just pop the register previously pushed
.global enter_coroutine /* Makes enter_coroutine visible to the linker*/
enter_coroutine:
mov %rdi,%rsp /* RDI contains the argument to enter_coroutine. */
/* And is copied to RSP. */
pop %r15
pop %r14
pop %r13
pop %r12
pop %rbx
pop %rbp
ret /* Pop the program counter */
The rest of my code is this
coroutine_t cr;
void test_function() {
int counter = 0;
while(1) {
printf("counter1: %d\n", counter);
counter++;
}
}
int main() {
cr = init_coroutine(table[0], STACK_SIZE, &test_function);
enter_coroutine(cr);
return 0;
}
So for the error
If i run as it is i will get a segfault when the program call printf the output from gdb is
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7dfcfdd in __vfprintf_internal (s=0x7ffff7f9d760 <_IO_2_1_stdout_>, format=0x555555556004 "counter1: %d\n", ap=ap#entry=0x555555558f48 <table+3848>, mode_flags=mode_flags#entry=0) at vfprintf-internal.c:1385
I assume it has some thing happening with the stack for two causes:
If i just print a string without parameters i get no error
If i remove the first ptr-- statement from the init_coroutine function it will also work, but will alocate things in the end of the stack and hence in the other thread's stack
I'am running this in a Intel(R) Core(TM) i5-5200U CPU with ubuntu 21.10 and ggc version 11.2.0
Could you give me some light here ?
I wasn't able to reproduce the problem on my x86_64 Linux box, but I was on compiler explorer, and the problem seems to be simple stack overflow (i.e., 4096 is too small a stack for printf).
Increasing the stack size (or choosing table[1], table[2], or table[3] instead table[0], which is effectively the same as increasing stack size) appears to make it work: https://gcc.godbolt.org/z/rnfMThbjo
I'm writing a FAT16 driver in GNU C for a hobby operating system, and I have a structure defined as such:
struct directory_entry {
uint8_t name[11];
uint8_t attrib;
uint8_t name_case;
uint8_t created_decimal;
uint16_t created_time;
uint16_t created_date;
uint16_t accessed_date;
uint16_t ignore;
uint16_t modified_time;
uint16_t modified_date;
uint16_t first_cluster;
uint32_t length;
} __attribute__ ((packed));
I was under the impression that name would be at the same address as the whole struct, and that attrib would be 11 bytes after that. And indeed, (void *)e.name - (void *)&e is 0 and (void *)&e.attrib - (void *)&e is 11, where e is of type struct directory_entry.
In my kernel, a void pointer to e is passed to a function which reads its contents from a disk. After this function, *(uint8_t *)&e is 80 and *((uint8_t *)&e + 11 is 8, as expected for what's on the disk. However, e.name[0] and e.attrib both are 0.
What gives here? Am I misunderstanding how __attribute__ ((packed)) works? Other structs with the same attribute work how I expect at other parts of my kernel. I can post a link to the full source if needed.
Edit: The full source is in this gitlab repository, on the stack-overflow branch. The relevant part is lines 34 to 52 of src/kernel/main.c. I'm sure that the data is being populated right, as I check *(uint8_t *)&e and *((uint8_t *)&e + 11). When I run it, the following is output by that part:
(void *)e.name - *(void *)&e
=> 0
*(uint8_t *)&e
=> 80
e.name[0]
=> 0
(void *)&e.attrib - (void *)&e
=> 11
*((uint8_t *)&e + 11)
=> 8
e.attrib
=> 0
I'm very confused about why e.name[0] would be any different than *(uint8_t *)&e.
Edit 2: I disassembled this part using objdump, to see what the difference was in the compiled code, but now I'm even more confused.
u8_dec(*(uint8_t *)&e, nbuf); and u8_dec(e.name[0], nbuf); are both compiled to: (comments mine)
lea eax, [ebp - 0x30] ;loads address of e from stack into eax
movzx eax, byte [eax] ;loads byte pointed to by eax into eax, zero-extending
movzx eax, al ;not sure why this is here, as it's already zero-extended
sub esp, 0x8
push 0x31ce0 ;nbuf
push eax ;the byte we loaded
call 0x3162f ;u8_dec
add esp, 0x10
This passes in the first byte of the struct, as expected. I'm sure that u8_dec doesn't modify e, as its first argument is passed by value and not by reference. nbuf is an array declared at file-scope, while e is declared at function scope, so it's not that they overlap or anything. Perhaps u8_dec isn't doing its job right? Here's the source of that:
void u8_dec(uint8_t n, uint8_t *b) {
if (!n) {
*(uint16_t *)b = '0';
return;
}
bool zero = false;
for (uint32_t m = 100; m; m /= 10) {
uint8_t d = (n / m) % 10;
if (zero)
*(b++) = d + '0';
else if (d) {
zero = true;
*(b++) = d + '0';
}
}
*b = 0;
}
It's pretty clear now that packed structs do work how I think they do, but I'm still not sure what's causing the problem. I'm passing the same value to a function that should be deterministic, but I'm getting different results on different calls.
My kernel utilizes 32-bit protected mode segmenting. I had my data segment as 0x0000.0000 - 0x000f.ffff and my stack segment as 0x0003.8000 - 0x0003.ffff, to trigger a general protection fault if the stack over overflowed, rather than allowing it to overflow into other kernel data and code.
However, when GCC compiles C code, it assumes that the stack and data segments have the same base, as this is most often the case. This was causing a problem as when I took the address of the local variable, it was relative to the stack segment (as local variables are on the stack), but when I dereferenced the pointer in the function that was called, it was relative to the data segment.
I have changed my segmenting model so that the stack is in the data segment instead of its own segment, and this has fixed the problem.
In XCode I've tried to manipulate const int value by using pointer. Here is the code:
const int con = 5;
int *p;
p = &con;
(*p) +=1;
printf("Add of constant:%p\n",&con);
printf("Add of pointer:%p\n",p);
printf("%d - %d",con,*p);
Result is like that on XCode:
Add of constant:0x7fff5fbff79c
Add of pointer:0x7fff5fbff79c
5 - 6
but on linux virtual machine values of con and *p is same 6.
Why there is a difference between values on XCode?
Tried this with VisualStudio, get the same result as with XCode. Assembly listing proved #Gerhardh point:
(*p) +=1; successfully increases value of con in RAM
printf("Add of constant:%p\n",&con); prints address of this memory correctly
printf("%d - %d",con,*p); doesn't read con value from RAM, but passes 5 directly into printf. Optimization threw away unnecessary read for value known at compile time. Here is related assembly listing
printf("%d - %d",con,*p);
mov eax,dword ptr [p] //get p
mov ecx,dword ptr [eax] //get *p
push ecx //push *p (3rd param)
push 5 //push 5 (2nd param). No read of con
push offset string "%d - %d" (415800h) //push addr of format string (1st param)
call dword ptr [__imp__printf (4182BCh)] //call printf()
Obviously, compiler on your VM didn't perform the same optimization.
I'm trying to hook the Windows API function FindWindowA(). I successfully did it with the code below without "hotpatching" it: I've overwritten the bytes at the beginning of the function. myHook() is called and a message box shows up when FindWindowA() is called.
user32.dll has hotpatching enabled and I'd like to overwrite the NOPs before the actual function instead of overwriting the function itself. However, the code below won't work when I set hotpatching to TRUE. It does nothing when FindWindowA() gets executed.
#include <stdio.h>
#include <windows.h>
void myHook()
{
MessageBoxA(NULL, "Hooked", "Hook", MB_ICONINFORMATION);
}
int main(int argc, char *argv[])
{
BOOLEAN hotpatching = FALSE;
LPVOID fwAddress = GetProcAddress(GetModuleHandleA("user32.dll"), "FindWindowA");
LPVOID fwHotpatchingAddress = (LPVOID)((DWORD)fwAddress - 5);
LPVOID myHookAddress = &myHook;
DWORD jmpOffset = (DWORD)&myHook - (DWORD)(!hotpatching ? fwAddress : fwHotpatchingAddress) - 5; // -5 because "JMP offset" = 5 bytes (1 + 4)
printf("fwAddress: %X\n", fwAddress);
printf("fwHotpatchingAddress: %X\n", fwHotpatchingAddress);
printf("myHookAddress: %X\n", myHookAddress);
printf("jmpOffset: %X\n", jmpOffset);
printf("Ready?\n\n");
getchar();
char JMP[1] = {0xE9};
char RETN[1] = {0xC3};
LPVOID offset0 = NULL;
LPVOID offset1 = NULL;
LPVOID offset2 = NULL;
if (!hotpatching)
offset0 = fwAddress;
else
offset0 = fwHotpatchingAddress;
offset1 = (LPVOID)((DWORD)offset0 + 1);
offset2 = (LPVOID)((DWORD)offset1 + 4);
DWORD oldProtect = 0;
VirtualProtect(offset0, 6, PAGE_EXECUTE_READWRITE, &oldProtect);
memcpy(fwAddress, JMP, 1);
memcpy(offset1, &jmpOffset, 4);
memcpy(offset2, RETN, 1);
VirtualProtect(offset0, 6, oldProtect, &oldProtect);
printf("FindWindowA() Patched");
getchar();
FindWindowA(NULL, "Test");
getchar();
return 0;
}
Could you tell me what's wrong?
Thank you.
Hotpatching enabled executable images are prepared by the compiler and linker to allow replacing the image while in use. The following two changes are applied (x86):
The function entry point is set to a 2-byte no-op mov edi, edi (/hotpatch).
Five consecutive nop's are prepended to each function entry point (/FUNCTIONPADMIN).
To illustrate this, here is a typical disassembly listing of a hotpaching enabled function:
(2) 768C8D66 90 nop
768C8D67 90 nop
768C8D68 90 nop
768C8D69 90 nop
768C8D6A 90 nop
(1) 768C8D6B 8B FF mov edi,edi
(3) 768C8D6D 55 push ebp
768C8D6E 8B EC mov ebp,esp
(1) designates the function entry point with the 2-byte no-op. (2) is the padding provided by the linker, and (3) is where the non-trivial function implementation starts.
To hook into a function you have to overwrite (2) with a jump to your hook function jmp myHook, and make this code reachable by replacing (1) with a relative jump jmp $-5.
The hook function must leave the stack in a consistent state. It should be declared as __declspec(naked) to prevent the compiler from generating function prolog and epilog code. The final instruction must either perform stack cleanup in line with the calling convention of the hooked function, or jump back to the hooked function at the address designated by (3).
#include <stdio.h>
#define uint unsigned int
#define AddressOfLabel(sectionname,out) __asm{mov [out],offset sectionname};
void* CreateFunction(void* start,void *end) {
uint __start=(uint)start,__end=(uint)end-1
,size,__func_runtime;
void* func_runtime=malloc(size=(((__end)-(__start)))+1);
__func_runtime=(uint)func_runtime;
memcpy((void*)(__func_runtime),start,size);
((char*)func_runtime)[size]=0xC3; //ret
return func_runtime;
}
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
main() {
void* _start,*_end;
AddressOfLabel(__start,_start);
AddressOfLabel(__end,_end);
void* func = CreateFunction(_start,_end);
CallRuntimeFunction(func); //I expected this method to print "Test"
//but this method raised exception
return 0;
__start:
printf("Test");
__end:
}
CreateFunction - takes two points in memory (function scope), allocate, copy it to the allocated memory and returns it (The void* used like a function to call with Assembly)
CallRuntimeFunction - runs the functions that returns from CreateFunction
#define AddressOfLabel(sectionname,out) - Outs the address of label (sectionname) to variable (out)
When I debugged this code and stepped in the call of CallRuntimeFunction and go to disassembly ,
I saw alot of ??? instead of assembly code of between __start and __end labels.
I tried to copy machine code between two labels and then run it. But I don't have any idea why I can't call function that allocated with malloc.
Edit:
I changed some code and done part of the work.
Runtime Function's memory allocate:
void* func_runtime=VirtualAlloc(0, size=(((__end)-(__start)))+1, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
Copy from function scope:
CopyMemory((void*)(__func_runtime),start,size-1);
But when I ran this program I can that:
mov esi,esp
push 0E4FD14h
call dword ptr ds:[0E55598h] ; <--- printf ,after that I don't know what is it
add esp,4
cmp esi,esp
call 000B9DBB ; <--- here
mov dword ptr [ebp-198h],0
lea ecx,[ebp-34h]
call 000B9C17
mov eax,dword ptr [ebp-198h]
jmp 000D01CB
ret
At here it enters to another function and weird stuff.
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
here address is a "pointer" to a parameter of this function which is also a pointer.
pointer to a pointer
use:
void CallRuntimeFunction(void* address) {
_asm {
mov ecx,[address] //we get address of "func"
mov ecx,[ecx] //we get "func"
call [ecx] //we jump func(ecx is an address. yes)
}
}
you wanna call func which is a pointer. when passed in your CallRunt... function, this generates a new pointer to point to that pointer. Pointer of second degree.
void* func = CreateFunction(_start,_end);
yes func is a pointer
Important: check your compilers "calling convention" options. Try the decl one
Be sure to invalidate the caches (both instruction and data) between the function code generation and its calling. See self-modifying code for further info.