Here’s the code:
__declspec ( naked ) void nseel_asm_assign(void)
{
__asm
{
fld qword ptr [eax]
fstp qword ptr [ebx]
}
}
__declspec ( naked ) void nseel_asm_assign_end(void) {}
The code that consumes them crashes. The debugger shows the addresses are OK, e.g.
&nseel_asm_assign 0x0f45e4a0 {vis_avs.dll!nseel_asm_assign(void)} void(*)()
&nseel_asm_assign_end 0x0f45e4b0 {vis_avs.dll!nseel_asm_assign_end(void)} void(*)()
However, when the address of these functions is taken by the actual C code not by the debugger, it stops being correct and the consuming code crashes because the size is negative:
fn 0x0f455056 {vis_avs.dll!_nseel_asm_assign} void(*)()
fn_e 0x0f45295f {vis_avs.dll!_nseel_asm_assign_end} void(*)()
The underscored functions contain just a single instruction, e.g. jmp nseel_asm_assign
How do I get the addresses of the real functions, without the underscore?
Update: in case you wondering why I wrote code like this, it wasn’t me, it’s third party, and it worked just fine when built with VC++ 6.0.
Here’s how to get the real address.
static void* unwrapJumpAddress( void *f )
{
const uint8_t* pb = (const uint8_t*)f;
if( *pb == 0xE9 ) // JMP: http://felixcloutier.com/x86/JMP.html
{
const int offset = *(const int*)( pb + 1 );
// The jump offset is relative to the start of the next instruction.
// This JMP takes 5 bytes.
return (void*)( pb + 5 + offset );
}
return f;
}
Related
I'm writing a FAT16 driver in GNU C for a hobby operating system, and I have a structure defined as such:
struct directory_entry {
uint8_t name[11];
uint8_t attrib;
uint8_t name_case;
uint8_t created_decimal;
uint16_t created_time;
uint16_t created_date;
uint16_t accessed_date;
uint16_t ignore;
uint16_t modified_time;
uint16_t modified_date;
uint16_t first_cluster;
uint32_t length;
} __attribute__ ((packed));
I was under the impression that name would be at the same address as the whole struct, and that attrib would be 11 bytes after that. And indeed, (void *)e.name - (void *)&e is 0 and (void *)&e.attrib - (void *)&e is 11, where e is of type struct directory_entry.
In my kernel, a void pointer to e is passed to a function which reads its contents from a disk. After this function, *(uint8_t *)&e is 80 and *((uint8_t *)&e + 11 is 8, as expected for what's on the disk. However, e.name[0] and e.attrib both are 0.
What gives here? Am I misunderstanding how __attribute__ ((packed)) works? Other structs with the same attribute work how I expect at other parts of my kernel. I can post a link to the full source if needed.
Edit: The full source is in this gitlab repository, on the stack-overflow branch. The relevant part is lines 34 to 52 of src/kernel/main.c. I'm sure that the data is being populated right, as I check *(uint8_t *)&e and *((uint8_t *)&e + 11). When I run it, the following is output by that part:
(void *)e.name - *(void *)&e
=> 0
*(uint8_t *)&e
=> 80
e.name[0]
=> 0
(void *)&e.attrib - (void *)&e
=> 11
*((uint8_t *)&e + 11)
=> 8
e.attrib
=> 0
I'm very confused about why e.name[0] would be any different than *(uint8_t *)&e.
Edit 2: I disassembled this part using objdump, to see what the difference was in the compiled code, but now I'm even more confused.
u8_dec(*(uint8_t *)&e, nbuf); and u8_dec(e.name[0], nbuf); are both compiled to: (comments mine)
lea eax, [ebp - 0x30] ;loads address of e from stack into eax
movzx eax, byte [eax] ;loads byte pointed to by eax into eax, zero-extending
movzx eax, al ;not sure why this is here, as it's already zero-extended
sub esp, 0x8
push 0x31ce0 ;nbuf
push eax ;the byte we loaded
call 0x3162f ;u8_dec
add esp, 0x10
This passes in the first byte of the struct, as expected. I'm sure that u8_dec doesn't modify e, as its first argument is passed by value and not by reference. nbuf is an array declared at file-scope, while e is declared at function scope, so it's not that they overlap or anything. Perhaps u8_dec isn't doing its job right? Here's the source of that:
void u8_dec(uint8_t n, uint8_t *b) {
if (!n) {
*(uint16_t *)b = '0';
return;
}
bool zero = false;
for (uint32_t m = 100; m; m /= 10) {
uint8_t d = (n / m) % 10;
if (zero)
*(b++) = d + '0';
else if (d) {
zero = true;
*(b++) = d + '0';
}
}
*b = 0;
}
It's pretty clear now that packed structs do work how I think they do, but I'm still not sure what's causing the problem. I'm passing the same value to a function that should be deterministic, but I'm getting different results on different calls.
My kernel utilizes 32-bit protected mode segmenting. I had my data segment as 0x0000.0000 - 0x000f.ffff and my stack segment as 0x0003.8000 - 0x0003.ffff, to trigger a general protection fault if the stack over overflowed, rather than allowing it to overflow into other kernel data and code.
However, when GCC compiles C code, it assumes that the stack and data segments have the same base, as this is most often the case. This was causing a problem as when I took the address of the local variable, it was relative to the stack segment (as local variables are on the stack), but when I dereferenced the pointer in the function that was called, it was relative to the data segment.
I have changed my segmenting model so that the stack is in the data segment instead of its own segment, and this has fixed the problem.
So I have been learning about the concept of hooking and using trampolines in order to bypass/execute data in a WinAPI hook function (In a different executable file, using DLL injection). So far I know how to make it (the trampoline and hook) using a mixture of assembly and C, but I can't seem to do it with just using C, as I seem to be missing something. I'd appreciate if someone could tell me what I'm doing wrong and how to fix it up.
Right now my code:
#include <Windows.h>
unsigned char* address = 0;
__declspec(naked) int __stdcall MessageBoxAHookTrampoline(HWND Window, char* Message, char* Title, int Type) {
__asm
{
push ebp
mov ebp, esp
mov eax, address
add eax, 5
jmp eax
}
}
int __stdcall MessageBoxAHook(HWND Window, char* Message, char* Title, int Type) {
wchar_t* WMessage = L"Hooked!";
wchar_t* WTitle = L"Success!";
MessageBoxW(0, WMessage, WTitle, 0);
return MessageBoxAHookTrampoline(Window, Message, Title, Type);
}
unsigned long __stdcall Thread(void* Context) {
address = (unsigned char*)GetProcAddress(LoadLibraryA("user32"), "MessageBoxA");
ULONG OP = 0;
if (VirtualProtect(address, 1, PAGE_EXECUTE_READWRITE, &OP)) {
memset(address, 0x90, 5);
*address = 0xE9;
*(unsigned long*)(address + 1) = (unsigned long)MessageBoxAHook - (unsigned long)address - 5;
}
else {
MessageBoxA(0, "Failed to change protection", "RIP", 0);
}
return 1;
}
// Entry point.
BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved) {
if (fdwReason == DLL_PROCESS_ATTACH) {
CreateThread(0, 0, Thread, 0, 0, 0);
}
else if (fdwReason == DLL_PROCESS_DETACH) {
}
return true;
}
So question is: How would I make a function say InstallHook that will install the hook and return a trampoline so I can use it easily?
Function prototype probably would be: void* InstallHook(void* originalFunc, void* targetFunc, int jumpsize), or so I've understood reading online, but unsure what jumpsize would be used for.
So far I know that the first 5 bytes must be preserved and restored, and then there's a jump to the address of the original hooked function. So I'd have to use malloc to allocate memory, memcpy to copy bytes over, the 0xE9 is the value of a jump instruction and such, but I just don't know how to implement it using just pure C. I figure it would be something similar to the code in this question. So how can I write a hook function that returns a trampoline using pure C for WinAPI functions?
If I understood the question correctly, you want to avoid "hard-coding" the trampoline function in assembly, presumably so you could have multiple trampolines in use at the same time without duplicating the code. You can achieve this using VirtualAlloc (malloc won't work since the returned memory won't be executable).
I wrote this from memory without access to a compiler so it might have some minor bugs, but the general idea is here. Normally you would also use VirtualProtect to change the page permissions to r-x instead of rwx once you're done modifying it, but I've left that out for the sake of simplicity:
void *CreateTrampoline(void *originalFunc)
{
/* Allocate the trampoline function */
uint8_t *trampoline = VirtualAlloc(
NULL,
5 + 5, /* 5 for the prologue, 5 for the JMP */
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE); /* Make trampoline executable */
/* Copy the original function's prologue */
memcpy(trampoline, originalFunc, 5);
/* JMP rel/32 opcode */
trampoline[5] = 0xE9;
/* JMP rel/32 operand */
uint32_t jmpDest = (uint32_t)originalFunc + 5; /* Skip original prologue */
uint32_t jmpSrc = (uint32_t)trampoline + 10; /* Starting after the JMP */
uint32_t delta = jmpDest - jmpSrc;
memcpy(trampoline + 6, &delta, 4);
return trampoline;
}
Your InstallHook function would then just call CreateTrampoline to create a trampoline, then patch the first 5 bytes of the original function with a JMP rel/32 to your hook.
Be warned, this only works on WinAPI functions, because Microsoft requires that they have a 5-byte prologue to enable hot-patching (which is what you're doing here). Normal functions do not have this requirement -- usually they only start with push ebp; mov ebp, esp which is only 3 bytes (and sometimes not even that, if the compiler decides to optimize it out).
Edit: here's how the math works:
_______________delta______________
| |
trampoline | originalFunc |
| | | |
v | v v
[prologue][jmp delta] [prologue][rest of func]
|________||_________| |________|
5 + 5 5
#include <stdio.h>
#define uint unsigned int
#define AddressOfLabel(sectionname,out) __asm{mov [out],offset sectionname};
void* CreateFunction(void* start,void *end) {
uint __start=(uint)start,__end=(uint)end-1
,size,__func_runtime;
void* func_runtime=malloc(size=(((__end)-(__start)))+1);
__func_runtime=(uint)func_runtime;
memcpy((void*)(__func_runtime),start,size);
((char*)func_runtime)[size]=0xC3; //ret
return func_runtime;
}
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
main() {
void* _start,*_end;
AddressOfLabel(__start,_start);
AddressOfLabel(__end,_end);
void* func = CreateFunction(_start,_end);
CallRuntimeFunction(func); //I expected this method to print "Test"
//but this method raised exception
return 0;
__start:
printf("Test");
__end:
}
CreateFunction - takes two points in memory (function scope), allocate, copy it to the allocated memory and returns it (The void* used like a function to call with Assembly)
CallRuntimeFunction - runs the functions that returns from CreateFunction
#define AddressOfLabel(sectionname,out) - Outs the address of label (sectionname) to variable (out)
When I debugged this code and stepped in the call of CallRuntimeFunction and go to disassembly ,
I saw alot of ??? instead of assembly code of between __start and __end labels.
I tried to copy machine code between two labels and then run it. But I don't have any idea why I can't call function that allocated with malloc.
Edit:
I changed some code and done part of the work.
Runtime Function's memory allocate:
void* func_runtime=VirtualAlloc(0, size=(((__end)-(__start)))+1, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
Copy from function scope:
CopyMemory((void*)(__func_runtime),start,size-1);
But when I ran this program I can that:
mov esi,esp
push 0E4FD14h
call dword ptr ds:[0E55598h] ; <--- printf ,after that I don't know what is it
add esp,4
cmp esi,esp
call 000B9DBB ; <--- here
mov dword ptr [ebp-198h],0
lea ecx,[ebp-34h]
call 000B9C17
mov eax,dword ptr [ebp-198h]
jmp 000D01CB
ret
At here it enters to another function and weird stuff.
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
here address is a "pointer" to a parameter of this function which is also a pointer.
pointer to a pointer
use:
void CallRuntimeFunction(void* address) {
_asm {
mov ecx,[address] //we get address of "func"
mov ecx,[ecx] //we get "func"
call [ecx] //we jump func(ecx is an address. yes)
}
}
you wanna call func which is a pointer. when passed in your CallRunt... function, this generates a new pointer to point to that pointer. Pointer of second degree.
void* func = CreateFunction(_start,_end);
yes func is a pointer
Important: check your compilers "calling convention" options. Try the decl one
Be sure to invalidate the caches (both instruction and data) between the function code generation and its calling. See self-modifying code for further info.
I am writing a program in C and i use inline asm. In the inline assembler code is have some addresses where i want to patch them at runtime.
A quick sample of the code is this:
void __declspec(naked) inline(void)
{
mov eax, 0xAABBCCDD
call 0xAABBCCDD
}
An say i want to modify the 0xAABBCCDD value from the main C program.
What i tried to do is to Call VirtualProtect an is the pointer of the function in order to make it Writeable, and then call memcpy to add the appropriate values to the code.
DWORD old;
VirtualProtect(inline, len, PAGE_EXECUTE_READWRITE, &old);
However VirtualProtect fails and GetLastError() returns 487 which means accessing invalid address. Anyone have a clue about this problem??
Thanks
Doesn't this work?
int X = 0xAABBCCDD;
void __declspec(naked) inline(void)
{
mov eax, [X]
call [X]
}
How to do it to another process at runtime,
Create a variable that holds the program base address
Get the target RVA (Relative Virtual Address)
Then calculate the real address like this PA=RVA + BASE
then call it from your inline assembly
You can get the base address like this
DWORD dwGetModuleBaseAddress(DWORD dwProcessID)
{
TCHAR zFileName[MAX_PATH];
ZeroMemory(zFileName, MAX_PATH);
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, true, dwProcessID);
HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, dwProcessID);
DWORD dwModuleBaseAddress = 0;
if (hSnapshot != INVALID_HANDLE_VALUE)
{
MODULEENTRY32 ModuleEntry32 = { 0 };
ModuleEntry32.dwSize = sizeof(MODULEENTRY32);
if (Module32First(hSnapshot, &ModuleEntry32))
{
do
{
if (wcscmp(ModuleEntry32.szModule, L"example.exe") == 0)
{
dwModuleBaseAddress = (DWORD_PTR)ModuleEntry32.modBaseAddr;
break;
}
} while (Module32Next(hSnapshot, &ModuleEntry32));
}
CloseHandle(hSnapshot);
CloseHandle(hProcess);
}
return dwModuleBaseAddress;
}
Assuming you have a local variable and your base address
mov dword ptr ss : [ebp - 0x14] , eax;
mov eax, dword ptr BaseAddress;
add eax, PA;
call eax;
mov eax, dword ptr ss : [ebp - 0x14] ;
You have to restore the value of your Register after the call returns, since this value may be used somewhere down the code execution, assuming you're trying to patch an existing application that may depend on the eax register after your call. Although this method has it disadvantages, but at least it will give anyone idea on what to do.
I'm using inline assembly to construct a set of passwords, which I will use to brute force against a given hash. I used this website as a reference for the construction of the passwords.
This is working flawlessly in a singlethreaded environment. It produces an infinite amount of incrementing passwords.
As I have only basic knowledge of asm, I understand the idea. The gcc uses ATT, so I compile with -masm=intel
During the attempt to multithread the program, I realize that this approach might not work.
The following code uses 2 global C variables, and I assume that this might be the problem.
__asm__("pushad\n\t"
"mov edi, offset plaintext\n\t" <---- global variable
"mov ebx, offset charsetTable\n\t" <---- again
"L1: movzx eax, byte ptr [edi]\n\t"
" movzx eax, byte ptr [charsetTable+eax]\n\t"
" cmp al, 0\n\t"
" je L2\n\t"
" mov [edi],al\n\t"
" jmp L3\n\t"
"L2: xlat\n\t"
" mov [edi],al\n\t"
" inc edi\n\t"
" jmp L1\n\t"
"L3: popad\n\t");
It produces a non deterministic result in the plaintext variable.
How can i create a workaround, that every thread accesses his own plaintext variable? (If this is the problem...).
I tried modifying this code, to use extended assembly, but I failed every time. Probably due to the fact that all tutorials use ATT syntax.
I would really appreciate any help, as I'm stuck for several hours now :(
Edit: Running the program with 2 threads, and printing the content of plaintext right after the asm instruction, produces:
b
b
d
d
f
f
...
Edit2:
pthread_create(&thread[i], NULL, crack, (void *) &args[i]))
[...]
void *crack(void *arg) {
struct threadArgs *param = arg;
struct crypt_data crypt; // storage for reentrant version of crypt(3)
char *tmpHash = NULL;
size_t len = strlen(param->methodAndSalt);
size_t cipherlen = strlen(param->cipher);
crypt.initialized = 0;
for(int i = 0; i <= LIMIT; i++) {
// intel syntax
__asm__ ("pushad\n\t"
//mov edi, offset %0\n\t"
"mov edi, offset plaintext\n\t"
"mov ebx, offset charsetTable\n\t"
"L1: movzx eax, byte ptr [edi]\n\t"
" movzx eax, byte ptr [charsetTable+eax]\n\t"
" cmp al, 0\n\t"
" je L2\n\t"
" mov [edi],al\n\t"
" jmp L3\n\t"
"L2: xlat\n\t"
" mov [edi],al\n\t"
" inc edi\n\t"
" jmp L1\n\t"
"L3: popad\n\t");
tmpHash = crypt_r(plaintext, param->methodAndSalt, &crypt);
if(0 == memcmp(tmpHash+len, param->cipher, cipherlen)) {
printf("success: %s\n", plaintext);
break;
}
}
return 0;
}
Since you're already using pthreads, another option is making the variables that are modified by several threads into per-thread variables (threadspecific data). See pthread_getspecific OpenGroup manpage. The way this works is like:
In the main thread (before you create other threads), do:
static pthread_key_y tsd_key;
(void)pthread_key_create(&tsd_key); /* unlikely to fail; handle if you want */
and then within each thread, where you use the plaintext / charsetTable variables (or more such), do:
struct { char *plainText, char *charsetTable } *str =
pthread_getspecific(tsd_key);
if (str == NULL) {
str = malloc(2 * sizeof(char *));
str.plainText = malloc(size_of_plaintext);
str.charsetTable = malloc(size_of_charsetTable);
initialize(str.plainText); /* put the data for this thread in */
initialize(str.charsetTable); /* ditto */
pthread_setspecific(tsd_key, str);
}
char *plaintext = str.plainText;
char *charsetTable = str.charsetTable;
Or create / use several keys, one per such variable; in that case, you don't get the str container / double indirection / additional malloc.
Intel assembly syntax with gcc inline asm is, hm, not great; in particular, specifying input/output operands is not easy. I think to get that to use the pthread_getspecific mechanism, you'd change your code to do:
__asm__("pushad\n\t"
"push tsd_key\n\t" <---- threadspecific data key (arg to call)
"call pthread_getspecific\n\t" <---- gets "str" as per above
"add esp, 4\n\t" <---- get rid of the func argument
"mov edi, [eax]\n\t" <---- first ptr == "plainText"
"mov ebx, [eax + 4]\n\t" <---- 2nd ptr == "charsetTable"
...
That way, it becomes lock-free, at the expense of using more memory (one plaintext / charsetTable per thread), and the expense of an additional function call (to pthread_getspecific()). Also, if you do the above, make sure you free() each thread's specific data via pthread_atexit(), or else you'll leak.
If your function is fast to execute, then a lock is a much simpler solution because you don't need all the setup / cleanup overhead of threadspecific data; if the function is either slow or very frequently called, the lock would become a bottleneck though - in that case the memory / access overhead for TSD is justified. Your mileage may vary.
Protect this function with mutex outside of inline Assembly block.