I am writing a program in C and i use inline asm. In the inline assembler code is have some addresses where i want to patch them at runtime.
A quick sample of the code is this:
void __declspec(naked) inline(void)
{
mov eax, 0xAABBCCDD
call 0xAABBCCDD
}
An say i want to modify the 0xAABBCCDD value from the main C program.
What i tried to do is to Call VirtualProtect an is the pointer of the function in order to make it Writeable, and then call memcpy to add the appropriate values to the code.
DWORD old;
VirtualProtect(inline, len, PAGE_EXECUTE_READWRITE, &old);
However VirtualProtect fails and GetLastError() returns 487 which means accessing invalid address. Anyone have a clue about this problem??
Thanks
Doesn't this work?
int X = 0xAABBCCDD;
void __declspec(naked) inline(void)
{
mov eax, [X]
call [X]
}
How to do it to another process at runtime,
Create a variable that holds the program base address
Get the target RVA (Relative Virtual Address)
Then calculate the real address like this PA=RVA + BASE
then call it from your inline assembly
You can get the base address like this
DWORD dwGetModuleBaseAddress(DWORD dwProcessID)
{
TCHAR zFileName[MAX_PATH];
ZeroMemory(zFileName, MAX_PATH);
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, true, dwProcessID);
HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, dwProcessID);
DWORD dwModuleBaseAddress = 0;
if (hSnapshot != INVALID_HANDLE_VALUE)
{
MODULEENTRY32 ModuleEntry32 = { 0 };
ModuleEntry32.dwSize = sizeof(MODULEENTRY32);
if (Module32First(hSnapshot, &ModuleEntry32))
{
do
{
if (wcscmp(ModuleEntry32.szModule, L"example.exe") == 0)
{
dwModuleBaseAddress = (DWORD_PTR)ModuleEntry32.modBaseAddr;
break;
}
} while (Module32Next(hSnapshot, &ModuleEntry32));
}
CloseHandle(hSnapshot);
CloseHandle(hProcess);
}
return dwModuleBaseAddress;
}
Assuming you have a local variable and your base address
mov dword ptr ss : [ebp - 0x14] , eax;
mov eax, dword ptr BaseAddress;
add eax, PA;
call eax;
mov eax, dword ptr ss : [ebp - 0x14] ;
You have to restore the value of your Register after the call returns, since this value may be used somewhere down the code execution, assuming you're trying to patch an existing application that may depend on the eax register after your call. Although this method has it disadvantages, but at least it will give anyone idea on what to do.
Related
So I have been learning about the concept of hooking and using trampolines in order to bypass/execute data in a WinAPI hook function (In a different executable file, using DLL injection). So far I know how to make it (the trampoline and hook) using a mixture of assembly and C, but I can't seem to do it with just using C, as I seem to be missing something. I'd appreciate if someone could tell me what I'm doing wrong and how to fix it up.
Right now my code:
#include <Windows.h>
unsigned char* address = 0;
__declspec(naked) int __stdcall MessageBoxAHookTrampoline(HWND Window, char* Message, char* Title, int Type) {
__asm
{
push ebp
mov ebp, esp
mov eax, address
add eax, 5
jmp eax
}
}
int __stdcall MessageBoxAHook(HWND Window, char* Message, char* Title, int Type) {
wchar_t* WMessage = L"Hooked!";
wchar_t* WTitle = L"Success!";
MessageBoxW(0, WMessage, WTitle, 0);
return MessageBoxAHookTrampoline(Window, Message, Title, Type);
}
unsigned long __stdcall Thread(void* Context) {
address = (unsigned char*)GetProcAddress(LoadLibraryA("user32"), "MessageBoxA");
ULONG OP = 0;
if (VirtualProtect(address, 1, PAGE_EXECUTE_READWRITE, &OP)) {
memset(address, 0x90, 5);
*address = 0xE9;
*(unsigned long*)(address + 1) = (unsigned long)MessageBoxAHook - (unsigned long)address - 5;
}
else {
MessageBoxA(0, "Failed to change protection", "RIP", 0);
}
return 1;
}
// Entry point.
BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved) {
if (fdwReason == DLL_PROCESS_ATTACH) {
CreateThread(0, 0, Thread, 0, 0, 0);
}
else if (fdwReason == DLL_PROCESS_DETACH) {
}
return true;
}
So question is: How would I make a function say InstallHook that will install the hook and return a trampoline so I can use it easily?
Function prototype probably would be: void* InstallHook(void* originalFunc, void* targetFunc, int jumpsize), or so I've understood reading online, but unsure what jumpsize would be used for.
So far I know that the first 5 bytes must be preserved and restored, and then there's a jump to the address of the original hooked function. So I'd have to use malloc to allocate memory, memcpy to copy bytes over, the 0xE9 is the value of a jump instruction and such, but I just don't know how to implement it using just pure C. I figure it would be something similar to the code in this question. So how can I write a hook function that returns a trampoline using pure C for WinAPI functions?
If I understood the question correctly, you want to avoid "hard-coding" the trampoline function in assembly, presumably so you could have multiple trampolines in use at the same time without duplicating the code. You can achieve this using VirtualAlloc (malloc won't work since the returned memory won't be executable).
I wrote this from memory without access to a compiler so it might have some minor bugs, but the general idea is here. Normally you would also use VirtualProtect to change the page permissions to r-x instead of rwx once you're done modifying it, but I've left that out for the sake of simplicity:
void *CreateTrampoline(void *originalFunc)
{
/* Allocate the trampoline function */
uint8_t *trampoline = VirtualAlloc(
NULL,
5 + 5, /* 5 for the prologue, 5 for the JMP */
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE); /* Make trampoline executable */
/* Copy the original function's prologue */
memcpy(trampoline, originalFunc, 5);
/* JMP rel/32 opcode */
trampoline[5] = 0xE9;
/* JMP rel/32 operand */
uint32_t jmpDest = (uint32_t)originalFunc + 5; /* Skip original prologue */
uint32_t jmpSrc = (uint32_t)trampoline + 10; /* Starting after the JMP */
uint32_t delta = jmpDest - jmpSrc;
memcpy(trampoline + 6, &delta, 4);
return trampoline;
}
Your InstallHook function would then just call CreateTrampoline to create a trampoline, then patch the first 5 bytes of the original function with a JMP rel/32 to your hook.
Be warned, this only works on WinAPI functions, because Microsoft requires that they have a 5-byte prologue to enable hot-patching (which is what you're doing here). Normal functions do not have this requirement -- usually they only start with push ebp; mov ebp, esp which is only 3 bytes (and sometimes not even that, if the compiler decides to optimize it out).
Edit: here's how the math works:
_______________delta______________
| |
trampoline | originalFunc |
| | | |
v | v v
[prologue][jmp delta] [prologue][rest of func]
|________||_________| |________|
5 + 5 5
i can use extended read functions of bios int 13h well from assembly,
with the below code
; *************************************************************************
; Setup DISK ADDRESS PACKET
; *************************************************************************
jmp strtRead
DAPACK :
db 010h ; Packet Size
db 0 ; Always 0
blkcnt:
dw 1 ; Sectors Count
db_add :
dw 07e00h ; Transfer Offset
dw 0 ; Transfer Segment
d_lba :
dd 1 ; Starting LBA(0 - n)
dd 0 ; Bios 48 bit LBA
; *************************************************************************
; Start Reading Sectors using INT13 Func 42
; *************************************************************************
strtRead:
mov si, OFFSET DAPACK; Load DPACK offset to SI
mov ah, 042h ; Function 42h
mov dl, 080h ; Drive ID
int 013h; Call INT13h
i want to convert this to be a c callable function but i have no idea about how to transfer the parameters from c to asm like drive id , sectors count, buffer segment:offset .... etc.
i am using msvc and masm and working with nothing except bios functions.
can anyone help ?!!
update :
i have tried the below function but always nothing loaded into the buffer ??
void read_sector()
{
static unsigned char currentMBR[512] = { 0 };
struct disk_packet //needed for int13 42h
{
byte size_pack; //size of packet must be 16 or 16+
byte reserved1; //reserved
byte no_of_blocks; //nof blocks for transfer
byte reserved2; //reserved
word offset; //offset address
word segment; //segment address
dword lba1;
dword lba2;
} disk_pack;
disk_pack.size_pack = 16; //set size to 16
disk_pack.no_of_blocks = 1; //1 block ie read one sector
disk_pack.reserved1 = 0; //reserved word
disk_pack.reserved2 = 0; //reserved word
disk_pack.segment = 0; //segment of buffer
disk_pack.offset = (word)¤tMBR[0]; //offset of buffer
disk_pack.lba1 = 0; //lba first 32 bits
disk_pack.lba2 = 0; //last 32 bit address
_asm
{
mov dl, 080h;
mov[disk_pack.segment], ds;
mov si, disk_pack;
mov ah, 42h;
int 13h
; jc NoError; //No error, ignore error code
; mov bError, ah; // Error, get the error code
NoError:
}
}
Sorry to post this as "answer"; I want to post this as "comment" but it is too long...
Different compilers have a different syntax of inline assembly. This means that the correct syntax of the following lines:
mov[disk_pack.segment], ds;
mov si, disk_pack;
... depends on the compiler used. Unfortunately I do not use 16-bit C compilers so I cannot help you in this point.
The next thing I see in your program is the following one:
disk_pack.segment = 0; //segment of buffer
disk_pack.offset = (word)¤tMBR[0]; //offset of buffer
With a 99% chance this will lead to a problem. Instead I would to the following:
struct disk_packet //needed for int13 42h
{
byte size_pack;
byte reserved;
word no_of_blocks; // note that this is 16-bit!
void far *data; // <- This line is the main change!
dword lba1;
dword lba2;
} disk_pack;
...
disk_pack.size_pack = 16;
disk_pack.no_of_blocks = 1;
disk_pack.reserved = 0;
disk_pack.data = ¤tMBR[0]; // also note the change here
disk_pack.lba1 = 0;
disk_pack.lba2 = 0;
...
Note that some compilers name the keyword "_far" or "__far" instead of "far".
A third problem is that some (buggy) BIOSes require ES to be equal to the segment value from the disk_pack and a fourth one is that many compilers require the inline assembly code not to modify any registers (AX, CX and DX is normally OK).
These two could be solved the following way:
push ds;
push es;
push si;
mov dl, 080h;
// TODO here: Set ds:si to disk_pack in a compiler-specific way
mov es,[si+6];
mov ah, 42h;
int 13h;
...
pop si;
pop es;
pop ds;
In my opinion the "#pragma pack" should not be neccessary because all elements in the structure are propperly aligned.
I'm trying to hook the Windows API function FindWindowA(). I successfully did it with the code below without "hotpatching" it: I've overwritten the bytes at the beginning of the function. myHook() is called and a message box shows up when FindWindowA() is called.
user32.dll has hotpatching enabled and I'd like to overwrite the NOPs before the actual function instead of overwriting the function itself. However, the code below won't work when I set hotpatching to TRUE. It does nothing when FindWindowA() gets executed.
#include <stdio.h>
#include <windows.h>
void myHook()
{
MessageBoxA(NULL, "Hooked", "Hook", MB_ICONINFORMATION);
}
int main(int argc, char *argv[])
{
BOOLEAN hotpatching = FALSE;
LPVOID fwAddress = GetProcAddress(GetModuleHandleA("user32.dll"), "FindWindowA");
LPVOID fwHotpatchingAddress = (LPVOID)((DWORD)fwAddress - 5);
LPVOID myHookAddress = &myHook;
DWORD jmpOffset = (DWORD)&myHook - (DWORD)(!hotpatching ? fwAddress : fwHotpatchingAddress) - 5; // -5 because "JMP offset" = 5 bytes (1 + 4)
printf("fwAddress: %X\n", fwAddress);
printf("fwHotpatchingAddress: %X\n", fwHotpatchingAddress);
printf("myHookAddress: %X\n", myHookAddress);
printf("jmpOffset: %X\n", jmpOffset);
printf("Ready?\n\n");
getchar();
char JMP[1] = {0xE9};
char RETN[1] = {0xC3};
LPVOID offset0 = NULL;
LPVOID offset1 = NULL;
LPVOID offset2 = NULL;
if (!hotpatching)
offset0 = fwAddress;
else
offset0 = fwHotpatchingAddress;
offset1 = (LPVOID)((DWORD)offset0 + 1);
offset2 = (LPVOID)((DWORD)offset1 + 4);
DWORD oldProtect = 0;
VirtualProtect(offset0, 6, PAGE_EXECUTE_READWRITE, &oldProtect);
memcpy(fwAddress, JMP, 1);
memcpy(offset1, &jmpOffset, 4);
memcpy(offset2, RETN, 1);
VirtualProtect(offset0, 6, oldProtect, &oldProtect);
printf("FindWindowA() Patched");
getchar();
FindWindowA(NULL, "Test");
getchar();
return 0;
}
Could you tell me what's wrong?
Thank you.
Hotpatching enabled executable images are prepared by the compiler and linker to allow replacing the image while in use. The following two changes are applied (x86):
The function entry point is set to a 2-byte no-op mov edi, edi (/hotpatch).
Five consecutive nop's are prepended to each function entry point (/FUNCTIONPADMIN).
To illustrate this, here is a typical disassembly listing of a hotpaching enabled function:
(2) 768C8D66 90 nop
768C8D67 90 nop
768C8D68 90 nop
768C8D69 90 nop
768C8D6A 90 nop
(1) 768C8D6B 8B FF mov edi,edi
(3) 768C8D6D 55 push ebp
768C8D6E 8B EC mov ebp,esp
(1) designates the function entry point with the 2-byte no-op. (2) is the padding provided by the linker, and (3) is where the non-trivial function implementation starts.
To hook into a function you have to overwrite (2) with a jump to your hook function jmp myHook, and make this code reachable by replacing (1) with a relative jump jmp $-5.
The hook function must leave the stack in a consistent state. It should be declared as __declspec(naked) to prevent the compiler from generating function prolog and epilog code. The final instruction must either perform stack cleanup in line with the calling convention of the hooked function, or jump back to the hooked function at the address designated by (3).
#include <stdio.h>
#define uint unsigned int
#define AddressOfLabel(sectionname,out) __asm{mov [out],offset sectionname};
void* CreateFunction(void* start,void *end) {
uint __start=(uint)start,__end=(uint)end-1
,size,__func_runtime;
void* func_runtime=malloc(size=(((__end)-(__start)))+1);
__func_runtime=(uint)func_runtime;
memcpy((void*)(__func_runtime),start,size);
((char*)func_runtime)[size]=0xC3; //ret
return func_runtime;
}
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
main() {
void* _start,*_end;
AddressOfLabel(__start,_start);
AddressOfLabel(__end,_end);
void* func = CreateFunction(_start,_end);
CallRuntimeFunction(func); //I expected this method to print "Test"
//but this method raised exception
return 0;
__start:
printf("Test");
__end:
}
CreateFunction - takes two points in memory (function scope), allocate, copy it to the allocated memory and returns it (The void* used like a function to call with Assembly)
CallRuntimeFunction - runs the functions that returns from CreateFunction
#define AddressOfLabel(sectionname,out) - Outs the address of label (sectionname) to variable (out)
When I debugged this code and stepped in the call of CallRuntimeFunction and go to disassembly ,
I saw alot of ??? instead of assembly code of between __start and __end labels.
I tried to copy machine code between two labels and then run it. But I don't have any idea why I can't call function that allocated with malloc.
Edit:
I changed some code and done part of the work.
Runtime Function's memory allocate:
void* func_runtime=VirtualAlloc(0, size=(((__end)-(__start)))+1, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
Copy from function scope:
CopyMemory((void*)(__func_runtime),start,size-1);
But when I ran this program I can that:
mov esi,esp
push 0E4FD14h
call dword ptr ds:[0E55598h] ; <--- printf ,after that I don't know what is it
add esp,4
cmp esi,esp
call 000B9DBB ; <--- here
mov dword ptr [ebp-198h],0
lea ecx,[ebp-34h]
call 000B9C17
mov eax,dword ptr [ebp-198h]
jmp 000D01CB
ret
At here it enters to another function and weird stuff.
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
here address is a "pointer" to a parameter of this function which is also a pointer.
pointer to a pointer
use:
void CallRuntimeFunction(void* address) {
_asm {
mov ecx,[address] //we get address of "func"
mov ecx,[ecx] //we get "func"
call [ecx] //we jump func(ecx is an address. yes)
}
}
you wanna call func which is a pointer. when passed in your CallRunt... function, this generates a new pointer to point to that pointer. Pointer of second degree.
void* func = CreateFunction(_start,_end);
yes func is a pointer
Important: check your compilers "calling convention" options. Try the decl one
Be sure to invalidate the caches (both instruction and data) between the function code generation and its calling. See self-modifying code for further info.
I'm using inline assembly to construct a set of passwords, which I will use to brute force against a given hash. I used this website as a reference for the construction of the passwords.
This is working flawlessly in a singlethreaded environment. It produces an infinite amount of incrementing passwords.
As I have only basic knowledge of asm, I understand the idea. The gcc uses ATT, so I compile with -masm=intel
During the attempt to multithread the program, I realize that this approach might not work.
The following code uses 2 global C variables, and I assume that this might be the problem.
__asm__("pushad\n\t"
"mov edi, offset plaintext\n\t" <---- global variable
"mov ebx, offset charsetTable\n\t" <---- again
"L1: movzx eax, byte ptr [edi]\n\t"
" movzx eax, byte ptr [charsetTable+eax]\n\t"
" cmp al, 0\n\t"
" je L2\n\t"
" mov [edi],al\n\t"
" jmp L3\n\t"
"L2: xlat\n\t"
" mov [edi],al\n\t"
" inc edi\n\t"
" jmp L1\n\t"
"L3: popad\n\t");
It produces a non deterministic result in the plaintext variable.
How can i create a workaround, that every thread accesses his own plaintext variable? (If this is the problem...).
I tried modifying this code, to use extended assembly, but I failed every time. Probably due to the fact that all tutorials use ATT syntax.
I would really appreciate any help, as I'm stuck for several hours now :(
Edit: Running the program with 2 threads, and printing the content of plaintext right after the asm instruction, produces:
b
b
d
d
f
f
...
Edit2:
pthread_create(&thread[i], NULL, crack, (void *) &args[i]))
[...]
void *crack(void *arg) {
struct threadArgs *param = arg;
struct crypt_data crypt; // storage for reentrant version of crypt(3)
char *tmpHash = NULL;
size_t len = strlen(param->methodAndSalt);
size_t cipherlen = strlen(param->cipher);
crypt.initialized = 0;
for(int i = 0; i <= LIMIT; i++) {
// intel syntax
__asm__ ("pushad\n\t"
//mov edi, offset %0\n\t"
"mov edi, offset plaintext\n\t"
"mov ebx, offset charsetTable\n\t"
"L1: movzx eax, byte ptr [edi]\n\t"
" movzx eax, byte ptr [charsetTable+eax]\n\t"
" cmp al, 0\n\t"
" je L2\n\t"
" mov [edi],al\n\t"
" jmp L3\n\t"
"L2: xlat\n\t"
" mov [edi],al\n\t"
" inc edi\n\t"
" jmp L1\n\t"
"L3: popad\n\t");
tmpHash = crypt_r(plaintext, param->methodAndSalt, &crypt);
if(0 == memcmp(tmpHash+len, param->cipher, cipherlen)) {
printf("success: %s\n", plaintext);
break;
}
}
return 0;
}
Since you're already using pthreads, another option is making the variables that are modified by several threads into per-thread variables (threadspecific data). See pthread_getspecific OpenGroup manpage. The way this works is like:
In the main thread (before you create other threads), do:
static pthread_key_y tsd_key;
(void)pthread_key_create(&tsd_key); /* unlikely to fail; handle if you want */
and then within each thread, where you use the plaintext / charsetTable variables (or more such), do:
struct { char *plainText, char *charsetTable } *str =
pthread_getspecific(tsd_key);
if (str == NULL) {
str = malloc(2 * sizeof(char *));
str.plainText = malloc(size_of_plaintext);
str.charsetTable = malloc(size_of_charsetTable);
initialize(str.plainText); /* put the data for this thread in */
initialize(str.charsetTable); /* ditto */
pthread_setspecific(tsd_key, str);
}
char *plaintext = str.plainText;
char *charsetTable = str.charsetTable;
Or create / use several keys, one per such variable; in that case, you don't get the str container / double indirection / additional malloc.
Intel assembly syntax with gcc inline asm is, hm, not great; in particular, specifying input/output operands is not easy. I think to get that to use the pthread_getspecific mechanism, you'd change your code to do:
__asm__("pushad\n\t"
"push tsd_key\n\t" <---- threadspecific data key (arg to call)
"call pthread_getspecific\n\t" <---- gets "str" as per above
"add esp, 4\n\t" <---- get rid of the func argument
"mov edi, [eax]\n\t" <---- first ptr == "plainText"
"mov ebx, [eax + 4]\n\t" <---- 2nd ptr == "charsetTable"
...
That way, it becomes lock-free, at the expense of using more memory (one plaintext / charsetTable per thread), and the expense of an additional function call (to pthread_getspecific()). Also, if you do the above, make sure you free() each thread's specific data via pthread_atexit(), or else you'll leak.
If your function is fast to execute, then a lock is a much simpler solution because you don't need all the setup / cleanup overhead of threadspecific data; if the function is either slow or very frequently called, the lock would become a bottleneck though - in that case the memory / access overhead for TSD is justified. Your mileage may vary.
Protect this function with mutex outside of inline Assembly block.