Hooking - hotpatching - c

I'm trying to hook the Windows API function FindWindowA(). I successfully did it with the code below without "hotpatching" it: I've overwritten the bytes at the beginning of the function. myHook() is called and a message box shows up when FindWindowA() is called.
user32.dll has hotpatching enabled and I'd like to overwrite the NOPs before the actual function instead of overwriting the function itself. However, the code below won't work when I set hotpatching to TRUE. It does nothing when FindWindowA() gets executed.
#include <stdio.h>
#include <windows.h>
void myHook()
{
MessageBoxA(NULL, "Hooked", "Hook", MB_ICONINFORMATION);
}
int main(int argc, char *argv[])
{
BOOLEAN hotpatching = FALSE;
LPVOID fwAddress = GetProcAddress(GetModuleHandleA("user32.dll"), "FindWindowA");
LPVOID fwHotpatchingAddress = (LPVOID)((DWORD)fwAddress - 5);
LPVOID myHookAddress = &myHook;
DWORD jmpOffset = (DWORD)&myHook - (DWORD)(!hotpatching ? fwAddress : fwHotpatchingAddress) - 5; // -5 because "JMP offset" = 5 bytes (1 + 4)
printf("fwAddress: %X\n", fwAddress);
printf("fwHotpatchingAddress: %X\n", fwHotpatchingAddress);
printf("myHookAddress: %X\n", myHookAddress);
printf("jmpOffset: %X\n", jmpOffset);
printf("Ready?\n\n");
getchar();
char JMP[1] = {0xE9};
char RETN[1] = {0xC3};
LPVOID offset0 = NULL;
LPVOID offset1 = NULL;
LPVOID offset2 = NULL;
if (!hotpatching)
offset0 = fwAddress;
else
offset0 = fwHotpatchingAddress;
offset1 = (LPVOID)((DWORD)offset0 + 1);
offset2 = (LPVOID)((DWORD)offset1 + 4);
DWORD oldProtect = 0;
VirtualProtect(offset0, 6, PAGE_EXECUTE_READWRITE, &oldProtect);
memcpy(fwAddress, JMP, 1);
memcpy(offset1, &jmpOffset, 4);
memcpy(offset2, RETN, 1);
VirtualProtect(offset0, 6, oldProtect, &oldProtect);
printf("FindWindowA() Patched");
getchar();
FindWindowA(NULL, "Test");
getchar();
return 0;
}
Could you tell me what's wrong?
Thank you.

Hotpatching enabled executable images are prepared by the compiler and linker to allow replacing the image while in use. The following two changes are applied (x86):
The function entry point is set to a 2-byte no-op mov edi, edi (/hotpatch).
Five consecutive nop's are prepended to each function entry point (/FUNCTIONPADMIN).
To illustrate this, here is a typical disassembly listing of a hotpaching enabled function:
(2) 768C8D66 90 nop
768C8D67 90 nop
768C8D68 90 nop
768C8D69 90 nop
768C8D6A 90 nop
(1) 768C8D6B 8B FF mov edi,edi
(3) 768C8D6D 55 push ebp
768C8D6E 8B EC mov ebp,esp
(1) designates the function entry point with the 2-byte no-op. (2) is the padding provided by the linker, and (3) is where the non-trivial function implementation starts.
To hook into a function you have to overwrite (2) with a jump to your hook function jmp myHook, and make this code reachable by replacing (1) with a relative jump jmp $-5.
The hook function must leave the stack in a consistent state. It should be declared as __declspec(naked) to prevent the compiler from generating function prolog and epilog code. The final instruction must either perform stack cleanup in line with the calling convention of the hooked function, or jump back to the hooked function at the address designated by (3).

Related

Buffer Overflow Return Address Attack

I am trying to learn more about buffer overflows so I have created a simple program to gain knowledge and try to exploit it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void failed(void)
{
puts("Did not exploit");
exit(0);
}
void pass(void)
{
puts("Good Job");
exit(1);
}
void foo()
{
char input[4];
gets(input);
}
int _main()
{
foo();
failed();
return 0;
}
I am trying to fill the buffer within foo() with random characters as well as the address of pass() such that the return address of foo() gets overwritten to the starting address of pass(). Using the GDB commands as follows to get relevant information.
x foo
-> 0x8049dd7 foo : 0xfb1e0ff3
disas foo
Dump of assembler code for function foo:
0x08049e09 <+0>: endbr32
0x08049e0d <+4>: push %ebp
0x08049e0e <+5>: mov %esp,%ebp
0x08049e10 <+7>: push %ebx
0x08049e11 <+8>: sub $0x14,%esp
0x08049e14 <+11>: call 0x8049e5a <__x86.get_pc_thunk.ax>
0x08049e19 <+16>: add $0x9b1e7,%eax
0x08049e1e <+21>: sub $0xc,%esp
0x08049e21 <+24>: lea -0xc(%ebp),%edx
0x08049e24 <+27>: push %edx
0x08049e25 <+28>: mov %eax,%ebx
0x08049e27 <+30>: call 0x8058850 <gets>
0x08049e2c <+35>: add $0x10,%esp
0x08049e2f <+38>: nop
0x08049e30 <+39>: mov -0x4(%ebp),%ebx
0x08049e33 <+42>: leave
0x08049e34 <+43>: ret
End of assembler dump.
I then created a python program which feeds its output into my vulnerable.c program as printing simply
print('A'*15 + '\x08\x04\x9d\xd7')
The A*15 is supposed to fill the buffer and the EBP then overwrites the return address with the address of foo (\x08\x04\x9d\xd7) but I continue to get segmentation faults. Any assistance would be great!
Any mistake and the attempt will segfault. You must:
have the right target address
put it in the right place on the stack
use the right byte order
The first one is difficult because the kernel will randomize address spaces on load,
primarily because of these kinds of attacks.
The other two you've gotten wrong.
If you'd like to play with something similar, here's an example
that changes the return address. Because of C calling conventions,
the stack is corrupted at the end of main, which can be fixed by using
stdcall or pascal calling conventions for the test function.
Syntax for that is compiler dependent.
#include <stdio.h>
#include <stdlib.h>
void oops() {
printf("oops!\n");
}
void /*__stdcall*/ test(int t)
{
/* x86 stack is top down, int is same size as pointer */
int *return_is_at = &t - 1;
/* replace parameter with our return address, for oops to return to */
*(&t) = *return_is_at; /* just-in-case avoid optimization*/
/* replace our return address with address of oops */
*return_is_at = (int)oops;
}
int main(int argc, char **argv)
{
test(1);
printf("test returned\n");
/* unless stdcall, at this point our stack is corrupted
and this return will crash, so:
*/
exit(1);
}
Here's an alternative function that uses a local variable to calculate
the return address location intead of the parameter.
This assumes a standard stack frame, which the compiler may optimize away.
It also corrupts the stack.
void test2()
{
/* x86 stack is top down, int is same size as pointer */
/* this relies on consistently defined stack frames */
int l;
int *return_is_at = &l + 2;
/* copy our return address up one,
for oops to return to (corrupting the stack)
*/
return_is_at[1] = *return_is_at;
/* replace our return address with address of oops */
*return_is_at = (int)oops;
}
FYI - It's possible to use a similar technique to track unique call trees for a function
(by walking up the stack frames) in order to fail specific call instances during testing.

GNU Lightning - Lisp like "Apply" function

I'm trying to make a sort of Lisp like "Apply" function with GNU Lightning: a function F that receive a pointer to a function, an argument count and an array of integers and call G with the right number of parameters.
My code is not working properly. What's wrong? How can I do?
Here the code:
#include <stdio.h>
#include <stdlib.h>
#include <lightning.h>
int f0() {
printf("f0();\n");
}
int f1(int p1) {
printf("f1(%d);\n", p1);
}
int f2(int p1, int p2) {
printf("f2(%d, %d);\n", p1, p2);
}
int f3(int p1, int p2, int p3) {
printf("f3(%d, %d %d)\n", p1, p2, p3);
}
int main (int argc, char **argv) {
init_jit(argv[0]);
jit_state_t *_jit = jit_new_state();
int (*f)(int *g, int argc, int *argv);
jit_prolog();
jit_node_t *p_g = jit_arg();
jit_node_t *p_argc = jit_arg();
jit_node_t *p_argv = jit_arg();
jit_getarg(JIT_R0, p_g);
jit_getarg(JIT_R1, p_argc);
jit_getarg(JIT_R2, p_argv);
jit_prepare();
/* for ( ; argc; argc--, argv++) */
jit_node_t *label = jit_label();
jit_node_t *zero = jit_beqi(JIT_R1, 0);
// *argv
jit_ldr_i(JIT_V0, JIT_R2);
jit_pushargr(JIT_V0);
// Go next
// argv++
jit_addi(JIT_R2, JIT_R2, sizeof(int));
// argc--
jit_subi(JIT_R1, JIT_R1, 1);
//
jit_patch_at(jit_jmpi(), label);
jit_patch(zero);
jit_finishr(JIT_R0);
jit_reti(0);
jit_epilog();
f = jit_emit();
jit_clear_state();
f((void *)f0, 0, NULL);
int a1[] = {10};
f((void *)f1, 1, a1);
int a2[] = {100, 200};
f((void *)f2, 2, a2);
int a3[] = {1000, 2000, 3000};
f((void *)f3, 3, a3);
finish_jit();
return 0;
}
Expected output:
f0();
f1(10);
f2(100,200);
f3(1000,2000,3000);
Real output:
f0();
f1(10);
f2(200, 2);
f3(3000, 3 -13360)
If we add a call to jit_disassemble() to your code, we see that the generated code for your f function looks like this:
0x7f2511280000 sub $0x30,%rsp
0x7f2511280004 mov %rbx,0x28(%rsp)
0x7f2511280009 mov %rbp,(%rsp)
0x7f251128000d mov %rsp,%rbp
0x7f2511280010 sub $0x18,%rsp
0x7f2511280014 mov %rdi,%rax
0x7f2511280017 mov %rsi,%r10
0x7f251128001a mov %rdx,%r11
0x7f251128001d nopl (%rax)
0x7f2511280020 test %r10,%r10
0x7f2511280023 je 0x7f2511280040
0x7f2511280029 movslq (%r11),%rbx
0x7f251128002c mov %rbx,%rdi
0x7f251128002f add $0x4,%r11
0x7f2511280033 sub $0x1,%r10
0x7f2511280037 jmpq 0x7f2511280020
0x7f251128003c nopl 0x0(%rax)
0x7f2511280040 callq *%rax
0x7f2511280042 xor %rax,%rax
0x7f2511280045 mov %rbp,%rsp
0x7f2511280048 mov 0x28(%rsp),%rbx
0x7f251128004d mov (%rsp),%rbp
0x7f2511280051 add $0x30,%rsp
0x7f2511280055 retq
If we look at the code that's generated by jit_pushargr, the problem becomes apparent:
0x7f251128002c mov %rbx,%rdi
So this sets the value of *argv as the first argument of the called function (rdi being the register that holds the first argument in the x64 calling convention). It does so multiple times because it's in a loop, but it's always the first argument that's being set. So when the function is called after the loop, rdi will hold the value that was last written to it in the loop (i.e. the last value inside argv) and the other argument registers / memory locations won't have been written to at all.
The way that jit_pushargr works is that it will write to the first argument when you call it the first time, then the second argument when you call it the second time and so on. But in your code you only call it once, so it only ever writes the first argument.
So what you'd need to do to create an apply function using lightning would be to actually call jit_pushargr argc times. That means that instead of generating a general apply function, you'd want to define the apply function itself in normal C and instead generate a helper function that pushes the given number of arguments.
Alternatively you could do this entirely without lightning and instead use libffi for this, which would be the more traditional tool for uses like this and wouldn't incur the overhead of generating new code every time apply is called. Of course this wouldn't prevent you to call your apply function from code generated by lightning.

How to make a Hook and Trampoline function in one for WinAPI hooking

So I have been learning about the concept of hooking and using trampolines in order to bypass/execute data in a WinAPI hook function (In a different executable file, using DLL injection). So far I know how to make it (the trampoline and hook) using a mixture of assembly and C, but I can't seem to do it with just using C, as I seem to be missing something. I'd appreciate if someone could tell me what I'm doing wrong and how to fix it up.
Right now my code:
#include <Windows.h>
unsigned char* address = 0;
__declspec(naked) int __stdcall MessageBoxAHookTrampoline(HWND Window, char* Message, char* Title, int Type) {
__asm
{
push ebp
mov ebp, esp
mov eax, address
add eax, 5
jmp eax
}
}
int __stdcall MessageBoxAHook(HWND Window, char* Message, char* Title, int Type) {
wchar_t* WMessage = L"Hooked!";
wchar_t* WTitle = L"Success!";
MessageBoxW(0, WMessage, WTitle, 0);
return MessageBoxAHookTrampoline(Window, Message, Title, Type);
}
unsigned long __stdcall Thread(void* Context) {
address = (unsigned char*)GetProcAddress(LoadLibraryA("user32"), "MessageBoxA");
ULONG OP = 0;
if (VirtualProtect(address, 1, PAGE_EXECUTE_READWRITE, &OP)) {
memset(address, 0x90, 5);
*address = 0xE9;
*(unsigned long*)(address + 1) = (unsigned long)MessageBoxAHook - (unsigned long)address - 5;
}
else {
MessageBoxA(0, "Failed to change protection", "RIP", 0);
}
return 1;
}
// Entry point.
BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved) {
if (fdwReason == DLL_PROCESS_ATTACH) {
CreateThread(0, 0, Thread, 0, 0, 0);
}
else if (fdwReason == DLL_PROCESS_DETACH) {
}
return true;
}
So question is: How would I make a function say InstallHook that will install the hook and return a trampoline so I can use it easily?
Function prototype probably would be: void* InstallHook(void* originalFunc, void* targetFunc, int jumpsize), or so I've understood reading online, but unsure what jumpsize would be used for.
So far I know that the first 5 bytes must be preserved and restored, and then there's a jump to the address of the original hooked function. So I'd have to use malloc to allocate memory, memcpy to copy bytes over, the 0xE9 is the value of a jump instruction and such, but I just don't know how to implement it using just pure C. I figure it would be something similar to the code in this question. So how can I write a hook function that returns a trampoline using pure C for WinAPI functions?
If I understood the question correctly, you want to avoid "hard-coding" the trampoline function in assembly, presumably so you could have multiple trampolines in use at the same time without duplicating the code. You can achieve this using VirtualAlloc (malloc won't work since the returned memory won't be executable).
I wrote this from memory without access to a compiler so it might have some minor bugs, but the general idea is here. Normally you would also use VirtualProtect to change the page permissions to r-x instead of rwx once you're done modifying it, but I've left that out for the sake of simplicity:
void *CreateTrampoline(void *originalFunc)
{
/* Allocate the trampoline function */
uint8_t *trampoline = VirtualAlloc(
NULL,
5 + 5, /* 5 for the prologue, 5 for the JMP */
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE); /* Make trampoline executable */
/* Copy the original function's prologue */
memcpy(trampoline, originalFunc, 5);
/* JMP rel/32 opcode */
trampoline[5] = 0xE9;
/* JMP rel/32 operand */
uint32_t jmpDest = (uint32_t)originalFunc + 5; /* Skip original prologue */
uint32_t jmpSrc = (uint32_t)trampoline + 10; /* Starting after the JMP */
uint32_t delta = jmpDest - jmpSrc;
memcpy(trampoline + 6, &delta, 4);
return trampoline;
}
Your InstallHook function would then just call CreateTrampoline to create a trampoline, then patch the first 5 bytes of the original function with a JMP rel/32 to your hook.
Be warned, this only works on WinAPI functions, because Microsoft requires that they have a 5-byte prologue to enable hot-patching (which is what you're doing here). Normal functions do not have this requirement -- usually they only start with push ebp; mov ebp, esp which is only 3 bytes (and sometimes not even that, if the compiler decides to optimize it out).
Edit: here's how the math works:
_______________delta______________
| |
trampoline | originalFunc |
| | | |
v | v v
[prologue][jmp delta] [prologue][rest of func]
|________||_________| |________|
5 + 5 5

Probable instruction Cache Synchronization issue in self modifying code?

A lot of related questions <How is x86 instruction cache synchronized? > mention x86 should properly handle i-cache synchronization in self modifying code. I wrote the following piece of code which toggles a function call on and off from different threads interleaved with its execution. I am using compare and swap operation as an additional guard so that the modification is atomic. But I am getting intermittent crashes (SIGSEGV, SIGILL) and analyzing the core dump makes me suspicious if the processor is trying to execute partially updated instructions. The code and the analysis given below. May be I am missing something here. Let me know if that's the case.
toggle.c
#include <stdio.h>
#include <inttypes.h>
#include <time.h>
#include <pthread.h>
#include <sys/mman.h>
#include <errno.h>
#include <unistd.h>
int active = 1; // Whether the function is toggled on or off
uint8_t* funcAddr = 0; // Address where function call happens which we need to toggle on/off
uint64_t activeSequence = 0; // Byte sequence for toggling on the function CALL
uint64_t deactiveSequence = 0; // NOP byte sequence for toggling off the function CALL
inline int modify_page_permissions(uint8_t* addr) {
long page_size = sysconf(_SC_PAGESIZE);
int code = mprotect((void*)(addr - (((uint64_t)addr)%page_size)), page_size,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (code) {
fprintf(stderr, "mprotect was not successfull! code %d\n", code);
fprintf(stderr, "errno value is : %d\n", errno);
return 0;
}
// If the 8 bytes we need to modify straddles a page boundary make the next page writable too
if (page_size - ((uint64_t)addr)%page_size < 8) {
code = mprotect((void*)(addr-((uint64_t)addr)%page_size+ page_size) , page_size,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (code) {
fprintf(stderr, "mprotect was not successfull! code %d\n", code);
fprintf(stderr, "errno value is : %d\n", errno);
return 0;;
}
}
return 1;
}
void* add_call(void* param) {
struct timespec ts;
ts.tv_sec = 0;
ts.tv_nsec = 50000;
while (1) {
if (!active) {
if (activeSequence != 0) {
int status = modify_page_permissions(funcAddr);
if (!status) {
return 0;
}
uint8_t* start_addr = funcAddr - 8;
fprintf(stderr, "Activating foo..\n");
uint64_t res = __sync_val_compare_and_swap((uint64_t*) start_addr,
*((uint64_t*)start_addr), activeSequence);
active = 1;
} else {
fprintf(stderr, "Active sequence not initialized..\n");
}
}
nanosleep(&ts, NULL);
}
}
int remove_call(uint8_t* addr) {
if (active) {
// Remove gets called first before add so we initialize active and deactive state byte sequences during the first call the remove
if (deactiveSequence == 0) {
uint64_t sequence = *((uint64_t*)(addr-8));
uint64_t mask = 0x0000000000FFFFFF;
uint64_t deactive = (uint64_t) (sequence & mask);
mask = 0x9090909090000000; // We NOP 5 bytes of CALL instruction and leave rest of the 3 bytes as it is
activeSequence = sequence;
deactiveSequence = deactive | mask;
funcAddr = addr;
}
int status = modify_page_permissions(addr);
if (!status) {
return -1;
}
uint8_t* start_addr = addr - 8;
fprintf(stderr, "Deactivating foo..\n");
uint64_t res = __sync_val_compare_and_swap((uint64_t*)start_addr,
*((uint64_t*)start_addr), deactiveSequence);
active = 0;
// fprintf(stderr, "Result : %p\n", res);
}
}
int counter = 0;
void foo(int i) {
// Use the return address to determine where we need to patch foo CALL instruction (5 bytes)
uint64_t* addr = (uint64_t*)__builtin_extract_return_addr(__builtin_return_address(0));
fprintf(stderr, "Foo counter : %d\n", counter++);
remove_call((uint8_t*)addr);
}
// This thread periodically checks if the method is inactive and if so reactivates it
void spawn_add_call_thread() {
pthread_t tid;
pthread_create(&tid, NULL, add_call, (void*)NULL);
}
int main() {
spawn_add_call_thread();
int i=0;
for (i=0; i<1000000; i++) {
// fprintf(stderr, "i : %d..\n", i);
foo(i);
}
fprintf(stderr, "Final count : %d..\n\n\n", counter);
}
Core dump analysis
Program terminated with signal 4, Illegal instruction.
#0 0x0000000000400a28 in main () at toggle.c:123
(gdb) info frame
Stack level 0, frame at 0x7fff7c8ee360:
rip = 0x400a28 in main (toggle.c:123); saved rip 0x310521ed5d
source language c.
Arglist at 0x7fff7c8ee350, args:
Locals at 0x7fff7c8ee350, Previous frame's sp is 0x7fff7c8ee360
Saved registers:
rbp at 0x7fff7c8ee350, rip at 0x7fff7c8ee358
(gdb) disas /r 0x400a28,+30
Dump of assembler code from 0x400a28 to 0x400a46:
=> 0x0000000000400a28 <main+64>: ff (bad)
0x0000000000400a29 <main+65>: ff (bad)
0x0000000000400a2a <main+66>: ff eb ljmpq *<internal disassembler error>
0x0000000000400a2c <main+68>: e7 48 out %eax,$0x48
(gdb) disas /r main
Dump of assembler code for function main:
0x00000000004009e8 <+0>: 55 push %rbp
...
0x0000000000400a24 <+60>: 89 c7 mov %eax,%edi
0x0000000000400a26 <+62>: e8 11 ff ff ff callq 0x40093c <foo>
0x0000000000400a2b <+67>: eb e7 jmp 0x400a14 <main+44>
So as can be seen the instruction pointer seems to positioned within an address inside the CALL instruction and processor is apparently trying to execute that misaligned instruction causing an illegal instruction fault.
I think your problem is that you replaced a 5-byte CALL instruction with 5 1-byte NOPs. Consider what happens when your thread has executed 3 of the NOPs, and then your master thread decides to swap the CALL instruction back in. Your thread's PC will be three bytes in the middle of the CALL instruction and will therefore execute an unexpected and likely illegal instruction.
What you need to do is swap the 5-byte CALL instruction with a 5-byte NOP. You just need to find a multibyte instruction that does nothing (such as or'ing a register against itself) and if you need some extra bytes, prepend some prefix bytes such as a gs override prefix and an address-size override prefix (both of which will do nothing). By using a 5-byte NOP, your thread will be guaranteed to either be at the CALL instruction or past the CALL instruction, but never inside of it.
On 80x86 most calls use a relative displacement, not an absolute address. Essentially its "call the code at here + < displacement >" and not "call the code at < address >".
For 64-bit code, the displacement may be 8 bits or 32-bits. It's never 64-bits.
For example, for a 2-byte "call with 8-bit displacement" instruction, you'd be trashing 6 bytes before the call instruction, the call opcode itself, and the instruction's operand (the displacement).
For another example, for a 5-byte "call with 32-bit displacement" instruction, you'd be trashing 3 bytes before the call instruction, the call opcode itself, and the instruction's operand (the displacement).
However...
These aren't the only way to call. For example, you can call using a function pointer, where the address of the code being called is not in the instruction at all (but may be in a register or be a variable in memory). There's also an optimisation called "tail call optimisation" where a call followed by a ret is replaced with a jmp (likely with some additional stack diddling for passing parameters, cleaning up the caller's local variables, etc).
Essentially; your code is severely broken, you can't cover all the possible corner cases, you shouldn't be doing this to begin with, and you probably should be using a function pointer instead of self modifying code (which would be faster and easier and portable too).

C - Inline asm patching at runtime

I am writing a program in C and i use inline asm. In the inline assembler code is have some addresses where i want to patch them at runtime.
A quick sample of the code is this:
void __declspec(naked) inline(void)
{
mov eax, 0xAABBCCDD
call 0xAABBCCDD
}
An say i want to modify the 0xAABBCCDD value from the main C program.
What i tried to do is to Call VirtualProtect an is the pointer of the function in order to make it Writeable, and then call memcpy to add the appropriate values to the code.
DWORD old;
VirtualProtect(inline, len, PAGE_EXECUTE_READWRITE, &old);
However VirtualProtect fails and GetLastError() returns 487 which means accessing invalid address. Anyone have a clue about this problem??
Thanks
Doesn't this work?
int X = 0xAABBCCDD;
void __declspec(naked) inline(void)
{
mov eax, [X]
call [X]
}
How to do it to another process at runtime,
Create a variable that holds the program base address
Get the target RVA (Relative Virtual Address)
Then calculate the real address like this PA=RVA + BASE
then call it from your inline assembly
You can get the base address like this
DWORD dwGetModuleBaseAddress(DWORD dwProcessID)
{
TCHAR zFileName[MAX_PATH];
ZeroMemory(zFileName, MAX_PATH);
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, true, dwProcessID);
HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, dwProcessID);
DWORD dwModuleBaseAddress = 0;
if (hSnapshot != INVALID_HANDLE_VALUE)
{
MODULEENTRY32 ModuleEntry32 = { 0 };
ModuleEntry32.dwSize = sizeof(MODULEENTRY32);
if (Module32First(hSnapshot, &ModuleEntry32))
{
do
{
if (wcscmp(ModuleEntry32.szModule, L"example.exe") == 0)
{
dwModuleBaseAddress = (DWORD_PTR)ModuleEntry32.modBaseAddr;
break;
}
} while (Module32Next(hSnapshot, &ModuleEntry32));
}
CloseHandle(hSnapshot);
CloseHandle(hProcess);
}
return dwModuleBaseAddress;
}
Assuming you have a local variable and your base address
mov dword ptr ss : [ebp - 0x14] , eax;
mov eax, dword ptr BaseAddress;
add eax, PA;
call eax;
mov eax, dword ptr ss : [ebp - 0x14] ;
You have to restore the value of your Register after the call returns, since this value may be used somewhere down the code execution, assuming you're trying to patch an existing application that may depend on the eax register after your call. Although this method has it disadvantages, but at least it will give anyone idea on what to do.

Resources