Hooking a function I don't know the parameters to - c

Lets say there is a DLL A.DLL with a known entry point DoStuff that I have in some way hooked out with my own DLL fakeA.dll such that the system is calling my DoStuff instead. How do I write such a function such that it can then call the same entry point of the hooked DLL (A.DLL) without knowing the arguments of the function? I.e. My function in fakeA.DLL would look something like
LONG DoStuff(
// don't know what to put here
)
{
FARPROC pfnHooked;
HINSTANCE hHooked;
LONG lRet;
// get hooked library and desired function
hHooked = LoadLibrary("A.DLL");
pfnHooked = GetProcAddress(hHooked, "DoStuff");
// how do I call the desired function without knowing the parameters?
lRet = pfnHooked( ??? );
return lRet;
}
My current thinking is that the arguments are on the stack so I'm guessing I would have to have a sufficiently large stack variable (a big ass struct for example) to capture whatever the arguments are and then just pass it along to pfnHooked? I.e.
// actual arg stack limit is >1MB but we'll assume 1024 bytes is sufficient
typedef struct { char unknownData[1024]; } ARBITARY_ARG;
ARBITARY_ARG DoStuff(ARBITARY_ARG args){
ARBITARY_ARG aRet;
...
aRet = pfnHooked(args);
return aRet;
}
Would this work? If so, is there a better way?
UPDATE: After some rudimentary (and non-conclusive) testing passing in the arbitrary block as arguments DOES work (which is not surprising, as the program will just read what it needs off the stack). However collecting the return value is harder as if it's too large it can cause an access violation. Setting the arbitrary return size to 8 bytes (or maybe 4 for x86) may be a solution to most cases (including void returns) however that's still guesswork. If I had some way of knowing the return type from the DLL (not necessarily at runtime) that would be grand.

This should be a comment but the meta answer is yes you can hook the function without knowing the calling convention and arguments, on an x64/x86 platform. Can it be purely done in C? No, it also needs a good deal of understanding of various calling convention and Assembly programming. The hooking framework will have some of it's bits written in Assembly.
Most hooking framework inherently do that by creating a trampoline that redirects the execution flow from the called function's preamble to stub code that is generally independent of the function it is hooking. In user mode you're guaranteed stack to be always present so you can push your own local variables too on the same stack as long as you can pop them and restore the stack to it's original state.
You don't really need to copy the existing arguments to your own stack variable. You can just inspect the stack, definitely read a bit about calling convention and how stacks are constructed on different architectures for various types of invocation in assembly before you attempt anything.

yes, this is possible do generic hooking 100% correct - one common for multiple functions with different arguments count and calling conventions. for both x86/x64 (amd64) platforms.
but for this need use little asm stubs - of course it will be different for x86/x64 - but it will be very small - several lines of code only - 2 small stub procedures - one for filter pre-call and one for post-call. but most code implementation (95%+) will be platform independent and in c++ (of course this possible do and on c but compare c++ - c source code will be larger, ugly and harder to implement)
in my solution need allocate small executable blocks of code for every hooking api (one block per hooked api). in this block - store function name, original address (or to where transfer control after pre-call - this is depended from hooking method) and one relative call instruction to common asm pre-call stub. magic of this call not only that it transfer control to common stub, but that return address in stack will be point to block itself (ok , with some offset, but if we will use c++ and inheritance - it will be exactly point to some base class, from which we derive our executable block class). as result in common precall stub we will be have information - which api call we hook here and then pass this info to c++ common handler.
one note, because in x64 relative call can be only in range [rip-0x80000000, rip+0x7fffffff] need declare (allocate) this code blocks inside our PE in separate bss section and mark this section as RWE. we can not simply use VirtualAlloc for allocate storage, because returned address can be too far from our common precall stub.
in common asm precall stub code must save rcx,rdx,r8,r9 registers for x64 (this is absolute mandatory) and ecx,edx registers for x86. this is need for case if function use __fastcall calling conventions. however windows api for example almost not using __fastcall - only several __fastcall functions exist from thousands of win api (for ensure this and found this functions - go to LIB folder and search for __imp_# string (this is __fastcall common prefix) and then call c++ common handler, which must return address of original function(to where transfer control) to stub. stub restore rcx,rdx,r8,r9 (or ecx,edx) registers and jump (but not call !) to this address
if we want filter only pre-call this is all what we need. however in most case need filter (hook) and post-call - for view/modify function return value and out parameters. and this is also possible do, but need little more coding.
for hook post-call obviously we must replace the return address for hooked api. but on what we must change return address ? and where save original return address ? for this we can not use global variable. even can not use thread local (__declspec( thread ) or thread_local) because call can be reqursive. can not use volatile register (because it changed during api call) and can not use non-volatile register - because in this case we will be save it,for restore later - but got some question - where ?
only one (and nice) solution here - allocate small block of executable memory (RWE) which containing one relative call instruction to common post-call asm stub. and some data - saved original return address, function parameters(for check out parameters in post handler) and function name
here again, some issuer for x64 - this block must be not too far from common post stub (+/- 2GB) - so the best also allocate this stubs in separate .bss section (with the pre-call stubs).
how many need this ret-stubs ? one per api call (if we want control post call). so not more than api calls active at any time. usually say 256 pre-allocated blocks - more than enough. and even if we fail allocate this block in pre-call - we only not control it post call, but not crash. and we can not for all hooked api want control post-call but only for some.
for very fast and interlocked alloc/free this blocks - need build stack semantic over it. allocate by interlocked pop and free by interlocked push. and pre initialize (call instruction) this blocks at begin (while push all it to stack, for not reinitialize it every time in pre-call)
common post-call stub in asm is very simply - here we not need save any registers. we simply call c++ post handler with address of block (we pop it from stack - result of call instruction from block) and with original return value (rax or eax). strictly said - api function can return pair rax+rdx or eax+edx but 99.9%+ of windows api return value in single register and i assume that we will be hooking only this api. however if want, can little adjust code for handle this too (simply in most case this not need)
c++ post call handler restore original return address (by using _AddressOfReturnAddress()), can log call and/or modify out parameters and finally return to.. original caller of api. what our handler return - this and will be final return value of api call. usually we mast return original value.
c++ code
#if 0
#define __ASM_FUNCTION __pragma(message(__FUNCDNAME__" proc\r\n" __FUNCDNAME__ " endp"))
#define _ASM_FUNCTION {__ASM_FUNCTION;}
#define ASM_FUNCTION {__ASM_FUNCTION;return 0;}
#define CPP_FUNCTION __pragma(message("extern " __FUNCDNAME__ " : PROC ; " __FUNCTION__))
#else
#define _ASM_FUNCTION
#define ASM_FUNCTION
#define CPP_FUNCTION
#endif
class CODE_STUB
{
#ifdef _WIN64
PVOID pad;
#endif
union
{
DWORD code;
struct
{
BYTE cc[3];
BYTE call;
};
};
int offset;
public:
void Init(PVOID stub)
{
// int3; int3; int3; call stub
code = 0xe8cccccc;
offset = RtlPointerToOffset(&offset + 1, stub);
C_ASSERT(sizeof(CODE_STUB) == RTL_SIZEOF_THROUGH_FIELD(CODE_STUB, offset));
}
PVOID Function()
{
return &call;
}
// implemented in .asm
static void __cdecl retstub() _ASM_FUNCTION;
static void __cdecl callstub() _ASM_FUNCTION;
};
struct FUNC_INFO
{
PVOID OriginalFunc;
PCSTR Name;
void* __fastcall OnCall(void** stack);
};
struct CALL_FUNC : CODE_STUB, FUNC_INFO
{
};
C_ASSERT(FIELD_OFFSET(CALL_FUNC,OriginalFunc) == sizeof(CODE_STUB));
struct RET_INFO
{
union
{
struct
{
PCSTR Name;
PVOID params[7];
};
SLIST_ENTRY Entry;
};
INT_PTR __fastcall OnCall(INT_PTR r);
};
struct RET_FUNC : CODE_STUB, RET_INFO
{
};
C_ASSERT(FIELD_OFFSET(RET_FUNC, Entry) == sizeof(CODE_STUB));
#pragma bss_seg(".HOOKS")
RET_FUNC g_rf[1024];//max call count
CALL_FUNC g_cf[16];//max hooks count
#pragma bss_seg()
#pragma comment(linker, "/SECTION:.HOOKS,RWE")
class RET_FUNC_Manager
{
SLIST_HEADER _head;
public:
RET_FUNC_Manager()
{
PSLIST_HEADER head = &_head;
InitializeSListHead(head);
RET_FUNC* p = g_rf;
DWORD n = RTL_NUMBER_OF(g_rf);
do
{
p->Init(CODE_STUB::retstub);
InterlockedPushEntrySList(head, &p++->Entry);
} while (--n);
}
RET_FUNC* alloc()
{
return static_cast<RET_FUNC*>(CONTAINING_RECORD(InterlockedPopEntrySList(&_head), RET_INFO, Entry));
}
void free(RET_INFO* p)
{
InterlockedPushEntrySList(&_head, &p->Entry);
}
} g_rfm;
void* __fastcall FUNC_INFO::OnCall(void** stack)
{
CPP_FUNCTION;
// in case __fastcall function in x86 - param#1 at stack[-1] and param#2 at stack[-2]
// this need for filter post call only
if (RET_FUNC* p = g_rfm.alloc())
{
p->Name = Name;
memcpy(p->params, stack, sizeof(p->params));
*stack = p->Function();
}
return OriginalFunc;
}
INT_PTR __fastcall RET_INFO::OnCall(INT_PTR r)
{
CPP_FUNCTION;
*(void**)_AddressOfReturnAddress() = *params;
PCSTR name = Name;
char buf[8];
if (IS_INTRESOURCE(name))
{
sprintf(buf, "#%04x", (ULONG)(ULONG_PTR)name), name = buf;
}
DbgPrint("%p %s(%p, %p, %p ..)=%p\r\n", *params, name, params[1], params[2], params[3], r);
g_rfm.free(this);
return r;
}
struct DLL_TO_HOOK
{
PCWSTR szDllName;
PCSTR szFuncNames[];
};
void DoHook(DLL_TO_HOOK** pp)
{
PCSTR* ppsz, psz;
DLL_TO_HOOK *p;
ULONG n = RTL_NUMBER_OF(g_cf);
CALL_FUNC* pcf = g_cf;
while (p = *pp++)
{
if (HMODULE hmod = LoadLibraryW(p->szDllName))
{
ppsz = p->szFuncNames;
while (psz = *ppsz++)
{
if (pcf->OriginalFunc = GetProcAddress(hmod, psz))
{
pcf->Name = psz;
pcf->Init(CODE_STUB::callstub);
// do hook: pcf->OriginalFunc -> pcf->Function() - code for this skiped
DbgPrint("hook: (%p) <- (%p)%s\n", pcf->Function(), pcf->OriginalFunc, psz);
if (!--n)
{
return;
}
pcf++;
}
}
}
}
}
asm x64 code:
extern ?OnCall#FUNC_INFO##QEAAPEAXPEAPEAX#Z : PROC ; FUNC_INFO::OnCall
extern ?OnCall#RET_INFO##QEAA_J_J#Z : PROC ; RET_INFO::OnCall
?retstub#CODE_STUB##SAXXZ proc
pop rcx
mov rdx,rax
call ?OnCall#RET_INFO##QEAA_J_J#Z
?retstub#CODE_STUB##SAXXZ endp
?callstub#CODE_STUB##SAXXZ proc
mov [rsp+10h],rcx
mov [rsp+18h],rdx
mov [rsp+20h],r8
mov [rsp+28h],r9
pop rcx
mov rdx,rsp
sub rsp,18h
call ?OnCall#FUNC_INFO##QEAAPEAXPEAPEAX#Z
add rsp,18h
mov rcx,[rsp+8]
mov rdx,[rsp+10h]
mov r8,[rsp+18h]
mov r9,[rsp+20h]
jmp rax
?callstub#CODE_STUB##SAXXZ endp
asm x86 code
extern ?OnCall#FUNC_INFO##QAIPAXPAPAX#Z : PROC ; FUNC_INFO::OnCall
extern ?OnCall#RET_INFO##QAIHH#Z : PROC ; RET_INFO::OnCall
?retstub#CODE_STUB##SAXXZ proc
pop ecx
mov edx,eax
call ?OnCall#RET_INFO##QAIHH#Z
?retstub#CODE_STUB##SAXXZ endp
?callstub#CODE_STUB##SAXXZ proc
xchg [esp],ecx
push edx
lea edx,[esp + 8]
call ?OnCall#FUNC_INFO##QAIPAXPAPAX#Z
pop edx
pop ecx
jmp eax
?callstub#CODE_STUB##SAXXZ endp
you can ask from where i know this decorated names like ?OnCall#FUNC_INFO##QAIPAXPAPAX#Z ? look for very begin of c++ code - for several macros - and first time compile with #if 1 and look in output window. hope you understand (and you will be probably need use this names, but not my names - decoration can be different)
and how call void DoHook(DLL_TO_HOOK** pp) ? like that:
DLL_TO_HOOK dth_kernel32 = { L"kernel32", { "VirtualAlloc", "VirtualFree", "HeapAlloc", 0 } };
DLL_TO_HOOK dth_ntdll = { L"ntdll", { "NtOpenEvent", 0 } };
DLL_TO_HOOK* ghd[] = { &dth_ntdll, &dth_kernel32, 0 };
DoHook(ghd);

Lets say there is a DLL A.DLL with a known entry point DoStuff
If the entry point DoStuff is known it ought to be documented somewhere, at the very least in some C header file. So a possible approach might be to parse that header to get its signature (i.e. the C declaration of DoStuff). Maybe you could fill some database with the signature of all functions declared in all system header files, etc... Or perhaps use debug information if you have it.
If you call some function (in C) and don't give all the required parameters, the calling convention & ABI will still be used, and these (missing) parameters get garbage values (if the calling convention defines that parameter to be passed in a register, the garbage inside that register; if the convention defines that parameter to be passed on the call stack, the garbage inside that particular call stack slot). So you are likely to crash and surely have some undefined behavior (which is scary, since your program might seem to work but still be very wrong).
However, look also into libffi. Once you know (at runtime) what to pass to some arbitrary function, you can construct a call to it passing the right number and types of arguments.
My current thinking is that the arguments are on the stack
I think it is wrong (at least on many x86-64 systems). Some arguments are passed thru registers. Read about x86 calling conventions.
Would this work?
No, it won't work because some arguments are passed thru registers, and because the calling convention depends upon the signature of the called function (floating point values might be passed in different registers, or always on the stack; variadic functions have specific calling conventions; etc....)
BTW, some recent C optimizing compilers are able to do tail call optimizations, which might complicate things.

There is no standard way of doing this because lot of things like calling conventions, pointer sizes etc matter when passing arguments. You will have to read the ABI for your platform and write an implementation, which I fear again won't be possible in C. You will need some inline assembly.
One simple way to do it would be (for a platform like X86_64) -
MyDoStuff:
jmpq *__real_DoStuff
This hook does nothing but just calls the real function. If you want to do anything useful while hooking you will have to save restore some registers before the call (again what to save depends on the ABI)

Related

Is Ghidra misinterpreting a function call?

When analyzing the assembly listing in Ghidra, I stumbled upon this instruction:
CALL dword ptr [EBX*0x4 + 0x402ac0]=>DAT_00402abc
I assumed that the program was calling a function whose address was inside DAT_00402abc, which I initially thought it was a dword variable. Indeed, when trying to create a function in the location where DAT_00402abc is in, Ghidra wouldn't let me do it.
The decompiler shows to me this line of code to translate that instruction:
(*(code *)(&int2)[iVar2])();
So I was wondering, what does it mean and what's the program supposed to do with this call? Is there a possibility that Ghidra totally messed up? And if so, how should I interpret that instruction?
I'm not at all familiar with Ghidra, but I can tell you how to interpret the machine instruction...
CALL dword ptr [EBX*0x4 + 0x402ac0]
There is a table of function addresses at 0x402ac0; the EBX'th entry in that table is being called. I have no idea what DAT_00402abc means, but if you inspect memory in dword-sized chunks at address 0x0402ac0 you should find plausible function addresses. [EDIT: 0x0040_2abc = 0x0040_2ac0 - 4. I suspect this means Ghidra thinks EBX has value -1 when control reaches this point. It may be wrong, or maybe the program has a bug. One would expect EBX to have a nonnegative value when control reaches this point.]
The natural C source code corresponding to this instruction would be something like
extern void do_thing_zero(void);
extern void do_thing_one(void);
extern void do_thing_two(void);
extern void do_thing_three(void);
typedef void (*do_thing_ptr)(void);
const do_thing_ptr do_thing_table[4] = {
do_thing_zero, do_thing_one, do_thing_two, do_thing_three
};
// ...
void do_thing_n(unsigned int n)
{
if (n >= 4) abort();
do_thing_table[n]();
}
If the functions in the table take arguments or return values, you'll see argument-handing code before and after the CALL instruction you quoted, but the CALL instruction itself will not change.
You would be seeing something different and much more complicated if the functions didn't all take the same set of arguments.

How do I call hex data stored in an array with inline assembly?

I have an OS project that I am working on and I am trying to call data that I have read from the disk in C with inline assembly.
I have already tried reading the code and executing it with the assembly call instruction, using inline assembly.
void driveLoop() {
uint16_t sectors = 31;
uint16_t sector = 0;
uint16_t basesector = 40000;
uint32_t i = 40031;
uint16_t code[sectors][256];
int x = 0;
while(x==0) {
read(i);
for (int p=0; p < 256; p++) {
if (readOut[p] == 0) {
} else {
x = 1;
//kprint_int(i);
}
}
i++;
}
kprint("Found sector!\n");
kprint("Loading OS into memory...\n");
for (sector=0; sector<sectors; sector++) {
read(basesector+sector);
for (int p=0; p<256; p++) {
code[sector][p] = readOut[p];
}
}
kprint("Done loading.\n");
kprint("Attempting to call...\n");
asm volatile("call (%0)" : : "r" (&code));
When the inline assembly is called I expect it to run the code from the sectors I read from the "disk" (this is in a VM, because its a hobby OS). What it does instead is it just hangs.
I probably don't much understand how variables, arrays, and assembly work, so if you could fill me in, that would be nice.
EDIT: The data I am reading from the disk is a binary file that was added
to the disk image file with
cat kernel.bin >> disk.img
and the kernel.bin is compiled with
i686-elf-ld -o kernel.bin -Ttext 0x4C4B40 *insert .o files here* --oformat binary
What it does instead is it just hangs.
Run your OS inside BOCHS so you can use BOCHS's built-in debugger to see exactly where it's stuck.
Being able to debug lockups, including with interrupts disabled, is probably very useful...
asm volatile("call (%0)" : : "r" (&code)); is unsafe because of missing clobbers.
But even worse than that it will load a new EIP value from the first 4 bytes of the array, instead of setting EIP to that address. (Unless the data you're loading is an array of pointers, not actual machine code?)
You have the %0 in parentheses, so it's an addressing mode. The assembler will warn you about an indirect call without *, but will assemble it like call *(%eax), with EAX = the address of code[0][0]. You actually want a call *%eax or whatever register the compiler chooses, register-indirect not memory-indirect.
&code and code are both just a pointer to the start of the array; &code doesn't create an anonymous pointer object storing the address of another address. &code takes the address of the array as a whole. code in this context "decays" to a pointer to the first object.
https://gcc.gnu.org/wiki/DontUseInlineAsm (for this).
You can get the compiler to emit a call instruction by casting the pointer to a function pointer.
__builtin___clear_cache(&code[0][0], &code[30][255]); // don't optimize away stores into the buffer
void (*fptr)(void) = (void*)code; // casting to void* instead of the actual target type is simpler
fptr();
That will compile (with optimization enabled) to something like lea 16(%esp), %eax / call *%eax, for 32-bit x86, because your code[][] buffer is an array on the stack.
Or to have it emit a jmp instead, do it at the end of a void function, or return funcptr(); in a non-void function, so the compiler can optimize the call/ret into a jmp tailcall.
If it doesn't return, you can declare it with __attribute__((noreturn)).
Make sure the memory page / segment is executable. (Your uint16_t code[]; is a local, so gcc will allocate it on the stack. This might not be what you want. The size is a compile-time constant so you could make it static, but if you do that for other arrays in other sibling functions (not parent or child), then you lose out on the ability to reuse a big chunk of stack memory for different arrays.)
This is much better than your unsafe inline asm. (You forgot a "memory" clobber, so nothing tells the compiler that your asm actually reads the pointed-to memory). Also, you forgot to declare any register clobbers; presumably the block of code you loaded will have clobbered some registers if it returns, unless it's written to save/restore everything.
In GNU C you do need to use __builtin__clear_cache when casting a data pointer to a function pointer. On x86 it doesn't actually clear any cache, it's telling the compiler that the stores to that memory are not dead because it's going to be read by execution. See How does __builtin___clear_cache work?
Without that, gcc could optimize away the copying into uint16_t code[sectors][256]; because it looks like a dead store. (Just like with your current inline asm which only asks for the pointer in a register.)
As a bonus, this part of your OS becomes portable to other architectures, including ones like ARM without coherent instruction caches where that builtin expands to a actual instructions. (On x86 it purely affects the optimizer).
read(basesector+sector);
It would probably be a good idea for your read function to take a destination pointer to read into, so you don't need to bounce data through your readOut buffer.
Also, I don't see why you'd want to declare your code as a 2D array; sectors are an artifact of how you're doing your disk I/O, not relevant to using the code after it's loaded. The sector-at-a-time thing should only be in the code for the loop that loads the data, not visible in other parts of your program.
char code[sectors * 512]; would be good.

Unwinding frame but do not return in C

My programming language compiles to C, I want to implement tail recursion optimization. The question here is how to pass control to another function without "returning" from the current function.
It is quite easy if the control is passed to the same function:
void f() {
__begin:
do something here...
goto __begin; // "call" itself
}
As you can see there is no return value and no parameters, those are passed in a separate stack adressed by a global variable.
Another option is to use inline assembly:
#ifdef __clang__
#define tail_call(func_name) asm("jmp " func_name " + 8");
#else
#define tail_call(func_name) asm("jmp " func_name " + 4");
#endif
void f() {
__begin:
do something here...
tail_call(f); // "call" itself
}
This is similar to goto but as goto passes control to the first statement in a function, skipping the "entry code" generated by a compiler, jmp is different, it's argument is a function pointer, and you need to add 4 or 8 bytes to skip the entry code.
The both above will work but only if the callee and the caller use the same amount of stack for local variables which is allocated by the entry code of the callee.
I was thinking to do leave manually with inline assembly, then replace the return address on the stack, then do a legal function call like f(). But my attempts all crashed. You need to modify BP and SP somehow.
So again, how to implement this for x64? (Again, assuming functions have no arguments and return void). Portable way without inline assembly is better, but assembly is accepted. Maybe longjump can be used?
Maybe you can even push the callee address on the stack, replacing the original return address and just ret?
Do not try to do this yourself. A good C compiler can perform tail-call elimination in many cases and will do so. In contrast, a hack using inline assembly has a good chance of going wrong in a way that is difficult to debug.
For example, see this snippet on godbolt.org. To duplicate it here:
The C code I used was:
int foo(int n, int o)
{
if (n == 0) return o;
puts("***\n");
return foo(n - 1, o + 1);
}
This compiles to:
.LC0:
.string "***\n"
foo:
test edi, edi
je .L4
push r12
mov r12d, edi
push rbp
mov ebp, esi
push rbx
mov ebx, edi
.L3:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
jne .L3
lea eax, [r12+rbp]
pop rbx
pop rbp
pop r12
ret
.L4:
mov eax, esi
ret
Notice that the tail call has been eliminated. The only call is to puts.
Since you don't need arguments and return values, how about combining all function into one and use labels instead of function names?
f:
__begin:
...
CALL(h); // a macro implementing traditional call
...
if (condition_ret)
RETURN; // a macro implementing traditional return
...
goto g; // tail recurse to g
The tricky part here is RETURN and CALL macros. To return you should keep yet another stack, a stack of setjump buffers, so when you return you call longjump(ret_stack.pop()), and when you call you do ret_stack.push(setjump(f)). This is poetical rendition ofc, you'll need to fill out the details.
gcc can offer some optimization here with computed goto, they are more lightweight than longjump. Also people who write vms have similar problems, and seemingly have asm-based solutions for those even on MSVC, see example here.
And finally such approach even if it saves memory, may be confusing to compiler, so can cause performance anomalies. You probably better off generating for some portable assembler-like language, llvm maybe? Not sure, should be something that has computed goto.
The venerable approach to this problem is to use trampolines. Essentially, every compiled function returns a function pointer (and maybe an arg count). The top level is a tight loop that, starting with your main, simply calls the returned function pointer ad infinitum. You could use a function that longjmps to escape the loop, i.e., to terminate the progam.
See this SO Q&A. Or Google "recursion tco trampoline."
For another approach, see Cheney on the MTA, where the stack just grows until it's full, which triggers a GC. This works once the program is converted to continuation passing style (CPS) since in that style, functions never return; so, after the GC, the stack is all garbage, and can be reused.
I will suggest a hack. The x86 call instruction, which is used by the compiler to translate your function calls, pushes the return address on the stack and then performs a jump.
What you can do is a bit of a stack manipulation, using some inline assembly and possibly some macros to save yourself a bit of headache. You basically have to overwrite the return address on the stack, which you can do immediately in the function called. You can have a wrapper function which overwrites the return address and calls your function - the control flow will then return to the wrapper which then moves to wherever you pointed it to.

C manually call function with stack and register

i know this is the big deal to manipulate stack but i think it would be a great lesson for me.
im searched the internet, and i found calling convention. I know how its working and why. I whant to simulate some of "Callee clean-up stack" maybe stdcall, fastcall its doesnt matter, important think is that who clean-up stack, then i will be have less work do to :)
for example.
i have function in C
double __fastcall Add(int a, int b) {
return a + b;
}
it will be Calee
and i have pointer to this function with type void*,
void* p = reinterpreted_cast<void*>(Add);
And i have function Caller
void Call(void* p, int a, int b) {
//some code about push and pop arg
//some code about save p into register
//some code about move 'p' into CPU to call this function manually
//some code about take of from stack/maybe register the output from function
}
And thats it, its helpful when i use calling convention "Calle clean-up" because i dont need
//some code about cleans-up this mess
I dont know how to do it, i know it can be done with assembler. but i afraid about it, and i never 'touch' this language. i would be greatful to simulate that calling with C, but when anyone can do it with ASM i will be haapy :)
I told also what i whant to do with it,
when i will be know how to manually call function, i will be able to call function with several parameters(if i know the number and size of it) and any type of function.
so i will be able to call any function in any language when that function is in the right calling convention.
I'm using Windows OS x64 and MinGw
First of all: C is intended to hide calling conventions and everything that is specific to how your code is executed from the programmer and provide an abstract layer above it.
The only condition when you need to (as you say) "manually" call a function is when you do it from asm.
C as a language has no direct control over the stack or the program counter.
To cite from GCC manual for fastcall for x86:
On the Intel 386, the `fastcall' attribute causes the compiler to
pass the first two arguments in the registers ECX and EDX.
Subsequent arguments are passed on the stack. The called function
will pop the arguments off the stack. If the number of arguments
is variable all arguments are pushed on the stack.
Also as far as I remember return values are passed in EAX.
So in order to call a function in this way you need to provide the arguments in ECX, EDX and then invoke the call instruction on the function address
int __fastcall Add(int a, int b) {
return a + b;
}
Please note I have changed the return type to int, because I do not remember how doubles are passed back.
int a, b;
// set a,b to something
void* p = reinterpreted_cast<void*>(Add);
int return_val;
asm (
"call %3"
: "=a" (return_val) // return value is passed in eax
: "c" (a) // pass c in ecx
, "d" (b) // pass b in edx
, "r" (p) // pass p in a random free register
);
By calling convention it is up to the callee to clean up any used stack space. In this case we didn't use any, but if we did then your compiler will translate your function Add in such a way that it cleans up the stack automatically.
The code above is actually a hack in such a way that I use the GCC extended asm syntax to automatically put our variables into the appropriate registers. It will generate sufficient code around this asm call to make sure data is consistent.
If you wish to use a stack based calling convention then cdecl is the standard one
int __cdecl Add(int a, int b) {
return a + b;
}
Then we need to push the arguments to the stack prior to calling
asm (
"push %1\n" // push a to the stack
"push %2\n" // push b to the stack
"call %3" // the callee will pop them from the stack and clean up
: "=a" (return_val) // return value is passed in eax
: "r" (a) // pass c in any register
, "r" (b) // pass b in any register
, "r" (p) // pass p in any register
);
One thing that I have not mentioned is that this asm call does not save any of our in-use registers, so I do not recommend putting this in a function that does anything else. In 32 bit x86 there is an instruction pushad that will push all general purpose registers to the stack and an equivalent (popad) to restore them. An equivalent for x86_64 is unavailable though. Normally when you compile C code the compiler will know which registers are in use and will save them in order for the callee not to overwrite them. Here it does not. If your callee uses registers that are in use by the caller - they will be overwritten!

C - How to create a pattern in code segment to recognize it in memory dump?

I dump my RAM (a piece of it - code segment only) in order to find where is which C function being placed. I have no map file and I don't know what boot/init routines exactly do.
I load my program into RAM, then if I dump the RAM, it is very hard to find exactly where is what function. I'd like to use different patterns build in the C source, to recognize them in the memory dump.
I've tryed to start every function with different first variable containing name of function, like:
char this_function_name[]="main";
but it doesn't work, because this string will be placed in the data segment.
I have simple 16-bit RISC CPU and an experimental proprietary compiler (no GCC or any well-known). The system has 16Mb of RAM, shared with other applications (bootloader, downloader). It is almost impossible to find say a unique sequence of N NOPs or smth. like 0xABCD. I would like to find all functions in RAM, so I need unique identificators of functions visible in RAM-dump.
What would be the best pattern for code segment?
If it were me, I'd use the symbol table, e.g. "nm a.out | grep main". Get the real address of any function you want.
If you really have no symbol table, make your own.
struct tab {
void *addr;
char name[100]; // For ease of searching, use an array.
} symtab[] = {
{ (void*)main, "main" },
{ (void*)otherfunc, "otherfunc" },
};
Search for the name, and the address will immediately preceed it. Goto address. ;-)
If your compiler has inline asm you can use it to create a pattern. Write some NOP instructions which you can easily recognize by opcodes in memory dump:
MOV r0,r0
MOV r0,r0
MOV r0,r0
MOV r0,r0
How about a completely different approach to your real problem, which is finding a particular block of code: Use diff.
Compile the code once with the function in question included, and once with it commented out. Produce RAM dumps of both. Then, diff the two dumps to see what's changed -- and that will be the new code block. (You may have to do some sort of processing of the dumps to remove memory addresses in order to get a clean diff, but the order of instructions ought to be the same in either case.)
Numeric constants are placed in the code segment, encoded in the function's instructions. So you could try to use magic numbers like 0xDEADBEEF and so on.
I.e. here's the disassembly view of a simple C function with Visual C++:
void foo(void)
{
00411380 push ebp
00411381 mov ebp,esp
00411383 sub esp,0CCh
00411389 push ebx
0041138A push esi
0041138B push edi
0041138C lea edi,[ebp-0CCh]
00411392 mov ecx,33h
00411397 mov eax,0CCCCCCCCh
0041139C rep stos dword ptr es:[edi]
unsigned id = 0xDEADBEEF;
0041139E mov dword ptr [id],0DEADBEEFh
You can see the 0xDEADBEEF making it into the function's source. Note that what you actually see in the executable depends on the endianness of the CPU (tx. Richard).
This is a x86 example. But RISC CPUs (MIPS, etc) have instructions moving immediates into registers - these immediates can have special recognizable values as well (although only 16-bit for MIPS, IIRC).
Psihodelia - it's getting harder and harder to catch your intention. Is it just a single function you want to find? Then can't you just place 5 NOPs one after another and look for them? Do you control the compiler/assembler/linker/loader? What tools are at your disposal?
As you noted, this:
char this_function_name[]="main";
... will end up setting a pointer in your stack to a data segment containing the string. However, this:
char this_function_name[]= { 'm', 'a', 'i', 'n' };
... will likely put all these bytes in your stack so you will be able to recognize the string in your code (I just tried it on my platform).
Hope this helps
Why not get each function to dump its own address. Something like this:
void* fnaddr( char* fname, void* addr )
{
printf( "%s\t0x%p\n", fname, addr ) ;
return addr ;
}
void test( void )
{
static void* fnaddr_dummy = fnaddr( __FUNCTION__, test ) ;
}
int main (int argc, const char * argv[])
{
static void* fnaddr_dummy = fnaddr( __FUNCTION__, main ) ;
test() ;
test() ;
}
By making fnaddr_dummy static, the dump is done once per-function. Obviously you would need to adapt fnaddr() to support whatever output or logging means you have on your system. Unfortunately, if the system performs lazy initialisation, you'll only get the addresses of the functions that are actually called (which may be good enough).
You could start each function with a call to the same dummy function like:
void identifyFunction( unsigned int identifier)
{
}
Each of your functions would call the identifyFunction-function with a different parameter (1, 2, 3, ...). This will not give you a magic mapfile, but when you inspect the code dump you should be able to quickly find out where the identifyFunction is because there will be lots of jumps to that address. Next scan for those jump and check before the jump to see what parameter is passed. Then you can make your own mapfile. With some scripting this should be fairly automatic.

Resources