C manually call function with stack and register - c

i know this is the big deal to manipulate stack but i think it would be a great lesson for me.
im searched the internet, and i found calling convention. I know how its working and why. I whant to simulate some of "Callee clean-up stack" maybe stdcall, fastcall its doesnt matter, important think is that who clean-up stack, then i will be have less work do to :)
for example.
i have function in C
double __fastcall Add(int a, int b) {
return a + b;
}
it will be Calee
and i have pointer to this function with type void*,
void* p = reinterpreted_cast<void*>(Add);
And i have function Caller
void Call(void* p, int a, int b) {
//some code about push and pop arg
//some code about save p into register
//some code about move 'p' into CPU to call this function manually
//some code about take of from stack/maybe register the output from function
}
And thats it, its helpful when i use calling convention "Calle clean-up" because i dont need
//some code about cleans-up this mess
I dont know how to do it, i know it can be done with assembler. but i afraid about it, and i never 'touch' this language. i would be greatful to simulate that calling with C, but when anyone can do it with ASM i will be haapy :)
I told also what i whant to do with it,
when i will be know how to manually call function, i will be able to call function with several parameters(if i know the number and size of it) and any type of function.
so i will be able to call any function in any language when that function is in the right calling convention.
I'm using Windows OS x64 and MinGw

First of all: C is intended to hide calling conventions and everything that is specific to how your code is executed from the programmer and provide an abstract layer above it.
The only condition when you need to (as you say) "manually" call a function is when you do it from asm.
C as a language has no direct control over the stack or the program counter.
To cite from GCC manual for fastcall for x86:
On the Intel 386, the `fastcall' attribute causes the compiler to
pass the first two arguments in the registers ECX and EDX.
Subsequent arguments are passed on the stack. The called function
will pop the arguments off the stack. If the number of arguments
is variable all arguments are pushed on the stack.
Also as far as I remember return values are passed in EAX.
So in order to call a function in this way you need to provide the arguments in ECX, EDX and then invoke the call instruction on the function address
int __fastcall Add(int a, int b) {
return a + b;
}
Please note I have changed the return type to int, because I do not remember how doubles are passed back.
int a, b;
// set a,b to something
void* p = reinterpreted_cast<void*>(Add);
int return_val;
asm (
"call %3"
: "=a" (return_val) // return value is passed in eax
: "c" (a) // pass c in ecx
, "d" (b) // pass b in edx
, "r" (p) // pass p in a random free register
);
By calling convention it is up to the callee to clean up any used stack space. In this case we didn't use any, but if we did then your compiler will translate your function Add in such a way that it cleans up the stack automatically.
The code above is actually a hack in such a way that I use the GCC extended asm syntax to automatically put our variables into the appropriate registers. It will generate sufficient code around this asm call to make sure data is consistent.
If you wish to use a stack based calling convention then cdecl is the standard one
int __cdecl Add(int a, int b) {
return a + b;
}
Then we need to push the arguments to the stack prior to calling
asm (
"push %1\n" // push a to the stack
"push %2\n" // push b to the stack
"call %3" // the callee will pop them from the stack and clean up
: "=a" (return_val) // return value is passed in eax
: "r" (a) // pass c in any register
, "r" (b) // pass b in any register
, "r" (p) // pass p in any register
);
One thing that I have not mentioned is that this asm call does not save any of our in-use registers, so I do not recommend putting this in a function that does anything else. In 32 bit x86 there is an instruction pushad that will push all general purpose registers to the stack and an equivalent (popad) to restore them. An equivalent for x86_64 is unavailable though. Normally when you compile C code the compiler will know which registers are in use and will save them in order for the callee not to overwrite them. Here it does not. If your callee uses registers that are in use by the caller - they will be overwritten!

Related

Is Ghidra misinterpreting a function call?

When analyzing the assembly listing in Ghidra, I stumbled upon this instruction:
CALL dword ptr [EBX*0x4 + 0x402ac0]=>DAT_00402abc
I assumed that the program was calling a function whose address was inside DAT_00402abc, which I initially thought it was a dword variable. Indeed, when trying to create a function in the location where DAT_00402abc is in, Ghidra wouldn't let me do it.
The decompiler shows to me this line of code to translate that instruction:
(*(code *)(&int2)[iVar2])();
So I was wondering, what does it mean and what's the program supposed to do with this call? Is there a possibility that Ghidra totally messed up? And if so, how should I interpret that instruction?
I'm not at all familiar with Ghidra, but I can tell you how to interpret the machine instruction...
CALL dword ptr [EBX*0x4 + 0x402ac0]
There is a table of function addresses at 0x402ac0; the EBX'th entry in that table is being called. I have no idea what DAT_00402abc means, but if you inspect memory in dword-sized chunks at address 0x0402ac0 you should find plausible function addresses. [EDIT: 0x0040_2abc = 0x0040_2ac0 - 4. I suspect this means Ghidra thinks EBX has value -1 when control reaches this point. It may be wrong, or maybe the program has a bug. One would expect EBX to have a nonnegative value when control reaches this point.]
The natural C source code corresponding to this instruction would be something like
extern void do_thing_zero(void);
extern void do_thing_one(void);
extern void do_thing_two(void);
extern void do_thing_three(void);
typedef void (*do_thing_ptr)(void);
const do_thing_ptr do_thing_table[4] = {
do_thing_zero, do_thing_one, do_thing_two, do_thing_three
};
// ...
void do_thing_n(unsigned int n)
{
if (n >= 4) abort();
do_thing_table[n]();
}
If the functions in the table take arguments or return values, you'll see argument-handing code before and after the CALL instruction you quoted, but the CALL instruction itself will not change.
You would be seeing something different and much more complicated if the functions didn't all take the same set of arguments.

Unwinding frame but do not return in C

My programming language compiles to C, I want to implement tail recursion optimization. The question here is how to pass control to another function without "returning" from the current function.
It is quite easy if the control is passed to the same function:
void f() {
__begin:
do something here...
goto __begin; // "call" itself
}
As you can see there is no return value and no parameters, those are passed in a separate stack adressed by a global variable.
Another option is to use inline assembly:
#ifdef __clang__
#define tail_call(func_name) asm("jmp " func_name " + 8");
#else
#define tail_call(func_name) asm("jmp " func_name " + 4");
#endif
void f() {
__begin:
do something here...
tail_call(f); // "call" itself
}
This is similar to goto but as goto passes control to the first statement in a function, skipping the "entry code" generated by a compiler, jmp is different, it's argument is a function pointer, and you need to add 4 or 8 bytes to skip the entry code.
The both above will work but only if the callee and the caller use the same amount of stack for local variables which is allocated by the entry code of the callee.
I was thinking to do leave manually with inline assembly, then replace the return address on the stack, then do a legal function call like f(). But my attempts all crashed. You need to modify BP and SP somehow.
So again, how to implement this for x64? (Again, assuming functions have no arguments and return void). Portable way without inline assembly is better, but assembly is accepted. Maybe longjump can be used?
Maybe you can even push the callee address on the stack, replacing the original return address and just ret?
Do not try to do this yourself. A good C compiler can perform tail-call elimination in many cases and will do so. In contrast, a hack using inline assembly has a good chance of going wrong in a way that is difficult to debug.
For example, see this snippet on godbolt.org. To duplicate it here:
The C code I used was:
int foo(int n, int o)
{
if (n == 0) return o;
puts("***\n");
return foo(n - 1, o + 1);
}
This compiles to:
.LC0:
.string "***\n"
foo:
test edi, edi
je .L4
push r12
mov r12d, edi
push rbp
mov ebp, esi
push rbx
mov ebx, edi
.L3:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
jne .L3
lea eax, [r12+rbp]
pop rbx
pop rbp
pop r12
ret
.L4:
mov eax, esi
ret
Notice that the tail call has been eliminated. The only call is to puts.
Since you don't need arguments and return values, how about combining all function into one and use labels instead of function names?
f:
__begin:
...
CALL(h); // a macro implementing traditional call
...
if (condition_ret)
RETURN; // a macro implementing traditional return
...
goto g; // tail recurse to g
The tricky part here is RETURN and CALL macros. To return you should keep yet another stack, a stack of setjump buffers, so when you return you call longjump(ret_stack.pop()), and when you call you do ret_stack.push(setjump(f)). This is poetical rendition ofc, you'll need to fill out the details.
gcc can offer some optimization here with computed goto, they are more lightweight than longjump. Also people who write vms have similar problems, and seemingly have asm-based solutions for those even on MSVC, see example here.
And finally such approach even if it saves memory, may be confusing to compiler, so can cause performance anomalies. You probably better off generating for some portable assembler-like language, llvm maybe? Not sure, should be something that has computed goto.
The venerable approach to this problem is to use trampolines. Essentially, every compiled function returns a function pointer (and maybe an arg count). The top level is a tight loop that, starting with your main, simply calls the returned function pointer ad infinitum. You could use a function that longjmps to escape the loop, i.e., to terminate the progam.
See this SO Q&A. Or Google "recursion tco trampoline."
For another approach, see Cheney on the MTA, where the stack just grows until it's full, which triggers a GC. This works once the program is converted to continuation passing style (CPS) since in that style, functions never return; so, after the GC, the stack is all garbage, and can be reused.
I will suggest a hack. The x86 call instruction, which is used by the compiler to translate your function calls, pushes the return address on the stack and then performs a jump.
What you can do is a bit of a stack manipulation, using some inline assembly and possibly some macros to save yourself a bit of headache. You basically have to overwrite the return address on the stack, which you can do immediately in the function called. You can have a wrapper function which overwrites the return address and calls your function - the control flow will then return to the wrapper which then moves to wherever you pointed it to.

Writing a thunk to verify SysV ABI compliance

The SysV ABI defines the C-level and assembly calling conventions for Linux.
I would like to write a generic thunk that verifies that a function satisfied the ABI restrictions on callee preserved registers and (perhaps) tried to return a value.
So given a target function like int foo(int, int) it's pretty easy3 to write such a thunk in assembly, something like1:
foo_thunk:
push rbp
push rbx
push r12
push r13
push r14
push r15
call foo
cmp rbp, [rsp + 40]
jne bad_rbp
cmp rbx, [rsp + 32]
jne bad_rbx
cmp r12, [rsp + 24]
jne bad_r12
cmp r13, [rsp + 16]
jne bad_r13
cmp r14, [rsp + 8]
jne bad_r14
cmp r15, [rsp]
jne bad_r15
ret
Now of course I don't actually wan to write a separate foo_thunk method for each call, I just want one generic one. This one should take a pointer to the underlying function (let's say in rax), and would use an indirect call call [rax] than call foo but would otherwise be the same.
What I can't figure out is how to to implement the transparent use of the thunk at the C level (or in C++, where there seems to be more meta-programming options - but let's stick to C here). I want to take something like:
foo(1, 2);
and translate it to a call to the thunk, but still passing the same arguments in the same places (that's needed for the thunk to work).
It is expected that I modify the source, perhaps with macro or template magic, so the call above could be changed to:
CHECK_THUNK(foo, (1, 2));
Giving the macro the name of the underlying function. In principle it could translate this to2:
check_thunk(&foo, 1, 2);
How can I declare check_thunk though? The first argument is "some type" of function pointer. We could try:
check_thunk(void (*ptr)(void), ...);
So a "generic" function pointer (all pointers can validly be cast to this, and we'll only actually call it assembly, outside the claws of the language standard), plus varargs.
This doesn't work though: the ... has totally different promotion rules than a properly prototyped function. It will work for the foo(1, 2) example, but if you call foo(1.0, 2) instead, the varargs version will just leave the 1.0 as a double and you'll be calling foo with a totally wrong value (a double value punned as an integer.
The above also has the disadvantage of passing the function pointer as the first argument, which means the thunk no longer works as-is: it has to save the function pointer in rdi somewhere and then shift all the values over by one (i.e., mov rdi, rsi). If there are non-register args, things get really messy.
Is there any way to make this work smoothly?
Note: this type of thunk is basically incompatible with any passing of parameters on the stack, which is an acceptable limitation of this approach (it should simply not be used for functions with that many arguments or with MEMORY class arguments).
1 This is checks the callee preserved registers, but the other checks are similarly straightforward.
2 In fact, you don't even really need the macro for that - but it's also there so you can turn off the thunk in release builds and just do a direct call.
3 Well by "easy" I guess I mean one that doesn't work in all cases. The shown thunk doesn't correctly align the stack (easy to fix), and breaks if foo has any stack-passed arguments (significantly harder to fix).
One way to do this, in a gcc-specific way, is to take advantage of typeof and nested functions to create a function pointer that embeds the call to the underlying function, but itself doesn't have any arguments.
This pointer can be passed to the thunk method, which calls it and verifies ABI compliance.
Here's an example of transforming a call to int add3(int, int, int) using this method:
The original call looks like:
int res = add3(a, b, c);
Then you wrap the call in a macro, like this2:
CALL_THUNKED(int res, add3, (a,b,c));
... which expands into something like:
typedef typeof(add3 (a,b,c)) ret_type;
ret_type closure() {
return add3 (a,b,c);
}
typedef ret_type (*typed_closure)(void);
typedef ret_type (*thunk_t)(typed_closure);
thunk_t thunk = (thunk_t)closure_thunk;
int res = thunk(&closure);
We create the closure() function on the stack, which calls directly into add3 with the original arguments. We can take the address of this closure and pass it an asm function without difficulty: calling it will have the ultimate effect of calling add3 with the arguments1.
The rest of the typedefs is basically dealing with the return type. We have only a single closure_thunk method, declared like this void* closure_thunk(void (*)(void)); and implemented in assembly. It takes a function pointer (any function pointer is convertible to any other), but the return type is "wrong". We cast it to thunk_t which is a dynamically generated typedef for a function that has the "right" return type.
Of course, that's certainly not legal for C functions, but we are implementing the function in asm, so we kind of sidestep the issue (if you wanted to be a bit more compliant, you could perhaps ask the asm code for a function pointer of the right type, which can "generate" it each time, outside of the reach of the standard: of course it's just returning the same pointer each time).
The closure_thunk function in asm is implemented along the lines of:
GLOBAL closure_thunk:function
closure_thunk:
push rsi
push_callee_saved
call rdi
; set up the function name
mov rdi, [rsp + 48]
; now check whether any regs were clobbered
cmp rbp, [rsp + 40]
jne bad_rbp
cmp rbx, [rsp + 32]
jne bad_rbx
cmp r12, [rsp + 24]
jne bad_r12
cmp r13, [rsp + 16]
jne bad_r13
cmp r14, [rsp + 8]
jne bad_r14
cmp r15, [rsp]
jne bad_r15
add rsp, 7 * 8
ret
That is, push all the registers we want to check on the stack (along with the function name), call the function in rdi and then do your checks. The bad_* methods aren't shown, but they basically spit out an error message like "Function add3 overwrote rbp... naughty!" and abort() the process.
This breaks if any arguments are passed on the stack, but it does work for return values passed on the stack (because the ABI for that case passes a pointer to the location for the return value in `rax).
1 How this is accomplished is kind of magic: gcc actually writes a few bytes of executable code onto the stack, and the closure function pointer points there. The few bytes basically loads a register with a pointer to the region that contains the captured variables (a, b, c in this case), and then calls the actual (read-only) closure() code which then can access the captured variables though that pointer (and pass them to add3).
2 As it turns out, we could probably use gcc's statement expression syntax to write the macro in a more usual function like syntax, something like int res = CALL_THUNKED(add3, (a,b,c)).
At the C source level (without modifying gcc or the linker to insert the thunk for you), you could define different prototypes for each thunk but still share the same implementation.
You could put multiple labels on the definition in the asm source, so check_thunk_foo has the same address as check_thunk_bar, but you can use a different C prototype for each.
Or you could make weak aliases like this:
int check_thunk_foo(void*, int, int)
__attribute__ ((weak, alias ("check_thunk_generic")));
// or maybe this should be ((weakref ("check_thunk_generic")))
#define foo(...) check_thunk_foo((void*)&foo, __VA_ARGS__)
// or to put the args in their original slots,
// but then you'd need different thunks for different numbers of integer args.
#define foo(x, y) check_thunk_foo((x), (y), (void*)&foo)
The major downside to this is that you need to copy+modify the original prototype for every function. You could hack this up with CPP macros so there's a single point of definition for the arg list, and the real prototype (and the thunk if enabled) both use it. Possibly by re-including the same .h twice, with a wrapper macro defined differently. Once for the real prototypes, again for the thunks.
BTW, passing the function pointer as an extra arg to a generic thunk is potentially problematic. I think it's not possible to reliably remove the first arg and forward the rest in the x86-64 SysV ABI. You don't know how many stack args there are, for functions that take more than 6 integer args. And you don't know if there are FP stack args before the first integer stack arg.
This should work fine for functions that pass all their register-possible args in registers. (i.e. if there are any stack args, they're large structs by value or other things that couldn't go in an integer register.)
To solve this problem, the thunk could dispatch based on return address instead of an extra hidden arg, if you had something like debug info to map call site return addresses to call targets. Or you could maybe get gcc to pass a hidden arg in rax or r11. Running call from inline asm sucks a lot, so you'd maybe need to customize gcc with support for some special attribute that passed a function pointer in an extra register.
but if you call foo(1.0, 2) instead, the varargs version will just leave the 1.0 as a double and you'll be calling foo with a totally wrong value (a double value punned as an integer.
Not that it matters, but no, you'd be calling foo(2, garbage) with xmm0=(double)1.0. Variadic functions still use register args the same as non-variadic functions (or with the option of passing FP args on the stack before you run out of registers, and setting al= less than 8).

Hooking a function I don't know the parameters to

Lets say there is a DLL A.DLL with a known entry point DoStuff that I have in some way hooked out with my own DLL fakeA.dll such that the system is calling my DoStuff instead. How do I write such a function such that it can then call the same entry point of the hooked DLL (A.DLL) without knowing the arguments of the function? I.e. My function in fakeA.DLL would look something like
LONG DoStuff(
// don't know what to put here
)
{
FARPROC pfnHooked;
HINSTANCE hHooked;
LONG lRet;
// get hooked library and desired function
hHooked = LoadLibrary("A.DLL");
pfnHooked = GetProcAddress(hHooked, "DoStuff");
// how do I call the desired function without knowing the parameters?
lRet = pfnHooked( ??? );
return lRet;
}
My current thinking is that the arguments are on the stack so I'm guessing I would have to have a sufficiently large stack variable (a big ass struct for example) to capture whatever the arguments are and then just pass it along to pfnHooked? I.e.
// actual arg stack limit is >1MB but we'll assume 1024 bytes is sufficient
typedef struct { char unknownData[1024]; } ARBITARY_ARG;
ARBITARY_ARG DoStuff(ARBITARY_ARG args){
ARBITARY_ARG aRet;
...
aRet = pfnHooked(args);
return aRet;
}
Would this work? If so, is there a better way?
UPDATE: After some rudimentary (and non-conclusive) testing passing in the arbitrary block as arguments DOES work (which is not surprising, as the program will just read what it needs off the stack). However collecting the return value is harder as if it's too large it can cause an access violation. Setting the arbitrary return size to 8 bytes (or maybe 4 for x86) may be a solution to most cases (including void returns) however that's still guesswork. If I had some way of knowing the return type from the DLL (not necessarily at runtime) that would be grand.
This should be a comment but the meta answer is yes you can hook the function without knowing the calling convention and arguments, on an x64/x86 platform. Can it be purely done in C? No, it also needs a good deal of understanding of various calling convention and Assembly programming. The hooking framework will have some of it's bits written in Assembly.
Most hooking framework inherently do that by creating a trampoline that redirects the execution flow from the called function's preamble to stub code that is generally independent of the function it is hooking. In user mode you're guaranteed stack to be always present so you can push your own local variables too on the same stack as long as you can pop them and restore the stack to it's original state.
You don't really need to copy the existing arguments to your own stack variable. You can just inspect the stack, definitely read a bit about calling convention and how stacks are constructed on different architectures for various types of invocation in assembly before you attempt anything.
yes, this is possible do generic hooking 100% correct - one common for multiple functions with different arguments count and calling conventions. for both x86/x64 (amd64) platforms.
but for this need use little asm stubs - of course it will be different for x86/x64 - but it will be very small - several lines of code only - 2 small stub procedures - one for filter pre-call and one for post-call. but most code implementation (95%+) will be platform independent and in c++ (of course this possible do and on c but compare c++ - c source code will be larger, ugly and harder to implement)
in my solution need allocate small executable blocks of code for every hooking api (one block per hooked api). in this block - store function name, original address (or to where transfer control after pre-call - this is depended from hooking method) and one relative call instruction to common asm pre-call stub. magic of this call not only that it transfer control to common stub, but that return address in stack will be point to block itself (ok , with some offset, but if we will use c++ and inheritance - it will be exactly point to some base class, from which we derive our executable block class). as result in common precall stub we will be have information - which api call we hook here and then pass this info to c++ common handler.
one note, because in x64 relative call can be only in range [rip-0x80000000, rip+0x7fffffff] need declare (allocate) this code blocks inside our PE in separate bss section and mark this section as RWE. we can not simply use VirtualAlloc for allocate storage, because returned address can be too far from our common precall stub.
in common asm precall stub code must save rcx,rdx,r8,r9 registers for x64 (this is absolute mandatory) and ecx,edx registers for x86. this is need for case if function use __fastcall calling conventions. however windows api for example almost not using __fastcall - only several __fastcall functions exist from thousands of win api (for ensure this and found this functions - go to LIB folder and search for __imp_# string (this is __fastcall common prefix) and then call c++ common handler, which must return address of original function(to where transfer control) to stub. stub restore rcx,rdx,r8,r9 (or ecx,edx) registers and jump (but not call !) to this address
if we want filter only pre-call this is all what we need. however in most case need filter (hook) and post-call - for view/modify function return value and out parameters. and this is also possible do, but need little more coding.
for hook post-call obviously we must replace the return address for hooked api. but on what we must change return address ? and where save original return address ? for this we can not use global variable. even can not use thread local (__declspec( thread ) or thread_local) because call can be reqursive. can not use volatile register (because it changed during api call) and can not use non-volatile register - because in this case we will be save it,for restore later - but got some question - where ?
only one (and nice) solution here - allocate small block of executable memory (RWE) which containing one relative call instruction to common post-call asm stub. and some data - saved original return address, function parameters(for check out parameters in post handler) and function name
here again, some issuer for x64 - this block must be not too far from common post stub (+/- 2GB) - so the best also allocate this stubs in separate .bss section (with the pre-call stubs).
how many need this ret-stubs ? one per api call (if we want control post call). so not more than api calls active at any time. usually say 256 pre-allocated blocks - more than enough. and even if we fail allocate this block in pre-call - we only not control it post call, but not crash. and we can not for all hooked api want control post-call but only for some.
for very fast and interlocked alloc/free this blocks - need build stack semantic over it. allocate by interlocked pop and free by interlocked push. and pre initialize (call instruction) this blocks at begin (while push all it to stack, for not reinitialize it every time in pre-call)
common post-call stub in asm is very simply - here we not need save any registers. we simply call c++ post handler with address of block (we pop it from stack - result of call instruction from block) and with original return value (rax or eax). strictly said - api function can return pair rax+rdx or eax+edx but 99.9%+ of windows api return value in single register and i assume that we will be hooking only this api. however if want, can little adjust code for handle this too (simply in most case this not need)
c++ post call handler restore original return address (by using _AddressOfReturnAddress()), can log call and/or modify out parameters and finally return to.. original caller of api. what our handler return - this and will be final return value of api call. usually we mast return original value.
c++ code
#if 0
#define __ASM_FUNCTION __pragma(message(__FUNCDNAME__" proc\r\n" __FUNCDNAME__ " endp"))
#define _ASM_FUNCTION {__ASM_FUNCTION;}
#define ASM_FUNCTION {__ASM_FUNCTION;return 0;}
#define CPP_FUNCTION __pragma(message("extern " __FUNCDNAME__ " : PROC ; " __FUNCTION__))
#else
#define _ASM_FUNCTION
#define ASM_FUNCTION
#define CPP_FUNCTION
#endif
class CODE_STUB
{
#ifdef _WIN64
PVOID pad;
#endif
union
{
DWORD code;
struct
{
BYTE cc[3];
BYTE call;
};
};
int offset;
public:
void Init(PVOID stub)
{
// int3; int3; int3; call stub
code = 0xe8cccccc;
offset = RtlPointerToOffset(&offset + 1, stub);
C_ASSERT(sizeof(CODE_STUB) == RTL_SIZEOF_THROUGH_FIELD(CODE_STUB, offset));
}
PVOID Function()
{
return &call;
}
// implemented in .asm
static void __cdecl retstub() _ASM_FUNCTION;
static void __cdecl callstub() _ASM_FUNCTION;
};
struct FUNC_INFO
{
PVOID OriginalFunc;
PCSTR Name;
void* __fastcall OnCall(void** stack);
};
struct CALL_FUNC : CODE_STUB, FUNC_INFO
{
};
C_ASSERT(FIELD_OFFSET(CALL_FUNC,OriginalFunc) == sizeof(CODE_STUB));
struct RET_INFO
{
union
{
struct
{
PCSTR Name;
PVOID params[7];
};
SLIST_ENTRY Entry;
};
INT_PTR __fastcall OnCall(INT_PTR r);
};
struct RET_FUNC : CODE_STUB, RET_INFO
{
};
C_ASSERT(FIELD_OFFSET(RET_FUNC, Entry) == sizeof(CODE_STUB));
#pragma bss_seg(".HOOKS")
RET_FUNC g_rf[1024];//max call count
CALL_FUNC g_cf[16];//max hooks count
#pragma bss_seg()
#pragma comment(linker, "/SECTION:.HOOKS,RWE")
class RET_FUNC_Manager
{
SLIST_HEADER _head;
public:
RET_FUNC_Manager()
{
PSLIST_HEADER head = &_head;
InitializeSListHead(head);
RET_FUNC* p = g_rf;
DWORD n = RTL_NUMBER_OF(g_rf);
do
{
p->Init(CODE_STUB::retstub);
InterlockedPushEntrySList(head, &p++->Entry);
} while (--n);
}
RET_FUNC* alloc()
{
return static_cast<RET_FUNC*>(CONTAINING_RECORD(InterlockedPopEntrySList(&_head), RET_INFO, Entry));
}
void free(RET_INFO* p)
{
InterlockedPushEntrySList(&_head, &p->Entry);
}
} g_rfm;
void* __fastcall FUNC_INFO::OnCall(void** stack)
{
CPP_FUNCTION;
// in case __fastcall function in x86 - param#1 at stack[-1] and param#2 at stack[-2]
// this need for filter post call only
if (RET_FUNC* p = g_rfm.alloc())
{
p->Name = Name;
memcpy(p->params, stack, sizeof(p->params));
*stack = p->Function();
}
return OriginalFunc;
}
INT_PTR __fastcall RET_INFO::OnCall(INT_PTR r)
{
CPP_FUNCTION;
*(void**)_AddressOfReturnAddress() = *params;
PCSTR name = Name;
char buf[8];
if (IS_INTRESOURCE(name))
{
sprintf(buf, "#%04x", (ULONG)(ULONG_PTR)name), name = buf;
}
DbgPrint("%p %s(%p, %p, %p ..)=%p\r\n", *params, name, params[1], params[2], params[3], r);
g_rfm.free(this);
return r;
}
struct DLL_TO_HOOK
{
PCWSTR szDllName;
PCSTR szFuncNames[];
};
void DoHook(DLL_TO_HOOK** pp)
{
PCSTR* ppsz, psz;
DLL_TO_HOOK *p;
ULONG n = RTL_NUMBER_OF(g_cf);
CALL_FUNC* pcf = g_cf;
while (p = *pp++)
{
if (HMODULE hmod = LoadLibraryW(p->szDllName))
{
ppsz = p->szFuncNames;
while (psz = *ppsz++)
{
if (pcf->OriginalFunc = GetProcAddress(hmod, psz))
{
pcf->Name = psz;
pcf->Init(CODE_STUB::callstub);
// do hook: pcf->OriginalFunc -> pcf->Function() - code for this skiped
DbgPrint("hook: (%p) <- (%p)%s\n", pcf->Function(), pcf->OriginalFunc, psz);
if (!--n)
{
return;
}
pcf++;
}
}
}
}
}
asm x64 code:
extern ?OnCall#FUNC_INFO##QEAAPEAXPEAPEAX#Z : PROC ; FUNC_INFO::OnCall
extern ?OnCall#RET_INFO##QEAA_J_J#Z : PROC ; RET_INFO::OnCall
?retstub#CODE_STUB##SAXXZ proc
pop rcx
mov rdx,rax
call ?OnCall#RET_INFO##QEAA_J_J#Z
?retstub#CODE_STUB##SAXXZ endp
?callstub#CODE_STUB##SAXXZ proc
mov [rsp+10h],rcx
mov [rsp+18h],rdx
mov [rsp+20h],r8
mov [rsp+28h],r9
pop rcx
mov rdx,rsp
sub rsp,18h
call ?OnCall#FUNC_INFO##QEAAPEAXPEAPEAX#Z
add rsp,18h
mov rcx,[rsp+8]
mov rdx,[rsp+10h]
mov r8,[rsp+18h]
mov r9,[rsp+20h]
jmp rax
?callstub#CODE_STUB##SAXXZ endp
asm x86 code
extern ?OnCall#FUNC_INFO##QAIPAXPAPAX#Z : PROC ; FUNC_INFO::OnCall
extern ?OnCall#RET_INFO##QAIHH#Z : PROC ; RET_INFO::OnCall
?retstub#CODE_STUB##SAXXZ proc
pop ecx
mov edx,eax
call ?OnCall#RET_INFO##QAIHH#Z
?retstub#CODE_STUB##SAXXZ endp
?callstub#CODE_STUB##SAXXZ proc
xchg [esp],ecx
push edx
lea edx,[esp + 8]
call ?OnCall#FUNC_INFO##QAIPAXPAPAX#Z
pop edx
pop ecx
jmp eax
?callstub#CODE_STUB##SAXXZ endp
you can ask from where i know this decorated names like ?OnCall#FUNC_INFO##QAIPAXPAPAX#Z ? look for very begin of c++ code - for several macros - and first time compile with #if 1 and look in output window. hope you understand (and you will be probably need use this names, but not my names - decoration can be different)
and how call void DoHook(DLL_TO_HOOK** pp) ? like that:
DLL_TO_HOOK dth_kernel32 = { L"kernel32", { "VirtualAlloc", "VirtualFree", "HeapAlloc", 0 } };
DLL_TO_HOOK dth_ntdll = { L"ntdll", { "NtOpenEvent", 0 } };
DLL_TO_HOOK* ghd[] = { &dth_ntdll, &dth_kernel32, 0 };
DoHook(ghd);
Lets say there is a DLL A.DLL with a known entry point DoStuff
If the entry point DoStuff is known it ought to be documented somewhere, at the very least in some C header file. So a possible approach might be to parse that header to get its signature (i.e. the C declaration of DoStuff). Maybe you could fill some database with the signature of all functions declared in all system header files, etc... Or perhaps use debug information if you have it.
If you call some function (in C) and don't give all the required parameters, the calling convention & ABI will still be used, and these (missing) parameters get garbage values (if the calling convention defines that parameter to be passed in a register, the garbage inside that register; if the convention defines that parameter to be passed on the call stack, the garbage inside that particular call stack slot). So you are likely to crash and surely have some undefined behavior (which is scary, since your program might seem to work but still be very wrong).
However, look also into libffi. Once you know (at runtime) what to pass to some arbitrary function, you can construct a call to it passing the right number and types of arguments.
My current thinking is that the arguments are on the stack
I think it is wrong (at least on many x86-64 systems). Some arguments are passed thru registers. Read about x86 calling conventions.
Would this work?
No, it won't work because some arguments are passed thru registers, and because the calling convention depends upon the signature of the called function (floating point values might be passed in different registers, or always on the stack; variadic functions have specific calling conventions; etc....)
BTW, some recent C optimizing compilers are able to do tail call optimizations, which might complicate things.
There is no standard way of doing this because lot of things like calling conventions, pointer sizes etc matter when passing arguments. You will have to read the ABI for your platform and write an implementation, which I fear again won't be possible in C. You will need some inline assembly.
One simple way to do it would be (for a platform like X86_64) -
MyDoStuff:
jmpq *__real_DoStuff
This hook does nothing but just calls the real function. If you want to do anything useful while hooking you will have to save restore some registers before the call (again what to save depends on the ABI)

Is there any operation in C analogous to this assembly code?

Today, I played around with incrementing function pointers in assembly code to create alternate entry points to a function:
.386
.MODEL FLAT, C
.DATA
INCLUDELIB MSVCRT
EXTRN puts:PROC
HLO DB "Hello!", 0
WLD DB "World!", 0
.CODE
dentry PROC
push offset HLO
call puts
add esp, 4
push offset WLD
call puts
add esp, 4
ret
dentry ENDP
main PROC
lea edx, offset dentry
call edx
lea edx, offset dentry
add edx, 13
call edx
ret
main ENDP
END
(I know, technically this code is invalid since it calls puts without the CRT being initialized, but it works without any assembly or runtime errors, at least on MSVC 2010 SP1.)
Note that in the second call to dentry I took the address of the function in the edx register, as before, but this time I incremented it by 13 bytes before calling the function.
The output of this program is therefore:
C:\Temp>dblentry
Hello!
World!
World!
C:\Temp>
The first output of "Hello!\nWorld!" is from the call to the very beginning of the function, whereas the second output is from the call starting at the "push offset WLD" instruction.
I'm wondering if this kind of thing exists in languages that are meant to be a step up from assembler like C, Pascal or FORTRAN. I know C doesn't let you increment function pointers but is there some other way to achieve this kind of thing?
AFAIK you can only write functions with multiple entry-points in asm.
You can put labels on all the entry points, so you can use normal direct calls instead of hard-coding the offsets from the first function-name.
This makes it easy to call them normally from C or any other language.
The earlier entry points work like functions that fall-through into the body of another function, if you're worried about confusing tools (or humans) that don't allow function bodies to overlap.
You might do this if the early entry-points do a tiny bit of extra stuff, and then fall through into the main function. It's mainly going to be a code-size saving technique (which might improve I-cache / uop-cache hit rate).
Compilers tend to duplicate code between functions instead of sharing large chunks of common implementation between slightly different functions.
However, you can probably accomplish it with only one extra jmp with something like:
int foo(int a) { return bigfunc(a + 1); }
int bar(int a) { return bigfunc(a + 2); }
int bigfunc(int x) { /* a lot of code */ }
See a real example on the Godbolt compiler explorer
foo and bar tailcall bigfunc, which is slightly worse than having bar fall-through into bigfunc. (Having foo jump over bar into bigfunc is still good, esp. if bar isn't that trivial.)
Jumping into the middle of a function isn't in general safe, because non-trivial functions usually need to save/restore some regs. So the prologue pushes them, and the epilogue pops them. If you jump into the middle, then the pops in the prologue will unbalance the stack. (i.e. pop off the return address into a register, and return to a garbage address).
See also Does a function with instructions before the entry-point label cause problems for anything (linking)?
You can use the longjmp function: http://www.cplusplus.com/reference/csetjmp/longjmp/
It's a fairly horrible function, but it'll do what you seek.

Resources