How to hook an unknown number of functions - x86 - c

Problem description
At runtime, I am given a list of addresses of functions (in the same process). Each time any of them is called, I need to log its address.
My attempt
If there was just one function (with help of a hooking library like subhook) I could create a hook:
create_hook(function_to_be_hooked, intermediate)
intermediate(args...):
log("function with address {&function_to_be_hooked} got called")
remove_hook(function_to_be_hooked)
ret = function_to_be_hooked(args...)
create_hook(function_to_be_hooked, intermediate)
return ret
This approach does not trivially extend. I could add any number of functions at compile-time, but I only know how many I need at runtime. If I hook multiple functions with the same intermediate, it doesn't know who called it.
Details
It seems like this problem should be solved by a hooking library. I am using C/C++ and Linux and the only options seem to be subhook and funchook, but none of them seem to support this functionality.

This should be fairly doable with assembly language manually, like if you were modifying a hook library. The machine code that overwrites the start of the original function can set a register or global variable before jumping to (or calling) the hook. Using call would push a unique return address which the hook likely wouldn't want to actually return to. (So it unbalances the return-address predictor stack, unless the hook uses ret with a modified return address, or it uses some prefixes as padding to make the call hook or call [rel hook_ptr] or whatever end at an instruction boundary of the original code so it can ret.)
Like mov al, imm8 if the function isn't variadic in the x86-64 System V calling convention, or mov r11b, imm8 in x86-64. Or mov ah, imm8 would work in x86-64 SysV without disturbing the AL= # of XMM args for a variadic function and still only be 2 bytes. Or use push imm8.
If the hook function itself was written in asm, it would be straightforward for it to look for a register, and extra stack arg, or just a return address from a call, as an extra arg without disturbing its ability to find the args for the hooked function. If it's written in C, looking in a global (or thread-local) variable avoids needing a custom calling convention.
But with existing hook libraries, assuming you're right they don't pass an int id
Using that library interface, it seems you'd need to generate an unknown number of unique things that are callable as a function pointer? That's not something ISO C can do. It can be strictly ahead-of-time compiled, not needing to generate any new machine code at run-time. It's compatible with a strict Harvard architecture.
You could define a huge array of function pointers to hook1(), hook2(), etc. which each look for their own piece of side data in another struct member of that array. Enough hook functions that however many you need at run-time, you'll already have enough. Each one can hard-code the array element it should access for its unique string.
You could use some C preprocessor macros to define some large more-than-enough number of hooks, and separately get an array initialized with structs containing function pointers to them. Some CPP tricks may allow iterating over names so you don't have to manually write out define_hook(0) define_hook(1) ... define_hook(MAX_HOOKS-1). Or maybe have a counter as a CPP macro that gets #defined to a new higher value.
Unused hooks would be sitting in memory and in your executable on disk, but wouldn't ever be called so they wouldn't be hot in cache. Ones that didn't share a page with any other code wouldn't ever need to get paged in to RAM at all. Same for later parts of the array of pointers and side-data. It's inelegant and clunky, and doesn't allow an unbounded number, but if you can reasonably say that 1024 or 8000 "should be enough for everyone", then this can work.
Another way also has many downsides, different but worse than the above. Especially that it requires calling the rest of your program from the bottom of a recursion (not just calling an init function that returns normally), and uses a lot of stack space. (You might ulimit -s to bump up your stack size limit over Linux's usual 8MiB.) Also it requires GNU extensions.
GNU C nested functions can make new callable entities with, making "trampoline" machine code on the stack when you take the address of a nested function. This would your stack executable, so there's a security hardening downside. There'd be one copy of the actual machine code for the nested function, but n copies of trampoline code that sets up a pointer to the right stack frame. And n instances of a local variable that you can arrange to have different values.
So you could use a recursive function that went through your array of hooks like foo(counter+1, hooks+1), and have the hook be a nested function that reads counter. Or instead of a counter, it can be a char* or whatever you like; you just set it in this invocation of the function.
This is pretty nasty (the hook machine code and data is all on the stack) and uses potentially a lot of stack space for the rest of your program. You can't return from this recursion or your hooks will break. So the recursion base-case will have to be (tail) calling a function that implements the rest of your program, not returning to your ultimate caller until the program is ending.
C++ has some std:: callable objects, like std::function = std::bind of a member function of a specific object, but they're not type-compatible with function pointers.
You can't pass a std::function * pointer to a function expecting a bare void (*fptr)(void) function pointer; making that happen would potentially require the library to allocate some executable memory and generate machine code in it. But ISO C++ is designed to be strictly ahead-of-time compilable, so they don't support that.
std::function<void(void)> f = std::bind(&Class::member, hooks[i]); compiles, but the resulting std::function<void(void)> object can't convert to a void (*)() function pointer. (https://godbolt.org/z/TnYM6MYTP). The caller needs to know it's invoking a std::function<void()> object, not a function pointer. There is no new machine code, just data, when you do this.

My instinct is to follow a debugger path.
You would need
a uin8_t * -> uint8_t map,
a trap handler, and
a single step handler
In broad stokes,
When you get a request to monitor a function, add its address, and the byte pointed by it to the map. Patch the pointed-to byte with int3.
The trap handler shall get an offending address from the exception frame, and log it. Then It shall unpatch the byte with the value from the map, set the single-step flag (TF) in FLAGS (again, in the exception frame), and return. That will execute the instruction, and raise a single-step exception.
You can set TF from user-space yourself and catch the resulting SIGTRAPs until you clear it (on a POSIX OS); it's more common for TF to only be used by debuggers, e.g. set by the kernel as part of Linux's ptrace(PTRACE_SINGLESTEP). But setting/clearing TF is not a privileged operation. (Patching bytes of machine code with int3 is how debuggers implement software breakpoints, not using x86's dr0-7 hardware debug registers. In your own process, no system call is necessary after an mprotect to make it writeable.)
The single-step handler shall re-patch int3, and return to let the program run until it hits int3 again.
In POSIX, the exception frame is pointed by uap argument to a sigaction handler.
PROS:
No bloated binary
No compile-time instrumentation
CONS:
Tricky to implement correctly. Remapping text segment writable; invalidating I-cache; perhaps something more.
Huge performance penalty; a no-go in real-time system.

Funchook now implements this functionality (on master branch, to be released with 2.0.0).

Related

Serialize a function pointer in C and save it in a file?

I am working on a C file register program that handles arbitrary generic data so the user needs to supply functions to be used, these functions are saved in function pointer in the register struct and work nicely. But I need to be able to run these functions again when the program is restarted ideally without the user needing the supply them again. I serialize important data about the register structure and write it into a header.
I was wondering how I can save the functions there too, a compiled c function is just raw binary data, right? So there must be a way to store it into a file and load the function pointers from the content in the file, but I am not sure how to this. Can someone point me in the right direction?
I am assuming it's possible to do this is C since it allows you to do pretty much anything but I might be missing something, can I do this without system calls at all? Or if not what would be the simplest way to do this in posix?
The functions are supplied when creating the register or creating new secondary indexes:
registerHandler* createAndOpenRecordFile(
int overwrite, char *filename, int keyPos, fn_keyCompare userCompare, fn_serialize userSerialize, fn_deserialize userDeserialize, int type, ...)
And saved as functions pointers:
typedef void* (*fn_serialize)(void*);
typedef void* (*fn_deserialize)(void*);
typedef int (*fn_keyCompare) (const void *, const void *);
typedef struct {
...
fn_serialize encode;
fn_deserialize decode;
fn_keyCompare compare;
} registerHandler;
While your logic makes some sort of sense, things much, much more complex than that. My answer is going to contain most of the comments already made here, only in answer form...
Let's assume that you have a pointer to a function. If that function has a jump instruction in it, that jump instructions could jump to an absolute address. That means that when you deserialize the function, you have to have a way to force it to be loaded into the same address, so that the absolute jump jumps to the correct address.
Which brings us to the next point. Given that your question is tagged with posix, there is no POSIX-compliant way to load code into a specific address, there's MAP_FIXED, but it's not going to work unless you write your own dynamic linker. Why does that matter? because the function's assembly code might reference the function's start address, for various reasons, most prominent of which is if the function itself gives its own address as an argument to another function.
Which actually brings us to our next point. If the serialized function calls other functions, you'd have to serialize them too. But that's the "easy" part. The hard part is if the function jumps into the middle of another function rather than call the other function, which could happen e.g. as a result of tail-call optimization. That means you have to serialize everything the function jumps into (recursively), but if the function jumps to 0x00000000ff173831, how many bytes will you serialize from that address?
For that matter, how do you know when any function ends in a portable way?
Even worse, are you even guaranteed that the function is contiguous in memory? Sure, all existing, sane hardware OS memory managers and hardware architectures make it contiguous in memory, but is it guaranteed to be so 1 year from now?
Yet another issue is: What if the user passes a different function based on something dynamic? i.e. if the environment variable X is true, we want function x(), otherwise we want y()?
We're not even going to think about discussing portability across hardware architectures, operating systems, or even versions of the same hardware architecture.
But we are going to talk about security. Assuming that you no longer require the user to give you a pointer to their code, which might have had a bug that they fixed in a new version, you'll continue to use the buggy version until the user remembers to "refresh" your data structures with new code.
And when I say "bug" above, you should read "security vulnerability". If the vulnerable function you're serializing launches a shell, or indeed refers to anything outside the processes, it becomes a persistent exploit.
In short, there's no way to do what you want to do in a sane and economic way. What you can do, instead, is to force the user to package these functions for you.
The most obvious way to do it is asking them to pass a filename of a library which you then open with dlopen().
Another way to do it is pass something like a Lua or JavaScript string and embed an engine to execute these strings as code.
Yet another way is to pass paths to executables, and execute these when the data needs to be processed. This is what git does.
But what you should probably do is just require that the user always passes these functions. Keep it simple.

What's inside the stack?

If I run a program, just like
#include <stdio.h>
int main(int argc, char *argv[], char *env[]) {
printf("My references are at %p, %p, %p\n", &argc, &argv, &env);
}
We can see that those regions are actually in the stack.
But what else is there? If we ran a loop through all the values in Linux 3.5.3 (for example, until segfault) we can see some weird numbers, and kind of two regions, separated by a bunch of zeros, maybe to try to prevent overwriting the environment variables accidentally.
Anyway, in the first region there must be a lot of numbers, such as all the frames for each function call.
How could we distinguish the end of each frame, where the parameters are, where the canary if the compiler added one, return address, CPU status and such?
Without some knowledge of the overlay, you only see bits, or numbers. While some of the regions are subject to machine specifics, a large number of the details are pretty standard.
If you didn't move too far outside of a nested routine, you are probably looking at the call stack portion of memory. With some generally considered "unsafe" C, you can write up fun functions that access function variables a few "calls" above, even if those variables were not "passed" to the function as written in the source code.
The call stack is a good place to start, as 3rd party libraries must be callable by programs that aren't even written yet. As such, it is fairly standardized.
Stepping outside of your process memory boundaries will give you the dreaded Segmentation violation, as memory fencing will detect an attempt to access non-authorized memory by the process. Malloc does a little more than "just" return a pointer, on systems with memory segmentation features, it also "marks" the memory accessible to that process and checks all memory accesses that the process assignments are not being violated.
If you keep following this path, sooner or later, you'll get an interest in either the kernel or the object format. It's much easier to investigate one way of how things are done with Linux, where the source code is available. Having the source code allows you to not reverse-engineer the data structures by looking at their binaries. When starting out, the hard part will be learning how to find the right headers. Later it will be learning how to poke around and possibly change stuff that under non-tinkering conditions you probably shouldn't be changing.
PS. You might consider this memory "the stack" but after a while, you'll see that really it's just a large slab of accessible memory, with one portion of it being considered the stack...
The contents of the stack are basically:
Whatever the OS passes to the program.
Call frames (also called stack frames, activation areas, ...)
What does the OS pass to the program? A typical *nix will pass the environment, arguments to the program, possibly some auxiliary information, and pointers to them to be passed to main().
In Linux, you'll see:
a NULL
the filename for the program.
environment strings
argument strings (including argv[0])
padding full of zeros
the auxv array, used to pass information from the kernel to the program
pointers to environment strings, ended by a NULL pointer
pointers to argument strings, ended by a NULL pointer
argc
Then, below that are stack frames, which contain:
arguments
the return address
possibly the old value of the frame pointer
possibly a canary
local variables
some padding, for alignment purposes
How do you know which is which in each stack frame? The compiler knows, so it just treats its location in the stack frame appropriately. Debuggers can use annotations for each function in the form of debug info, if available. Otherwise, if there is a frame pointer, you can identify things relative to it: local variables are below the frame pointer, arguments are above the stack pointer. Otherwise, you must use heuristics, things that look like code addresses are probably code addresses, but sometimes this results in incorrect and annoying stack traces.
The content of the stack will vary depending on the architecture ABI, the compiler, and probably various compiler settings and options.
A good place to start is the published ABI for your target architecture, then check that your particular compiler conforms to that standard. Ultimately you could analyse the assembler output of the compiler or observe the instruction level operation in your debugger.
Remember also that a compiler need not initialise the stack, and will certainly not "clear it down", when it has finished with it, so when it is allocated to a process or thread, it might contain any value - even at power-on, SDRAM for example will not contain any specific or predictable value, if the physical RAM address has been previously used by another process since power on or even an earlier called function in the same process, the content will have whatever that process left in it. So just looking at the raw stack does not tell you much.
Commonly a generic stack frame may contain the address that control will jump to when the function returns, the values of all the parameters passed, and the value of all auto local variables in the function. However the ARM ABI for example passes the first four arguments to a function in registers R0 to R3, and holds the return value of the leaf function in the LR register, so it is not as simple in all cases as the "typical" implementation I have suggested.
The details are very dependent on your environment. The operating system generally defines an ABI, but that's in fact only enforced for syscalls.
Each language (and each compiler even if they compile the same language) in fact may do some things differently.
However there is some sort of system-wide convention, at least in the sense of interfacing with dynamically loaded libraries.
Yet, details vary a lot.
A very simple "primer" could be http://kernelnewbies.org/ABI
A very detailed and complete specification you could look at to get an idea of the level of complexity and details that are involved in defining an ABI is "System V Application Binary Interface AMD64 Architecture Processor Supplement" http://www.x86-64.org/documentation/abi.pdf

In c, are variables always pushed from their registers to the stack before they go out of scope?

After a function A calls a function B, can the code in B trash all the registers (aside from those that hold the stack pointers and B's parameters) without affecting variables local to A? Accordingly, after function B returns to function A, does function A pop all its locals back off the stack (reasoning that the register states might have changed while function B was executed)?
What about global variables? Does function B need to worry at all about any register operations affecting the state of global variables?
(The main reason I ask this, is that I feel like experimenting with injecting machine code at runtime as function B by using mprotect to make an array executable, and then casting the array pointer to function pointer and calling it. With the above questions I hope to figure out what the extent of B's playground is.)
This is calling convention, which is architecture, operating system, and compiler dependent.
Edit 0:
One more link for you: application binary interface. Drill down for your particular hardware/OS/compiler combination. You'll find what registers are used for parameters/return values, which are reserved for specific things, and which are free for any given function to clobber.
It's up to the functions how they handle calling other functions. It's normal to store all your local variables on the stack before branching to another function, but if you know for fact that some other function only uses a specific two registers, and you avoid using those two anywhere, then you wouldn't need to store anything (other than the address to branch back to afterwards, of course) on the stack before branching to that function.
It is really just a low level implementation design decision (which is usually decided by a compiler) so you might find that some functions will trust B with what's currently in the registers, while other functions won't.

Single instruction push/pop for user stack instead accessory function calls?

On the processor stack push mov and pop and so on are single instructions.
When compiling source code the compiler generates the single machine instruction version, but during run-time, assuming the stack is ... well a regular stack container, accessing values stored on the stack during run-time takes function calls which translate into tons of machine code.
It is possible to achieve the same level of efficiency for dynamic run-time objects instead of using setter and getter member functions which are way longer than a single machine instruction?
My idea is of using a mark pointer, but I don't know how to literally push its value into a memory location or in from a memory location during run-time without resorting to function calls.
Inlining assembly is probably an option, one I would like to avoid if possible. But I guess I would still have to put it inside a function body so it won't be a single instruction.
Sounds like what you're trying to do is opt out the extra call and ret from your getters/setters. In this case, you can use the keyword inline to tell your compiler to inline that particular function. Another way would be to code your getters/setters using C macro function if they are not too complex.

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?
Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.
It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.
Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.
You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?
I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.
A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!
If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).
It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.
Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.
I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.
A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.
My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.
As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...
I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).
I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.

Resources