Replace function behind a function pointer on microcontroller at runtime - c

I wonder if there is a way to load a C function with its data at runtime to the text segment of a running microcontroller system. After the function is placed in the text segment and the data is stored in data segment the function pointer to the new loaded function becomes invoked in the main application. The functionality would be similar to a boot loader except from loading a entire binary before start up. I know that you can use the scatter-loading functions of the linker to place the function pointer at a fixed address or alter the alignment in the sections. Does anyone know if this is possible and if not why?
Many thanks

Technically it is possible. Keep in mind that any solution will be non-standard, not portable, and very tricky.
Many controllers may execute code only from a read-only memory, which makes the whole concept of dynamic loading problematic:
you'd need to erase a whole page first, making sure that no other parts of the application accesses this page during load;
you'd need to flush the instruction cache (again, many controllers rely on instruction cache being always valid).
In any case you'd need to ensure that the function being replaced has no stack frame associated with it. Very hard to enforce in a multithreaded system.
Any particular architecture may offer more traps.

Related

Serialize a function pointer in C and save it in a file?

I am working on a C file register program that handles arbitrary generic data so the user needs to supply functions to be used, these functions are saved in function pointer in the register struct and work nicely. But I need to be able to run these functions again when the program is restarted ideally without the user needing the supply them again. I serialize important data about the register structure and write it into a header.
I was wondering how I can save the functions there too, a compiled c function is just raw binary data, right? So there must be a way to store it into a file and load the function pointers from the content in the file, but I am not sure how to this. Can someone point me in the right direction?
I am assuming it's possible to do this is C since it allows you to do pretty much anything but I might be missing something, can I do this without system calls at all? Or if not what would be the simplest way to do this in posix?
The functions are supplied when creating the register or creating new secondary indexes:
registerHandler* createAndOpenRecordFile(
int overwrite, char *filename, int keyPos, fn_keyCompare userCompare, fn_serialize userSerialize, fn_deserialize userDeserialize, int type, ...)
And saved as functions pointers:
typedef void* (*fn_serialize)(void*);
typedef void* (*fn_deserialize)(void*);
typedef int (*fn_keyCompare) (const void *, const void *);
typedef struct {
...
fn_serialize encode;
fn_deserialize decode;
fn_keyCompare compare;
} registerHandler;
While your logic makes some sort of sense, things much, much more complex than that. My answer is going to contain most of the comments already made here, only in answer form...
Let's assume that you have a pointer to a function. If that function has a jump instruction in it, that jump instructions could jump to an absolute address. That means that when you deserialize the function, you have to have a way to force it to be loaded into the same address, so that the absolute jump jumps to the correct address.
Which brings us to the next point. Given that your question is tagged with posix, there is no POSIX-compliant way to load code into a specific address, there's MAP_FIXED, but it's not going to work unless you write your own dynamic linker. Why does that matter? because the function's assembly code might reference the function's start address, for various reasons, most prominent of which is if the function itself gives its own address as an argument to another function.
Which actually brings us to our next point. If the serialized function calls other functions, you'd have to serialize them too. But that's the "easy" part. The hard part is if the function jumps into the middle of another function rather than call the other function, which could happen e.g. as a result of tail-call optimization. That means you have to serialize everything the function jumps into (recursively), but if the function jumps to 0x00000000ff173831, how many bytes will you serialize from that address?
For that matter, how do you know when any function ends in a portable way?
Even worse, are you even guaranteed that the function is contiguous in memory? Sure, all existing, sane hardware OS memory managers and hardware architectures make it contiguous in memory, but is it guaranteed to be so 1 year from now?
Yet another issue is: What if the user passes a different function based on something dynamic? i.e. if the environment variable X is true, we want function x(), otherwise we want y()?
We're not even going to think about discussing portability across hardware architectures, operating systems, or even versions of the same hardware architecture.
But we are going to talk about security. Assuming that you no longer require the user to give you a pointer to their code, which might have had a bug that they fixed in a new version, you'll continue to use the buggy version until the user remembers to "refresh" your data structures with new code.
And when I say "bug" above, you should read "security vulnerability". If the vulnerable function you're serializing launches a shell, or indeed refers to anything outside the processes, it becomes a persistent exploit.
In short, there's no way to do what you want to do in a sane and economic way. What you can do, instead, is to force the user to package these functions for you.
The most obvious way to do it is asking them to pass a filename of a library which you then open with dlopen().
Another way to do it is pass something like a Lua or JavaScript string and embed an engine to execute these strings as code.
Yet another way is to pass paths to executables, and execute these when the data needs to be processed. This is what git does.
But what you should probably do is just require that the user always passes these functions. Keep it simple.

GCC symbol table for local variables on stack

Of course, symbol and type information of each variable defined in a C/C++ program is available, otherwise the debuggers could not show them. But how to access this information?
A lot info about the elf is available, but that is about linking an seems to hold only global variables, not local ones on the stack i.e.
In a remote real time system (not under unix), I'd like to be able to peek now and then by copying some memory in a list together with the associated variable name, and later on take a look at them while the RT system goes on.
The best would be that the dump could be introduced at any time for any variable without the need to add some statements in the code upfront.
But how to access this information?
TL;DR: it's complicated.
You would need to build almost a complete debugger. You can watch this space. When the author gets around to step 9, you'll have an example to follow.
I'd like to be able to peek now and then by copying some memory in a list together with the associated variable name, and later on take a look at them while the RT system goes on.
RT systems do not usually lend themselves to easy debugging. The best you could probably do is take a snapshot of the entire (used portion of) the stack, and "fish out" variable values later.
To do that, you'll need to know current values of the stack pointer and instruction pointer, contents of the stack, and load addresses of all ELF objects. And you'll need to re-implement large part of a debugger (or modify existing one).
The easiest approach might be to convert (post-process) the above info into an ELF core, and then use existing debugger of your choice to analyse the values. You can use Google user-space coredumper to see what's involved. See also this answer.

__attribute__((section("name"))) usage?

I have ran through code that use _ attribute _((section("name")). I understand that for gcc compiler this allows you to tell the linker to put the object created at a specific section "name" (with "name" absolute address declared in a linker file).
What is the point of doing this instead of just using the .data section?
There are many possible uses. [Edit to add note: this is just a sample of uses I've seen myself or considered, not a complete list.]
The Linux kernel, for instance, marks some code and data sections as used only during kernel bootstrap. These can be jettisoned after the kernel is running, reclaiming the space for other uses.
You can use this to mark code or data values that need patching on a particular processor variant, e.g., with or without a coprocessor.
You can use it to make things live in "special" address spaces that will be burned to PROM or saved on an EEPROM, rather than in ordinary memory.
You can use it to collect together code or data areas for purposes like initialization and cleanup, as with C++ constructors and destructors that run before the program starts and when it ends, or for using shorter addressing modes (I don't know how much that would apply on ARM as I have not written any ARM code myself).
The actual use depends on the linker script(s).
From a usecase point of view, there are lots of different types of .data, like:
data local to a specific CPU and/or NUMA node
data shared between contexts (like user/kernelspace, as are the .vdso or vsyscall pages. Or, another example, bootloader and kernel)
readonly data or other data with specific access mode/type restrictions (say, cacheability or cache residency - the latter can be specificed on some ARM SoCs)
data that persists "state transitions" (such as hibernation image loads, or crash kernel / fast reboot reinitializations)
data with specific lifetimes/lifecycles (only used in specific stages during boot or during operation, write-once data)
data specific to a particular kernel subsystem or particular kernel module
"code colocated" data (addressing offsets in x64 are plus/minus 2GB so if you want RIP-relative addressing, the data must be within that range of your currently executing code)
data mapped to some specific hardware register space VA range
So in the end it's often about attributes (the word here used in a more generic sense than what __attribute__(...) allows you to state from within gcc sourcecode. Whether another section is needed and/or useful is ... in the eye of the beholder - the system designer, that is.
The availabiltiy of the section attribute, therefore, allows for flexibility and that is, IMHO, a good thing.
Years later, I'm going to add a specific detail because it's worth writing down.
If you create your own section, you can manage it yourself. In particular, you can use preprocessor macros to insert certain data items into your special section. If the only thing that uses that special section is your preprocessor macros, then you have the ability to create a data structure in a distributed fashion.
What does this mean? It means you can write a preprocessor macro like ADD_VAR_TO_SPECIAL_SECTION(...) and concatenate a bunch of different values in random order into what amounts to an array (or just a big old pile, if they aren't all the same type) in your section.
This gives you the ability to create a (randomly-ordered) array of data at compile time. There is no initialization, no registration, no overhead. You just compile and link your code, and all the macros that were in all the different source files have added all their values into one big array.
How can you use this? Create a bunch of "modules." Register the init functions and destroy functions in an ad-hoc array. Process the array at startup time. (You can add some kind of topological sort if you need to.) You don't need to have a master list of modules anywhere, it gets built automatically. Or, create a macro to register unit test functions into a test suite. Again, it creates an ad-hoc list with no "registration" required.

C code that checksums itself *in ram*

I'm trying to get a ram-resident image to checksum itself, which is proving easier said than done.
The code is first compiled on a cross development platform, generating an .elf output. A utility is used to strip out the binary image, and that image is burned to flash on the target platform, along with the image size. When the target is started, it copies the binary to the correct region of ram, and jumps to it. The utility also computes a checksum of all the words in the elf that are destined for ram, and that too is burned into the flash. So my image theoretically could checksum its own ram resident image using the a-priori start address and the size saved in flash, and compare to the sum saved in flash.
That's the theory anyway. The problem is that once the image begins executing, there is change in the .data section as variables are modified. By the time the sum is done, the image that has been summed is no longer the image for which the utility calculated the sum.
I've eliminated change due to variables defined by my application, by moving the checksum routine ahead of all other initializations in the app (which makes sense b/c why run any of it if an integrity check fails, right?), but the killer is the C run time itself. It appears that there are some items relating to malloc and pointer casting and other things that are altered before main() is even entered.
Is the entire idea of self-checksumming C code lame? If there was a way to force app and CRT .data into different sections, I could avoid the CRT thrash, but one might argue that if the goal is to integrity check the image before executing (most of) it, that initialized CRT data should be part of that. Is there a way to make code checksum itself in RAM like this at all?
FWIW, I seem stuck with a requirement for this. Personally I'd have thought that the way to go is to checksum the binary in the flash, before transfer to ram, and trust the loader and the ram. Paranoia has to end somewhere right?
Misc details: tool chain is GNU, image contains .text, .rodata and .data as one contiguously loaded chunk. There is no OS, this is bare metal embedded. Primary loader essentially memcpy's my binary into ram, at a predetermined address. No relocations occur. VM is not used. Checksum only needs testing once at init only.
updated
Found that by doing this..
__attribute__((constructor)) void sumItUp(void) {
// sum it up
// leave result where it can be found
}
.. that I can get the sum done before almost everything except the initialization of the malloc/sbrk vars by the CRT init, and some vars owned by "impure.o" and "locale.o". Now, the malloc/sbrk value is something I know from the project linker script. If impure.o and locale.o could be mitigated, might be in business.
update
Since I can control the entry point (by what's stated in flash for the primary loader), it seems the best angle of attack now is to use a piece of custom assembler code to set up stack and sdata pointers, call the checksum routine, and then branch into the "normal" _start code.
If the checksum is done EARLY enough, you could use ONLY stack variables, and not write to any data-section variables - that is, make EVERYTHING you need to perform the checksumming [and all preceding steps to get to that point] ONLY use local variables for storing things in [you can read global data of course].
I'm fairly convinced that the right way is to trust the flash & loader to load what is in the flash. If you want to checksum the code, sure, go and do that [assuming it's not being modified by the loader of course - for example runtime loading of shared libraries or relocation of the executable itself, such as random virtual address spaces and such]. But the data loaded from flash can't be relied upon once execution starts properly.
If there is a requirement from someone else that you should do this, then please explain to them that this is not feasible to implement, and that "the requirement, as it stands" is "broken".
I'd suggest approaching this like an executable packer, like upx.
There are several things in the other answers and in your question that, for lack of a better term, make me nervous.
I would not trust the loader or anything in flash that wasn't forced on me.
There is source code floating around on the net that was used to secure one of, I think, HTCs recent phones. Look around on forum.xda-developers.com and see if you can find it and use it for an example.
I would push back on this requirement. Cellphone manufacturers spend a lot of time on keeping their images locked down and, eventually, all of them are beaten. This seems like a vicious circle.
Can you use the linker script to place impure.o and locale.o before or after everything else, allowing you to checksum everything but those and the malloc/sbrk stuff? I'm guessing malloc and sbrk are called in the bootloader that loads your application, so the thrash caused by those cannot be eliminated?
It's not an answer to just tell you to fight this requirement, but I agree that this seems to be over-thought. I'm sure you can't go into any amount of detail, but I'm assuming the spec authors are concerned about malicious users/hackers, rather than regular memory corruption due to cosmic rays, etc. In this case, if a malicious user/hacker can change what's loaded into RAM, they can just change your checksumming routine (which is itself running from RAM, correct?) to always return a happy status, no matter how well the checksum routine they aren't running anymore is designed.
Even if they are concerned about regular memory corruption, this checksum routine would only catch that if the error occurred during the original copy to memory, which is actually the least likely time such an error would occur, simply because the system hasn't been running long enough to have a high probability of a corruption event.
In general, what you want to do is impossible, since on many (most?) platforms the program loader may "relocate" some program address constants.
Can you update the loader to perform the checksum test on the flash resident binary image, before it is copied to ram?

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?
Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.
It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.
Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.
You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?
I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.
A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!
If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).
It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.
Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.
I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.
A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.
My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.
As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...
I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).
I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.

Resources