Dynamic relocation of code section - c

Just out of curiosity I wonder if it is possible to relocate a piece of code during
the execution of a program. For instance, I have a function and this function should
be replaced in memory each time after it has been executed. One idea that came up our mind
is to use self-modifying code to do that. According to some online resources, self-modifying
code can be executed on Linux, but still I am not sure if such a dynamic relocation is possible. Has anyone experience with that?

Yes dynamic relocation is definitely possible. However, you have to make sure that the code is completely self-contained, or that it accesses globals/external functions by absolute references. If your code can be completely position independent, meaning the only references it makes are relative to itself, you're set. Otherwise you will need to do the fixups yourself at loading time.
With GCC, you can use -fpic to generate position independent code. Passing -q or --emit-relocs to the linker will make it emit relocation information. The ELF specification (PDF link) has information about how to use that relocation information; if you're not using ELF, you'll have to find the appropriate documentation for your format.

As Carl says, it can be done, but you're opening a can of worms. In practice, the only people who take the trouble to do this are academics or malware authors (now donning my flame proof cloak).
You can copy some code into a malloc'd heap region, then call it via function pointers, but depending on the OS you may have to enable execution in the segment. You can try to copy some code into the code segment (taking care not to overwrite the following function), but the OS likely has made this segment read-only. You might want to look at the Linux kernel and see how it loads its modules.

If all these different functions exist at compile time then you could simply use a function pointer to keep track of the next one that is to be called. If you absolutely have to modify the function at runtime and that modification can't be done in place then you could also use a function pointer that is updated with address of the new function when it is created/loaded. The rest of your system would then call the self-modifying function through the function pointer and therefore doesn't have to know or care about the self-modifying code and you only have to do the fixup in one place.

Related

Remote update-able function or code in a statically linked firmware of embedded device (microcontroller) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Is it possible to remote update or patch only certain piece of code to an embedded device (microcontroller)?
I am writing a bare metal C program on a microcontroller. Lets say I have main program which comprises of function A(), B(), C() and D() each in their own .c file.
I would like to do a remote update (patching) of only B() which has its own custom section in the main program so that the address is fixed and known.
The thing that puzzles me is how do I produce an updated executable of B()? Given that it relies on standard C library or other global variables from the main program. I need to resolve all the symbols from the main program.
Would appreciate any suggestions or reference to other thread if this question has been asked before (I tried to search but couldn't find any)
Solutions:
from Russ Schultz: Compile the new B(), and link it with the symbol file previously generated from the main program using gcc --just-symbols option. The resultant elf will contain just B() that assumes all these other symbols are there. (I haven't tried it but this is the concept that I was looking for)
Compile the whole program with the new B(), and manually take only the B() part binary from the main program (because its section address and size are known). Send B() binary to the device for remote update (not efficient as it involves many manual works).
Dynamic loading and linking requires run-time support - normally provided by an OS. VxWorks for example includes support for that (although the code is normally loaded into RAM over a network or mass-storage file-system rather then Flash or other re-writable ROM).
You could in theory write your own run-time linker/loader. However for it to work, the embedded firmware must contain a symbol table in order to complete the link. The way it works in VxWorks, is the object code to be loaded is partially linked and contains unresolved symbols that are completed by the run-time linker/loader by reference to the embedded symbol table.
Other embedded operating systems that support dynamic loading are:
Precise/MQX
Nucleus
Another approach that does not require a symbol table is to provide services via a software interrupt API (as used in PC-BIOS and MS-DOS). The loaded module will necessarily have a restricted access to services provided by the API, but because they are interrupts, the actual location of such services does not need to be known to the loadable module, not explicitly linked at run-time.
There is an article on dynamically loading modules in small RTOS systems on Embedded.com:
Bring big system features to small RTOS devices with downloadable app modules.
The author John Carbone works for Express Logic who produce ThreadX RTOS, which gives you an idea of the kind of system it is expected to work on, although the article and method described is not specific to ThreadX.
Some approach like this requires that the programmer manually manages all code allocation with their own custom segments. You'd have to know the fixed address of the function and it can't be allowed to grow beyond a certain size.
The flash memory used will dictate the restrictions, namely how large an area do you need to erase before programming. If you can execute code from eeprom/data flash then that's the obvious choice.
Library calls etc are irrelevant as the library functions are most likely stored elsewhere. Or in the rare case where they are inlined, they'll be small. But you might have to write the function in assembler, since C compiler-generated machine code may screw up the calling convention or unexpectedly overwrite registers if taken out of the expected context.
Because all of the above is fairly complex, the normal approach is to only modify const variables, rather than code, and keep those in eeprom/data flash, then have your program act based on those values.
I'm assuming you're talking about bare metal with my answer.
First off, linking a new function B() with the original program is relatively simple, particularly with GCC. LD can take a 'symbol file' as input using the --just-symbols option. So, scrape your map file, create the symbol file and use it as an input to your new link. Your resultant elf will contain just your code that assumes all these other symbols are there.
At that point, compile your new function B(), which should be a different name than B() (so we'll choose B_()). It should have the exact same signature as B() or things won't work right. You have to compile with the same headers, etc. that your original code was compiled with or it likely won't work.
Now, depending on how you've architected your original program, life can be easy or a real mess.
If you make your original program with the idea of patching in mind, then the prep is relatively trivial. Identify which functions you might want to patch and then call them through function pointers, e.g.:
void OriginalB(void)
{
//Original implementation of B goes here
}
void (B*)(void) = OriginalB;
void main(void)
{
B(); //this calls OriginalB() through the function pointer B. Once you
//patch the global function pointer B to point to B_(), then this
//code will call your new function.
}
Now your patch program is the original program linked with your B_(), but you somehow have to update the global function pointer B to point to B_() (rather than OriginalB())
Assuming you can use your new elf (or hex file) to update your device, it's pretty easy to just go modify those to change the value of B or assign the new function pointer directly in your code.
If not, then whatever method of injection you need to do also needs to inject a change to the global pointer.
If you didn't prep your original program, then it can be a real bear (but doable) to go modify references to B() to instead jump to your new B_(). It might get super tricky if your new function is too far away for a relative jump, but still doable in theory. I've never actually done it. ;)
If you're trying to patch a ROM, you almost have to have prepped the original ROMmed program to use function points for potential patch points. Or have some support in the ROM hardware to allow limited patching (usually it's just a few locations it will let you patch).
Some of the details may be incorrect for GCC (I use the Keil tools in my professional flow), but the concept is the same. It's doable. It's fragile. There's no standard way of doing this and it's highly tool and application dependent.

How to circumvent dlopen() caching?

According to its man page, dlopen() will not load the same library twice:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it. Any initialization returns (see
below) are called just once. However, a subsequent dlopen() call
that loads the same shared object with RTLD_NOW may force symbol
resolution for a shared object earlier loaded with RTLD_LAZY.
(emphasis mine).
But what actually determines the identity of shared objects? I tried to look into the code, but did not come very far. Is it:
some form of normalized path name (e.g. realpath?)
the inode ?
the contents of the libray?
I am pretty sure that I can rule out this last point, since an actual filesystem copy yields two different handles.
To explain the motivation behind this question: I am working with some code that has static global variables. I need multiple instances of that code to run in a thread-safe manner. My current approach is to compile and link said code into a dynamic library and load that library multiple times. With some linker magic, it appears to create several copies of the globals and resolve access in each library to its own copies. The only problem is that my prototype copies the generated library n times for n concurrent uses. This is not only somewhat ugly but I also suspect that it might break on a different platform.
So what is the exact behaviour of dlopen() according to the POSIX standard?
edit: Because it came up in a comment and an answer, no refactoring the code is definitely not an option. It would involve months or even years of work and potentially sacrifice all benefits of using the code in the first place. There exists an ongoing research project that might solve this problem in a much cleaner way, but it is actual research and might fail. I need a solution now.
edit2: Because people still seem to not believe the usecase is actually valid. I am working on a pure functional language, that shall be embedded into a larger C/C++ application. Because I need a prototype with a garbage collector, a proven typechecker, and reasonable performance ASAP, I used OCaml as intermediate code. Right now, I am compiling a source module into an OCaml module, link the generated object code (including startup etc.) into a shared library with the OCaml runtime and dlopen() that shared library. Every .so has its own copy of the runtime, including several global variabels (e.g. the pointer to the young generation) and that is, or rather should be, totally fine. The library exposes exactly two functions: An initializer and a single export that does whatever the original module is intended to do. No symbols of the OCaml runtime are exported/shared. when I load the library, its internal symbols are relocated as expected, the only issue I have right now is that I actually need to copy the .so file for each instance of the job at runtime.
Regarding thread-local-storage: That is actually an interesting idea, as the modification to the runtime is indeed rather simple. But the problem is the machine code generated by the OCaml compiler, as it cannot emit loading instructions for tls symbols (yet?).
POSIX says:
Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.
So the answer is "inode". Copying the library file "should work", but hard links won't. Except. Since they will expose the same global symbols and when that happens all (portability) bets are off. You're in the middle of weakly defined behavior that has evolved through bug fixes rather than good design.
Don't dig deeper when you're in a hole. The approach to add additional horrible hacks to make a fundamentally broken library work just leads to additional breakage. Just spend a few hours to fix the library to not use globals instead of spending days to hack around dynamic linking (which will be unportable at best).

Hook and Replace Export Function in the Loaded ELF ( .so shared library )

I'm writing some C code to hook some function of .so ELF (shared-library) loaded into memory.
My C code should be able to re-direct an export function of another .so library that was loaded into the app/program's memory.
Here's a bit of elaboration:
Android app will have multiple .so files loaded. My C code has to look through export function that belongs to another shared .so library (called target.so in this case)
This is not a regular dlsym approach because I don't just want address of a function but I want to replace it with my own fuction; in that: when another library makes the call to its own function then instead my hook_func gets called, and then from my hook_func I should call the original_func.
For import functions this can work. But for export functions I'm not sure how to do it.
Import functions have the entries in the symbol table that have corresponding entry in relocation table that eventually gives the address of entry in global offset table (GOT).
But for the export functions, the symbol's st_value element itself has address of the procedure and not GOT address (correct me if I'm wrong).
How do I perform the hooking for the export function?
Theoretically speaking, I should get the memory location of the st_value element of dynamic symbol table entry ( Elf32_Sym ) of export function. If I get that location then I should be able to replace the value in that location with my hook_func's address. However, I'm not able to write into this location so far. I have to assume the dynamic symbol table's memory is read-only. If that is true then what is the workaround in that case?
Thanks a lot for reading and helping me out.
Update: LD_PRELOAD can only replace the original functions with my own, but then I'm not sure if there any way to call the originals.
In my case for example:
App initializes the audio engine by calling Audio_System_Create and passes a reference of AUDIO_SYSTEM object to Audio_System_Create(AUDIO_SYSTEM **);
AUDIO API allocates this struct/object and function returns.
Now if only I could access that AUDIO_SYSTEM object, I would easily attach a callback to this object and start receiving audio data.
Hence, my ultimate goal is to get the reference to AUIOD_SYSTEM object; and in my understanding, I can only get that if I intercept the call where that object is first getting allocated through Audio_System_Create(AUIOD_SYSTEM **).
Currently there is no straight way to grab the output audio at android. (all examples talk about recording audio that comes from microphone only)
Update2:
As advised by Basile in his answer, I made use of dladdr() but strangely enough it gives me the same address as I pass to it.
void *pFunc=procedure_addr; //procedure address calculated from the st_value of symbol from symbol table in ELF file (not from loaded file)
int nRet;
// Lookup the name of the function given the function pointer
if ((nRet = dladdr(pFunc, &DlInfo)) != 0)
{
LOGE("Symbol Name is: %s", DlInfo.dli_sname);
if(DlInfo.dli_saddr==NULL)
LOGE("Symbol Address is: NULL");
else
LOGE("Symbol Address is: 0x%x", DlInfo.dli_saddr);
}
else
LOGE("dladdr failed");
Here's the result I get:
entry_addr =0x75a28cfc
entry_addr_through_dlysm =0x75a28cfc
Symbol Name is: AUDIO_System_Create
Symbol Address is: 0x75a28cfc
Here address obtained through dlysm or calculated through ELF file is the address of procedure; while I need the location where this address itself is; so that I can replace this address with my hook_func address. dladdr() didn't do what I thought it will do.
You should read in details Drepper's paper: how to write shared libraries - notably to understand why using LD_PRELOADis not enough. You may want to study the source code of the dynamic linker (ld-linux.so) inside your libc. You might try to change with mprotect(2) and/or mmap(2) and/or mremap(2) the relevant pages. You can query the memory mapping thru proc(5) using /proc/self/maps & /proc/self/smaps. Then you could, in an architecture-specific way, replace the starting bytes (perhaps using asmjit or GNU lightning) of the code of original_func by a jump to your hook_func function (which you might need to change its epilogue, to put the overwritten instructions -originally at original_func- there...)
Things might be slightly easier if original_func is well known and always the same. You could then study its source and assembler code, and write the patching function and your hook_func only for it.
Perhaps using dladdr(3) might be helpful too (but probably not).
Alternatively, hack your dynamic linker to change it for your needs. You might study the source code of musl-libc
Notice that you probably need to overwrite the machine code at the address of original_func (as given by dlsym on "original_func"). Alternatively, you'll need to relocate every occurrence of calls to that function in all the already loaded shared objects (I believe it is harder; if you insist see dl_iterate_phdr(3)).
If you want a generic solution (for an arbitrary original_func) you'll need to implement some binary code analyzer (or disassembler) to patch that function. If you just want to hack a particular original_func you should disassemble it, and patch its machine code, and have your hook_func do the part of original_func that you have overwritten.
Such horrible and time consuming hacks (you'll need weeks to make it work) make me prefer using free software (since then, it is much simpler to patch the source of the shared library and recompile it).
Of course, all this isn't easy. You need to understand in details what ELF shared objects are, see also elf(5) and read Levine's book: Linkers and Loaders
NB: Beware, if you are hacking against a proprietary library (e.g. unity3d), what you are trying to achieve might be illegal. Ask a lawyer. Technically, you are violating most abstractions provided by shared libraries. If possible, ask the author of the shared library to give help and perhaps implement some plugin machinery in it.

CPU dependent code: how to avoid function pointers?

I have performance critical code written for multiple CPUs. I detect CPU at run-time and based on that I use appropriate function for the detected CPU. So, now I have to use function pointers and call functions using these function pointers:
void do_something_neon(void);
void do_something_armv6(void);
void (*do_something)(void);
if(cpu == NEON) {
do_something = do_something_neon;
}else{
do_something = do_something_armv6;
}
//Use function pointer:
do_something();
...
Not that it matters, but I'll mention that I have optimized functions for different cpu's: armv6 and armv7 with NEON support. The problem is that by using function pointers in many places the code become slower and I'd like to avoid that problem.
Basically, at load time linker resolves relocs and patches code with function addresses. Is there a way to control better that behavior?
Personally, I'd propose two different ways to avoid function pointers: create two separate .so (or .dll) for cpu dependent functions, place them in different folders and based on detected CPU add one of these folders to the search path (or LD_LIB_PATH). The, load main code and dynamic linker will pick up required dll from the search path. The other way is to compile two separate copies of library :)
The drawback of the first method is that it forces me to have at least 3 shared objects (dll's): two for the cpu dependent functions and one for the main code that uses them. I need 3 because I have to be able to do CPU detection before loading code that uses these cpu dependent functions. The good part about the first method is that the app won't need to load multiple copies of the same code for multiple CPUs, it will load only the copy that will be used. The drawback of the second method is quite obvious, no need to talk about it.
I'd like to know if there is a way to do that without using shared objects and manually loading them at runtime. One of the ways would be some hackery that involves patching code at run-time, it's probably too complicated to get it done properly). Is there a better way to control relocations at load time? Maybe place cpu dependent functions in different sections and then somehow specify what section has priority? I think MAC's macho format has something like that.
ELF-only (for arm target) solution is enough for me, I don't really care for PE (dll's).
thanks
You may want to lookup the GNU dynamic linker extension STT_GNU_IFUNC. From Drepper's blog when it was added:
Therefore I’ve designed an ELF extension which allows to make the decision about which implementation to use once per process run. It is implemented using a new ELF symbol type (STT_GNU_IFUNC). Whenever the a symbol lookup resolves to a symbol with this type the dynamic linker does not immediately return the found value. Instead it is interpreting the value as a function pointer to a function that takes no argument and returns the real function pointer to use. The code called can be under control of the implementer and can choose, based on whatever information the implementer wants to use, which of the two or more implementations to use.
Source: http://udrepper.livejournal.com/20948.html
Nonetheless, as others have said, I think you're mistaken about the performance impact of indirect calls. All code in shared libraries will be called via a (hidden) function pointer in the GOT and a PLT entry that loads/calls that function pointer.
For the best performance you need to minimize the number of indirect calls (through pointers) per second and allow the compiler to optimize your code better (DLLs hamper this because there must be a clear boundary between a DLL and the main executable and there's no optimization across this boundary).
I'd suggest doing these:
moving as much of the main executable's code that frequently calls DLL functions into the DLL. That'll minimize the number of indirect calls per second and allow for better optimization at compile time too.
moving almost all your code into separate CPU-specific DLLs and leaving to main() only the job of loading the proper DLL OR making CPU-specific executables w/o DLLs.
Here's the exact answer that I was looking for.
GCC's __attribute__((ifunc("resolver")))
It requires fairly recent binutils.
There's a good article that describes this extension: Gnu support for CPU dispatching - sort of...
Lazy loading ELF symbols from shared libraries is described in section 1.5.5 of Ulrich Drepper's DSO How To (updated 2011-12-10). For ARM it is described in section 3.1.3 of ELF for ARM.
EDIT: With the STT_GNU_IFUNC extension mentioned by R. I forgot that was an extension. GNU Binutils supports that for ARM, apparently since March 2011, according to changelog.
If you want to call functions without the indirection of the PLT, I suggest function pointers or per-arch shared libraries inside which function calls don't go through PLTs (beware: calling an exported function is through the PLT).
I wouldn't patch the code at runtime. I mean, you can. You can add a build step: after compilation disassemble your binaries, find all offsets of calls to functions that have multi-arch alternatives, build table of patch locations, link that into your code. In main, remap the text segment writeable, patch the offsets according to the table you prepared, map it back to read-only, flush the instruction cache, and proceed. I'm sure it will work. How much performance do you expect to gain by this approach? I think loading different shared libraries at runtime is easier. And function pointers are easier still.

How to get the function name of a C function pointer

I have the following problem: when I get a backtrace in C using the backtrace(3) function the symbols returned the name of the functions is easily determinable with the dwarf library and dladdr(3).
The problem is that if I have a simple function pointer (e.g. by passing it &function) the dladdr + dwarf functions can't help. It seems that the pointer is different from the one returned by backtrace(3) (it's not strange, since backtrace gets these function pointers straight from the stack).
My question is whether there is some method for resolving these names too? Also, I'd like to know exactly what is the difference between the two pointers.
Thanks!
UPDATE:
The difference between the pointers is quite significant:
The one I get with backtrace is: 0x8048ca4
The direct pointer version: 0x3ba838
Seems to me that the second one needs an offset.
Making a guess from the substantial differences in the typical addresses you cited, one is from an actual shared library, and the other from your main executable. Reading between the lines of a man page for dladdr(3), it could be the case that if the symbol isn't located in a module that was loaded by dlopen(3) then it might not be able to reconstruct the matching file and symbol names.
I assume you haven't stripped the symbols off any module you care about here, or all bets are off. If the executable has symbols, then it should be possible to look in them for one that is an exact match for the address of any nameable function. A pointer to a function is just that, after all.
addr2line(1) may be just the thing you're looking for.

Resources