The Symbol Relocation - linker

The following is how a function call(for the 1st time) would be resolved in a PIC
Jump to the PLT entry of our symbol.
Jump to the GOT entry of our symbol.
Jump back to the PLT entry and push an offset on the stack. That the
offset is actually an Elf_Rel structure describing how to patch the symbol.
Jump to the PLT stub entry.
Push a pointer to a link_map structure in order for the linker to
find in which library the symbol belongs to.
Call resolver routine.
Patch the GOT entry.
This is different from how a data reference is made which just uses the GOT table
So, why is there this difference? Why 2 different approaches?

why is there this difference? Why 2 different approaches?
What you described is lazy relocation.
You don't have to use it, and will not use it if e.g. LD_BIND_NOW=1 is set in the environment.
It's an optimization: it allows you to reduce the amount of work that the dynamic linker has to perform, when a particular program invocation does not exercise many possible program execution paths.
Imagine a program that can call foo(), bar() or baz(), depending on arguments, and which calls exactly one of the routines in any given execution.
If you didn't use lazy relocation, the dynamic loader would have to resolve all 3 routines at program startup. Lazy relocation allows dynamic loader to only perform the one relocation that is actually required in any given execution (the one function that is getting called), and at exactly the right time (when the function is being called).
Now, why can't variables also be resolved that way?
Because there is no convenient way for the dynamic loader to know when to perform that relocation.
Suppose the globals are a, b and c, and that foo() references a and b, bar() references b and c, and baz() references a and c. In theory the dynamic loader could scan bodies of foo, bar and baz, and build a map of "if calling foo, then also resolve globals a and b", etc. But it's much simpler and faster to just resolve all references to globals at startup.

Related

How are shared libraries referenced by various programs?

I understand that shared libraries are loaded into memory and used by various programs.
How can a program know where in memory the library is?
When a shared library is used, there are two parts to the linkage process. At compile time, the linker program, ld in Linux, links against the shared library in order to learn which symbols are defined by it. However, none of the code or data initializers from the shared library are actually included in the ultimate a.out file. Instead, ld just records which dynamic libraries were linked against and the information is placed into an auxiliary section of the a.out file.
The second phase takes placed at execution time, before main gets invoked. The kernel loads a small helper program, ld.so, into the address space and this gets executed. Therefore, the start address of the program is not main or even _start (if you have heard of it). Rather, it is actually the start address of the dynamic library loader.
In Linux, the kernel maps the ld.so loader code into a convenient place in the precess address space and sets up the stack so that the list of required shared libraries (and other necessary info) is present. The dynamic loader finds each of the required libraries by looking at a sequence of directories which are often point in the LD_LIBRARY_PATH environment variable. There is also a pre-defined list which is hard-coded into ld.so (and additional search places can be hard-coded into the a.out during link time). For each of the libraries, the dynamic loader reads its header and then uses mmap to create memory regions for the library.
Now for the fun part.
Since the actual libraries used at run-time to satisfy the requirements are not known at link-time, we need to figure out a way to access functions defined in the shared library and global variables that are exported by the shared library (this practice is deprecated since exporting global variables is not thread-safe, but it is still something we try to handle).
Global variables are assigned a statics address at link time and are then accessed by absolute memory address.
For functions exported by the library, the user of the library is going to emit a series of call assembly instructions, which reference an absolute memory address. But, the exact absolute memory address of the referenced function is not known at link time. How do we deal with this?
Well, the linker creates what is known as a Procedure Linkage Table, which is a series of jmp (assembly jump) instructions. The target of the jump is filled in at run time.
Now, when dealing with the dynamic portions of the code (i.e. the .o files that have been compiled with -fpic), there are no absolute memory references whatsoever. In order to access global variables which are also visible to the static portion of the code, another table called the Global Offset Table is used. This table is an array of pointers. At link time, since the absolute memory addresses of the global variables are known, the linker populates this table. Then, at run time, dynamic code is able to access the global variables by first finding the Global Offset Table, then loading the address of the correct variable from the appropriate slot in the table, and finally dereferencing the pointer.

GCC on ARM Cortex M3: Calling functions from specific addresses

I need to call function from a specific addresses (e.g. Double function indirection in C) but not exactly the same. I could pull the pointers from the mapping table and manipulate dynamically generated function pointers, which I prefer to avoid. E.g., I want to avoid this type of call:
((int)(*)(void*)) compute_volume = ((int)(*)(void*)) 0x20001000;
int vol = (*compute_volume)();
Instead, I would prefer to use some sort of linker provided symbols or other methods to achieve the following, except that the compute_volume() function is provided by a different image, perhaps something like this:
extern int compute_volume(void);
vol = compute_volume();
In other words, I intend to split my code into multiple images, thus reducing the need for modifying or overwriting the flash everytime a symbol or computation changes.
Any suggestions/ideas?
You can define jump table which would reside always in te same flash region (you can define that region in linker and pragmas in the code I think) and when called it jumps to desired function.
In firmware part I you only define symbols which refer to "passing" functions addresses (if you will always keep it in the same region it will make future updates MUCH easier). In firmware part II you create jump table which resides in the address space you were referring to in firmware part I and calls the actual functions.
I am not 100% sure I have described it correctly but this should give you some notion how to solve your problem. The link Ring Ø provided should help you with placing jump table code in one place.

Win32, WinMain vs custom Entry Point (huge size difference), why?

As topic says.
I noticed that if i use WinMain or any other default Entry Point, a C application can be like 70kb.
But if i just specify a custom Entry Point, say "RawMain", int RawMain().
Then the file will be like 6kb.
So i am wondering, why is this, what does it add/reference to the file?
I could understand there being some small difference in size, but the difference is huge for an empty application.
Thanks!
When building for windows in most environments, the actual program entry point will be provided by a function in a small runtime library. That will do some environment preparation and then call a function you provide, such as main, wmain, WinMain, etc.
The code that runs before your user-provided main function includes running global C++ constructors, enabling TLS variables, initializing global mutexes so that standard-library calls work properly in a multithreaded environment, setting up the standard locale, and other stuff.
One thing that setting the entry point does is starts the linker with an undefined symbol with the name you give the entry point, so for example, if you're using mingw32, the linker will start assuming that it needs to link libmingw32.a and with the undefined symbol __tmainCRTStartup.
The linker will find (hopefully) __tmainCRTStartup in libmingw32.a, and include the object file crtexe.o which contains it, along with anything else needed to satisfy undefined symbols emanating from crtexe.o, which is where the extra size comes from.
When you set your own entry point, you override this, and just set the linker to look for whatever function you specify. You get a smaller executable, but you have to be careful that features you're using don't rely on any of the global initialization that would be done by the runtime's startup function.

Calls inside a module: resolved by the compiler or linker?

If a function f() is called and implemented in the same c file (module) - who resolves this call? The compiler or the linker?
I think it's technically implementation-dependent, but typically references within the same file will be resolved by the compiler. There's no point deferring it until link time since the compiler knows which function is being called, and the compiler may be able to generate the code for the function call more efficiently if it doesn't have to leave a place for the linker to fill in an address. (For example, it may be able to use a relative jump instruction with a 16-bit offset for a call to a nearby function, instead of an absolute jump with a 32-bit or 64-bit address embedded in the code.)
This may change if the called function is declared as a weak symbol: in that case, although the function is defined in the current translation unit, that definition may be overridden by one from another module at link time, so the compiler has to treat it as a call to a function in another module.
It depends on the symbol's linkage. If f is an internal function such as those declared/defined with static, then f is resolved in the compiling (by the compiler). If the f is a weak symbol, then it is resolved in the dynamically linking (by the dynamic loader). If f is a strong symbol, then it is resolved in the compiling (by the compiler).
Especially, when the program is compiled with optimization, f may be directly inlined into the caller's body which is done by the compiler.

manually setting function address gcc

I've got a worked binary used in embeded system. Now i want to write a some kind of patch for it. The patch will be loaded into a RAM bellow the main program and then will be called from main program. The question is how to tell gcc to use manually setted addresses of some function which will be used from patch. in other words:
Old code has function sin() and i could use nm to find out the address of sin() in old code. My patched code will use sin() (or something else from main programm) and i want to tell the gcc (or maybe ld or maybe something else) for it to use the static address of function sin() while it linking the patched code. is it possible?
The problem is that you would gave to replace all references to the original sin() function for the patched code. That would require the runtime system to contain all the object code data used to resolve references, and for the original code to be modifiable (i.e. not in ROM for example).
Windriver's RTOS VxWorks can do something close to what you are suggesting; the way it does it is you use "partial linking" (GNU linker option -r) to generate an object file with links that will be resolved at runtime - this allows an object file to be created with unresolved links - i.e. an incomplete executable. VxWorks itself contains a loader and runtime "linker" that can dynamically load partially linked object files and resolve references. A loaded object file however must be resolvable entirely using already loaded object code - so no circular dependencies, and in your example you would have to reload/restart the system so that the object file containing the sin() were loaded before those that reference it, otherwise only those loaded after would use the new implementation.
So if you were to use VxWorks (or an OS with similar capabilities), the solution is perhaps simple, if not you would have to implement your own loader/linker, which is of course possible, but not trivial.
Another, perhaps simpler possibility is to have all your code call functions through pointers that you hold in variables, so that all calls (or at least all calls you might want to replace) are resolved at runtime. You would have to load the patch and then modify the sin() function's pointer so that all calls thereafter are made to the new function. The problem with this approach is that you would either have to know a priori which functions you might later want to replace, or have all functions called that way (which may be prohibitively expensive in memory terms. It would perhaps be useful for this solution to have some sort of preprocessor or code generator that would allow you to mark functions that would be "dynamic" in this way and could automatically generate the pointers and calling code. So for example you might write code thus:
__dynamic void myFunction( void ) ;
...
myFunction() ;
and your custom preprocessor would generate:
void myFunction( void ) ;
void (*__dynamic_myFunction)(void) = myFunction() ;
...
__dynamic_myFunction() ;
then your patch/loader code would reassign myFunctionDyn with the address of the replacement function.
You could generate a "dynamic symbol table" containing just the names and addresses of the __dynamic_xxxxx symbols and include that in your application so that a loader could change the __dynamic_xxxxx variables by matching the xxxxx name with the symbols in the loaded object file - if you load a plain binary however you would have to provide the link information to the loader - i.e. which __dynamic_xxxxx variable to be reasssigned and teh address to assign to it.

Resources