How do dynamic linkers/loaders resolve symbols? - linker

I'm a CS student, and I'm doing a project on shared libraries and dynamic linking/loading. One of the questions I have to answer is how symbols are resolved with dynamic linking/loading. I've scoured the internet and haven't been able to find anything conclusive. I understand different linkers may resolve symbols differently across different operating systems. I'm just looking for a general, windows-based answer; how are symbols resolved in dynamic linking?
Thank You!

Well, let's stick on Windows. I'll answer in a few words and then vote for moving this question to CS site instead of main SO.
First, dynamic linking can be at program start (prelinked variant) and when program code explicitly requests some library load. While the same DLLs are used there, details differ.
Prelinked variant will work with so called import library that is static but contains special thunks (AKA trampolines) that are replaced with jumps to real code when dynamic loader attaches a real library (DLL file). For linkers that aren't aware of dynamic loading, import library is enough to still provide dynamic linking - that how it appeared in DOS/Windows. The calling code could be ignorant of details if the code is provided by static or dynamic library.
On-the-fly loading is using methods like LoadLibrary to load the library (and activate it) and GetProcAddress to get pointer of real function implementation. In that case, application (or another library) is aware of details of this mechanism.
I hope this brief is enough to provide you with enough words for further delving.

Related

Instrumenting a C library

I have a binary library and a binary executable using that library, both written in C. I know the C API provided by the library, but neither the source of the library or the executable. I would like to understand how the executable uses the library (compare my previous question How to know which functions of a library get called by a program).
The proposed solutions did not give satisfactory results. A possibility not mentioned seems to be to implement a wrapper library that imitates the known interface of the binary library I am interested in. My idea is to forward all of the calls to the wrapper to the binary library. This should allow me to log all the calls and passed parameters, in other words to instrument the library.
I succeeded in implementing the wrapper library on Linux as a dynamic link library (*.so), together with my own sample application connecting to the wrapper. The wrapper, in turn, uses the original binary library. Both libraries are used with dlopen and dlsym to access the API. However, I am facing the following practical problem: I do not manage to link the original binary executable to my wrapper library. That is related to the fact that the executable expects the library under a certain name. However, if I name my rapper library that way, it conflicts with the original library. Surprisingly (to me) simply renaming the .so-file of that one and linking the wrapper library against it does not work (The result stops without error message when the wrapper library calls dlopen and I do not get more information from the debugger than that it seems to happen in an malloc).
I tried a number of things like using symbolic links to move one of the libraries out of the search path of the run-time linker, to add paths to the LD_LIBRARY_PATH environment variable, different relative locations of the .so-files (and corresponding paths for dlopen), as well as different compiler options, so far without success.
To summarize, I would like
(executable)_orig->(lib.so)->(lib.so)_orig
where (executable)_orig and (lib.so)_orig (both binaries that I cannot influence) are such that
(executable)_orig->(lib.so)_orig
works. I have the sources of (lib.so) and can modify it as I wish. Also, I can modify the Linux host system as I like. The task of (lib.so) is to tell me how (executable)_orig and (lib.so)_orig interact.
I also have
(executable)->(wrapper.so)->(lib.so)_orig
working, which seems to indicate that the issue is related to the naming and loading conventions for the libraries.
This is a separate new question because it deals with the specific practical issue sketched above. Beyond that, some background info on why renaming the file corresponding to (lib.so)_orig to circumvent the issue may fail could also prove useful.

user space library in Linux kernel modal - for testing

I know you shouldn't and it is bad practice etc. but is it possible to include a c userspace library in a kernel module?
I am writing the module for my own purposes to test some things and it will never be published or used by anyone else. I just want a quick hack not worrying about good practices.
(specifically I would like to use the __int128 datatype provided by gcc included in <stdint.h>)
Thanks
A C library is first of all a collection of functions, there is no real distinction between "user-space" and "kernel-space". However, using a dynamic shared library is not straight possible in kernel space, since there is no suitable dynamic loader for this. Indeed, loading the kernel-module itself is some kind of dynamic loading, but the module itself cannot load another shared library in turn.
However, it should be possible to link code from a static library (.a) to your kernel module. This code then becomes integral part of your kernel module itself and should work in kernelspace as in userspace, as long as it doesn't depend on external symbols, which are not present in kernel space (especially symbols of the libc, for example).

linking, loading, and virtual memory

I know these questions have been asked before - but I still can't reconcile everything together into an overall picture.
static vs dynamic library
static libraries have their code copied and linked into the resulting executable
static libraries have only copy and link the required modules into the executable, not the entire library implementation
static libraries don't need to be compiled as PIC as they are apart of the resulting executable
dynamic libraries copy and link in stubs that describe how to load/link (?) the function implementation at runtime
dynamic libraries can be PIC or relocatable
why are there separate static and dynamic libraries? All of the above seems to be be the job of the static or dynamic linker. Why do I need 2 libraries that implement scanf?
(bonus #1) what does a shared library refer to? I've heard it being used as (1) the overall umbrella term, synonymous to library, (2) directly to a dynamic library, (3) using virtual memory to map the same physical memory of a library to multiple address spaces. Can you do this only with dynamic libraries? (4) having different versions of the same dynamic library in memory.
(bonus #2) are the standard libraries (libc, libc++, stdlibc++, ..) linked dynamically or statically by default? I never need to dlopen()..
static vs dynamic linking
how is this any different than static vs dynamic libraries? I don't understand why there isn't just 1 library, and we use either a static or dynamic linker (other than the PIC issue). Instead of talking about static vs dynamic libraries, should we instead be discussing the more general static s dynamic linking?
is symbol resolution still performed at compile-time for both?
static vs dynamic loading
Static loading means copying the full executable into MM before executing it
Dynamic loading means that only the executable header copied into MM before executing, additional functionality is loaded into MM when requested. How is this any different from paging?
If the executable is dynamically linked, why would it not be dynamically loaded?
both static loading and dynamic loading may or may not perform relocation
I know there are a lot of things I'm confused about here - and I'm not necessary looking for someone to address each issue. I'm hoping by listing out everything that is confusing me, that someone that understands this will see where a lapse in my understanding is at a broad level, and be able to paint a larger picture about how these things cooperate together..
why 2 types of lib loading
dynamic saves space (you dont have hundreds of copies of the same code in all binaries using foo.lib
dynamic allows foo.lib vendor can ship a new version of the library and existing code takes advantage of it
static makes dependency management easier - in theory a binary can be one file
What is 'shared library'
unix name for dynamic library. Windows calls it DLL
Are standard libraries static or dynamic
depends on platform. On some you can choose on others its chosen for you. For example on windwos there are compiler switchs to say if you want static or dynamic runtimes. Not dont confuse dynamic library usage with dlopen - see later
'why we talk about 2 different types of library'
Typically a static library is in a different format from a dynamic one. Typically a static library is input to the linker just like any other compile unit. A dynamic library is typically output by the linker. They are used differently even though they both deliver the same chunk of code to your app
Symbol resolution is finalized at load time for a DLL
Full dynamic loading. This is the realm of dlopen. This is where you want to call entry points in a library that might not have even existing when you compiled. Use cases:
plugins that conform to a well known interface but there can be many implementations (PAM and NSS are good examples). The app chooses to load one or more implementations from specified files at run time
an app needs to load a library and call an arbitrary function. Imagine how , for example , how a scripting language can load and call an arbitrary method
To use a .so on unix you dont need to use dlopen. You can have it loaded for you (Same on windows). To really dynamically load a shared lib / dll you need dlopen or LoadLibrary
Note that statically linked libraries load faster, since there is less disk searching for all the runtime library files. If the libraries are small, and very unusual, probably better to link statically. If there are serious version dependencies / functional differences like MFC, the DLLs need different names.

Hiding a library within a library

Here's the situation. I have an old legacy library that is broken in many places, but has a lot of important code built in (we do not have the source, just the lib + headers). The functions exposed by this library have to be handled in a "special" way, some post and pre-processing or things go bad. What I'm thinking is to create another library that uses this old library, and exposes a new set of functions that are "safe".
I quickly tried creating this new library, and linked that into the main program. However, it still links to the symbols in the old library that are exposed through the new library.
One thing would obviously be to ask people not to use these functions, but if I could hide them through some way, only exposing the safe functions, that would be even better.
Is it possible? Alternatives?
(it's running on an ARM microcontroller. the file format is ELF, and the OS is an RTOS from Keil, using their compiler)
[update]
Here's what i ended up doing: I created dummy functions within the new library that use the same prototypes as the ones in the old. Linked the new library into the main program, and if the other developers try to use the "bad" functions from the old library it will break the build with a "Symbol abcd multiply defined (by old_lib.o and new_lib.o)." Good enough for government work...
[update2]
I actually found out that i can manually hide components of a library when linking them in through the IDE =P, much better solution. sorry for taking up space here.
If you're using the GNU binutils, objcopy can prefix all symbols with a string of your choice. Just use objcopy --prefix-symbols=brokenlib_ old.so new.so (be careful: omitting new.so will cause old.so to be overwritten!)
Now you use brokenlib_foo() to call the original version of foo().
If you use libtool to compile and link the library instead of ld, you can provide -export-symbols to control the output symbols, but this will only work if your old library can be statically linked. If it is dynamically linked (.so, .dylib, or .dll), this will not be possible.

Program location in the memory and static/shared libraries

When I run a program (in linux) does it all get loaded into the physical memory? If so, is using shared libraries, instead of static libraries, help in terms of caching? In general, when should I use shared libraries and when should I use static libraries? My codes are either written in C or in C++ if that matters.
This article hits covers some decent ground on what you want. This article goes much deeper about the advantages of shared libraries
SO also has covered this topic in depth
Difference between static and shared libraries?
When to use dynamic vs. static libraries
Almost all the above mentioned articles are shared library biased. Wikipedia tries to rescue static libraries :)
From wiki,
There are several advantages to statically linking libraries with an
executable instead of dynamically linking them. The most significant
is that the application can be certain that all its libraries are
present and that they are the correct version. This avoids dependency
problems. Usually, static linking will result in a significant
performance improvement.
Static linking can also allow the application
to be contained in a single executable file, simplifying distribution
and installation.
With static linking, it is enough to include those
parts of the library that are directly and indirectly referenced by
the target executable (or target library).
With dynamic libraries, the
entire library is loaded, as it is not known in advance which
functions will be invoked by applications. Whether this advantage is
significant in practice depends on the structure of the library.
Shared libraries are used mostly when you have functionality that could be used and "shared" across different programs. In that case, you will have a single point where all the programs will get their methods. However, this creates a dependency problem since now your compiled programs are dependent on that specific version of the library.
Static libraries are used mostly when you don't want to have dependency issues and don't want your program to care which X or Y libraries are installed on your target system.
So, which one to use?. for that you should answer the following questions:
Will your program be used on different platforms or Linux distributions? (e.g. Red Hat, Debian, SLES11-SP1)
Do you have replicated code that is being used by different binaries?
Do you envision that in the future other programs could benefit from your work?
I think this is a case by case decision, and it is not a one size fits all kind of answer.

Resources