At dynamic linking, does the dynamic loader look at all object files for definitions, or only at those specified by the executable? - c

So I'm trying to wrap my head around static and dynamic linking. There are many resources on SO and on the web. I think I pretty much get it, but there's still one thing that seems to bother me. Also, please correct me if my overall understanding is wrong.
I think I understand static linking:
The linker unpacks the linked libraries, and actually includes the libraries' object files inside the produced executable. The unresolved-stubs in the application object files are then replaced by actual function-calling code, which calls functions in addresses known at build time.
Dynamic linking on the other hand is what puzzles me more: I understand that in dynamic linking, the stubs in the object-code which reference yet-unresolved names, are going to stay as stubs until runtime.
Then at runtime, the dynamic loader of the OS would look through precompiled libraries stored at standard filesystem locations. It would look in the object-files of the libraries, inside their symbol tables (?) and try to find a matching function definition for each unresolved-stub. It would then load the matching object-files into memory, and replace the stubs to point to the function definitions.
So the part I'm missing is this: where does the OS dynamic loader look - does it look in the symbol tables for all object-files in the system-libraries directory? Or does it only look in object-files specified somewhere in the application-executable file? Is this the reason why at compile time we must specify all dynamic dependencies of our program? Also, is it true dynamic libraries expose a symbol-table too?

So the part I'm missing is this: where does the OS dynamic loader look
- does it look in the symbol tables for all object-files in the system-libraries directory?
No dynamic linker I'm aware of does this.
Or does it only look in object-files
specified somewhere in the application-executable file?
Nor exactly this, either.
Details vary, but generally, a dynamic linker looks for specific shared libraries by name in various directories. The directories searched may be built into the linker, specified by the operating system, specified in the object being linked, or a combination. The linker does not (generally) examine libraries' symbol tables until after it locates them by name and selects them for linking.
Is this the
reason why at compile time we must specify all dynamic dependencies of
our program?
Yes, though under some circumstances we do not need to specify all dynamic dependencies at compile time. Some dynamic linkers support on-demand dynamic loading as directed by the program itself. This can be used to implement plugin systems, among other purposes.
Also, is it true dynamic libraries expose a symbol-table
too?
Yes. Dynamic libraries have their own symbol tables because
The dynamic linker uses them to do its work, and
Dynamic libraries can have their own dynamic linking requirements, which are not necessarily reflected in the main program's.

In the normal usage, "dynamic linking" is performed by the loader. "Static linking" is performed by the linker.
Generally, linkers can create either executable files or shared libraries. The linker output for both is an instruction stream that tells the loaders how to place the executable or library in memory.
Dynamic linking on the other hand is what puzzles me more: I understand that in dynamic linking, the stubs in the object-code which reference yet-unresolved names, are going to stay as stubs until runtime
That is not [usually] correct. The linker will locate the shared library in which the symbol exists. The executable will have an instruction to find the symbol in that shared library. Linkers generally puke if they cannot find all the symbols that need to be resolved.
So the part I'm missing is this: where does the OS dynamic loader look - does it look in the symbol tables for all object-files in the system-libraries directory?
This a system specific question. In well designed operating systems, the shared libraries are designated by the system manager. The loader uses the library specified by the system. Poorly designed systems frequently use some kind of search path to find the shared libraries (which created a massive security hole).

Related

Does everything that may end up in a shared library always need to be compiled with -fPIC?

I'm building a shared library. I need only one function in it to be public.
The shared library is built from a few object files and several static libraries. The linker complains that everything should be build with -fPIC. All the object files and most static libraries were built without this option.
This makes me ask a number of questions:
Do I have to rebuild every object file and every static library I need for this dynamic lib with -fPIC? Is it the only way?
The linker must be able to relocate object files statically, during linking. Correct? Otherwise if object files used hardcoded constant addresses they could overlap with each other. Shouldn't this mean that the linker has all the information necessary to create the global offset table for each object file and everything else needed to create a shared library?
Should I always use -fPIC for everything in the future as a default option, just in case something may be needed by a dynamic library some day?
I'm working on Linux on x86_64 currently, but I'm interested in answers about any platform.
You did not say which platform you use but on Linux it's a requirement to compile object files that go into your library as position independent code (PIC). This includes static libraries at least in practice.
Yes. See load time relocation of shared libraries and position independent code pic in shared libraries.
I only use -fPIC when compiling object files that go into libraries to avoid unecessary overhead.

To use or not to use -fpic

My application needs to load one or more algorithms at run time and I use .so for this. The thing is that these libraries are not used by any other process but my applicaiton so there is no need to share the .text section with others. Some parts of the .so come from other static libraries that I compile beforehand.
In this case, do I still have to use -fpic flag for the static files?
EDIT
I found this article article. At page 7 it states this "So, if performance is important for a library or dynamically loadable module, you can compile it as non-PIC code. The primary downside to compiling the module as non-PIC is that loading time in-creases because the dynamic linker must make a large number of code patches when binding symbols."
Yes you do. Anything that will be loaded with dlopen must be compiled using -fpic (or -fPIC).
This is not about sharing the text segment, but about the different rules for accessing global data (including things that you might not realize are global data, such as the "procedure linkage table" trampolines used to call between global functions) in the main executable versus in shared libraries.

Can I force a dynamic library to link to a specific dynamic library dependency?

I'm building an dynamic library, libfoo.so, which depends on libcrypto.so.
Within my autotools Makefile.am file, I have a line like this:
libfoo_la_LIBADD += -L${OPENSSL_DIR}/lib -lcrypto
where $OPENSSL_DIR defaults to /usr but can be overridden by passing --with-openssl-dir=/whatever.
How can I ensure that an executable using libfoo.so uses ${OPENSSL_DIR}/lib/libcrypto.so (only) without the person building or running the executable having to use rpath or fiddle with LD_LIBRARY_PATH?
As things stand, I can build libfoo and pass --with-openssl-dir=/usr/local/openssl-special and it builds fine. But when I run ldd libfoo.so, it just points to the libcrypto.so in /usr/lib.
The only solution I can think of is statically linking libcrypto.a into libfoo.so. Is there any other approach possible?
Details of runtime dynamic linking vary from platform to platform. The Autotools can insulate you from that to an extent, but if you care about the details, which apparently you do, then it probably is not adequate to allow the Autotools to choose for you.
With that said, however, you seem to be ruling out just about all possibilities:
The most reliable way to ensure that at runtime you get the specific implementation you linked against at build time is to link statically. But you say you don't want that.
If you instead use dynamic libraries then you rely on the dynamic linker to associate a library implementation with your executable at run time. In that case, there are two general choices for how you can direct the DL to a specific library implementation:
Via information stored in the program / library binary. You are using terminology that suggests an ELF-based system, and for ELF shared objects, it is the RPATH and / or RUNPATH that convey information about where to look for required libraries. There is no path information associated with individual library requirements; they are identified by SONAME only. But you say you don't want to use RPATH*, and so I suppose not RUNPATH either.
Via static or dynamic configuration of the dynamic linker. This is where LD_LIBRARY_PATH comes in, but you say you don't want to use that. The dynamic linker typically also has a configuration file or files, such as /etc/ld.so.conf. There you can specify library directories to search, and, with a bit of care, the order to search them.
Possibly, then, you can cause your desired library implementation to be linked to your application by updating the dynamic linker's configuration files to cause it to search the wanted path first. This will affect the whole system, however, and it's brittle.
Alternatively, depending on details of the nature of the dependency, you could give your wanted version of libcrypto a distinct SONAME. Effectively, that would make it a different object (e.g. libdjcrypto) as far as the static and dynamic linkers are concerned. But that is risky, because if your library has both direct and indirect dependencies on libcrypto, or if a program using your library depends on libcrypto via another path, then you'll end up at run time (dynamically) linking both libraries, and possibly even using functions from both, depending on the origin of each call.
Note well that the above issue should be a concern for you if you link your library statically, too. If that leaves any indirect dynamic dependencies on libcrypto in your library, or any dynamic dependencies from other sources in programs using your library, then you will end up with multiple versions of libcrypto in use at the same time.
Bottom line
For an executable, the best options are either (1) all-static linkage or (2) (for ELF) RPATH / LD_LIBRARY_PATH / RUNPATH, ensuring that all components require the target library via the same SONAME. I tend to like providing a wrapper script that sets LD_LIBRARY_PATH, so that its effect is narrowly scoped.
For a reusable library, "don't do that" is probably the best alternative. The high potential for ending up with programs simultaneously using two different versions of the other library (libcrypto in your case) makes all available options unattractive. Unless, of course, you're ok with multiple library versions being used by the same program, in which case static linkage and RPATH / RUNPATH (but not LD_LIBRARY_PATH) are your best available alternatives.
*Note that at least some versions of libtool have a habit of adding RPATH entries whether you ask for them or not -- something to watch out for. You may need to patch the libtool scripts installed in your project to avoid that.

How the OS find shared library path in two different linking?:run-time linking (loading) and compile time linking shared library in linux

i am a little confused about how shared library and the OS works.
1st question : how the OS manages shared libraries?, how they are specified uniquely? by file name or some other(say an ID) things? or by full path?!
2nd question : i know first when we compile and link codes, the linker need to access the shared library(.so) to perform linking, then after this stage when we execute the compiled program the OS loads the shared library and this libraries may be in different locations(am I wrong?) BUT i do not understand how the OS knows where to look for shared library, is library information (name? path? or what?!) coded in the executable ?
When compiling a program, libraries (other than the language runtime) must be explicitly specified in the build, otherwise they will not be included. There are some standard library directories, so for example you can specify -lfoo, and it will automatically look for libfoo.a or libfoo.so in the various usual directories like /usr/lib, /usr/local/lib etc.
Note, however, that a name like libfoo.so is usually a symlink to the actual library file name, which might be something like libfoo.so.1. This way, if there needs to be a backward-incompatible change to the ABI (the layout of some structure might change, say), then the new version of the library becomes libfoo.so.2, and binaries linked against the old version remain unaffected.
So the linker follows the symlink, and inserts a reference to the versioned name libfoo.so.1 into the executable, instead of the unversioned name libfoo.so. It can also insert the full path, but this is usually not done. Instead, when the executable is run, there is a system search path, as configured in your systemwide /etc/ld.so.conf, that is used to find the library.
(Actually, ld.so.conf is just the human-readable source for your library search paths; this is compiled into binary form in /etc/ld.so.cache for speed, using the ldconfig command. This is why you need to run ldconfig every time you make changes to the shareable libraries on your system.)
That’s a very simplified explanation of what is going on. There is a whole lot more that is not covered here. Here and here are some reference docs that might be useful on the build process. And here is a description of the system executable loader.

linking, loading, and virtual memory

I know these questions have been asked before - but I still can't reconcile everything together into an overall picture.
static vs dynamic library
static libraries have their code copied and linked into the resulting executable
static libraries have only copy and link the required modules into the executable, not the entire library implementation
static libraries don't need to be compiled as PIC as they are apart of the resulting executable
dynamic libraries copy and link in stubs that describe how to load/link (?) the function implementation at runtime
dynamic libraries can be PIC or relocatable
why are there separate static and dynamic libraries? All of the above seems to be be the job of the static or dynamic linker. Why do I need 2 libraries that implement scanf?
(bonus #1) what does a shared library refer to? I've heard it being used as (1) the overall umbrella term, synonymous to library, (2) directly to a dynamic library, (3) using virtual memory to map the same physical memory of a library to multiple address spaces. Can you do this only with dynamic libraries? (4) having different versions of the same dynamic library in memory.
(bonus #2) are the standard libraries (libc, libc++, stdlibc++, ..) linked dynamically or statically by default? I never need to dlopen()..
static vs dynamic linking
how is this any different than static vs dynamic libraries? I don't understand why there isn't just 1 library, and we use either a static or dynamic linker (other than the PIC issue). Instead of talking about static vs dynamic libraries, should we instead be discussing the more general static s dynamic linking?
is symbol resolution still performed at compile-time for both?
static vs dynamic loading
Static loading means copying the full executable into MM before executing it
Dynamic loading means that only the executable header copied into MM before executing, additional functionality is loaded into MM when requested. How is this any different from paging?
If the executable is dynamically linked, why would it not be dynamically loaded?
both static loading and dynamic loading may or may not perform relocation
I know there are a lot of things I'm confused about here - and I'm not necessary looking for someone to address each issue. I'm hoping by listing out everything that is confusing me, that someone that understands this will see where a lapse in my understanding is at a broad level, and be able to paint a larger picture about how these things cooperate together..
why 2 types of lib loading
dynamic saves space (you dont have hundreds of copies of the same code in all binaries using foo.lib
dynamic allows foo.lib vendor can ship a new version of the library and existing code takes advantage of it
static makes dependency management easier - in theory a binary can be one file
What is 'shared library'
unix name for dynamic library. Windows calls it DLL
Are standard libraries static or dynamic
depends on platform. On some you can choose on others its chosen for you. For example on windwos there are compiler switchs to say if you want static or dynamic runtimes. Not dont confuse dynamic library usage with dlopen - see later
'why we talk about 2 different types of library'
Typically a static library is in a different format from a dynamic one. Typically a static library is input to the linker just like any other compile unit. A dynamic library is typically output by the linker. They are used differently even though they both deliver the same chunk of code to your app
Symbol resolution is finalized at load time for a DLL
Full dynamic loading. This is the realm of dlopen. This is where you want to call entry points in a library that might not have even existing when you compiled. Use cases:
plugins that conform to a well known interface but there can be many implementations (PAM and NSS are good examples). The app chooses to load one or more implementations from specified files at run time
an app needs to load a library and call an arbitrary function. Imagine how , for example , how a scripting language can load and call an arbitrary method
To use a .so on unix you dont need to use dlopen. You can have it loaded for you (Same on windows). To really dynamically load a shared lib / dll you need dlopen or LoadLibrary
Note that statically linked libraries load faster, since there is less disk searching for all the runtime library files. If the libraries are small, and very unusual, probably better to link statically. If there are serious version dependencies / functional differences like MFC, the DLLs need different names.

Resources