How the OS find shared library path in two different linking?:run-time linking (loading) and compile time linking shared library in linux - c

i am a little confused about how shared library and the OS works.
1st question : how the OS manages shared libraries?, how they are specified uniquely? by file name or some other(say an ID) things? or by full path?!
2nd question : i know first when we compile and link codes, the linker need to access the shared library(.so) to perform linking, then after this stage when we execute the compiled program the OS loads the shared library and this libraries may be in different locations(am I wrong?) BUT i do not understand how the OS knows where to look for shared library, is library information (name? path? or what?!) coded in the executable ?

When compiling a program, libraries (other than the language runtime) must be explicitly specified in the build, otherwise they will not be included. There are some standard library directories, so for example you can specify -lfoo, and it will automatically look for libfoo.a or libfoo.so in the various usual directories like /usr/lib, /usr/local/lib etc.
Note, however, that a name like libfoo.so is usually a symlink to the actual library file name, which might be something like libfoo.so.1. This way, if there needs to be a backward-incompatible change to the ABI (the layout of some structure might change, say), then the new version of the library becomes libfoo.so.2, and binaries linked against the old version remain unaffected.
So the linker follows the symlink, and inserts a reference to the versioned name libfoo.so.1 into the executable, instead of the unversioned name libfoo.so. It can also insert the full path, but this is usually not done. Instead, when the executable is run, there is a system search path, as configured in your systemwide /etc/ld.so.conf, that is used to find the library.
(Actually, ld.so.conf is just the human-readable source for your library search paths; this is compiled into binary form in /etc/ld.so.cache for speed, using the ldconfig command. This is why you need to run ldconfig every time you make changes to the shareable libraries on your system.)
That’s a very simplified explanation of what is going on. There is a whole lot more that is not covered here. Here and here are some reference docs that might be useful on the build process. And here is a description of the system executable loader.

Related

At dynamic linking, does the dynamic loader look at all object files for definitions, or only at those specified by the executable?

So I'm trying to wrap my head around static and dynamic linking. There are many resources on SO and on the web. I think I pretty much get it, but there's still one thing that seems to bother me. Also, please correct me if my overall understanding is wrong.
I think I understand static linking:
The linker unpacks the linked libraries, and actually includes the libraries' object files inside the produced executable. The unresolved-stubs in the application object files are then replaced by actual function-calling code, which calls functions in addresses known at build time.
Dynamic linking on the other hand is what puzzles me more: I understand that in dynamic linking, the stubs in the object-code which reference yet-unresolved names, are going to stay as stubs until runtime.
Then at runtime, the dynamic loader of the OS would look through precompiled libraries stored at standard filesystem locations. It would look in the object-files of the libraries, inside their symbol tables (?) and try to find a matching function definition for each unresolved-stub. It would then load the matching object-files into memory, and replace the stubs to point to the function definitions.
So the part I'm missing is this: where does the OS dynamic loader look - does it look in the symbol tables for all object-files in the system-libraries directory? Or does it only look in object-files specified somewhere in the application-executable file? Is this the reason why at compile time we must specify all dynamic dependencies of our program? Also, is it true dynamic libraries expose a symbol-table too?
So the part I'm missing is this: where does the OS dynamic loader look
- does it look in the symbol tables for all object-files in the system-libraries directory?
No dynamic linker I'm aware of does this.
Or does it only look in object-files
specified somewhere in the application-executable file?
Nor exactly this, either.
Details vary, but generally, a dynamic linker looks for specific shared libraries by name in various directories. The directories searched may be built into the linker, specified by the operating system, specified in the object being linked, or a combination. The linker does not (generally) examine libraries' symbol tables until after it locates them by name and selects them for linking.
Is this the
reason why at compile time we must specify all dynamic dependencies of
our program?
Yes, though under some circumstances we do not need to specify all dynamic dependencies at compile time. Some dynamic linkers support on-demand dynamic loading as directed by the program itself. This can be used to implement plugin systems, among other purposes.
Also, is it true dynamic libraries expose a symbol-table
too?
Yes. Dynamic libraries have their own symbol tables because
The dynamic linker uses them to do its work, and
Dynamic libraries can have their own dynamic linking requirements, which are not necessarily reflected in the main program's.
In the normal usage, "dynamic linking" is performed by the loader. "Static linking" is performed by the linker.
Generally, linkers can create either executable files or shared libraries. The linker output for both is an instruction stream that tells the loaders how to place the executable or library in memory.
Dynamic linking on the other hand is what puzzles me more: I understand that in dynamic linking, the stubs in the object-code which reference yet-unresolved names, are going to stay as stubs until runtime
That is not [usually] correct. The linker will locate the shared library in which the symbol exists. The executable will have an instruction to find the symbol in that shared library. Linkers generally puke if they cannot find all the symbols that need to be resolved.
So the part I'm missing is this: where does the OS dynamic loader look - does it look in the symbol tables for all object-files in the system-libraries directory?
This a system specific question. In well designed operating systems, the shared libraries are designated by the system manager. The loader uses the library specified by the system. Poorly designed systems frequently use some kind of search path to find the shared libraries (which created a massive security hole).

Resolve shared library path on Windows and *nix systems

When loading shared library given its name, systems searches for the actual file (eg .dll) in some directories, based on search order, or in cache.
How can I programmatically get the resolved path of DLL given its name, but without actually loading it? E.g. on Windows, for kernel32 or kernel32.dll it would probably return C:\windows\system32\kernel32.dll whereas given foo it could be C:\Program Files\my\app\foo.dll.
If that can't be done, is there another way to determinate whether certain library belongs to system? E.g. user32.dll or libc.so.6 are system libraries but avcodec-55.dll or myhelperslib.so are not.
I'm interested solutions that work on Windows, Linux and Mac OS.
On Windows, LoadLibraryEx has the LOAD_LIBRARY_AS_DATAFILE flag which opens the DLL without performing the operations you refer to as "actually loading it".
This can be combined with any of the search order flags (Yeah, there is more than just one search order).
Unfortunately, you cannot use GetModuleFilename. Use GetMappedFileName instead.
The LoadLibraryEx documentation also says specifically not to use the SearchPath function to locate DLLs and not to use the DONT_RESOLVE_DLL_REFERENCES flag mentioned in comments.
For Linux, there's an existing tool ldd for which source code is available. It does actually load the shared libraries, but with a special environment variable LD_TRACE_LOADED_OBJECTS set that by convention causes them to skip doing anything. Because this is just a convention, beware that malicious files can perform actions when loaded by ldd CVE-2009-5064.

Can I force a dynamic library to link to a specific dynamic library dependency?

I'm building an dynamic library, libfoo.so, which depends on libcrypto.so.
Within my autotools Makefile.am file, I have a line like this:
libfoo_la_LIBADD += -L${OPENSSL_DIR}/lib -lcrypto
where $OPENSSL_DIR defaults to /usr but can be overridden by passing --with-openssl-dir=/whatever.
How can I ensure that an executable using libfoo.so uses ${OPENSSL_DIR}/lib/libcrypto.so (only) without the person building or running the executable having to use rpath or fiddle with LD_LIBRARY_PATH?
As things stand, I can build libfoo and pass --with-openssl-dir=/usr/local/openssl-special and it builds fine. But when I run ldd libfoo.so, it just points to the libcrypto.so in /usr/lib.
The only solution I can think of is statically linking libcrypto.a into libfoo.so. Is there any other approach possible?
Details of runtime dynamic linking vary from platform to platform. The Autotools can insulate you from that to an extent, but if you care about the details, which apparently you do, then it probably is not adequate to allow the Autotools to choose for you.
With that said, however, you seem to be ruling out just about all possibilities:
The most reliable way to ensure that at runtime you get the specific implementation you linked against at build time is to link statically. But you say you don't want that.
If you instead use dynamic libraries then you rely on the dynamic linker to associate a library implementation with your executable at run time. In that case, there are two general choices for how you can direct the DL to a specific library implementation:
Via information stored in the program / library binary. You are using terminology that suggests an ELF-based system, and for ELF shared objects, it is the RPATH and / or RUNPATH that convey information about where to look for required libraries. There is no path information associated with individual library requirements; they are identified by SONAME only. But you say you don't want to use RPATH*, and so I suppose not RUNPATH either.
Via static or dynamic configuration of the dynamic linker. This is where LD_LIBRARY_PATH comes in, but you say you don't want to use that. The dynamic linker typically also has a configuration file or files, such as /etc/ld.so.conf. There you can specify library directories to search, and, with a bit of care, the order to search them.
Possibly, then, you can cause your desired library implementation to be linked to your application by updating the dynamic linker's configuration files to cause it to search the wanted path first. This will affect the whole system, however, and it's brittle.
Alternatively, depending on details of the nature of the dependency, you could give your wanted version of libcrypto a distinct SONAME. Effectively, that would make it a different object (e.g. libdjcrypto) as far as the static and dynamic linkers are concerned. But that is risky, because if your library has both direct and indirect dependencies on libcrypto, or if a program using your library depends on libcrypto via another path, then you'll end up at run time (dynamically) linking both libraries, and possibly even using functions from both, depending on the origin of each call.
Note well that the above issue should be a concern for you if you link your library statically, too. If that leaves any indirect dynamic dependencies on libcrypto in your library, or any dynamic dependencies from other sources in programs using your library, then you will end up with multiple versions of libcrypto in use at the same time.
Bottom line
For an executable, the best options are either (1) all-static linkage or (2) (for ELF) RPATH / LD_LIBRARY_PATH / RUNPATH, ensuring that all components require the target library via the same SONAME. I tend to like providing a wrapper script that sets LD_LIBRARY_PATH, so that its effect is narrowly scoped.
For a reusable library, "don't do that" is probably the best alternative. The high potential for ending up with programs simultaneously using two different versions of the other library (libcrypto in your case) makes all available options unattractive. Unless, of course, you're ok with multiple library versions being used by the same program, in which case static linkage and RPATH / RUNPATH (but not LD_LIBRARY_PATH) are your best available alternatives.
*Note that at least some versions of libtool have a habit of adding RPATH entries whether you ask for them or not -- something to watch out for. You may need to patch the libtool scripts installed in your project to avoid that.

Does the linker prefer .so files over .a files?

I'm building Julia using a local LLVM build which contains both libLLVM*.so files and corresponding libLLVM*.a files. This was built first with BUILD_SHARED_LIBS=ON, which is responsible for the presence of the libLLVM*.so files.
libjulia.so, the library used by the julia executable, always linked to the libLLVM*.so files, even when I rebuilt LLVM with BUILD_SHARED_LIBS=OFF(the default config). llvm-config --libs $LIB's output with and without BUILD_SHARED_LIBS=ON didn't vary much and nothing seem to hint at llvm-config issuing linking options that'd direct the linker to link either *.so files or *.a files.
Why is this the case ? Is it s default behaviour of the linker to use .so files even when .a files of the same name exist ? Or, is there a build configuration cache that Julia reuses ?
Yes, to fulfil the option -lfoo, ld will by default link libfoo.so in preference to libfoo.a if both
are found in the same search directory, and when it finds either one it
will look no further.
You can enforce linkage of static libraries only by passing -static to the linkage,
but in that case static versions must be found for all libraries - including
default system libraries - not just those you explicitly mention.
To selectively link a static library libfoo.a, without specifying -static,
you can use the explicit form of the -l option: -l:libfoo.a rather than
-lfoo.
llvm-config will emit library options in the -lfoo form whether you build
static or shared libraries, since those options will work correctly for
either, but you need to understand when using them how the linker
behaves. If you don't tell it otherwise, it will link the shared rather
than the static library when it faces the choice.
Later
Why does ld prefer to link shared libraries over static ones?
AFAIK, it is not on record why the developers of ld made this decision long
ago, but the reason is obvious: If dynamic linkage is the default then
executables, by default, will not physically include additional copies of code
that can be provided to all executables by a single shared copy, from a shared library. Thus
executables, by default, will economize their code size and the aggregate of
excecutables that constitutes your system or mine will be vastly smaller than
it would have to be without sharing. Shared libraries and dynamic linkage
were invented so that systems need not be be bloated with duplicated code.
Dynamic linkage brings with it the complication that an executable
linked with shared libraries, when distributed to a system other than the
one on which it was built, does not carry its dynamic dependencies with it. It's
for that reason that all the approved mechanisms for installing a new binaries
on systems - package managers - ensure that all of their dynamic dependencies
are installed as well.

Can I make gcc ignore static libraries when linking shared libraries?

I've encountered a few cases building projects which use shared libraries or dynamic-loaded modules where the module/library depends on another library, but doesn't check that a shared copy is available before trying to link. This causes object files from a static archive (.a file) to get pulled into the resulting .so, and since these object files are non-PIC, the resulting .so file either has TEXTRELs (very bad load performance and memory usage) or fails altogether (on archs like x86_64 that don't support non-PIC shared libraries).
Is there any way I can make the gcc compiler driver refuse to link static library code into shared library output? It seems difficult and complicated by the possible need to link minimal amounts from libgcc.a and the like...
As you know, you can use -static to only link against static libraries, but there doesn't appear to be a good equivalent to only linking against dynamic libraries.
The following answer may be useful...
How to link using GCC without -l nor hardcoding path for a library that does not follow the libNAME.so naming convention?
You can use -l:[libraryname].so to list the dynamic libraries you want to link against in your library search path. Specifying the .so ending will probably help with your dynamic library only case. You will probably have to specify the whole name with the 'lib' prefix instead of just the shortened version.

Resources