Resolve shared library path on Windows and *nix systems - c

When loading shared library given its name, systems searches for the actual file (eg .dll) in some directories, based on search order, or in cache.
How can I programmatically get the resolved path of DLL given its name, but without actually loading it? E.g. on Windows, for kernel32 or kernel32.dll it would probably return C:\windows\system32\kernel32.dll whereas given foo it could be C:\Program Files\my\app\foo.dll.
If that can't be done, is there another way to determinate whether certain library belongs to system? E.g. user32.dll or libc.so.6 are system libraries but avcodec-55.dll or myhelperslib.so are not.
I'm interested solutions that work on Windows, Linux and Mac OS.

On Windows, LoadLibraryEx has the LOAD_LIBRARY_AS_DATAFILE flag which opens the DLL without performing the operations you refer to as "actually loading it".
This can be combined with any of the search order flags (Yeah, there is more than just one search order).
Unfortunately, you cannot use GetModuleFilename. Use GetMappedFileName instead.
The LoadLibraryEx documentation also says specifically not to use the SearchPath function to locate DLLs and not to use the DONT_RESOLVE_DLL_REFERENCES flag mentioned in comments.
For Linux, there's an existing tool ldd for which source code is available. It does actually load the shared libraries, but with a special environment variable LD_TRACE_LOADED_OBJECTS set that by convention causes them to skip doing anything. Because this is just a convention, beware that malicious files can perform actions when loaded by ldd CVE-2009-5064.

Related

Getting known library paths from ldconfig for use with dlopen

I have a program written in C that uses dlopen for loading plug-in modules. When the library is dynamically loaded, it runs constructor code which register pointer to structure with function implementations with the main application by use of exported function. I want to use absolute path for specifying the file to dlopen.
Then I have other part of the program with takes file, determine if it is ELF, then looks into the ELF header for specific ELF section, read this section and extract from it pertinent information. This way it filters only shared libraries which I have previously tagged as a plug-in module.
However, I am solving a problem how to discover them on the fly (in portable Linux way, i.e. it will run on Debian and on Fedora too and so on) from the main program. I have been thinking about using ldconfig for this. (As the modules will be installed by way of distro packaging system, APT for example.) Is there any way how to programmatically get the string list of known libraries from C program other than directly reading the /etc/ld.co.cache file? I was thinking that maybe there is some header library which will give char** when I ask.
Or, maybe is there any better solution to my problem?
(I am proponent of using standard system components that programming one-off solutions which will need support in the future.)

Can I force a dynamic library to link to a specific dynamic library dependency?

I'm building an dynamic library, libfoo.so, which depends on libcrypto.so.
Within my autotools Makefile.am file, I have a line like this:
libfoo_la_LIBADD += -L${OPENSSL_DIR}/lib -lcrypto
where $OPENSSL_DIR defaults to /usr but can be overridden by passing --with-openssl-dir=/whatever.
How can I ensure that an executable using libfoo.so uses ${OPENSSL_DIR}/lib/libcrypto.so (only) without the person building or running the executable having to use rpath or fiddle with LD_LIBRARY_PATH?
As things stand, I can build libfoo and pass --with-openssl-dir=/usr/local/openssl-special and it builds fine. But when I run ldd libfoo.so, it just points to the libcrypto.so in /usr/lib.
The only solution I can think of is statically linking libcrypto.a into libfoo.so. Is there any other approach possible?
Details of runtime dynamic linking vary from platform to platform. The Autotools can insulate you from that to an extent, but if you care about the details, which apparently you do, then it probably is not adequate to allow the Autotools to choose for you.
With that said, however, you seem to be ruling out just about all possibilities:
The most reliable way to ensure that at runtime you get the specific implementation you linked against at build time is to link statically. But you say you don't want that.
If you instead use dynamic libraries then you rely on the dynamic linker to associate a library implementation with your executable at run time. In that case, there are two general choices for how you can direct the DL to a specific library implementation:
Via information stored in the program / library binary. You are using terminology that suggests an ELF-based system, and for ELF shared objects, it is the RPATH and / or RUNPATH that convey information about where to look for required libraries. There is no path information associated with individual library requirements; they are identified by SONAME only. But you say you don't want to use RPATH*, and so I suppose not RUNPATH either.
Via static or dynamic configuration of the dynamic linker. This is where LD_LIBRARY_PATH comes in, but you say you don't want to use that. The dynamic linker typically also has a configuration file or files, such as /etc/ld.so.conf. There you can specify library directories to search, and, with a bit of care, the order to search them.
Possibly, then, you can cause your desired library implementation to be linked to your application by updating the dynamic linker's configuration files to cause it to search the wanted path first. This will affect the whole system, however, and it's brittle.
Alternatively, depending on details of the nature of the dependency, you could give your wanted version of libcrypto a distinct SONAME. Effectively, that would make it a different object (e.g. libdjcrypto) as far as the static and dynamic linkers are concerned. But that is risky, because if your library has both direct and indirect dependencies on libcrypto, or if a program using your library depends on libcrypto via another path, then you'll end up at run time (dynamically) linking both libraries, and possibly even using functions from both, depending on the origin of each call.
Note well that the above issue should be a concern for you if you link your library statically, too. If that leaves any indirect dynamic dependencies on libcrypto in your library, or any dynamic dependencies from other sources in programs using your library, then you will end up with multiple versions of libcrypto in use at the same time.
Bottom line
For an executable, the best options are either (1) all-static linkage or (2) (for ELF) RPATH / LD_LIBRARY_PATH / RUNPATH, ensuring that all components require the target library via the same SONAME. I tend to like providing a wrapper script that sets LD_LIBRARY_PATH, so that its effect is narrowly scoped.
For a reusable library, "don't do that" is probably the best alternative. The high potential for ending up with programs simultaneously using two different versions of the other library (libcrypto in your case) makes all available options unattractive. Unless, of course, you're ok with multiple library versions being used by the same program, in which case static linkage and RPATH / RUNPATH (but not LD_LIBRARY_PATH) are your best available alternatives.
*Note that at least some versions of libtool have a habit of adding RPATH entries whether you ask for them or not -- something to watch out for. You may need to patch the libtool scripts installed in your project to avoid that.

How the OS find shared library path in two different linking?:run-time linking (loading) and compile time linking shared library in linux

i am a little confused about how shared library and the OS works.
1st question : how the OS manages shared libraries?, how they are specified uniquely? by file name or some other(say an ID) things? or by full path?!
2nd question : i know first when we compile and link codes, the linker need to access the shared library(.so) to perform linking, then after this stage when we execute the compiled program the OS loads the shared library and this libraries may be in different locations(am I wrong?) BUT i do not understand how the OS knows where to look for shared library, is library information (name? path? or what?!) coded in the executable ?
When compiling a program, libraries (other than the language runtime) must be explicitly specified in the build, otherwise they will not be included. There are some standard library directories, so for example you can specify -lfoo, and it will automatically look for libfoo.a or libfoo.so in the various usual directories like /usr/lib, /usr/local/lib etc.
Note, however, that a name like libfoo.so is usually a symlink to the actual library file name, which might be something like libfoo.so.1. This way, if there needs to be a backward-incompatible change to the ABI (the layout of some structure might change, say), then the new version of the library becomes libfoo.so.2, and binaries linked against the old version remain unaffected.
So the linker follows the symlink, and inserts a reference to the versioned name libfoo.so.1 into the executable, instead of the unversioned name libfoo.so. It can also insert the full path, but this is usually not done. Instead, when the executable is run, there is a system search path, as configured in your systemwide /etc/ld.so.conf, that is used to find the library.
(Actually, ld.so.conf is just the human-readable source for your library search paths; this is compiled into binary form in /etc/ld.so.cache for speed, using the ldconfig command. This is why you need to run ldconfig every time you make changes to the shareable libraries on your system.)
That’s a very simplified explanation of what is going on. There is a whole lot more that is not covered here. Here and here are some reference docs that might be useful on the build process. And here is a description of the system executable loader.

Proxy shared library (sharedlib, shlib, so) for ELF?

On Windows, it's more or less common to create "proxy DLLs" which take place of the original DLL and forward calls to it (after any additional actions as needed). You can read about it here and here for example.
However, shlib munging culture under Linux is quite different. It starts with the fact that LD_PRELOAD is the builtin feature with ld.so under Linux, which simply injects separate shlib into process and uses any symbols it defines as override. And that "injection" technique seems to define whole direction of thought - here's a typical ELF hacking tool or this question, where gentleman seems to have the same usecase as me, but starts with asking how he can patch existing binaries.
No, thanks. I don't want to inject into or modify something which is nor mine. All I want to do is to make a standalone proxy shlib which will call out to the original. Ideally, there would be a tool which can be fed with the original .so and create a C source code which would just redirect to original's functions, while letting me easily override anything I want. So, where's such tool? ;-) Thanks.
Using LD_PRELOAD doesn't really involve modifying something which isn't yours, and the injection isn't all that different from normal dynamic library loading. The “typical ELF hacking tool” from the ERESI project is unrelated to LD_PRELOAD. You should not be afraid of it. A good introduction to writing LD_PRELOAD-able “proxies” is here.
That being said, if you want to create a system-wide proxy for some library, you might argue that globally setting LD_PRELOAD (and thus loading your proxy into every binary that ever runs on your system) is undesirable. It is commonly used to override functions from glibc by tools such as libeatmydata or socksify, but if you're overriding a function in a library that is bigger and/or less widespread than glibc, it makes sense to try to find another approach, to really create a proxy for just that one library.
One such approach is to use patchelf --replace-needed or --add-needed to hardcode the full pathname of the original library and then make sure the proxy library is found first by setting LD_LIBRARY_PATH¹. So, the complete procedure is:
create an LD_PRELOAD-able library that overrides some functions of the original one (test that it works using only LD_PRELOAD before proceeding further!)
compile and link this library with the original library so that ldd libwrapper-foo.so includes something like:
libfoo.so.0 => /usr/lib/x86_64-linux-gnu/libfoo.so.0 (0x0000deadbeef0000)
hardcode the full path using patchelf:
patchelf --replace-needed libfoo.so.0 /usr/lib/x86_64-linux-gnu/libfoo.so.0 libwrapper-foo.so
symlink libwrapper-foo.so to libfoo.so.0
now LD_LIBRARY_PATH=. ldd $(which program-that-uses-libfoo) should include these lines:
libfoo.so.0 => ./libfoo.so.0 (0x0000dead56780000)
/usr/lib/x86_64-linux-gnu/libfoo.so.0 (0x0000dead1234000000)
set LD_LIBRARY_PATH to full path to the wrapper library in your .bashrc or somewhere
A real-life example of such proxy libary is my wrapper for libpango that enables subpixel positioning for all applications.
¹) It might also be possible to put this proxy library into /usr/local/lib, but ldconfig (the tool that updates shared libraries cache) refuses to use libraries with hardcoded absolute paths.
apitrace is a tool which covers detailed tracing of graphic libs (OpenGL, DirectX) calls for a number of platform. It's probably too detailed and complex for generic solution, but at least provides some reference and affinity.

Find a directory in shared library search path

I want dlopen() every shared library in a specific directory. In order to do that,
what is the cleanest way to retrieve linux's library search path. Or Is there a quicker way of find a specific directory in that path ?
posix would be better.
POSIX does not support a mechanism to find out the directories on the shared library search path (it does not mandate LD_LIBRARY_PATH, for example), so any solution is inherently somewhat platform specific.
Linux presents some problems because the values to be used could be based on the contents of /etc/ld.so.conf as well as any runtime value in LD_LIBRARY_PATH environment variable; other systems present comparable problems. The default locations also vary by system - with /lib and /usr/lib being usual for 32-bit Linux machines but /lib64 and /usr/lib64 being used on at least some 64-bit machines. However, other platforms use other locations for 64-bit software. For example, Solaris uses /lib/sparcv9 and /usr/lib/sparcv9, for example (though the docs mention /lib/64 and /usr/lib/64, they're symlinks to the sparcv9 directories). Solaris also has environment variables LD_LIBRARY_PATH_64 and LD_LIBRARY_PATH_32. HP-UX and AIX traditionally use other variables than LD_LIBRARY_PATH -- SHLIB_PATH and LIBPATH, IIRC -- though I believe AIX now uses LD_LIBRARY_PATH too. And, on Solaris, the tool for configuring shared libraries is 'crle' (configure runtime linking environment) and the analog of /etc/ld.so.conf is either /var/ld/ld.config or /var/ld/64/ld.config. Also, of course, the extensions on shared libraries varies (.so, .sl, .dylib, .bundle, etc).
So, your solution will be platform-specific. You will need to decide on the the default locations, the environment variables to read, and the configuration file to read, and the relevant file extension. Given those, then it is mainly a SMOP - Simple Matter Of Programming:
For each directory named by any of the sources:
Open the relevant sub-directory (opendir())
Read each file name (readdir()) in turn
Use dlopen() on the path of the relevant files.
Do whatever analysis is relevant to you.
Use dlclose()
Use closedir()
See also the notes in the comment below...the complete topic is modestly fraught with variations from platform to platform.
I'm not sure it's possible to do that and be portable. Since this question is about Linux, portability may not be of paramount importance. Then I do not understand the POSIX constraint. Could you clarify?
You'll probably have to either implement the search functionality detailed in man 8 ld.so, which includes scanning /etc/ld.so.conf in addition to LD_LIBRARY_PATH, or make /lib/ld.so do what you want for you and parse the output. A not-exactly-pretty command line for that could be:
export LD_PRELOAD=THISLIBRARYSODOESNOTEXIST
strace -s 4096 /bin/true 2>&1 | sed -n 's/^open("\([^"]*\)\/THISLIBRARYSODOESNOTEXIST".*$/\1\/YOURSUBDIRHERE/gp'
unset LD_PRELOAD
You can then enumerate files with the POSIX calls opendir(3) and readdir(3).

Resources