How to get dylibs in /opt/local/lib to be recognized by dlopen in MacOS Monterey? - macports

I am trying to compile some nim code that depends on libsass, and it fails with
dlopen(libsass.dylib, 0x0002): tried: 'libsass.dylib' (no such file), '/usr/local/lib/libsass.dylib' (no such file), '/usr/lib/libsass.dylib' (no such file), '/Users/emre/code/nimforum/libsass.dylib' (no such file)
could not load: libsass.dylib
On my system, that file is in /opt/local/lib, since I installed it with macports. I tried setting LD_LIBRARY_PATH, DYLD_LIBRARY_PATH, and DYLD_FALLBACK_LIBRARY_PATH, to /opt/local/lib but this did not help. I believe macOS's System Integrity Protection module is the cause but I am not sure how best to accommodate it.

This has nothing to do with SIP.
You need to pass the full path to the library you want to open to the first argument of dlopen(). From the man page:
SYNOPSIS
#include <dlfcn.h>
void*
dlopen(const char* path, int mode);
DESCRIPTION
dlopen() examines the mach-o file specified by path.
If you really need to use dlopen() to load this, you should be passing dlopen("/opt/local/lib/libsass.dylib", RTLD_NOW).
However, dlopen() is generally frowned upon as it bypasses a lot of the performance and correctness benefits of static linkage. You should aim to use static linkage (ie: pass -lsass at build time) wherever possible.

Related

Modify glibc dynamic linker to check if a shared library has been loaded in another process

I want to modify the glibc dynamic linker/loader so that before mapping a shared library into a process, the linker/loader checks whether the library has been loaded/in-use by any other process in the system or not. The linker/loader will perform a specific operation on the shared library code only if the library has not been used/loaded by any other process. I understand that currently the linker/loader only linearly maps the shared library and waits for demand paging to physically load the library.
I have tried to use the shell command lsof /path/library.so from within the dynamic linker/loader code to accomplish that. To invoke lsof command from within dynamic linker code, I have tried
system("lsof /path/library.so")
File* fp=popen("lsof /path/library.so", "r")
Building dynamic linker, however, gives me "multiple definitions of x symbols" error as I tried to include stdio.h (for popen()) or stdlib.h (for system()) header files. Can you please suggest how to resolve the glibc build error or any other better way to solve my original problem?
Addition 1: Thanks #EmployedRussian. I also explored the option that you mentioned.
One possible answer is: store them in a file or a database. If that is your answer, then the solution becomes obvious: check if the file or a database entry exists. If it does, you don't need to do the computation again.
The main problem for both lsof or file/databased based solution is: when I add a new .c file and include <stdio.h> in that file to do file operations (such as FILE* fp = fopen()), the glibc build gives me errors like this for few functions: '-Wl,-(' /path/glibc-2.30_build/elf/dl-allobjs.os /path/glibc-2.30_build/libc_pic.a -lgcc '-Wl,-)' -Wl,-Map,/path/glibc-2.30_build/elf/librtld.mapT /usr/bin/ld: /path/glibc-2.30_build/libc_pic.a(dl-error.os): in function `__GI__dl_signal_exception': /path/glibc_2.30_shared_library/elf/dl-error-skeleton.c:91: multiple definition of `_dl_signal_exception'; /path/glibc-2.30_build/elf/dl-allobjs.os:/path/glibc_2.30_shared_library/elf/dl-error-skeleton.c:91: first defined here
Building dynamic linker, however, gives me "multiple definitions of x symbols" error
This is because the dynamic linker is very special, and you are very restricted in what you can do in the dynamic linker.
It is special because it must be a stand-alone program -- it can't use any other library (including libc.so.6) -- it is responsible for loading all other libraries, so naturally it can't use anything that it has yet to load.
I just want to compute them once when the library is being physically loaded the first time.
This is still an XY Problem. What are you going to do with the result of this computation?
One possible answer is: store them in a file or a database.
If that is your answer, then the solution becomes obvious: check if the file or a database entry exists. If it does, you don't need to do the computation again.
Update:
The main problem for both lsof or file/databased based solution is: when I add a new .c file and include <stdio.h> in that file to do file operations (such as FILE* fp = fopen()), the glibc build gives me errors
This is the exact same problem: you are trying to use parts of libc.so which can't be used in a dynamic linker.
If you want to store the result of your computation in a file, you need to use low-level parts which are usable. Use open() and write() instead of fopen() and fprintf().
Alternatively, do it from within your library or program -- since you will no longer care about how many processes have loaded the library, there is no reason to try to perform this computation in the loader. (There might be a reason, but you are not explaining it; so we are back to XY problem.)

Is there a reliable way to know what libraries could be dlopen()ed in an elf binary?

Basically, I want to get a list of libraries a binary might load.
The unreliable way I came up with that seems to work (with possible false-positives):
comm -13 <(ldd elf_file | sed 's|\s*\([^ ]*\)\s.*|\1|'| sort -u) <(strings -a elf_file | egrep '^(|.*/)lib[^:/]*\.so(|\.[0-9]+)$' | sort -u)
This is not reliable. But it gives useful information, even if the binary was stripped.
Is there a reliable way to get this information without possible false-positives?
EDIT: More context.
Firefox is transitioning from using gstreamer to using ffmpeg.
I was wondering what versions of libavcodec.so will work.
libxul.so uses dlopen() for many optional features.
And the library names are hard-coded. So, the above command helps
in this case.
I also have a general interest in package management and binary dependencies.
I know you can get direct dependencies with readelf -d, dependencies of
dependencies with ldd. And I was wondering about optional dependencies, hence the question.
ldd tells you the libraries your binary has been linked against. These are not those that the program could open with dlopen.
The signature for dlopen is
void *dlopen(const char *filename, int flag);
So you could, still unreliably, run strings on the binary, but this could still fail if the library name is not a static string, but built or read from somewhere during program execution -- and this last situation means that the answer to your question is "no"... Not reliably. (The name of the library file could be read from the network, from a Unix socket, or even uncompressed on the fly, for example. Anything is possible! -- although I wouldn't recommend any of these ideas myself...)
edit: also, as John Bollinger mentioned, the library names could be read from a config file.
edit: you could also try substituting the dlopen system call with one of yours (this is done by the Boehm garbage collector with malloc, for example), so it would open the library, but also log its name somewhere. But if the program didn't open a specific library during execution, you still won't know about it.
(I am focusing on Linux; I guess that most of my answer fits for every POSIX systems; but on MacOSX dlopen wants .dylib dynamic library files, not .so shared objects)
A program could even emit some C code in some temporary file /tmp/foo1234.c, fork a compilation of that /tmp/foo1234.c into a shared library /tmp/foo1234.so by some gcc -O -shared -fPIC /tmp/foo1234.c -o /tmp/foo1234.so command -generated and executed at runtime of your program-, perhaps remove the /tmp/foo1234.c file -since it is not needed any more-, and dlopen that /tmp/foo1234.so (and perhaps even remove /tmp/foo1234.so after dlopen), all that in the same process. My GCC MELT plugin for gcc does exactly this, and so does Bigloo, and the GCCJIT library is doing something close.
So in general, your quest is impossible and even has no sense.
Is there a reliable way to get this information without possible false-positives?
No, there is no reliable way to get such information without false positives (you could prove that equivalent to the halting problem, or to some other undecidable problem). See also Rice's theorem.
In practice, most dlopen happens on plugins provided by some configuration. There might not be exactly named as such in a configuration file (e.g. some Foo programs might have a convention like a plugin named bar in some foo.conf configuration file is provided by foo-bar.so plugin).
However, you might find some heuristic approximation. Most programs doing some dlopen have some plugin convention requesting some particular symbol names in the plugin. You could search for shared objects defining these names. Of course you'll get false positives.
For example, the zsh shell accepts plugins called zsh modules. the example module shows that enables_,
boot_, features_ etc... functions are expected in zsh modules. You could use nm -D to find *.so files providing these (hence finding the plugins likely to be perhaps loadable by zsh)
(I am not convinced that such an approach is worthwhile; in fact you should usually know which plugins are useful on your system by which applications)
BTW, you could use strace(1) on the execution of some command to understand the syscalls it is doing, hence the plugins it is loading. You might also use ltrace(1), or pmap(1) (on some given process), or simply -for a process 1234- use cat /proc/1234/maps to understand its virtual address space, hence the plugins it has already loaded. See proc(5).
Notice that strace, ltrace, pmap exist on Linux, but many POSIX systems have similar programs.
Also, a program could generate some machine code at runtime and execute it (SBCL does that at every REPL interaction!). Your program could also use some JIT techniques (e.g. with libjit, llvm, asmjit, GCCJIT or with hand-written code...) to do likewise. So plugin-like behavior can happen without dlopen (and you might mimic dlopen with mmap calls and some ELF relocation processing).
Addenda:
If you are installing firefox from its packaged version (e.g. the iceweasel package on Debian), its package is likely to handle the dependencies

Determine real executable when invoking dynamic linker directly

I am running an executable through my dynamic linker directly, calling execve() with the path to the dynamic linker. However, unlike when executing a binary directly, /proc/self/exe is a symlink to the dynamic linker rather than a symlink to the binary, which breaks certain applications that depend on the standard behaviour (mainly OpenJDK). Is there any way to determine the real executable path of a binary executed through a dynamic linker? Is there another file in /proc that I can read to get the path, then have a hacky LD_PRELOAD override for readlink() that translates accesses to /proc/*/exe to the real path?
For a bit of background - I'm trying to get fakechroot with the custom ELF loader parameter working for OpenJDK.
This is probably too easy and you've already thought about it, but can't you simply read /proc/*/cmdline and find the real executable as an argument to ld? Only if you have detected that /proc/*/exe is a symlink to ld, of course.

How can a C shared library function learn the executable's path

I am writing a C shared library in Linux in which a function would like to discover the path to the currently running executable. It does NOT have access to argv[0] in main(), and I don't want to require the program accessing the library to pass that in.
How can a function like this, outside main() and in the wild, get to the path of the running executable? So far I've thought of 2 rather unportable, unreliable ways: 1) try to read /proc/getpid()/exe and 2) try to climb the stack to __libc_start_main() and read the stack params. I worry about all machines having /proc mounted.
Can you think of another way? Is there something buried anywhere in dlopen(NULL, 0) ? Can I get a reliable proc image of self from the kernel??
Thanks for any thoughts.
/proc is your best chance, as "path of the executable" is not that well defined concept in Linux (you can even delete it while the program is running).
To get the breakdown of loaded modules (with the main executable usually being the first entry) you should look at /proc/<pid>/maps. It's a text formatted file which will allow you to associate executable and library paths with load addresses (if the former are known and still valid).
Unless you are writing software that may be used very early in system startup, you can safely assume that /proc will always be mounted on a Linux system. It contains quite a bit of data that is not accessible any other way, and thus must be mounted for a system to function properly. As such, you can pretty easily obtain a path to your executable using:
readlink("/proc/self/exe", buf, sizeof(buf));
If for some reason you want to avoid this, it's also possible to read it from the process's auxiliary vector:
#include <sys/auxv.h>
#include <elf.h>
const char *execpath = (const char *) getauxval(AT_EXECFN);
Note that this will require a recent version of glibc (2.16 or later). It'll also return the path that was used to execute your application (e.g, possibly something like ./binary), rather than its absolute path.

What should I do if two libraries provide a function with the same name generating a conflict?

What should I do if I have two libraries that provide functions with equivalent names?
It is possible to rename symbols in an object file using objcopy --redefine-sym old=new file (see man objcopy).
Then just call the functions using their new names and link with the new object file.
If you control one or both: edit one to change the name and recompile Or equivalently see Ben and unknown's answers which will work without access to the source code.
If you don't control either of them you can wrap one of them up. That is compile another (statically linked!) library that does nothing except re-export all the symbols of the original except the offending one, which is reached through a wrapper with an alternate name. What a hassle.
Added later: Since qeek says he's talking about dynamic libraries, the solutions suggested by Ferruccio and mouviciel are probably best. (I seem to live in long ago days when static linkage was the default. It colors my thinking.)
Apropos the comments: By "export" I mean to make visible to modules linking to the library---equivalent to the extern keyword at file scope. How this is controlled is OS and linker dependent. And it is something I always have to look up.
Under Windows, you could use LoadLibrary() to load one of those libraries into memory and then use GetProcAddress() to get the address of each function you need to call and call the functions through a function pointer.
e.g.
HMODULE lib = LoadLibrary("foo.dll");
void *p = GetProcAddress(lib, "bar");
// cast p to the approriate function pointer type (fp) and call it
(*fp)(arg1, arg2...);
FreeLibrary(lib);
would get the address of a function named bar in foo.dll and call it.
I know Unix systems support similar functionality, but I can't think of their names.
If you have .o files there, a good answer here: https://stackoverflow.com/a/6940389/4705766
Summary:
objcopy --prefix-symbols=pre_string test.o to rename the symbols in .o file
or
objcopy --redefine-sym old_str=new_str test.o to rename the specific symbol in .o file.
Here's a thought. Open one of the offending libraries in a hex editor and change all occurrences of the offending strings to something else. You should then be able to use the new names in all future calls.
UPDATE: I just did it on this end and it seems to work. Of course, I've not tested this thoroughly - it may be no more than a really good way to blow your leg off with a hexedit shotgun.
You should not use them together. If I remember correctly, the linker issues an error in such a case.
I didn't try, but a solution may be with dlopen(), dlsym() and dlclose() which allow you to programmatically handle dynamic libraries. If you don't need the two functions at the same time, you could open the first library, use the first function and close the first library before using the second library/function.
Assuming that you use linux you first need to add
#include <dlfcn.h>
Declare function pointer variable in proper context, for example,
int (*alternative_server_init)(int, char **, char **);
Like Ferruccio stated in https://stackoverflow.com/a/678453/1635364 ,
load explicitly the library you want to use by executing (pick your favourite flags)
void* dlhandle;
void* sym;
dlhandle = dlopen("/home/jdoe/src/libwhatnot.so.10", RTLD_NOW|RTLD_LOCAL);
Read the address of the function you want to call later
sym = dlsym(dlhandle, "conflicting_server_init");
assign and cast as follows
alternative_server_init = (int (*)(int, char**, char**))sym;
Call in a similar way than the original. Finally, unload by executing
dlclose(dlhandle);
Swear? As far as I am aware, there isn't much you can do if you have two libraries that expose link points with the same name and you need to link against both.
This problem is the reason c++ has namespaces. There's not really a great solution in c for 2 third party libs having the same name.
If it's a dynamic object, you might be able to explicitly load the shared objects (LoadLibrary/dlopen/etc) and call it in that fashion. Alternately, if you don't need both libs at the same time in the same code, you can maybe do something with static linking (if you have the .lib/.a files).
None of these solutions apply to all projects, of course.
You should write a wrapper library around one of them.
Your wrapper library should expose symbols with unique names, and not expose the symbols of the non-unique names.
Your other option is to rename the function name in the header file, and rename the symbol in the library object archive.
Either way, to use both, it's gonna be a hack job.
The question is approaching a decade old, but there are new searches all the time...
As already answered, objcopy with the --redefine-sym flag is a good choice in Linux. See, for example, https://linux.die.net/man/1/objcopy for full documentation. It is a little clunky because you are essentially copying the entire library while making changes and every update requires this work to be repeated. But at least it should work.
For Windows, dynamically loading the library is a solution and a permanent one like the dlopen alternative in Linux would be. However both dlopen() and LoadLibrary() add extra code that can be avoided if the only issue is duplicate names. Here the Windows solution is more elegant than the objcopy approach: Just tell the linker that the symbols in a library are known by some other name and use that name. There a few steps to doing it. You need to make a def file and provide the name translation in the EXPORTS section. See https://msdn.microsoft.com/en-us/library/hyx1zcd3.aspx (VS2015, it will eventually get replaced by newer versions) or http://www.digitalmars.com/ctg/ctgDefFiles.html (probably more permanent) for full syntax details of a def file. The process would be to make a def file for one of the libraries then use this def file to build a lib file and then link with that lib file. (For Windows DLLs, lib files only are used for linking, not code execution.) See How to make a .lib file when have a .dll file and a header file for the process of building the lib file. Here the only difference is adding the aliases.
For both Linux and Windows, rename the functions in the headers of the library whose names are being aliased. Another option that should work would be, in files referring to the new names, to #define old_name new_name, #include the headers of the library whose exports are being aliased, and then #undef old_name in the caller. If there are a lot of files using the library, an easier alternative is to make a header or headers that wraps the defines, includes and undefs and then use that header.
Hope this info was helpful!
I've never used dlsym, dlopen, dlerror, dlclose, dlvsym, etc., but I'm looking at the man page, and it gives an example of opening libm.so and extracting the cos function. Does dlopen go through the process of looking for collisions? If it doesn't, the OP could just load both libraries manually and assign new names to all the functions his libraries provide.
If it's a builtin function.
for example, torch has range method(deprecated)and builtin has range method as well.
I was having some issues and all it took was adding __builtins__ before the function name.
range() => torch
builtins.range()

Resources