How to relink existing shared library with extra object file - linker

I have existing Linux shared object file (shared library) which has been stripped. I want to produce a new version of the library with some additional functions included. I had hoped that something like the following would work, but does not:
ld -o newlib.so newfuncs.o --whole-archive existinglib.so
I do not have the source to the existing library. I could get it but getting a full build environment with the necessary dependencies in place would be a lot of effort for what seems like a simple problem.

You might like to try coming at this from a slightly different angle by loading your object using preloading.
Set LD_PRELOAD to point to your new object
export LD_PRELOAD=/my/newfuncs/dir/newfuncs.o
and specify the existing library in the same way via your LD_LIBRARY_PATH.
This will then instruct the run time linker to search for needed symbols in your object before looking in objects located in your LD_LIBRARY_PATH.
BTW You can put calls in your object to then call the function that would've been called if you hadn't specified an LD_PRELOAD object or objects. This is why this is sometimes called interposing.
This is how many memory allocation analysis tools work. They interpose versions of malloc() and free() that records the calls to alloc() and free() before then calling the actual system alloc and free functions to perform the memory management.
There's plenty of tutorials on the interwebs on using LD_PRELOAD. One of the original and best is still "Building library interposers for fun and profit". Though written nine years ago and written for Solaris it is still an excellent resource.
HTH and good luck.

Completely untested idea:
# mv existinglib.so existinglib-real.so
# ld -o exlistinglib.so -shared newfuncs.o -lexistinglib-real
The dynamic linker, when loading a program that expects to load existinglib.so, will find your version, and also load existinglib-real.so on which it depends. It doesn't quite achieve the stated goal of your question, but should look as if it does to the program loading the library.

Shared libraries are not archives, they are really more like executables. You cannot therefore just stuff extra content into them in the same way you could for a .a static library.

Short answer: exactly what you asked for can't be done.
Longer answer: depending on why you want to do this, and exactly how existinglib.so has been linked, you may get close to desired behavior. In addition to LD_PRELOAD and renaming existinglib.so already mentioned, you could also use a linker script (cat /lib/libc.so to see what I mean).

Related

How to circumvent dlopen() caching?

According to its man page, dlopen() will not load the same library twice:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it. Any initialization returns (see
below) are called just once. However, a subsequent dlopen() call
that loads the same shared object with RTLD_NOW may force symbol
resolution for a shared object earlier loaded with RTLD_LAZY.
(emphasis mine).
But what actually determines the identity of shared objects? I tried to look into the code, but did not come very far. Is it:
some form of normalized path name (e.g. realpath?)
the inode ?
the contents of the libray?
I am pretty sure that I can rule out this last point, since an actual filesystem copy yields two different handles.
To explain the motivation behind this question: I am working with some code that has static global variables. I need multiple instances of that code to run in a thread-safe manner. My current approach is to compile and link said code into a dynamic library and load that library multiple times. With some linker magic, it appears to create several copies of the globals and resolve access in each library to its own copies. The only problem is that my prototype copies the generated library n times for n concurrent uses. This is not only somewhat ugly but I also suspect that it might break on a different platform.
So what is the exact behaviour of dlopen() according to the POSIX standard?
edit: Because it came up in a comment and an answer, no refactoring the code is definitely not an option. It would involve months or even years of work and potentially sacrifice all benefits of using the code in the first place. There exists an ongoing research project that might solve this problem in a much cleaner way, but it is actual research and might fail. I need a solution now.
edit2: Because people still seem to not believe the usecase is actually valid. I am working on a pure functional language, that shall be embedded into a larger C/C++ application. Because I need a prototype with a garbage collector, a proven typechecker, and reasonable performance ASAP, I used OCaml as intermediate code. Right now, I am compiling a source module into an OCaml module, link the generated object code (including startup etc.) into a shared library with the OCaml runtime and dlopen() that shared library. Every .so has its own copy of the runtime, including several global variabels (e.g. the pointer to the young generation) and that is, or rather should be, totally fine. The library exposes exactly two functions: An initializer and a single export that does whatever the original module is intended to do. No symbols of the OCaml runtime are exported/shared. when I load the library, its internal symbols are relocated as expected, the only issue I have right now is that I actually need to copy the .so file for each instance of the job at runtime.
Regarding thread-local-storage: That is actually an interesting idea, as the modification to the runtime is indeed rather simple. But the problem is the machine code generated by the OCaml compiler, as it cannot emit loading instructions for tls symbols (yet?).
POSIX says:
Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.
So the answer is "inode". Copying the library file "should work", but hard links won't. Except. Since they will expose the same global symbols and when that happens all (portability) bets are off. You're in the middle of weakly defined behavior that has evolved through bug fixes rather than good design.
Don't dig deeper when you're in a hole. The approach to add additional horrible hacks to make a fundamentally broken library work just leads to additional breakage. Just spend a few hours to fix the library to not use globals instead of spending days to hack around dynamic linking (which will be unportable at best).

How can I "dump" a Function to a file?

For example, I have a function func():
int func (int a, int b) {return a + b;}
Now I want write it to a file, so that I can use the system-call mmap to load it with PROT_EXEC and I can call it from another program.What should I do for it?
If you know what signature you need and a static library or the location of a shared library at compile time, you probably just want to include the header and link against the output library. If you want to invoke a function dynamically, you probably want dlopen / dlsym (UNIX) or LoadLibrary / GetProcAddress (Windows) for loading the libary dynamically and retrieving the address of the function by name.
Note that the cases where you actually need to load a library dynamically (at least explicitly) are pretty rare. This is often used for modular architectures (e.g. "plugins" or "extensions") where individual pieces of the application are distributed separately (which can be achieved more securely using IPC rather than dynamic loading... see my note below). Or for cases where your application is not allowed to include dependencies statically and needs to conditionally supply behavior based on the existence of certain library dependencies in the environment in which it happens to be executing. In most cases, though, you'll simply want to include a header that declares the symbols you need and compile for each target platform (possibly using #if...#else macros if there are symbols that vary across OSes or OS versions).
From a stability, security, and code complexity standpoint, I personally recommend that you avoid dynamic library loading. For core system functionality, it's reasonable to link against a dynamic library, but you'll want to do it in a way where the burden of dynamic loading is entirely on your toolchain (i.e. you shouldn't need to call dlopen or LoadLibrary explicitly). For other functionality, it is almost always better to statically link (assuming you distribute updates when there are security fixes for your dependencies), since this will avoid you getting broken by incompatible version updates and also prevent your users from experiencing dependency hell (you require version A but some other application requires version B); modular architectures are often better (and more securely) achieved through inter-process communication (IPC), since dynamically loaded libraries live in the process of the program that loads them (thereby giving them access to the entire process's virtual memory space), whereas with interprocess-communication, each component would be a separate process, and individual components would only have access to information that was given to it explicitly by the calling process, which would make it more difficult for a malicious component to steal data from the caller or other components or to produce instability.
The sanest thing if you want this to actually be used in the real world is probably to just compile the source as part of your program on each platform, like a regular function.
Next best is probably a separate process that you talk to rather than merge with.
Semi-sane (but still not a great choice, see our discussion in the other answer) would be making the shared library, like Michael Aaron Safyan said.
But if you want to know how it works just because - say, you want to write your own dynamic linker, or are doing some kind of runtime code generation like a JIT compiler, or if you just wanna know - you can make a raw code file.
To use it, what we'd have to do is similar to what the linker does - load the code at a particular address that it is made to work on and run it. There is position independent code that can run at any address, too.
Let's first get our function compiled and linked, then output into a raw image for a certain address. Assume the function is func in the file func.c and we're using gcc on Linux. (A Windows compiler would have similar options - gcc on Windows is exactly the same, I believe, but something like Digital Mars's C compiler does it differently with the linker command being /BINARY for instance)
Anyway, here's what I ran:
gcc -c func.c # makes func.o
ld func.o --oformat=binary -e func -o func.binary
This generates a file called func.binary. You can disassemble it most easily with ndisasm -b 64 func.binary (or -b 32 if you compiled the C in 32 bit mode) to confirm it looks right - I see an add instruction there, so looks good to me.
If you loaded that and mmaped then called it... it should work.
Problems will be quick to come up though:
If there's more than one function in that file, they'll all be squished together.
The addresses they try to use to call each other may be totally wrong.
Global variables and other static data will be messed up.
And there's more. The operating system uses more complex file formats for executables and libraries for a reason!
To go to the next step, you could consider writing an ELF or PE loader which reads that metadata off a standard file. Of course, once you get into much of this, you'll be doing exactly what the OS provides with dlopen and LoadLibrary.... so unless the goal is to just learn about the guts, just call those functions and call it done!

Linking two object files with a common symbol

If i have two object files both defining a symbol (function) "foobar".
Is it possible to tell the linker to obey the obj file order i give in the command line call and always take the symbol from first file and never the later one?
AFAIK the "weak" pragma works only on shared libraries but not on object files.
Please answer for all the C/C++ compiler/linker/operating system combinations you know cause i'm flexibel and use a lot of compiles (sun studio, intel, msvc, gcc, acc).
I believe that you will need to create a static library from the second object file, and then link the first object file and then the library. If a symbol is resolved by an object file, the linker will not search the libraries for it.
Alternatively place both object files in separate static libraries, and then the link order will be determined by their occurrence in the command line.
Creating a static library from an object file will vary depending on the tool chain. In GCC use the ar utility, and for MSVC lib.exe (or use the static library project wizard).
There is a danger here, the keyword here is called Interpositioning dependant code.
Let me give you an example here:
Supposing you have written a custom routine called malloc. And you link in the standard libraries, what will happen is this, functions that require the usage of malloc (the standard function) will use your custom version and the end result is the code may become unstable as the unintended side effect and something will appear 'broken'.
This is just something to bear in mind.
As in your case, you could 'overwrite' (I use quotes to emphasize) the other function but then how would you know which foobar is being used? This could lead to debugging grief when trying to figure out which foobar is called.
Hope this helps,
Best regards,
Tom.
You can make it as a .a file... Then the compiler gets the symbol and doesn't crib later

How do linkers decide what parts of libraries to include?

Assume library A has a() and b(). If I link my program B with A and call a(), does b() get included in the binary? Does the compiler see if any function in the program call b() (perhaps a() calls b() or another lib calls b())? If so, how does the compiler get this information? If not, isn't this a big waste of final compile size if I'm linking to a big library but only using a minor feature?
Take a look at link-time optimization. This is necessarily vendor dependent. It will also depend how you build your binaries. MS compilers (2005 onwards at least) provide something called Function Level Linking -- which is another way of stripping symbols you don't need. This post explains how the same can be achieved with GCC (this is old, GCC must've moved on but the content is relevant to your question).
Also take a look at the LLVM implementation (and the examples section).
I suggest you also take a look at Linkers and Loaders by John Levine -- an excellent read.
It depends.
If the library is a shared object or DLL, then everything in the library is loaded, but at run time. The cost in extra memory is (hopefully) offset by sharing the library (really, the code pages) between all the processes in memory that use that library. This is a big win for something like libc.so, less so for myreallyobscurelibrary.so. But you probably aren't asking about shared objects, really.
Static libraries are a simply a collection of individual object files, each the result of a separate compilation (or assembly), and possibly not even written in the same source language. Each object file has a number of exported symbols, and almost always a number of imported symbols.
The linker's job is to create a finished executable that has no remaining undefined imported symbols. (I'm lying, of course, if dynamic linking is allowed, but bear with me.) To do that, it starts with the modules named explicitly on the link command line (and possibly implicitly in its configuration) and assumes that any module named explicitly must be part of the finished executable. It then attempts to find definitions for all of the undefined symbols.
Usually, the named object modules expect to get symbols from some library such as libc.a.
In your example, you have a single module that calls the function a(), which will result in the linker looking for module that exports a().
You say that the library named A (on unix, probably libA.a) offers a() and b(), but you don't specify how. You implied that a() and b() do not call each other, which I will assume.
If libA.a was built from a.o and b.o where each defines the corresponding single function, then the linker will include a.o and ignore b.o.
However, if libA.a included ab.o that defined both a() and b() then it will include ab.o in the link, satisfying the need for a(), and including the unused function b().
As others have mentioned, there are linkers that are capable of splitting individual functions out of modules, and including only those that are actually used. In many cases, that is a safe thing to do. But it is usually safest to assume that your linker does not do that unless you have specific documentation.
Something else to be aware of is that most linkers make as few passes as they can through the files and libraries that are named on the command line, and build up their symbol table as they go. As a practical matter, this means that it is good practice to always specify libraries after all of the object modules on the link command line.
It depends on the linker.
eg. Microsoft Visual C++ has an option "Enable function level linking" so you can enable it manually.
(I assume they have a reason for not just enabling it all the time...maybe linking is slower or something)
Usually (static) libraries are composed of objects created from source files. What linkers usually do is include the object if a function that is provided by that object is referenced. if your source file only contains one function than only that function will be brought in by the linker. There are more sophisticated linkers out there but most C based linkers still work like outlined. There are tools available that split C source that contain multiple functions into artificially smaller source files to make static linking more fine granular.
If you are using shared libraries then you don't impact you compiled size by using more or less of them. However your runtime size will include them.
This lecture at Academic Earth gives a pretty good overview, linking is talked about near the later half of the talk, IIRC.
Without any optimization, yes, it'll be included. The linker, however, might be able to optimize out by statically analyzing the code and trying to remove unreachable code.
It depends on the linker, but in general only functions that are actually called get included in the final executable. The linker works by looking up the function name in the library and then using the code associated with the name.
There are very few books on linkers, which is strange when you think how important they are. The text for a good one can be found here.
It depends on the options passed to the linker, but typically the linker will leave out the object files in a library that are not referenced anywhere.
$ cat foo.c
int main(){}
$ gcc -static foo.c
$ size
text data bss dec hex filename
452659 1928 6880 461467 70a9b a.out
# force linking of libz.a even though it isn't used
$ gcc -static foo.c -Wl,-whole-archive -lz -Wl,-no-whole-archive
$ size
text data bss dec hex filename
517951 2180 6844 526975 80a7f a.out
It depends on the linker and how the library was built. Usually libraries are a combination of object files (import libraries are a major exception to this). Older linkers would pull things into the output file image at a granularity of the object files that were put into the library. So if function a() and function b() were both in the same object file, they would both be in the output file - even if only one of the 2 functions were actually referenced.
This is a reason why you'll often see library-oriented projects with a policy of a single C function per source file. That way each function is packaged in its own object file and linkers have no problem pulling in only what is referenced.
Note however that newer linkers (certainly newer Microsoft linkers) have the ability to pull in only parts of object files that are referenced, so there's less of a need today to enforce a one-function-per-source-file policy - though there are reasonable arguments that that should be done anyway for maintainability.

What should I do if two libraries provide a function with the same name generating a conflict?

What should I do if I have two libraries that provide functions with equivalent names?
It is possible to rename symbols in an object file using objcopy --redefine-sym old=new file (see man objcopy).
Then just call the functions using their new names and link with the new object file.
If you control one or both: edit one to change the name and recompile Or equivalently see Ben and unknown's answers which will work without access to the source code.
If you don't control either of them you can wrap one of them up. That is compile another (statically linked!) library that does nothing except re-export all the symbols of the original except the offending one, which is reached through a wrapper with an alternate name. What a hassle.
Added later: Since qeek says he's talking about dynamic libraries, the solutions suggested by Ferruccio and mouviciel are probably best. (I seem to live in long ago days when static linkage was the default. It colors my thinking.)
Apropos the comments: By "export" I mean to make visible to modules linking to the library---equivalent to the extern keyword at file scope. How this is controlled is OS and linker dependent. And it is something I always have to look up.
Under Windows, you could use LoadLibrary() to load one of those libraries into memory and then use GetProcAddress() to get the address of each function you need to call and call the functions through a function pointer.
e.g.
HMODULE lib = LoadLibrary("foo.dll");
void *p = GetProcAddress(lib, "bar");
// cast p to the approriate function pointer type (fp) and call it
(*fp)(arg1, arg2...);
FreeLibrary(lib);
would get the address of a function named bar in foo.dll and call it.
I know Unix systems support similar functionality, but I can't think of their names.
If you have .o files there, a good answer here: https://stackoverflow.com/a/6940389/4705766
Summary:
objcopy --prefix-symbols=pre_string test.o to rename the symbols in .o file
or
objcopy --redefine-sym old_str=new_str test.o to rename the specific symbol in .o file.
Here's a thought. Open one of the offending libraries in a hex editor and change all occurrences of the offending strings to something else. You should then be able to use the new names in all future calls.
UPDATE: I just did it on this end and it seems to work. Of course, I've not tested this thoroughly - it may be no more than a really good way to blow your leg off with a hexedit shotgun.
You should not use them together. If I remember correctly, the linker issues an error in such a case.
I didn't try, but a solution may be with dlopen(), dlsym() and dlclose() which allow you to programmatically handle dynamic libraries. If you don't need the two functions at the same time, you could open the first library, use the first function and close the first library before using the second library/function.
Assuming that you use linux you first need to add
#include <dlfcn.h>
Declare function pointer variable in proper context, for example,
int (*alternative_server_init)(int, char **, char **);
Like Ferruccio stated in https://stackoverflow.com/a/678453/1635364 ,
load explicitly the library you want to use by executing (pick your favourite flags)
void* dlhandle;
void* sym;
dlhandle = dlopen("/home/jdoe/src/libwhatnot.so.10", RTLD_NOW|RTLD_LOCAL);
Read the address of the function you want to call later
sym = dlsym(dlhandle, "conflicting_server_init");
assign and cast as follows
alternative_server_init = (int (*)(int, char**, char**))sym;
Call in a similar way than the original. Finally, unload by executing
dlclose(dlhandle);
Swear? As far as I am aware, there isn't much you can do if you have two libraries that expose link points with the same name and you need to link against both.
This problem is the reason c++ has namespaces. There's not really a great solution in c for 2 third party libs having the same name.
If it's a dynamic object, you might be able to explicitly load the shared objects (LoadLibrary/dlopen/etc) and call it in that fashion. Alternately, if you don't need both libs at the same time in the same code, you can maybe do something with static linking (if you have the .lib/.a files).
None of these solutions apply to all projects, of course.
You should write a wrapper library around one of them.
Your wrapper library should expose symbols with unique names, and not expose the symbols of the non-unique names.
Your other option is to rename the function name in the header file, and rename the symbol in the library object archive.
Either way, to use both, it's gonna be a hack job.
The question is approaching a decade old, but there are new searches all the time...
As already answered, objcopy with the --redefine-sym flag is a good choice in Linux. See, for example, https://linux.die.net/man/1/objcopy for full documentation. It is a little clunky because you are essentially copying the entire library while making changes and every update requires this work to be repeated. But at least it should work.
For Windows, dynamically loading the library is a solution and a permanent one like the dlopen alternative in Linux would be. However both dlopen() and LoadLibrary() add extra code that can be avoided if the only issue is duplicate names. Here the Windows solution is more elegant than the objcopy approach: Just tell the linker that the symbols in a library are known by some other name and use that name. There a few steps to doing it. You need to make a def file and provide the name translation in the EXPORTS section. See https://msdn.microsoft.com/en-us/library/hyx1zcd3.aspx (VS2015, it will eventually get replaced by newer versions) or http://www.digitalmars.com/ctg/ctgDefFiles.html (probably more permanent) for full syntax details of a def file. The process would be to make a def file for one of the libraries then use this def file to build a lib file and then link with that lib file. (For Windows DLLs, lib files only are used for linking, not code execution.) See How to make a .lib file when have a .dll file and a header file for the process of building the lib file. Here the only difference is adding the aliases.
For both Linux and Windows, rename the functions in the headers of the library whose names are being aliased. Another option that should work would be, in files referring to the new names, to #define old_name new_name, #include the headers of the library whose exports are being aliased, and then #undef old_name in the caller. If there are a lot of files using the library, an easier alternative is to make a header or headers that wraps the defines, includes and undefs and then use that header.
Hope this info was helpful!
I've never used dlsym, dlopen, dlerror, dlclose, dlvsym, etc., but I'm looking at the man page, and it gives an example of opening libm.so and extracting the cos function. Does dlopen go through the process of looking for collisions? If it doesn't, the OP could just load both libraries manually and assign new names to all the functions his libraries provide.
If it's a builtin function.
for example, torch has range method(deprecated)and builtin has range method as well.
I was having some issues and all it took was adding __builtins__ before the function name.
range() => torch
builtins.range()

Resources