I have tested such a simple program below
/* a shared library */
dispatch_write_hello(void)
{
fprintf(stderr, "hello\n");
}
extern void
print_hello(void)
{
dispatch_write_hello();
}
My main program is like this:
extern void
dispatch_write_hello(void)
{
fprintf(stderr, "overridden\n");
}
int
main(int argc, char **argv)
{
print_hello();
return 0;
}
The result of the program is "overriden". To figure this out why this happens, I used gdb. The call chain is like this:
_dl_runtime_resolve -> _dl_fixup ->_dl_lookup_symbol_x
I found the definition of _dl_lookup_symbol_x in glibc is
Search loaded objects' symbol tables for a definition of the symbol UNDEF_NAME, perhaps with a requested version for the symbol
So I think when trying to find the symbol dispatch_write_hello, it first of all looks up in the main object file, and then in the shared library. This is the cause of this problem. Is my understanding right? Many thanks for your time.
Given that you mention _dl_runtime_resolve, I assume that you're on Linux system (thanks #Olaf for clarifying this).
A short answer to your question - yes, during symbols interposition dynamic linker will first look inside the executable and only then scan shared libraries. So definition of dispatch_write_hello will prevail.
EDIT
In case you wonder why runtime linker needs to resolve the call to dispatch_write_hello in print_hello to anything besides dispatch_write_hello in the same translation unit - this is caused by so called semantic interposition support in GCC. By default compiler treats any call inside library code (i.e. code compiled with -fPIC) as potentially interposable at runtime unless you specifically tell it not to, via -fvisibility-hidden, -Wl,-Bsymbolic, -fno-semantic-interposition or __attribute__((visibility("hidden"))). This has been discussed on the net many times e.g. in the infamous Sorry state of dynamic libraries on Linux.
As a side note, this feature incures significant performance penalty compared to other compilers (Clang, Visual Studio).
Related
I'm investigating the topic of shared libraries. The way I understood it, when linking a source file with a shared library to form an executable, unresolved symbols will remain unresolved until their first call, then lazy binding will resolve them. Based on that, I assumed that using a function that wasn't defined anywhere won't throw linker error, as it will leave the resolving job to the dynamic linker. But when I typed the following commands in the terminal:
gcc -c foo.c -fPIC
gcc -shared foo.o -o libfoos.so
gcc main.c -Wl,-rpath=. libfoos.so
I got an "undefined reference to 'foo2' " error.
This was all done with the following files in the same directory:
foo.h:
#ifndef __FOO_H__
#define __FOO_H__
int foo(int num);
#endif /* __FOO_H__ */
main.c:
#include <stdio.h>
#include "foo.h"
int main()
{
int a = 5;
printf("%d * %d = %d\n", a, a, foo(a));
printf("%d + %d = %d\n", a, a, foo2(a));
return (0);
}
and foo.c:
#include "foo.h"
int foo(int num)
{
return (num * num);
}
So my questions are:
Is it true that symbols remain unresolved until they are called for the first time? If so, then how come I'm getting an error at linking time?
I'm guessing that maybe some check needs to be made as for the very existence of the symbols (foo and foo2 my example) in the shared library, already at linking time. If so, then why not resolving them already at the same time, since we're accessing some information in the library anyway?
Thanks!
Is it true that symbols remain unresolved until they are called for the first time?
I think you may be confusing the requirements and semantics of the source language (C) with the execution semantics of dynamic shared object formats and implementations, such as ELF.
The C language does not specify when symbols are resolved, only that there must be a definition for each identifier that is used to access an object or call a function.
Different DSO formats have different properties in and around this. With ELF, for example, resolution of dynamic symbols can be deferred until the symbol is first referenced, or it can be performed immediately upon loading the DSO. This is configurable both at runtime and at compile time. The semantics of other DSO formats may be different in this and other regards.
Bottom line: no, it is not necessarily true that dynamic symbols are resolved only when they are first referenced, but that might be the default for your particular implementation and environment.
If so, then how come I'm getting an error at
linking time?
The linker is checking the C language requirements at build time. It is perfectly reasonable and in fact desirable for it to do so even when building shared objects, for if there is an unresolvable symbol used then one would like to know about the problem and fix it before people try to use the program. This is not related to whether dynamic symbol resolution is deferred at runtime.
I'm guessing that maybe some check needs to be made as for the very existence of the symbols (foo and foo2 my example) in the shared
library, already at linking time.
Yes, that's basically it.
If so, then why not resolving them
already at the same time, since we're accessing some information in
the library anyway?
How do you know that doesn't happen?
In a DSO system that does not feature symbol relocation, that can be done and is done. The dynamism in such a DSO system is primarily in whether a given library is loaded at all. DSOs in such a system have fixed load addresses and the symbols exported from them also have fixed addresses. This allows executables to be (much) smaller and for system memory to be used (much) more efficiently, relative to statically-linked executables.
But there are big practical problems with such an approach. For example, you have to contend with address-space collisions between different DSOs, updating DSOs is difficult and risky, and having well-known addresses is a security risk. Therefore, most modern DSO systems feature symbol relocation. In such a system, DSOs' load addresses are determined dynamically, at runtime, and typically even the relative offsets represented by their exported symbols are not fixed. This is the kind of DSO system that supports deferred symbol resolution, and with such a system, symbols from other DSOs cannot be resolved at build time because they are not known until run time, and they might even vary from run to run.
We have a software project with real-time constraints largely written in C++ but making use of a number of C libraries, running in a POSIX operating system. To satisfy real-time constraints, we have moved almost all of our text logging off of stderr pipe and into shared memory ring buffers.
The problem we have now is that when old code, or a C library, calls assert, the message ends up in stderr and not in our ring buffers with the rest of the logs. We'd like to find a way to redirect the output of assert.
There are three basic approaches here that I have considered:
1.) Make our own assert macro -- basically, don't use #include <cassert>, give our own definition for assert. This would work but it would be prohibitively difficult to patch all of the libraries that we are using that call assert to include a different header.
2.) Patch libc -- modify the libc implementation of __assert_fail. This would work, but it would be really awkward in practice because this would mean that we can't build libc without building our logging infra. We could make it so that at run-time, we can pass a function pointer to libc that is the "assert handler" -- that's something that we could consider. The question is if there is a simpler / less intrusive solution than this.
3.) Patch libc header so that __assert_fail is marked with __attribute__((weak)). This means that we can override it at link-time with a custom implementation, but if our custom implementation isn't linked in, then we link to the regular libc implementation. Actually I was hoping that this function already would be marked with __attribute__((weak)) and I was surprised to find that it isn't apparently.
My main question is: What are the possible downsides of option (3) -- patching libc so that this line: https://github.com/lattera/glibc/blob/master/assert/assert.h#L67
extern void __assert_fail (const char *__assertion, const char *__file,
unsigned int __line, const char *__function)
__THROW __attribute__ ((__noreturn__));
is marked with __attribute__((weak)) as well ?
Is there a good reason I didn't think of that the maintainers didn't already do this?
How could any existing program that is currently linking and running successfully against libc break after I patch the header in this way? It can't happen, right?
Is there a significant run-time cost to using weak-linking symbols here for some reason? libc is already a shared library for us, and I would think the cost of dynamic linking should swamp any case analysis regarding weak vs. strong resolution that the system has to do at load time?
Is there a simpler / more elegant approach here that I didn't think of?
Some functions in glibc, particularly, strtod and malloc, are marked with a special gcc attribute __attribute__((weak)). This is a linker directive -- it tells gcc that these symbols should be marked as "weak symbols", which means that if two versions of the symbol are found at link time, the "strong" one is chosen over the weak one.
The motivation for this is described on wikipedia:
Use cases
Weak symbols can be used as a mechanism to provide default implementations of functions that can be replaced by more specialized (e.g. optimized) ones at link-time. The default implementation is then declared as weak, and, on certain targets, object files with strongly declared symbols are added to the linker command line.
If a library defines a symbol as weak, a program that links that library is free to provide a strong one for, say, customization purposes.
Another use case for weak symbols is the maintenance of binary backward compatibility.
However, in both glibc and musl libc, it appears to me that the __assert_fail function (to which the assert.h macro forwards) is not marked as a weak symbol.
https://github.com/lattera/glibc/blob/master/assert/assert.h
https://github.com/lattera/glibc/blob/master/assert/assert.c
https://github.com/cloudius-systems/musl/blob/master/include/assert.h
You don't need attribute((weak)) on symbol __assert_fail from glibc. Just write your own implementation of __assert_fail in your program, and the linker should use your implementation, for example:
#include <stdio.h>
#include <assert.h>
void __assert_fail(const char * assertion, const char * file, unsigned int line, const char * function)
{
fprintf(stderr, "My custom message\n");
abort();
}
int main()
{
assert(0);
printf("Hello World");
return 0;
}
That's because when resolving symbols by the linker the __assert_fail symbol will already be defined by your program, so the linker shouldn't pick the symbol defined by libc.
If you really need __assert_fail to be defined as a weak symbol inside libc, why not just objcopy --weaken-symbol=__assert_fail /lib/libc.so /lib/libc_with_weak_assert_fail.so. I don't think you need to rebuild libc from sources for that.
If I were you, I would probably opt for opening a pipe(2) and fdopen(2)'ing stderr to take the write end of that pipe. I'd service the read end of the pipe as part of the main poll(2) loop (or whatever the equivalent is in your system) and write the contents to the ring buffer.
This is obviously slower to handle actual output, but from your write-up, such output is rare, so the impact ought to be negligable (especially if you already have a poll or select this fd can piggyback on).
It seems to me that tweaking libc or relying on side-effects of the tools might break in the future and will be a pain to debug. I'd go for the guaranteed-safe mechanism and pay the performance price if at all possible.
I suspect this is an issue with my understanding of how the linking of shared objects takes place on Linux.
Using Valgrind with OpenCL (which, from various other posts, I understand to be problematic in its own right), I'm encountering errors from a module that is part of the shared object but is never actually run.
Specifically, I have an OpenCL helper module that has a series of functions for performing OpenCL actions. I have removed all references to the functions within this module from the executing code. I would naively assume, therefore, that this module cannot raise any problems with Valgrind.
However, I am seeing issues that are raised through _dl_init.c (lots of them, showing how broken OpenCL is with Valgrind). This suggests to me that code within the OpenCL runtime is being executed at link time.
Can someone clarify (or point me to suitable clarification material) how _dl_init.c is involved in the linker process?
Is it universally true that the .so files execute some initialisation code, or is it a library option?
Is this something that is easily accessible to library writers, or does it involved nefarious hacks?
Shared objects (.so files) are permitted to have code that is executed as soon as the library is loaded, regardless of the use of any of the code in the library.
This is used, for example, to perform static initialization of C++ objects.
If you don't want to have valgrind complaining about things that are being done in the library behind your back, then you can run valgrind so that it generates output that can be used as a suppression file by passing in --gen-suppressions=all. You use the suppression output in a suppression file of your own when running against the library and it should mask out these issues.
Cases when it appears:
If you're using C++ code and have globally scoped objects, then their constructors are called when the library is loaded
If you add the gcc specific attribute((constructor)) to a function definition it gets called when the library is loaded
e.g. (C++) at a global scope:
#include <iostream>
class demo {
demo() { std::cout << "Hello World" << std::endl; }
~demo() {}
};
demo ademo;
e.g. (C)
#include <stdio.h>
static void __attribute__((constructor)) my_init() {
printf("Hello World\n");
}
There is a corresponding event for object destruction or __attribute__((destructor)) on library unloading.
The ELF spec defines the presence of an .init and .fini section, which are, for libraries, the mechanism that is used to run constructor-type and destructor-type code when the library is loaded/unloaded. For a standard executable it's this, plus the getting you to main with the appropriate parameters.
You can explicitly change these entry points, but that's a little bit dangerous and can lead to crashes and unknown bugs. It makes more sense to hook into the mentioned supported mechanisms.
The title is clear, we can loaded a library by dl_open etc..
But how can I get the signature of functions in it?
This answer cannot be answered in general. Technically if you compiled your executable with exhaustive debugging information (code may still be an optimized, release version), then the executable will contain extra sections, providing some kind of reflectivity of the binary. On *nix systems (you referred to dl_open) this is implemented through DWARF debugging data in extra sections of the ELF binary. Similar it works for Mach Universal Binaries on MacOS X.
Windows PEs however uses a completely different format, so unfortunately DWARF is not truley cross plattform (actually in the early development stages of my 3D engine I implemented an ELF/DWARF loader for Windows, so that I could use a common format for the engines various modules, so with some serious effort such can be done).
If you don't want to go into implementing your own loaders, or debugging information accessors, then you may embed the reflection information through some extra symbols exported (by some standard naming scheme) which refer to a table of function names, mapping to their signature. In the case of C source files writing a parser to extract the information from the source file itself is rather trivial. C++ OTOH is so notoriously difficult to parse correctly, that you need some fully fledged compiler to get it right. For this purpose GCCXML was developed, technically a GCC that emits the AST in XML form instead of an object binary. The emitted XML then is much easier to parse.
From the extracted information create a source file with some kind of linked list/array/etc. structure describing each function. If you don't directly export each function's symbol but instead initialize some field in the reflection structure with the function pointer you got a really nice and clean annotated exporting scheme. Technically you could place this information in a spearate section of the binary as well, but putting it in the read only data section does the job as well, too.
However if you're given a 3rd party binary – say worst case scenario it has been compiled from C source, no debugging information and all symbols not externally referenced stripped – you're pretty much screwed. The best you could do, was applying some binary analysis of the way the function accesses the various places in which parameters can be passed.
This will only tell you the number of parameters and the size of each parameter value, but not the type or name/meaning. When reverse engineering some program (e.g. malware analysis or security audit), identifying the type and meaning of the parameters passed to functions is one of the major efforts. Recently I came across some driver I had to reverse for debugging purposes, and you cannot believe how astounded I was by the fact that I found C++ symbols in a Linux kernel module (you can't use C++ in the Linux kernel in a sane way), but also relieved, because the C++ name mangling provided me with plenty information.
On Linux (or Mac) you can use a combination of "nm" and "c++filt" (for C++ libraries)
nm mylibrary.so | c++filt
or
nm mylibrary.a | c++filt
"nm" will give you the mangled form and "c++filt" attempts to put them in a more human-readable format. You might want to use some options in nm to filter down the results, especially if the library is large (or you can "grep" the final output to find a particular item)
No this is not possible. Signature of a function doesn't mean anything at runtime, its a piece of information useful at compile time for the compiler to validate your program.
You can't. Either the library publishes a public API in a header, or you need to know the signature by some other means.
The parameters of a function in the lower level depends on how many stack arguments in the stack frame you consider and how you interpret them. Therefore once the function is compiled into object code it is not possible to get the signature like that. One remote possibility is to disassemble the code and read how it function is working to know the number if parameters, but still the type would be difficult or impossible to determine. In a word, it is not possible.
This information is not available. Not even the debugger knows:
$ cat foo.c
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
char foo[10] = { 0 };
char bar[10] = { 0 };
printf("%s\n", "foo");
memcpy(bar, foo, sizeof(foo));
return 0;
}
$ gcc -g -o foo foo.c
$ gdb foo
Reading symbols from foo...done.
(gdb) b main
Breakpoint 1 at 0x4005f3: file foo.c, line 5.
(gdb) r
Starting program: foo
Breakpoint 1, main (argc=1, argv=0x7fffffffe3e8) at foo.c:5
5 {
(gdb) ptype printf
type = int ()
(gdb) ptype memcpy
type = int ()
(gdb)
Suppose I have an ELF binary that's dynamic linked, and I want to override/redirect certain library calls. I know I can do this with LD_PRELOAD, but I want a solution that's permanent in the binary, independent of the environment, and that works for setuid/setgid binaries, none of which LD_PRELOAD can achieve.
What I'd like to do is add code from additional object files (possibly in new sections, if necessary) and add the symbols from these object files to the binary's symbol table so that the newly added version of the code gets used in place of the shared library code. I believe this should be possible without actually performing any relocations in the existing code; even though they're in the same file, these should be able to be resolved at runtime in the usual PLT way (for what it's worth I only care about functions, not data).
Please don't give me answers along the line of "You don't want to do this!" or "That's not portable!" What I'm working on is a way of interfacing binaries with slightly-ABI-incompatible alternate shared-library implementations. The platform in question is i386-linux (i.e. 32-bit) if it matters. Unless I'm mistaken about what's possible, I could write some tools to parse the ELF files and perform my hacks, but I suspect there's a fancy way to use the GNU linker and other tools to accomplish this without writing new code.
I suggest the elfsh et al. tools from the ERESI (alternate) project, if you want to instrument the ELF files themselves. Compatibility with i386-linux is not a problem, as I've used it myself for the same purpose.
The relevant how-tos are here.
You could handle some of the dynamic linking in your program itself. Read the man page for dlsym(3) in particular, and dlopen(3), dlerror(3), and dlclose(3) for the rest of the dynamic linking interface.
A simple example -- say I want to override dup2(2) from libc. I could use the following code (let's call it "dltest.c"):
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <dlfcn.h>
int (*prev_dup2)(int oldfd, int newfd);
int dup2(int oldfd, int newfd) {
printf("DUP2: %d --> %d\n", oldfd, newfd);
return prev_dup2(oldfd, newfd);
}
int main(void) {
int i;
prev_dup2 = dlsym(RTLD_NEXT, "dup2");
if (!prev_dup2) {
printf("dlsym failed to find 'dup2' function!\n");
return 1;
}
if (prev_dup2 == dup2) {
printf("dlsym found our own 'dup2' function!\n");
return 1;
}
i = dup2(1,3);
if (i == -1) {
perror("dup2() failed");
}
return 0;
}
Compile with:
gcc -o dltest dltest.c -ldl
The statically linked dup2() function overrides the dup2() from the library. This works even if the function is in another .c file (and is compiled as a separate .o).
If your overriding functions are themselves dynamically linked, you may want to use dlopen() rather than trusting the linker to get the libraries in the correct order.
EDIT: I suspect that if a different function within the overridden library calls an overridden function, the original function gets called rather than the override. I don't know what will happen if one dynamic library calls another.
I don't seem to be able to just add comment to this question, so posting it as an "answer". Sorry about it, doing that just to hopefully help other folks who search an answer.
So, I seem to have similar usecase, but I explicitly find any modification to existing binaries unacceptable (for me), so I'm looking for standalone proxy approach: Proxy shared library (sharedlib, shlib, so) for ELF?