Scan shared object inclusions at runtime

Scan shared object inclusions at runtime - c

I am working on a C program (under Linux ) that relies on shared libraries as plugins.
I provide each plugin with several functions from a static library of mine. In order to change the workflow of my program, I need to know at runtime whether a plugin is going to call a certain function included from the aforementioned library.
What I need is the C equivalent of:
readelf -a ${PLUGIN_NAME} | grep ${FUNCTION_NAME}
Is that possible to exploit the <dlfcn.h> library in order to achieve that ? Needless to say, I prefer not executing the oneliner above in a system() call.
Thanks

You can try LibELF which allows you to manipulate ELF binaries (i.e. read sections). Very nice examples you can find here.

Related

When building a DLL which will be loaded using LoadLibrary, do I need to link with the dependent binaries or is including the headers enough?

I'm building a plugin (extension module) system for a language interpreter I'm writing in C.
During runtime, the program uses LoadLibrary in order to load a specified DLL file. This seems to work, with basic DLLs which don't depend on functions defined in the main program.
However, I'm trying to build a plugin DLL which does depend on functions which are defined in the main program binary.
To do so, I defined a interface.h header in the main code base, for these plugins to include and use. It defines the headers for the functions they might require. The plugin does #include <interface.h> in its header.
I compile the plugin DLL like so:
gcc myplugin.c -shared -Wl,--subsystem,windows -D MYPLUGIN_EXPORTS -o myplugin.dll -I..\main_program_headers
I then get the following kind of errors:
undefined reference to 'some function name'
Does this mean I have to compile the plugins with dynamic linking to the actual binaries they depend on in the main program?
If so, does this mean I need to keep the individual .o files of the main program around, even after linking them to the result .exe? Can GCC link directly against the .o files?
Anyway, I really hoped LoadLibrary takes care of fixing the function references on load during runtime. Doesn't it?
Update:
As #tenfour kindly answered, a DLL must follow normal linking rules and function references need to be resolved at build time. He / she suggested a system where the host program passes into the plugin pointers for the needed function.
This approach will indeed work, but I would still like to know:
What kind of build process is necessary to allow the plugin to call functions from the main app directly, without any special system at runtime (except for LoadLibrary).
I would like to say that my main source of influence here is the extension system for the CPython interpreter. It seems to me judging by its documentation that a CPython extension doesn't receive function pointers from the host interpreter, and is still able to directly call Py_* functions from it.
Any idea how such a thing can be done? How can the plugin DLL be built to support this? What and how do I need to link against?

Since you didn't post interface.h, I can only guess that you are forward-declaring functions there, for example:
int some_func();
If the plugin code tries to invoke this method, it will compile, but the linker has no reference to it. The body of that function only exists in your host application.
Instead, if you want to dynamically link, using LoadLibrary, you need to use function pointers, e.g.:
typedef int (*some_func_ptr)(); // declare the fn pointer type
...
some_func_ptr some_func = x; // assign it after the host passes you x
...
some_func(); // now you can call it without linker issues.
And viola you have used dynamic linking to create a plugin system. Of course the design of your interface will force this to be more complex but that's mostly just labor once you understand the concept.

how to make shared library an executable

I was searching for asked question. i saw this link https://hev.cc/2512.html which is doing exactly the same thing which I want. But there is no explanation of whats going on. I am also confused whether shared library with out main() can be made executable if yes how? I can guess i have to give global main() but know no details. Any further easy reference and guidance is much appreciated
I am working on x86-64 64 bit Ubuntu with kernel 3.13

This is fundamentally not sensible.
A shared library generally has no task it performs that can be used as it's equivalent of a main() function. The primary goal is to allow separate management and implementation of common code operations, and on systems that operate that way to allow a single code file to be loaded and shared, thereby reducing memory overhead for application code that uses it.
An executable file is designed to have a single point of entry from which it performs all the operations related to completing a well defined task. Different OSes have different requirements for that entry point. A shared library normally has no similar underlying function.
So in order to (usefully) convert a shared library to an executable you must also define ( and generate code for ) a task which can be started from a single entry point.
The code you linked to is starting with the source code to the library and explicitly codes a main() which it invokes via the entry point function. If you did not have the source code for a library you could, in theory, hack a new file from a shared library ( in the absence of security features to prevent this in any given OS ), but it would be an odd thing to do.
But in practical terms you would not deploy code in this manner. Instead you would code a shared library as a shared library. If you wanted to perform some task you would code a separate executable that linked to that library and code. Trying to tie the two together defeats the purpose of writing the library and distorts the structure, implementation and maintenance of that library and the application. Keep the application and the library apart.

I don't see how this is useful for anything. You could always achieve the same functionality from having a main in a separate binary that links against that library. Making a single file that works as both is solidly in the realm of "silly computer tricks". There's no benefit I can see to having a main embedded in the library, even if it's a test harness or something.
There might possible be some performance reasons, like not having function calls go through the indirection of the PLT.
In that example, the shared library is also a valid ELF executable, because it has a quick-and-dirty entry-point that grabs the args for main from where the ABI says they go (i.e. copies them from the stack into registers). It also arranges for the ELF interpreter to be set correctly. It will only work on x86-64, because no definition is provided for init_args for other platforms.
I'm surprised it actually works; I thought all the crap the usual CRT (startup) code does was actually needed for stdio to work properly. It looks like it doesn't initialize extern char **environ;, since it only gets argc and argv from the stack, not envp.
Anyway, when run as an executable, it has everything needed to be a valid dynamically-linked executable: an entry-point which runs some code and exits, an interpreter, and a dependency on libc. (ELF shared libraries can depend on (i.e. link against) other ELF shared libraries, in the same way that executables can).
When used as a library, it just works as a normal library containing some function definitions. None of the stuff that lets it work as an executable (entry point and interpreter) is even looked at.
I'm not sure why you don't get an error for multiple definitions of main, since it isn't declared as a "weak" symbol. I guess shared-lib definitions are only looked for when there's a reference to an undefined symbol. So main() from call.c is used instead of main() from libtest.so because main already has a definition before the linker looks at libtest.

To create shared Dynamic Library with Example.
Suppose with there are three files are : sum.o mul.o and print.o
Shared library name " libmno.so "
cc -shared -o libmno.so sum.o mul.o print.o
and compile with
cc main.c ./libmno.so

Is there a reliable way to know what libraries could be dlopen()ed in an elf binary?

Basically, I want to get a list of libraries a binary might load.
The unreliable way I came up with that seems to work (with possible false-positives):
comm -13 <(ldd elf_file | sed 's|\s*\([^ ]*\)\s.*|\1|'| sort -u) <(strings -a elf_file | egrep '^(|.*/)lib[^:/]*\.so(|\.[0-9]+)$' | sort -u)
This is not reliable. But it gives useful information, even if the binary was stripped.
Is there a reliable way to get this information without possible false-positives?
EDIT: More context.
Firefox is transitioning from using gstreamer to using ffmpeg.
I was wondering what versions of libavcodec.so will work.
libxul.so uses dlopen() for many optional features.
And the library names are hard-coded. So, the above command helps
in this case.
I also have a general interest in package management and binary dependencies.
I know you can get direct dependencies with readelf -d, dependencies of
dependencies with ldd. And I was wondering about optional dependencies, hence the question.

ldd tells you the libraries your binary has been linked against. These are not those that the program could open with dlopen.
The signature for dlopen is
void *dlopen(const char *filename, int flag);
So you could, still unreliably, run strings on the binary, but this could still fail if the library name is not a static string, but built or read from somewhere during program execution -- and this last situation means that the answer to your question is "no"... Not reliably. (The name of the library file could be read from the network, from a Unix socket, or even uncompressed on the fly, for example. Anything is possible! -- although I wouldn't recommend any of these ideas myself...)
edit: also, as John Bollinger mentioned, the library names could be read from a config file.
edit: you could also try substituting the dlopen system call with one of yours (this is done by the Boehm garbage collector with malloc, for example), so it would open the library, but also log its name somewhere. But if the program didn't open a specific library during execution, you still won't know about it.

(I am focusing on Linux; I guess that most of my answer fits for every POSIX systems; but on MacOSX dlopen wants .dylib dynamic library files, not .so shared objects)
A program could even emit some C code in some temporary file /tmp/foo1234.c, fork a compilation of that /tmp/foo1234.c into a shared library /tmp/foo1234.so by some gcc -O -shared -fPIC /tmp/foo1234.c -o /tmp/foo1234.so command -generated and executed at runtime of your program-, perhaps remove the /tmp/foo1234.c file -since it is not needed any more-, and dlopen that /tmp/foo1234.so (and perhaps even remove /tmp/foo1234.so after dlopen), all that in the same process. My GCC MELT plugin for gcc does exactly this, and so does Bigloo, and the GCCJIT library is doing something close.
So in general, your quest is impossible and even has no sense.
Is there a reliable way to get this information without possible false-positives?
No, there is no reliable way to get such information without false positives (you could prove that equivalent to the halting problem, or to some other undecidable problem). See also Rice's theorem.
In practice, most dlopen happens on plugins provided by some configuration. There might not be exactly named as such in a configuration file (e.g. some Foo programs might have a convention like a plugin named bar in some foo.conf configuration file is provided by foo-bar.so plugin).
However, you might find some heuristic approximation. Most programs doing some dlopen have some plugin convention requesting some particular symbol names in the plugin. You could search for shared objects defining these names. Of course you'll get false positives.
For example, the zsh shell accepts plugins called zsh modules. the example module shows that enables_,
boot_, features_ etc... functions are expected in zsh modules. You could use nm -D to find *.so files providing these (hence finding the plugins likely to be perhaps loadable by zsh)
(I am not convinced that such an approach is worthwhile; in fact you should usually know which plugins are useful on your system by which applications)
BTW, you could use strace(1) on the execution of some command to understand the syscalls it is doing, hence the plugins it is loading. You might also use ltrace(1), or pmap(1) (on some given process), or simply -for a process 1234- use cat /proc/1234/maps to understand its virtual address space, hence the plugins it has already loaded. See proc(5).
Notice that strace, ltrace, pmap exist on Linux, but many POSIX systems have similar programs.
Also, a program could generate some machine code at runtime and execute it (SBCL does that at every REPL interaction!). Your program could also use some JIT techniques (e.g. with libjit, llvm, asmjit, GCCJIT or with hand-written code...) to do likewise. So plugin-like behavior can happen without dlopen (and you might mimic dlopen with mmap calls and some ELF relocation processing).
Addenda:
If you are installing firefox from its packaged version (e.g. the iceweasel package on Debian), its package is likely to handle the dependencies

How can I "dump" a Function to a file?

For example, I have a function func():
int func (int a, int b) {return a + b;}
Now I want write it to a file, so that I can use the system-call mmap to load it with PROT_EXEC and I can call it from another program.What should I do for it?

If you know what signature you need and a static library or the location of a shared library at compile time, you probably just want to include the header and link against the output library. If you want to invoke a function dynamically, you probably want dlopen / dlsym (UNIX) or LoadLibrary / GetProcAddress (Windows) for loading the libary dynamically and retrieving the address of the function by name.
Note that the cases where you actually need to load a library dynamically (at least explicitly) are pretty rare. This is often used for modular architectures (e.g. "plugins" or "extensions") where individual pieces of the application are distributed separately (which can be achieved more securely using IPC rather than dynamic loading... see my note below). Or for cases where your application is not allowed to include dependencies statically and needs to conditionally supply behavior based on the existence of certain library dependencies in the environment in which it happens to be executing. In most cases, though, you'll simply want to include a header that declares the symbols you need and compile for each target platform (possibly using #if...#else macros if there are symbols that vary across OSes or OS versions).
From a stability, security, and code complexity standpoint, I personally recommend that you avoid dynamic library loading. For core system functionality, it's reasonable to link against a dynamic library, but you'll want to do it in a way where the burden of dynamic loading is entirely on your toolchain (i.e. you shouldn't need to call dlopen or LoadLibrary explicitly). For other functionality, it is almost always better to statically link (assuming you distribute updates when there are security fixes for your dependencies), since this will avoid you getting broken by incompatible version updates and also prevent your users from experiencing dependency hell (you require version A but some other application requires version B); modular architectures are often better (and more securely) achieved through inter-process communication (IPC), since dynamically loaded libraries live in the process of the program that loads them (thereby giving them access to the entire process's virtual memory space), whereas with interprocess-communication, each component would be a separate process, and individual components would only have access to information that was given to it explicitly by the calling process, which would make it more difficult for a malicious component to steal data from the caller or other components or to produce instability.

The sanest thing if you want this to actually be used in the real world is probably to just compile the source as part of your program on each platform, like a regular function.
Next best is probably a separate process that you talk to rather than merge with.
Semi-sane (but still not a great choice, see our discussion in the other answer) would be making the shared library, like Michael Aaron Safyan said.
But if you want to know how it works just because - say, you want to write your own dynamic linker, or are doing some kind of runtime code generation like a JIT compiler, or if you just wanna know - you can make a raw code file.
To use it, what we'd have to do is similar to what the linker does - load the code at a particular address that it is made to work on and run it. There is position independent code that can run at any address, too.
Let's first get our function compiled and linked, then output into a raw image for a certain address. Assume the function is func in the file func.c and we're using gcc on Linux. (A Windows compiler would have similar options - gcc on Windows is exactly the same, I believe, but something like Digital Mars's C compiler does it differently with the linker command being /BINARY for instance)
Anyway, here's what I ran:
gcc -c func.c # makes func.o
ld func.o --oformat=binary -e func -o func.binary
This generates a file called func.binary. You can disassemble it most easily with ndisasm -b 64 func.binary (or -b 32 if you compiled the C in 32 bit mode) to confirm it looks right - I see an add instruction there, so looks good to me.
If you loaded that and mmaped then called it... it should work.
Problems will be quick to come up though:
If there's more than one function in that file, they'll all be squished together.
The addresses they try to use to call each other may be totally wrong.
Global variables and other static data will be messed up.
And there's more. The operating system uses more complex file formats for executables and libraries for a reason!
To go to the next step, you could consider writing an ELF or PE loader which reads that metadata off a standard file. Of course, once you get into much of this, you'll be doing exactly what the OS provides with dlopen and LoadLibrary.... so unless the goal is to just learn about the guts, just call those functions and call it done!

How to relink existing shared library with extra object file

I have existing Linux shared object file (shared library) which has been stripped. I want to produce a new version of the library with some additional functions included. I had hoped that something like the following would work, but does not:
ld -o newlib.so newfuncs.o --whole-archive existinglib.so
I do not have the source to the existing library. I could get it but getting a full build environment with the necessary dependencies in place would be a lot of effort for what seems like a simple problem.

You might like to try coming at this from a slightly different angle by loading your object using preloading.
Set LD_PRELOAD to point to your new object
export LD_PRELOAD=/my/newfuncs/dir/newfuncs.o
and specify the existing library in the same way via your LD_LIBRARY_PATH.
This will then instruct the run time linker to search for needed symbols in your object before looking in objects located in your LD_LIBRARY_PATH.
BTW You can put calls in your object to then call the function that would've been called if you hadn't specified an LD_PRELOAD object or objects. This is why this is sometimes called interposing.
This is how many memory allocation analysis tools work. They interpose versions of malloc() and free() that records the calls to alloc() and free() before then calling the actual system alloc and free functions to perform the memory management.
There's plenty of tutorials on the interwebs on using LD_PRELOAD. One of the original and best is still "Building library interposers for fun and profit". Though written nine years ago and written for Solaris it is still an excellent resource.
HTH and good luck.

Completely untested idea:
# mv existinglib.so existinglib-real.so
# ld -o exlistinglib.so -shared newfuncs.o -lexistinglib-real
The dynamic linker, when loading a program that expects to load existinglib.so, will find your version, and also load existinglib-real.so on which it depends. It doesn't quite achieve the stated goal of your question, but should look as if it does to the program loading the library.

Shared libraries are not archives, they are really more like executables. You cannot therefore just stuff extra content into them in the same way you could for a .a static library.

Short answer: exactly what you asked for can't be done.
Longer answer: depending on why you want to do this, and exactly how existinglib.so has been linked, you may get close to desired behavior. In addition to LD_PRELOAD and renaming existinglib.so already mentioned, you could also use a linker script (cat /lib/libc.so to see what I mean).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight