Returning a shared library symbol table - c

For instance:
void* sdl_library = dlopen("libSDL.so", RTLD_LAZY);
void* initializer = dlsym(sdl_library,"SDL_Init");
Assuming no errors, initializer will point to the function SD_Init in the shared library libSDK.so.
However this requires knowing the symbol "SDL_Init" exists.
Is it possibly to query a library for all its symbols? Eg, in this case it would return SDL_Init, the function pointer, and any other symbols exported by libSDL.so.

There is no libc function to do that. However, you can write one yourself (though the code is somewhat involved).
On Linux, dlopen() in fact returns the address of a link_map structure, which has a member named l_addr that points to the base address of the loaded shared object (assuming your system doesn't randomize shared library placement, and that your library has not been prelinked).
On Linux, a sure way to find the base address (the address of Elf*_Ehdr) is to use dl_iterate_phdr() after dlopen()ing the library.
Having the ELF header, you should be able to iterate over a list of exported symbols (the dynamic symbol table), by first locating the Elf*_Phdr of type PT_DYNAMIC, and then locating DT_SYMTAB, DT_STRTAB entries, and iterating over all symbols in the dynamic symbol table. Use /usr/include/elf.h to guide you.
Additionally, you could use libelf, but I'm unable to guide you since I don't have previous experience with it.
Finally note that the exercise is somewhat futile: you'll get a list of defined functions, but you'll have no idea how to call them (what parameters they expect), so what's the point?

I don't think there is a published API for this. You can either use the nm tool from binutils or examine its source code:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/binutils/?cvsroot=src
http://sourceware.org/cgi-bin/cvsweb.cgi/src/binutils/nm.c?rev=1.63&content-type=text/x-cvsweb-markup&cvsroot=src
(obviously assuming elf)

The Boost.DLL offers this functionality through the library_info::symbols function. Adapted from the tutorial on Querying libraries for symbols:
// Class `library_info` can extract information from a library
boost::dll::library_info inf(libpath);
// Getting exported symbols
std::vector<std::string> exports = inf.symbols();
// Printing symbols
for (std::size_t j = 0; j < exports.size(); ++j) {
std::cout << exports[j] << std::endl;
}
Note that this only works for the symbols that nm lists without the --dynamic flag, i.e., those in the .symtab section. It seems like some libraries do not export any symbols in that section. I have opened a feature request to support falling back to the .dynsym section in that case.

void *dlsym(void *restrict handle, const char *restrict name);
Return Value
If handle does not refer to a valid
object opened by dlopen(), or if the
named symbol cannot be found within
any of the objects associated with
handle, dlsym() shall return NULL.
More detailed diagnostic information
shall be available through dlerror().
( Source: http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html )
In other words, if the symbol isn't found, dlsym() will return NULL. Not sure if that's what you're looking for, but that is the simplest way I can find.

The linux nm command could be used: http://man.yolinux.com/cgi-bin/man2html?cgi_command=nm

Related

How to tell apart imported function vs imported global variable in a DLL's PE header?

I'm writing a small tool that should be able to inspect an arbitrary process of interest and check if any of its statically linked functions were trampolined. (An example of a trampoline could be what Microsoft Detours does to a process.)
For that I parse the PE header of the target process and retrieve all of its imported DLLs with all imported functions in them. Then I can compare the following between DLLs on disk and the DLLs loaded in the target process memory:
A. Entries in the Import Address Table for each imported function.
B. First N bytes of each function's machine code.
And if any of the above do not match, this will most certainly mean that a trampoline was applied to a particular function (or WinAPI.)
This works well, except of one situation when a target process can import a global variable instead of a function. For example _acmdln is such global variable. You can still find it in msvcrt.dll and use it as such:
//I'm not sure why you'd want to do it this way,
//but it will give you the current command line.
//So just to prove the concept ...
HMODULE hMod = ::GetModuleHandle(L"msvcrt.dll");
char* pVar = (char*)::GetProcAddress(hMod, "_acmdln");
char* pCmdLine = pVar ? *(char**)pVar : NULL;
So, what this means for my trampoline checking tool is that I need to differentiate between an imported function (WinAPI) and a global variable. Any idea how?
PS. If I don't do that, my algorithm that I described above will compare a global variable's "code bytes" as if it was a function, which is just a pointer to a command line that will most certainly be different, and then flag it as a trampolined function.
PS2. Not exactly my code, but a similar way to parse PE header can be found here. (Search for DumpImports function for extracting DLL imports.)
Global variables will be in the .data section not the .text section, in addition the section will not have execute permissions if it's not a function. Therefore you can use both of these characteristics to filter.

Is there a way in C to have the compiler/linker give an error if a function is not defined?

In my case I am writing a simple plugin system in C using dlfcn.h (linux). The plugins are compiled separately from the main program and result in a bunch of .so files.
There are certain functions that must be defined in the plugin in order for the the plugin to be called properly by the main program. Ideally I would like each plugin to have included in it a .h file or something that somehow states what functions a valid plugin must have, if these functions are not defined in the plugin I would like the plugin to fail compilation.
I don't think you can enforce that a function be defined at compile time. However, if you use gcc toolchain, you can use the --undefined flag when linking to enforce that a symbol be defined.
ld --undefined foo
will treat foo as though it is an undefined symbol that must be defined for the linker to succeed.
You cannot do that.
It's common practice, to only define two exported functions in a library opened by dlopen(), one to import functions in your plugin and one to export functions of your plugin.
A few lines of code are better than any explanation:
struct plugin_import {
void (*draw)(float);
void (*update)(float);
};
struct plugin_export {
int (*get_version)(void);
void (*set_version)(int);
};
extern void import(struct plugin_import *);
extern void export(struct plugin_export *);
int setup(void)
{
struct plugin_export out = {0};
struct plugin_import in;
/* give the plugin our function pointers */
in.draw = &draw, in.update = &update;
import(&in);
/* get our functions out of the plugin */
export(&out);
/* verify that all functions are defined */
if (out.get_version == NULL || out.set_version == NULL)
return 1;
return 0;
}
This is very similar to the system Quake 2 used. You can look at the source here.
With the only difference, Quake 2 only exported a single function, which im- and exports the functions defined by the dynamic library at once.
Well after doing some research and asking a few people that I know of on IRC I have found the following solution:
Since I am using gcc I am able to use a linker script.
linker.script:
ASSERT(DEFINED(funcA), "must define funcA" ) ;
ASSERT(DEFINED(funcB), "must define funcB" ) ;
If either of those functions are not defined, then a custom error message will be output when the program tries to link.
(more info on linker script syntax can be found here: http://www.math.utah.edu/docs/info/ld_3.html)
When compiling simply add the linker script file after the source file:
gcc -o test main.c linker.script
Another possibility:
Something that I didn't think of (seems a bit obvious now) that was brought to my attention is you can create small program that loads your plugin and checks to see that you have valid function pointers to all of the functions that you want your plugin to have. Then incorporate this into your build system, be it a makefile or a script or whatever. This has the benefit that you are no longer limited to using a particular compiler to make this work. As well as you can do some more sophisticated checks for other other things. The only downside being you have a little more work to do to get it set up.

How can my C code find the symbol corresponding to an address at run-time (in Linux)?

Given a function or variable run-time address, my code needs to find out the name and, if it's a variable, type information of the symbol. Or at least provide enough information for later, off-line extraction of the name (and type info).
It is Linux code and it is assumed debug information is available.
I tried to look into the ELF file format, binutils and all but the subject is huge, so I was hoping somebody can help me narrow the scope.
I can see the following types of solutions:
find the range of the code/data segments of the modules currently loaded in memory - HOW TO DO THAT ?. Save the address's module and segment name and offset in it's segment. Off-line then use binutils to find the symbol in the module's debug info - again, HOW TO DO THAT?
use some API/system services I do not know of to find the symbol and info at run-time - HOW?
Thank you in advance.
GNU libc provides a dladdr function for this exact purpose. However, it only works on functions, not variables.
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <dlfcn.h>
int dladdr(void *addr, Dl_info *info);
The function dladdr() takes a function pointer and tries to resolve
name and file where it is located. Information is stored in the
Dl_info structure:
typedef struct {
const char *dli_fname; /* Pathname of shared object that
contains address */
void *dli_fbase; /* Address at which shared object
is loaded */
const char *dli_sname; /* Name of symbol whose definition
overlaps addr */
void *dli_saddr; /* Exact address of symbol named
in dli_sname */
} Dl_info;
If no symbol matching addr could be found, then dli_sname and dli_saddr
are set to NULL.
dladdr() returns 0 on error, and nonzero on success.
Of course, usually I do this sort of thing from gdb, not within the program itself.
What you want to look at is the Binary File Descriptor library specifically the symbol handling functions. libbfd provides a common set of functions for manipulating and reading various object formats. It does this by providing an abstract view of object files and then has specific back ends to handle the details of specific object types and architectures. ELF file formats are supported as is most likely the architecture you want to use.
I don't find libbfd difficult to use but I am always open to alternatives and libelf is another one. You will probably want to look at the gelf_getsym function specifically.
C is a fully-compiled language. The names and types and other info about variables are generally discarded in the compilation process.
An exception is that most compilers will produce an executable with debugging information included, so that a live debugger has access to this information. This info is totally OS-specific, and even compiler-specific, and might even be in parts of memory not accessible to the program.

Will a referenced library function be linked if it is not called?

Suppose we have the following iterface to a library:
// my_interface.h
typedef float (*myfunc)(float);
myfunc get_the_func();
And suppose that the implementation is as follows:
// my_impl.c
#include <math.h>
myfunc get_the_func() {
return sinf;
}
And now suppose the client code does the following:
#include "my_interface.h"
...
myfunc func = get_the_func();
printf("%f\n", func(0.0f));
Does the standard guarantee that get_the_function() will return the address of the standard math library sinf()? If so, where in the standard is this implied?
Please note the sinf() is not explicitly called anywhere.
The standard does not require external functions to be called: being referenced is enough to be kept in the results of translation. According to the standard's section 5.1.1.2.1, during the eights (and last) phase of translation
All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation.
Since returning a pointer to a function is considered a reference to it, the standard guarantees that sinf will be linked to whatever implementation that you supply to the linker, which may or may not be the one coming from the standard math library.
The C standard doesn't guarantee that get_the_function() will return the address of the sinf function in the math library. The C standard guarantees that whatever the pointer is returned will be callable as if you called the sinf function and it will compare equal to any other pointer to that function.
On some architectures you might get a pointer to a descriptor of the function which the compiler handles specially and I've also seen function pointer values pointing to a dynamic linking trampoline and not the function itself.
I recall a discussion on the gcc mailing lists around 10 years ago if a function pointer cast to (void *) actually needs to compare equal to another function pointer cast to (void *) which was in connection with how IA64 implements function pointers (they could point to different descriptor structs in different libraries, which meant that to correctly implement compares the compiler would have to compare the contents of the descriptors and not the pointers themselves). But I don't remember what the standards lawyers decided and/or if this was solved in the linker.
Okay so NOWHERE in the standard does it say that just because you #include <math.h> that you will also link to the standard math library ... As a matter of fact that DOES NOT happen.
You HAVE to link to the libm in order to use the standard math library, as in:
cc -o foo foo.c -lm
^^^^
The marked option is actually for the linker step, without it there is no linkage irrespective of whether you a function in a library as a return value or whether you use it to actually call the function.
The external symbols are resolved by either explicitly specifying the archives/objects/libraries or in case of systems / environments lazy linking during runtime through dynamic linking.
On environments that support dynamic linking, weak linking, lazy linking, etc. there is no guarantee that references will ever be resolved. For the resolution the execution path has to be traversed.
Let's say it is. Still, your client needs to provide a linkage path for resolving sinf as well either when they are linking or during runtime for environments that support it.
the point:
The client user has the ability to use any way to resolve the symbol to an address that they see fit; So, in fact, there is no way to guarantee that your client will link to the standard library and that it will resolve to the system library sinf. The only thing you know is that if that path is executed; it will either result in an address that can be looked up using the sinf link or it will crash unceremoniously for environments that don't have to resolve the symbols at link time.
update/clarification:
To clarify, if sinf is used as a variable; then it needs to be resolved but there is still NO guarantee that when the client code resolves the symbol when they go through their linking step that they will resolve it against the math library. If we are talking about GUARANTEES that is.
Now practically speaking, if the client links against the standard math library and doesn't pull of any of the overrides that they can (which I pointed out above) then yes that symbol will get resolved to require a linkage against the standard library (either static or dynamic)
My original answer is a bit ehem "prissy" for which I apologize because we were talking about standards and guarantees thus the rantish nature. There is nothing for example stopping the client simply doing something like this:
foo.c:
#include "my_interface.h"
...
myfunc func = get_the_func();
printf("%f\n", func(0.0f));
first pass compile:
cc -o foo foo.c
get an error that says sinf is unresolved and so the client edits her source file:
foo.c:
#include "my_interface.h"
...
void * sinef = NULL;
myfunc func = get_the_func();
printf("%f\n", func(0.0f));
and now you have a fully resolved but nicely crashing program;

How to create modules in C

I have an interface with which I want to be able to statically link modules. For example, I want to be able to call all functions (albeit in seperate files) called FOO or that match a certain prototype, ultimately make a call into a function in the file without a header in the other files. Dont say that it is impossible since I found a hack that can do it, but I want a non hacked method. (The hack is to use nm to get functions and their prototypes then I can dynamically call the function). Also, I know you can do this with dynamic linking, however, I want to statically link the files. Any ideas?
Put a table of all functions into each translation unit:
struct functions MOD1FUNCS[]={
{"FOO", foo},
{"BAR", bar},
{0, 0}
};
Then put a table into the main program listing all these tables:
struct functions* ALLFUNCS[]={
MOD1FUNCS,
MOD2FUNCS,
0
};
Then, at run time, search through the tables, and lookup the corresponding function pointer.
This is somewhat common in writing test code. e.g., you want to call all functions that start with test_. So you have a shell script that grep's through all your .C files and pulls out the function names that match test_.*. Then that script generates a test.c file that contains a function that calls all the test functions.
e.g., generated program would look like:
int main() {
initTestCode();
testA();
testB();
testC();
}
Another way to do it would be to use some linker tricks. This is what the Linux kernel does for its initialization. Functions that are init code are marked with the qualifier __init. This is defined in linux/init.h as follows:
#define __init __section(.init.text) __cold notrace
This causes the linker to put that function in the section .init.text. The kernel will reclaim memory from that section after the system boots.
For calling the functions, each module will declare an initcall function with some other macros core_initcall(func), arch_initcall(func), et cetera (also defined in linux/init.h). These macros put a pointer to the function into a linker section called .initcall.
At boot-time, the kernel will "walk" through the .initcall section calling all of the pointers there. The code that walks through looks like this:
extern initcall_t __initcall_start[], __initcall_end[], __early_initcall_end[];
static void __init do_initcalls(void)
{
initcall_t *fn;
for (fn = __early_initcall_end; fn < __initcall_end; fn++)
do_one_initcall(*fn);
/* Make sure there is no pending stuff from the initcall sequence */
flush_scheduled_work();
}
The symbols __initcall_start, __initcall_end, etc. get defined in the linker script.
In general, the Linux kernel does some of the cleverest tricks with the GCC pre-processor, compiler and linker that are possible. It's always been a great reference for C tricks.
You really need static linking and, at the same time, to select all matching functions at runtime, right? Because the latter is a typical case for dynamic linking, i'd say.
You obviusly need some mechanism to register the available functions. Dynamic linking would provide just this.
I really don't think you can do it. C isn't exactly capable of late-binding or the sort of introspection you seem to be requiring.
Although I don't really understand your question. Do you want the features of dynamically linked libraries while statically linking? Because that doesn't make sense to me... to static link, you need to already have the binary in hand, which would make dynamic loading of functions a waste of time, even if you could easily do it.

Resources