reading the environment when executing ELF IFUNC dispatch functions - c

The IFUNC mechanism in recent ELF tools on (at least) Linux allows to choose a implementation of a function at runtime. Look at the iunc attribute in the GCC documentation for more detailed description: http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
Another description of IFUNC mecanism : http://www.agner.org/optimize/blog/read.php?i=167
I would like to choose my implementation depending on the value of an environment variable. However, my experiments show me that the libc (at least the part about environment) is not yet initialized when the resolver function is run. So, the classical interfaces (extern char**environ or getenv()) do not work.
Does anybody know how to access the environment of a program in Linux at very early stage ? The environment is setup by the kernel at the execve(2) system call, so it is already somewhere (but where exactly ?) in the program address space at early initialization.
Thanks in advance
Vincent
Program to test:
#include <stdio.h>
#include <stdlib.h>
extern char** environ;
char** save_environ;
char* toto;
int saved=0;
extern int fonction ();
int fonction1 () {
return 1;
}
int fonction2 () {
return 2;
}
static typeof(fonction) * resolve_fonction (void) {
saved=1;
save_environ=environ;
toto=getenv("TOTO");
/* no way to choose between fonction1 and fonction2 with the TOTO envvar */
return fonction1;
}
int fonction () __attribute__ ((ifunc ("resolve_fonction")));
void print_saved() {
printf("saved: %dn", saved);
if (saved) {
printf("prev environ: %pn", save_environ);
printf("prev TOTO: %sn", toto);
}
}
int main() {
print_saved();
printf("main environ: %pn", environ);
printf("main environ[0]: %sn", environ[0]);
printf("main TOTO: %sn", getenv("TOTO"));
printf("main value: %dn", fonction());
return 0;
}
Compilation and execution:
$ gcc -Wall -g ifunc.c -o ifunc
$ env TOTO=ok ./ifunc
saved: 1
prev environ: (nil)
prev TOTO: (null)
main environ: 0x7fffffffe288
main environ[0]: XDG_VTNR=7
main TOTO: ok
main value: 1
$
In the resolver function, environ is NULL and getenv("TOTO") returns NULL. In the main function, the information is here.

Function Pointer
I found no way to use env in early stage legally. Resolver function runs in linker even earlier, than preinit_array functions. The only legal way to resolve this is to use function pointer and decide what function to use in function of .preinit_array section:
extern char** environ;
int(*f)();
void preinit(int argc, char **argv, char **envp) {
f = f1;
environ = envp; // actually, it is done a bit later
char *v = getenv("TOTO");
if (v && strcmp(v, "ok") == 0) {
f = f2;
}
}
__attribute__((section(".preinit_array"))) typeof(preinit) *__preinit = preinit;
ifunc & GNU ld inners
Glibc's linker ld contains a local symbol _environ and it is initialized, but it is rather hard to extract it. There is another way I found, but it is a bit tricky and rather unreliable.
At linker's entry point _start only stack is initialized. Program arguments and environmental values are sent to the process via stack. Arguments are stored in the following order:
argc, argv, argv + 1, ..., argv + argc - 1, NULL, ENV...
Linker ld shares a global symbol _dl_argv, which points to this place on the stack. With the help of it we can extract all the needful variables:
extern char** environ;
extern char **_dl_argv;
char** get_environ() {
int argc = *(int*)(_dl_argv - 1);
char **my_environ = (char**)(_dl_argv + argc + 1);
return my_environ;
}
typeof(f1) * resolve_f() {
environ = get_environ();
const char *var = getenv("TOTO");
if (var && strcmp(var, "ok") == 0) {
return f2;
}
return f1;
}
int f() __attribute__((ifunc("resolve_f")));

Related

executing init and fini

I just read about init and fini sections in ELF files and gave it a try:
#include <stdio.h>
int main(){
puts("main");
return 0;
}
void init(){
puts("init");
}
void fini(){
puts("fini");
}
If I do gcc -Wl,-init,init -Wl,-fini,fini foo.c and run the result the "init" part is not printed:
$ ./a.out
main
fini
Did the init part not run, or was it not able to print somehow?
Is there a any "official" documentation about the init/fini stuff?
man ld says:
-init=name
When creating an ELF executable or shared object, call
NAME when the executable or shared object is loaded, by
setting DT_INIT to the address of the function. By
default, the linker uses "_init" as the function to call.
Shouldn't that mean, that it would be enough to name the init function _init? (If I do gcc complains about multiple definition.)
Don't do that; let your compiler and linker fill in the sections as they see fit.
Instead, mark your functions with the appropriate function attributes, so that the compiler and linker will put them in the correct sections.
For example,
static void before_main(void) __attribute__((constructor));
static void after_main(void) __attribute__((destructor));
static void before_main(void)
{
/* This is run before main() */
}
static void after_main(void)
{
/* This is run after main() returns (or exit() is called) */
}
You can also assign a priority (say, __attribute__((constructor (300)))), an integer between 101 and 65535, inclusive, with functions having a smaller priority number run first.
Note that for illustration, I marked the functions static. That is, the functions won't be visible outside the file scope. The functions do not need to be exported symbols to be automatically called.
For testing, I suggest saving the following in a separate file, say tructor.c:
#include <unistd.h>
#include <string.h>
#include <errno.h>
static int outfd = -1;
static void wrout(const char *const string)
{
if (string && *string && outfd != -1) {
const char *p = string;
const char *const q = string + strlen(string);
while (p < q) {
ssize_t n = write(outfd, p, (size_t)(q - p));
if (n > (ssize_t)0)
p += n;
else
if (n != (ssize_t)-1 || errno != EINTR)
break;
}
}
}
void before_main(void) __attribute__((constructor (101)));
void before_main(void)
{
int saved_errno = errno;
/* This is run before main() */
outfd = dup(STDERR_FILENO);
wrout("Before main()\n");
errno = saved_errno;
}
static void after_main(void) __attribute__((destructor (65535)));
static void after_main(void)
{
int saved_errno = errno;
/* This is run after main() returns (or exit() is called) */
wrout("After main()\n");
errno = saved_errno;
}
so you can compile and link it as part of any program or library. To compile it as a shared library, use e.g.
gcc -Wall -Wextra -fPIC -shared tructor.c -Wl,-soname,libtructor.so -o libtructor.so
and you can interpose it into any dynamically linked command or binary using
LD_PRELOAD=./libtructor.so some-command-or-binary
The functions keep errno unchanged, although it should not matter in practice, and use the low-level write() syscall to output the messages to standard error. The initial standard error is duplicated to a new descriptor, because in many instances, the standard error itself gets closed before the last global destructor -- our destructor here -- gets run.
(Some paranoid binaries, typically security sensitive ones, close all descriptors they don't know about, so you might not see the After main() message in all cases.)
It is not a bug in ld but in the glibc startup code for the main executable. For shared objects the function set by the -init option is called.
This is the commit to ld adding the options -init and -fini.
The _init function of the program isn't called from file glibc-2.21/elf/dl-init.c:58 by the DT_INIT entry by the dynamic linker, but called from __libc_csu_init in file glibc-2.21/csu/elf-init.c:83 by the main executable.
That is, the function pointer in DT_INIT of the program is ignored by the startup.
If you compile with -static, fini isn't called, too.
DT_INIT and DT_FINI should definitely not be used, because they are old-style, see line 255.
The following works:
#include <stdio.h>
static void preinit(int argc, char **argv, char **envp) {
puts(__FUNCTION__);
}
static void init(int argc, char **argv, char **envp) {
puts(__FUNCTION__);
}
static void fini(void) {
puts(__FUNCTION__);
}
__attribute__((section(".preinit_array"), used)) static typeof(preinit) *preinit_p = preinit;
__attribute__((section(".init_array"), used)) static typeof(init) *init_p = init;
__attribute__((section(".fini_array"), used)) static typeof(fini) *fini_p = fini;
int main(void) {
puts(__FUNCTION__);
return 0;
}
$ gcc -Wall a.c
$ ./a.out
preinit
init
main
fini
$

Get names and addresses of exported functions in linux

I am able to get a list of exported function names and pointers from an executable in windows by using using the PIMAGE_DOS_HEADER API (example).
What is the equivalent API for Linux?
For context I am creating unit test executables and I am exporting functions starting with the name "test_" and I want the executable to just spin through and execute all of the test functions when run.
Example psuedo code:
int main(int argc, char** argv)
{
auto run = new_trun();
auto module = dlopen(NULL);
auto exports = get_exports(module); // <- how do I do this on unix?
for( auto i = 0; i < exports->length; i++)
{
auto export = exports[i];
if(strncmp("test_", export->name, strlen("test_")) == 0)
{
tcase_add(run, export->name, export->func);
}
}
return trun_run(run);
}
EDIT:
I was able to find what I was after using the top answer from this question:
List all the functions/symbols on the fly in C?
Additionally I had to use the gnu_hashtab_symbol_count function from Nominal Animal's answer below to handle the DT_GNU_HASH instead of the DT_HASH.
My final test main function looks like this:
int main(int argc, char** argv)
{
vector<string> symbols;
dl_iterate_phdr(retrieve_symbolnames, &symbols);
TRun run;
auto handle = dlopen(NULL, RTLD_LOCAL | RTLD_LAZY);
for(auto i = symbols.begin(); i != symbols.end(); i++)
{
auto name = *i;
auto func = (testfunc)dlsym(handle, name.c_str());
TCase tcase;
tcase.name = string(name);
tcase.func = func;
run.test_cases.push_back(tcase);
}
return trun_run(&run);
}
Which I then define tests in the assembly like:
// test.h
#define START_TEST(name) extern "C" EXPORT TResult test_##name () {
#define END_TEST return tresult_success(); }
// foo.cc
START_TEST(foo_bar)
{
assert_pending();
}
END_TEST
Which produces output that looks like this:
test_foo_bar: pending
1 pending
0 succeeded
1 total
I do get quite annoyed when I see questions asking how to do something in operating system X that you do in Y.
In most cases, it is not an useful approach, because each operating system (family) tends to have their own approach to issues, so trying to apply something that works in X in Y is like stuffing a cube into a round hole.
Please note: the text here is intended as harsh, not condesceding; my command of the English language is not as good as I'd like. Harshness combined with actual help and pointers to known working solutions seems to work best in overcoming nontechnical limitations, in my experience.
In Linux, a test environment should use something like
LC_ALL=C LANG=C readelf -s FILE
to list all the symbols in FILE. readelf is part of the binutils package, and is installed if you intend to build new binaries on the system. This leads to portable, robust code. Do not forget that Linux encompasses multiple hardware architectures that do have real differences.
To build binaries in Linux, you normally use some of the tools provided in binutils. If binutils provided a library, or there was an ELF library based on the code used in binutils, it would be much better to use that, rather than parse the output of the human utilities. However, there is no such library (the libbfd library binutils uses internally is not ELF-specific). The [URL=http://www.mr511.de/software/english.html]libelf[/URL] library is good, but it is completely separate work by chiefly a single author. Bugs in it have been reported to binutils, which is unproductive, as the two are not related. Simply put, there are no guarantees that it handles the ELF files on a given architecture the same way binutils does. Therefore, for robustness and reliability, you'll definitely want to use binutils.
If you have a test application, it should use a script, say /usr/lib/yourapp/list-test-functions, to list the test-related functions:
#!/bin/bash
export LC_ALL=C LANG=C
for file in "$#" ; do
readelf -s "$file" | while read num value size type bind vix index name dummy ; do
[ "$type" = "FUNC" ] || continue
[ "$bind" = "GLOBAL" ] || continue
[ "$num" = "$[$num]" ] || continue
[ "$index" = "$[$index]" ] || continue
case "$name" in
test_*) printf '%s\n' "$name"
;;
esac
done
done
This way, if there is an architecture that has quirks (in the binutils' readelf output format in particular), you only need to modify the script. Modifying such a simple script is not difficult, and it is easy to verify the script works correctly -- just compare the raw readelf output to the script output; anybody can do that.
A subroutine that constructs a pipe, fork()s a child process, executes the script in the child process, and uses e.g. getline() in the parent process to read the list of names, is quite simple and extremely robust. Since this is also the one fragile spot, we've made it very easy to fix any quirks or problems here by using that external script (that is customizable/extensible to cover those quirks, and easy to debug).
Remember, if binutils itself has bugs (other than output formatting bugs), any binaries built will almost certainly exhibit those same bugs also.
Being a Microsoft-oriented person, you probably will have trouble grasping the benefits of such a modular approach. (It is not specific to Microsoft, but specific to a single-vendor controlled ecosystem where the vendor-pushed approach is via overarching frameworks, and black boxes with clean but very limited interfaces. I think it as the framework limitation, or vendor-enforced walled garden, or prison garden. Looks good, but getting out is difficult. For description and history on the modular approach I'm trying to describe, see for example the Unix philosophy article at Wikipedia.)
The following shows that your approach is indeed possible in Linux, too -- although clunky and fragile; this stuff is intended to be done using the standard tools instead. It's just not the right approach in general.
The interface, symbols.h, is easiest to implement using a callback function that gets called for each symbol found:
#ifndef SYMBOLS_H
#ifndef _GNU_SOURCE
#error You must define _GNU_SOURCE!
#endif
#define SYMBOLS_H
#include <stdlib.h>
typedef enum {
LOCAL_SYMBOL = 1,
GLOBAL_SYMBOL = 2,
WEAK_SYMBOL = 3,
} symbol_bind;
typedef enum {
FUNC_SYMBOL = 4,
OBJECT_SYMBOL = 5,
COMMON_SYMBOL = 6,
THREAD_SYMBOL = 7,
} symbol_type;
int symbols(int (*callback)(const char *libpath, const char *libname, const char *objname,
const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom),
void *custom);
#endif /* SYMBOLS_H */
The ELF symbol binding and type macros are word-size specific, so to avoid the hassle, I declared the enum types above. I omitted some uninteresting types (STT_NOTYPE, STT_SECTION, STT_FILE), however.
The implementation, symbols.c:
#define _GNU_SOURCE
#include <stdlib.h>
#include <limits.h>
#include <string.h>
#include <stdio.h>
#include <fnmatch.h>
#include <dlfcn.h>
#include <link.h>
#include <errno.h>
#include "symbols.h"
#define UINTS_PER_WORD (__WORDSIZE / (CHAR_BIT * sizeof (unsigned int)))
static ElfW(Word) gnu_hashtab_symbol_count(const unsigned int *const table)
{
const unsigned int *const bucket = table + 4 + table[2] * (unsigned int)(UINTS_PER_WORD);
unsigned int b = table[0];
unsigned int max = 0U;
while (b-->0U)
if (bucket[b] > max)
max = bucket[b];
return (ElfW(Word))max;
}
static symbol_bind elf_symbol_binding(const unsigned char st_info)
{
#if __WORDSIZE == 32
switch (ELF32_ST_BIND(st_info)) {
#elif __WORDSIZE == 64
switch (ELF64_ST_BIND(st_info)) {
#else
switch (ELF_ST_BIND(st_info)) {
#endif
case STB_LOCAL: return LOCAL_SYMBOL;
case STB_GLOBAL: return GLOBAL_SYMBOL;
case STB_WEAK: return WEAK_SYMBOL;
default: return 0;
}
}
static symbol_type elf_symbol_type(const unsigned char st_info)
{
#if __WORDSIZE == 32
switch (ELF32_ST_TYPE(st_info)) {
#elif __WORDSIZE == 64
switch (ELF64_ST_TYPE(st_info)) {
#else
switch (ELF_ST_TYPE(st_info)) {
#endif
case STT_OBJECT: return OBJECT_SYMBOL;
case STT_FUNC: return FUNC_SYMBOL;
case STT_COMMON: return COMMON_SYMBOL;
case STT_TLS: return THREAD_SYMBOL;
default: return 0;
}
}
static void *dynamic_pointer(const ElfW(Addr) addr,
const ElfW(Addr) base, const ElfW(Phdr) *const header, const ElfW(Half) headers)
{
if (addr) {
ElfW(Half) h;
for (h = 0; h < headers; h++)
if (header[h].p_type == PT_LOAD)
if (addr >= base + header[h].p_vaddr &&
addr < base + header[h].p_vaddr + header[h].p_memsz)
return (void *)addr;
}
return NULL;
}
struct phdr_iterator_data {
int (*callback)(const char *libpath, const char *libname,
const char *objname, const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom);
void *custom;
};
static int iterate_phdr(struct dl_phdr_info *info, size_t size, void *dataref)
{
struct phdr_iterator_data *const data = dataref;
const ElfW(Addr) base = info->dlpi_addr;
const ElfW(Phdr) *const header = info->dlpi_phdr;
const ElfW(Half) headers = info->dlpi_phnum;
const char *libpath, *libname;
ElfW(Half) h;
if (!data->callback)
return 0;
if (info->dlpi_name && info->dlpi_name[0])
libpath = info->dlpi_name;
else
libpath = "";
libname = strrchr(libpath, '/');
if (libname && libname[0] == '/' && libname[1])
libname++;
else
libname = libpath;
for (h = 0; h < headers; h++)
if (header[h].p_type == PT_DYNAMIC) {
const ElfW(Dyn) *entry = (const ElfW(Dyn) *)(base + header[h].p_vaddr);
const ElfW(Word) *hashtab;
const ElfW(Sym) *symtab = NULL;
const char *strtab = NULL;
ElfW(Word) symbol_count = 0;
for (; entry->d_tag != DT_NULL; entry++)
switch (entry->d_tag) {
case DT_HASH:
hashtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
if (hashtab)
symbol_count = hashtab[1];
break;
case DT_GNU_HASH:
hashtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
if (hashtab) {
ElfW(Word) count = gnu_hashtab_symbol_count(hashtab);
if (count > symbol_count)
symbol_count = count;
}
break;
case DT_STRTAB:
strtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
break;
case DT_SYMTAB:
symtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
break;
}
if (symtab && strtab && symbol_count > 0) {
ElfW(Word) s;
for (s = 0; s < symbol_count; s++) {
const char *name;
void *const ptr = dynamic_pointer(base + symtab[s].st_value, base, header, headers);
symbol_bind bind;
symbol_type type;
int result;
if (!ptr)
continue;
type = elf_symbol_type(symtab[s].st_info);
bind = elf_symbol_binding(symtab[s].st_info);
if (symtab[s].st_name)
name = strtab + symtab[s].st_name;
else
name = "";
result = data->callback(libpath, libname, name, ptr, symtab[s].st_size, bind, type, data->custom);
if (result)
return result;
}
}
}
return 0;
}
int symbols(int (*callback)(const char *libpath, const char *libname, const char *objname,
const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom),
void *custom)
{
struct phdr_iterator_data data;
if (!callback)
return errno = EINVAL;
data.callback = callback;
data.custom = custom;
return errno = dl_iterate_phdr(iterate_phdr, &data);
}
When compiling the above, remember to link against the dl library.
You may find the gnu_hashtab_symbol_count() function above interesting; the format of the table is not well documented anywhere that I can find. This is tested to work on both i386 and x86-64 architectures, but it should be vetted against the GNU sources before relying on it in production code. Again, the better option is to just use those tools directly via a helper script, as they will be installed on any development machine.
Technically, a DT_GNU_HASH table tells us the first dynamic symbol, and the highest index in any hash bucket tells us the last dynamic symbol, but since the entries in the DT_SYMTAB symbol table always begin at 0 (actually, the 0 entry is "none"), I only consider the upper limit.
To match library and function names, I recommend using strncmp() for a prefix match for libraries (match at the start of the library name, up to the first .). Of course, you can use fnmatch() if you prefer glob patterns, or regcomp()+regexec() if you prefer regular expressions (they are built-in to the GNU C library, no external libraries are needed).
Here is an example program, example.c, that just prints out all the symbols:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
#include <errno.h>
#include "symbols.h"
static int my_func(const char *libpath, const char *libname, const char *objname,
const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom __attribute__((unused)))
{
printf("%s (%s):", libpath, libname);
if (*objname)
printf(" %s:", objname);
else
printf(" unnamed");
if (size > 0)
printf(" %zu-byte", size);
if (binding == LOCAL_SYMBOL)
printf(" local");
else
if (binding == GLOBAL_SYMBOL)
printf(" global");
else
if (binding == WEAK_SYMBOL)
printf(" weak");
if (type == FUNC_SYMBOL)
printf(" function");
else
if (type == OBJECT_SYMBOL || type == COMMON_SYMBOL)
printf(" variable");
else
if (type == THREAD_SYMBOL)
printf(" thread-local variable");
printf(" at %p\n", addr);
fflush(stdout);
return 0;
}
int main(int argc, char *argv[])
{
int arg;
for (arg = 1; arg < argc; arg++) {
void *handle = dlopen(argv[arg], RTLD_NOW);
if (!handle) {
fprintf(stderr, "%s: %s.\n", argv[arg], dlerror());
return EXIT_FAILURE;
}
fprintf(stderr, "%s: Loaded.\n", argv[arg]);
}
fflush(stderr);
if (symbols(my_func, NULL))
return EXIT_FAILURE;
return EXIT_SUCCESS;
}
To compile and run the above, use for example
gcc -Wall -O2 -c symbols.c
gcc -Wall -O2 -c example.c
gcc -Wall -O2 example.o symbols.o -ldl -o example
./example | less
To see the symbols in the program itself, use the -rdynamic flag at link time to add all symbols to the dynamic symbol table:
gcc -Wall -O2 -c symbols.c
gcc -Wall -O2 -c example.c
gcc -Wall -O2 -rdynamic example.o symbols.o -ldl -o example
./example | less
On my system, the latter prints out
(): stdout: 8-byte global variable at 0x602080
(): _edata: global at 0x602078
(): __data_start: global at 0x602068
(): data_start: weak at 0x602068
(): symbols: 70-byte global function at 0x401080
(): _IO_stdin_used: 4-byte global variable at 0x401150
(): __libc_csu_init: 101-byte global function at 0x4010d0
(): _start: global function at 0x400a57
(): __bss_start: global at 0x602078
(): main: 167-byte global function at 0x4009b0
(): _init: global function at 0x4008d8
(): stderr: 8-byte global variable at 0x602088
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): unnamed local at 0x7fc652097000
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): unnamed local at 0x7fc652097da0
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): __asprintf: global function at 0x7fc652097000
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): free: global function at 0x7fc652097000
...
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): dlvsym: 118-byte weak function at 0x7fc6520981f0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc651cf14a0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc65208c740
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _rtld_global: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): __libc_enable_secure: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): __tls_get_addr: global function at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _rtld_global_ro: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_find_dso_for_object: global function at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_starting_up: weak at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_argv: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): putwchar: 292-byte global function at 0x7fc651d4a210
...
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): vwarn: 224-byte global function at 0x7fc651dc8ef0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): wcpcpy: 39-byte weak function at 0x7fc651d75900
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): unnamed local at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): unnamed local at 0x7fc65229bae0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_get_tls_static_info: 21-byte global function at 0x7fc6522adaa0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_PRIVATE: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_2.3: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_2.4: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): free: 42-byte weak function at 0x7fc6522b2c40
...
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): malloc: 13-byte weak function at 0x7fc6522b2bf0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_allocate_tls_init: 557-byte global function at 0x7fc6522adc00
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _rtld_global_ro: 304-byte global variable at 0x7fc6524bdcc0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): __libc_enable_secure: 4-byte global variable at 0x7fc6524bde68
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_rtld_di_serinfo: 1620-byte global function at 0x7fc6522a4710
I used ... to mark where I removed lots of lines.
Questions?
To get a list of exported symbols from a shared library (a .so) under Linux, there are two ways: the easy one and a slightly harder one.
The easy one is to use the console tools already available: objdump (included in GNU binutils):
$ objdump -T /usr/lib/libid3tag.so.0
00009c15 g DF .text 0000012e Base id3_tag_findframe
00003fac g DF .text 00000053 Base id3_ucs4_utf16duplicate
00008288 g DF .text 000001f2 Base id3_frame_new
00007b73 g DF .text 000003c5 Base id3_compat_fixup
...
The slightly harder way is to use libelf and write a C/C++ program to list the symbols yourself. Have a look at the elfutils package, which is also built from the libelf source. There is a program called eu-readelf (the elfutils version of readelf, not to be confused with the binutils readelf). eu-readelf -s $LIB lists exported symbols using libelf, so you should be able to use that as a starting point.

How to start two instances of a process through fork without modifying the program

I am trying to fork a process from another at the start. For this I tried to modify the __libc_start_main function in glibc (a modified glibc that I use) and tried to put the fork there, but could not compile the glibc as it gives an error whenever I try to do that. What are other options and why inserting fork in __libc_start_main doesn't work?
Again notice that I want to do it in a way that no program modification is required, that is modification in glibc is OK but not the program.
In __libc_start_main, I try to fork like this.
if (__builtin_expect (! not_first_call, 1))
{
struct pthread *self;
fork(); // <-- here
self = THREAD_SELF;
/* Store old info. */
unwind_buf.priv.data.prev = THREAD_GETMEM (self, cleanup_jmp_buf);
unwind_buf.priv.data.cleanup = THREAD_GETMEM (self, cleanup);
/* Store the new cleanup handler info. */
THREAD_SETMEM (self, cleanup_jmp_buf, &unwind_buf);
/* Run the program. */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
}
The error i get is the following.
file '/build/sunrpc/xbootparam_prot.T' already exists and may be overwritten
make[2]: *** [build/sunrpc/xbootparam_prot.stmp] Error 1
If you're statically linking to an unmodifiable object with the main entry point, you could use symbol wrapping to sneak the fork() before the object's main().
For example, main.o that can't be modified:
#include <stdio.h>
int main( int argc, char *argv[] ) {
printf( "In main()\n" );
return 0;
}
Your wrapper symbol within glibc:
#include <unistd.h>
#include <stdio.h>
int __wrap_main( int argc, char *argv[] ) {
printf( "In wrapper\n" );
if ( fork() ) {
return __real_main( argc, argv );
} else {
printf( "Other process did something else\n" );
return 0;
}
}
And use the --wrap command linker command:
gcc -o app main.o wrap.o -Wl,--wrap=main
$ ./app
In wrapper
In main()
$ Other process did something else

Find program's code address at runtime?

When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.
[edit] To be more specific, I will demonstrate it with following example.
#include <stdio.h>
int main(int argc,char *argv[]){
myfunction();
exit(0);
}
Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.
Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:
void* myfunction_address = myfunction;
If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.
Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).
And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.
And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.
If you know the function name before program runs, simply use
void * addr = myfunction;
If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.
#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>
long find_symbol(char *filename, char *symname)
{
bfd *ibfd;
asymbol **symtab;
long nsize, nsyms, i;
symbol_info syminfo;
char **matching;
bfd_init();
ibfd = bfd_openr(filename, NULL);
if (ibfd == NULL) {
printf("bfd_openr error\n");
}
if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
printf("format_matches\n");
}
nsize = bfd_get_symtab_upper_bound (ibfd);
symtab = malloc(nsize);
nsyms = bfd_canonicalize_symtab(ibfd, symtab);
for (i = 0; i < nsyms; i++) {
if (strcmp(symtab[i]->name, symname) == 0) {
bfd_symbol_info(symtab[i], &syminfo);
return (long) syminfo.value;
}
}
bfd_close(ibfd);
printf("cannot find symbol\n");
}
To get a backtrace, use execinfo.h as documented in the GNU libc manual.
For example:
#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>
void trace_pom()
{
const int sz = 15;
void *buf[sz];
// get at most sz entries
int n = backtrace(buf, sz);
// output them right to stderr
backtrace_symbols_fd(buf, n, fileno(stderr));
// but if you want to output the strings yourself
// you may use char ** backtrace_symbols (void *const *buffer, int size)
write(fileno(stderr), "\n", 1);
}
void TransferFunds(int n);
void DepositMoney(int n)
{
if (n <= 0)
trace_pom();
else TransferFunds(n-1);
}
void TransferFunds(int n)
{
DepositMoney(n);
}
int main()
{
DepositMoney(3);
return 0;
}
compiled
gcc a.c -o a -g -Wall -Werror -rdynamic
According to the mentioned website:
Currently, the function name and offset only be obtained on systems that use the ELF
binary format for programs and libraries. On other systems, only the hexadecimal return
address will be present. Also, you may need to pass additional flags to the linker to
make the function names available to the program. (For example, on systems using GNU
ld, you must pass (-rdynamic.)
Output
./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
About a comment in an answer (getting the address of an instruction), you can use this very ugly trick
#include <setjmp.h>
void function() {
printf("in function\n");
printf("%d\n",__LINE__);
printf("exiting function\n");
}
int main() {
jmp_buf env;
int i;
printf("in main\n");
printf("%d\n",__LINE__);
printf("calling function\n");
setjmp(env);
for (i=0; i < 18; ++i) {
printf("%p\n",env[i]);
}
function();
printf("in main again\n");
printf("%d\n",__LINE__);
}
It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output
in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37
have fun!

How would a loaded library function call a symbol in the main application?

When loaded a shared library is opened via the function dlopen(), is there a way for it to call functions in main program?
Code of dlo.c (the lib):
#include <stdio.h>
// function is defined in main program
void callb(void);
void test(void) {
printf("here, in lib\n");
callb();
}
Compile with
gcc -shared -olibdlo.so dlo.c
Here the code of the main program (copied from dlopen manpage, and adjusted):
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
void callb(void) {
printf("here, i'm back\n");
}
int
main(int argc, char **argv)
{
void *handle;
void (*test)(void);
char *error;
handle = dlopen("libdlo.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any existing error */
*(void **) (&test) = dlsym(handle, "test");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(EXIT_FAILURE);
}
(*test)();
dlclose(handle);
exit(EXIT_SUCCESS);
}
Build with
gcc -ldl -rdynamic main.c
Output:
[js#HOST2 dlopen]$ LD_LIBRARY_PATH=. ./a.out
here, in lib
here, i'm back
[js#HOST2 dlopen]$
The -rdynamic option puts all symbols in the dynamic symbol table (which is mapped into memory), not only the names of the used symbols. Read further about it here. Of course you can also provide function pointers (or a struct of function pointers) that define the interface between the library and your main program. It's actually the method what i would choose probably. I heard from other people that it's not so easy to do -rdynamic in windows, and it also would make for a cleaner communication between library and main program (you've got precise control on what can be called and not), but it also requires more house-keeping.
Yes, If you provide your library a pointer to that function, I'm sure the library will be able to run/execute the function in the main program.
Here is an example, haven't compiled it so beware ;)
/* in main app */
/* define your function */
int do_it( char arg1, char arg2);
int do_it( char arg1, char arg2){
/* do it! */
return 1;
}
/* some where else in main app (init maybe?) provide the pointer */
LIB_set_do_it(&do_it);
/** END MAIN CODE ***/
/* in LIBRARY */
int (*LIB_do_it_ptr)(char, char) = NULL;
void LIB_set_do_it( int (*do_it_ptr)(char, char) ){
LIB_do_it_ptr = do_it_ptr;
}
int LIB_do_it(){
char arg1, arg2;
/* do something to the args
...
... */
return LIB_do_it_ptr( arg1, arg2);
}
The dlopen() function, as discussed by #litb, is primarily provided on systems using ELF format object files. It is rather powerful and will let you control whether symbols referenced by the loaded library can be satisfied from the main program, and generally does let them be satisfied. Not all shared library loading systems are as flexible - be aware if it comes to porting your code.
The callback mechanism outlined by #hhafez works now that the kinks in that code are straightened out.

Resources