executing init and fini - c

I just read about init and fini sections in ELF files and gave it a try:
#include <stdio.h>
int main(){
puts("main");
return 0;
}
void init(){
puts("init");
}
void fini(){
puts("fini");
}
If I do gcc -Wl,-init,init -Wl,-fini,fini foo.c and run the result the "init" part is not printed:
$ ./a.out
main
fini
Did the init part not run, or was it not able to print somehow?
Is there a any "official" documentation about the init/fini stuff?
man ld says:
-init=name
When creating an ELF executable or shared object, call
NAME when the executable or shared object is loaded, by
setting DT_INIT to the address of the function. By
default, the linker uses "_init" as the function to call.
Shouldn't that mean, that it would be enough to name the init function _init? (If I do gcc complains about multiple definition.)

Don't do that; let your compiler and linker fill in the sections as they see fit.
Instead, mark your functions with the appropriate function attributes, so that the compiler and linker will put them in the correct sections.
For example,
static void before_main(void) __attribute__((constructor));
static void after_main(void) __attribute__((destructor));
static void before_main(void)
{
/* This is run before main() */
}
static void after_main(void)
{
/* This is run after main() returns (or exit() is called) */
}
You can also assign a priority (say, __attribute__((constructor (300)))), an integer between 101 and 65535, inclusive, with functions having a smaller priority number run first.
Note that for illustration, I marked the functions static. That is, the functions won't be visible outside the file scope. The functions do not need to be exported symbols to be automatically called.
For testing, I suggest saving the following in a separate file, say tructor.c:
#include <unistd.h>
#include <string.h>
#include <errno.h>
static int outfd = -1;
static void wrout(const char *const string)
{
if (string && *string && outfd != -1) {
const char *p = string;
const char *const q = string + strlen(string);
while (p < q) {
ssize_t n = write(outfd, p, (size_t)(q - p));
if (n > (ssize_t)0)
p += n;
else
if (n != (ssize_t)-1 || errno != EINTR)
break;
}
}
}
void before_main(void) __attribute__((constructor (101)));
void before_main(void)
{
int saved_errno = errno;
/* This is run before main() */
outfd = dup(STDERR_FILENO);
wrout("Before main()\n");
errno = saved_errno;
}
static void after_main(void) __attribute__((destructor (65535)));
static void after_main(void)
{
int saved_errno = errno;
/* This is run after main() returns (or exit() is called) */
wrout("After main()\n");
errno = saved_errno;
}
so you can compile and link it as part of any program or library. To compile it as a shared library, use e.g.
gcc -Wall -Wextra -fPIC -shared tructor.c -Wl,-soname,libtructor.so -o libtructor.so
and you can interpose it into any dynamically linked command or binary using
LD_PRELOAD=./libtructor.so some-command-or-binary
The functions keep errno unchanged, although it should not matter in practice, and use the low-level write() syscall to output the messages to standard error. The initial standard error is duplicated to a new descriptor, because in many instances, the standard error itself gets closed before the last global destructor -- our destructor here -- gets run.
(Some paranoid binaries, typically security sensitive ones, close all descriptors they don't know about, so you might not see the After main() message in all cases.)

It is not a bug in ld but in the glibc startup code for the main executable. For shared objects the function set by the -init option is called.
This is the commit to ld adding the options -init and -fini.
The _init function of the program isn't called from file glibc-2.21/elf/dl-init.c:58 by the DT_INIT entry by the dynamic linker, but called from __libc_csu_init in file glibc-2.21/csu/elf-init.c:83 by the main executable.
That is, the function pointer in DT_INIT of the program is ignored by the startup.
If you compile with -static, fini isn't called, too.
DT_INIT and DT_FINI should definitely not be used, because they are old-style, see line 255.
The following works:
#include <stdio.h>
static void preinit(int argc, char **argv, char **envp) {
puts(__FUNCTION__);
}
static void init(int argc, char **argv, char **envp) {
puts(__FUNCTION__);
}
static void fini(void) {
puts(__FUNCTION__);
}
__attribute__((section(".preinit_array"), used)) static typeof(preinit) *preinit_p = preinit;
__attribute__((section(".init_array"), used)) static typeof(init) *init_p = init;
__attribute__((section(".fini_array"), used)) static typeof(fini) *fini_p = fini;
int main(void) {
puts(__FUNCTION__);
return 0;
}
$ gcc -Wall a.c
$ ./a.out
preinit
init
main
fini
$

Related

Is it possible for an LD_PRELOAD to only affect the main executable?

The Actual Problem
I have an executable that by default uses EGL and SDL 1.2 to handle graphics and user input respectively. Using LD_PRELOAD, I have replaced both with GLFW.
This works normally unless the user has installed the Wayland version of GLFW, which depends on EGL itself. Because all the EGL calls are either stubbed to do nothing or call GLFW equivalents, it doesn't work (ie. eglSwapBuffers calls glfwSwapBuffers which calls eglSwapBuffers and so on). I can't remove the EGL stubs because then it would call both EGL and GLFW and the main executable is closed-source so I can't modify that.
Is there any way to make LD_PRELOAD affect the main executable but not GLFW? Or any other solution to obtain the same effect?
The Simplified Problem
I made a simplified example to demonstrate the problem.
Main Executable:
#include <stdio.h>
extern void do_something();
int main() {
do_something();
fputs("testing B\n", stderr);
}
Shared Library:
#include <stdio.h>
void do_something() {
fputs("testing A\n", stderr);
}
Preloaded Library:
#include <stdio.h>
int fputs(const char *str, FILE *file) {
// Do Nothing
return 0;
}
When the preloaded library isn't used, the output is:
testing A
testing B
When it is used, the output is nothing.
I'm looking for a way to make the preloaded library only affect the main executable, that the output would be:
testing A
Thank you!
You can check if the return address is in the executable or the library, and then call either the "real" function or do your stub code, like this:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <link.h>
#include <stdio.h>
#include <stdlib.h>
static struct {
ElfW(Addr) start, end;
} *segments;
static int n;
static int (*real_fputs)(const char *, FILE *);
static int callback(struct dl_phdr_info *info, size_t size, void *data) {
n = info->dlpi_phnum;
segments = malloc(n * sizeof *segments);
for(int i = 0; i < n; ++i) {
segments[i].start = info->dlpi_addr + info->dlpi_phdr[i].p_vaddr;
segments[i].end = info->dlpi_addr + info->dlpi_phdr[i].p_vaddr + info->dlpi_phdr[i].p_memsz;
}
return 1;
}
__attribute__((__constructor__))
static void setup(void) {
real_fputs = dlsym(RTLD_NEXT, "fputs");
dl_iterate_phdr(callback, NULL);
}
__attribute__((__destructor__))
static void teardown(void) {
free(segments);
}
__attribute__((__noinline__))
int fputs(const char *str, FILE *file) {
ElfW(Addr) addr = (ElfW(Addr))__builtin_extract_return_addr(__builtin_return_address(0));
for(int i = 0; i < n; ++i) {
if(addr >= segments[i].start && addr < segments[i].end) {
// Do Nothing
return 0;
}
}
return real_fputs(str, file);
}
This has some caveats, though. For example, if your executable calls a library function that tail-calls a function you're hooking, then this will incorrectly consider that library call an executable call. (You could mitigate this problem by adding wrappers for those library functions too, that unconditionally forward to the "real" function, and compiling the wrapper code with -fno-optimize-sibling-calls.) Also, there's no way to distinguish whether anonymous executable memory (e.g., JITted code) originally came from the executable or a library.
To test this, save my code as hook_fputs.c, your main executable as main.c, and your shared library as libfoo.c. Then run these commands:
clang -fPIC -shared hook_fputs.c -ldl -o hook_fputs.so
clang -fPIC -shared libfoo.c -o libfoo.so
clang main.c ./libfoo.so
LD_PRELOAD=./hook_fputs.so ./a.out
Implement the interposing library separately for the two cases.
Create a wrapper script or program that uses ldd to find out the exact EGL library version and their paths the target binary is dynamically linked against; then, using ldd on the the GLFW library, to find out whether it is linked against EGL or not. Finally, have it execute the target binary with the path to the appropriate interposing library in LD_PRELOAD environment variable.

How to append to __preinit_array_start on Linux?

On Linux with GCC if I define
__attribute__((constructor)) static void myfunc(void) {}
, then the address of myfunc will be appended to __init_array_start in the .ctors section. But how can I append a function pointer to __preinit_array_start?
Is __preinit_array_start relevant in a statically linked binary?
As there's no __attribute__((preconstructor)), you can just mush the code into the relevant section using some section attributes e.g.
#include <stdio.h>
int v;
int w;
int x;
__attribute__((constructor)) static void
foo(void)
{
printf("Foo %d %d %d\n", v, w, x);
}
static void
bar(void)
{
v = 3;
}
static void
bar1(void)
{
w = 2;
}
static void
bar2(void)
{
x = 1;
}
__attribute__((section(".preinit_array"))) static void (*y[])(void) = { &bar, &bar1, &bar2 };
int
main(int argc, char **argv)
{
printf("Hello World\n");
}
File dumped into foo.c, compiled using: gcc -o foo foo.c, and then run yields an output of:
Foo 3 2 1
Hello World
File compiled using gcc -static -o foo foo.c, and then run yields the same output, so it does appear to work with statically linked binaries.
It will not work with .so files, though; the linker complains with:
/usr/bin/ld: /tmp/ccI0lMgd.o: .preinit_array section is not allowed in DSO
/usr/bin/ld: failed to set dynamic section sizes: Nonrepresentable section on output
I'd be inclined to avoid it, as code run in that section precedes all other initialization routines. If you're trying to perform some 'this is supposed to run first' initialization, then it's really not a good idea - you're just fighting a race condition which should be solved by some other mechanism.

reading the environment when executing ELF IFUNC dispatch functions

The IFUNC mechanism in recent ELF tools on (at least) Linux allows to choose a implementation of a function at runtime. Look at the iunc attribute in the GCC documentation for more detailed description: http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
Another description of IFUNC mecanism : http://www.agner.org/optimize/blog/read.php?i=167
I would like to choose my implementation depending on the value of an environment variable. However, my experiments show me that the libc (at least the part about environment) is not yet initialized when the resolver function is run. So, the classical interfaces (extern char**environ or getenv()) do not work.
Does anybody know how to access the environment of a program in Linux at very early stage ? The environment is setup by the kernel at the execve(2) system call, so it is already somewhere (but where exactly ?) in the program address space at early initialization.
Thanks in advance
Vincent
Program to test:
#include <stdio.h>
#include <stdlib.h>
extern char** environ;
char** save_environ;
char* toto;
int saved=0;
extern int fonction ();
int fonction1 () {
return 1;
}
int fonction2 () {
return 2;
}
static typeof(fonction) * resolve_fonction (void) {
saved=1;
save_environ=environ;
toto=getenv("TOTO");
/* no way to choose between fonction1 and fonction2 with the TOTO envvar */
return fonction1;
}
int fonction () __attribute__ ((ifunc ("resolve_fonction")));
void print_saved() {
printf("saved: %dn", saved);
if (saved) {
printf("prev environ: %pn", save_environ);
printf("prev TOTO: %sn", toto);
}
}
int main() {
print_saved();
printf("main environ: %pn", environ);
printf("main environ[0]: %sn", environ[0]);
printf("main TOTO: %sn", getenv("TOTO"));
printf("main value: %dn", fonction());
return 0;
}
Compilation and execution:
$ gcc -Wall -g ifunc.c -o ifunc
$ env TOTO=ok ./ifunc
saved: 1
prev environ: (nil)
prev TOTO: (null)
main environ: 0x7fffffffe288
main environ[0]: XDG_VTNR=7
main TOTO: ok
main value: 1
$
In the resolver function, environ is NULL and getenv("TOTO") returns NULL. In the main function, the information is here.
Function Pointer
I found no way to use env in early stage legally. Resolver function runs in linker even earlier, than preinit_array functions. The only legal way to resolve this is to use function pointer and decide what function to use in function of .preinit_array section:
extern char** environ;
int(*f)();
void preinit(int argc, char **argv, char **envp) {
f = f1;
environ = envp; // actually, it is done a bit later
char *v = getenv("TOTO");
if (v && strcmp(v, "ok") == 0) {
f = f2;
}
}
__attribute__((section(".preinit_array"))) typeof(preinit) *__preinit = preinit;
ifunc & GNU ld inners
Glibc's linker ld contains a local symbol _environ and it is initialized, but it is rather hard to extract it. There is another way I found, but it is a bit tricky and rather unreliable.
At linker's entry point _start only stack is initialized. Program arguments and environmental values are sent to the process via stack. Arguments are stored in the following order:
argc, argv, argv + 1, ..., argv + argc - 1, NULL, ENV...
Linker ld shares a global symbol _dl_argv, which points to this place on the stack. With the help of it we can extract all the needful variables:
extern char** environ;
extern char **_dl_argv;
char** get_environ() {
int argc = *(int*)(_dl_argv - 1);
char **my_environ = (char**)(_dl_argv + argc + 1);
return my_environ;
}
typeof(f1) * resolve_f() {
environ = get_environ();
const char *var = getenv("TOTO");
if (var && strcmp(var, "ok") == 0) {
return f2;
}
return f1;
}
int f() __attribute__((ifunc("resolve_f")));

Find program's code address at runtime?

When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.
[edit] To be more specific, I will demonstrate it with following example.
#include <stdio.h>
int main(int argc,char *argv[]){
myfunction();
exit(0);
}
Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.
Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:
void* myfunction_address = myfunction;
If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.
Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).
And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.
And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.
If you know the function name before program runs, simply use
void * addr = myfunction;
If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.
#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>
long find_symbol(char *filename, char *symname)
{
bfd *ibfd;
asymbol **symtab;
long nsize, nsyms, i;
symbol_info syminfo;
char **matching;
bfd_init();
ibfd = bfd_openr(filename, NULL);
if (ibfd == NULL) {
printf("bfd_openr error\n");
}
if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
printf("format_matches\n");
}
nsize = bfd_get_symtab_upper_bound (ibfd);
symtab = malloc(nsize);
nsyms = bfd_canonicalize_symtab(ibfd, symtab);
for (i = 0; i < nsyms; i++) {
if (strcmp(symtab[i]->name, symname) == 0) {
bfd_symbol_info(symtab[i], &syminfo);
return (long) syminfo.value;
}
}
bfd_close(ibfd);
printf("cannot find symbol\n");
}
To get a backtrace, use execinfo.h as documented in the GNU libc manual.
For example:
#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>
void trace_pom()
{
const int sz = 15;
void *buf[sz];
// get at most sz entries
int n = backtrace(buf, sz);
// output them right to stderr
backtrace_symbols_fd(buf, n, fileno(stderr));
// but if you want to output the strings yourself
// you may use char ** backtrace_symbols (void *const *buffer, int size)
write(fileno(stderr), "\n", 1);
}
void TransferFunds(int n);
void DepositMoney(int n)
{
if (n <= 0)
trace_pom();
else TransferFunds(n-1);
}
void TransferFunds(int n)
{
DepositMoney(n);
}
int main()
{
DepositMoney(3);
return 0;
}
compiled
gcc a.c -o a -g -Wall -Werror -rdynamic
According to the mentioned website:
Currently, the function name and offset only be obtained on systems that use the ELF
binary format for programs and libraries. On other systems, only the hexadecimal return
address will be present. Also, you may need to pass additional flags to the linker to
make the function names available to the program. (For example, on systems using GNU
ld, you must pass (-rdynamic.)
Output
./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
About a comment in an answer (getting the address of an instruction), you can use this very ugly trick
#include <setjmp.h>
void function() {
printf("in function\n");
printf("%d\n",__LINE__);
printf("exiting function\n");
}
int main() {
jmp_buf env;
int i;
printf("in main\n");
printf("%d\n",__LINE__);
printf("calling function\n");
setjmp(env);
for (i=0; i < 18; ++i) {
printf("%p\n",env[i]);
}
function();
printf("in main again\n");
printf("%d\n",__LINE__);
}
It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output
in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37
have fun!

How would a loaded library function call a symbol in the main application?

When loaded a shared library is opened via the function dlopen(), is there a way for it to call functions in main program?
Code of dlo.c (the lib):
#include <stdio.h>
// function is defined in main program
void callb(void);
void test(void) {
printf("here, in lib\n");
callb();
}
Compile with
gcc -shared -olibdlo.so dlo.c
Here the code of the main program (copied from dlopen manpage, and adjusted):
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
void callb(void) {
printf("here, i'm back\n");
}
int
main(int argc, char **argv)
{
void *handle;
void (*test)(void);
char *error;
handle = dlopen("libdlo.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any existing error */
*(void **) (&test) = dlsym(handle, "test");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(EXIT_FAILURE);
}
(*test)();
dlclose(handle);
exit(EXIT_SUCCESS);
}
Build with
gcc -ldl -rdynamic main.c
Output:
[js#HOST2 dlopen]$ LD_LIBRARY_PATH=. ./a.out
here, in lib
here, i'm back
[js#HOST2 dlopen]$
The -rdynamic option puts all symbols in the dynamic symbol table (which is mapped into memory), not only the names of the used symbols. Read further about it here. Of course you can also provide function pointers (or a struct of function pointers) that define the interface between the library and your main program. It's actually the method what i would choose probably. I heard from other people that it's not so easy to do -rdynamic in windows, and it also would make for a cleaner communication between library and main program (you've got precise control on what can be called and not), but it also requires more house-keeping.
Yes, If you provide your library a pointer to that function, I'm sure the library will be able to run/execute the function in the main program.
Here is an example, haven't compiled it so beware ;)
/* in main app */
/* define your function */
int do_it( char arg1, char arg2);
int do_it( char arg1, char arg2){
/* do it! */
return 1;
}
/* some where else in main app (init maybe?) provide the pointer */
LIB_set_do_it(&do_it);
/** END MAIN CODE ***/
/* in LIBRARY */
int (*LIB_do_it_ptr)(char, char) = NULL;
void LIB_set_do_it( int (*do_it_ptr)(char, char) ){
LIB_do_it_ptr = do_it_ptr;
}
int LIB_do_it(){
char arg1, arg2;
/* do something to the args
...
... */
return LIB_do_it_ptr( arg1, arg2);
}
The dlopen() function, as discussed by #litb, is primarily provided on systems using ELF format object files. It is rather powerful and will let you control whether symbols referenced by the loaded library can be satisfied from the main program, and generally does let them be satisfied. Not all shared library loading systems are as flexible - be aware if it comes to porting your code.
The callback mechanism outlined by #hhafez works now that the kinks in that code are straightened out.

Resources