Location of global variables with DWARF (and relocation) - c

When dynamically linking a binary with libraries, relocation information is used to bind the variables/functions of the different ELF objects. However DWARF is not affected by relocation: how is a debugger supposed to resolve global variables?
Let's say I have liba.so (a.c) defining a global variable (using GNU/Linux with GCC or Clang):
#include <stdio.h>
int foo = 10;
int test(void) {
printf("&foo=%p\n", &foo);
}
and an program b linked against liba.so (b.c):
#include <stdio.h>
extern int foo;
int main(int argc, char** argv) {
test();
printf("&foo=%p\n", &foo);
return 0;
}
I expect that "foo" will be instanciated in liba.so
but in fact it is instanciated in both liba.so and b:
$ ./b
&foo=0x600c68 # <- b .bss
&foo=0x600c68 # <- b .bss
The foo variable which is used (both by b and by lib.so) is in the .bss of b
and not in liba.so:
[...]
0x0000000000600c68 - 0x0000000000600c70 is .bss
[...]
0x00007ffff7dda9c8 - 0x00007ffff7dda9d4 is .data in /home/foo/bar/liba.so
0x00007ffff7dda9d4 - 0x00007ffff7dda9d8 is .bss in /home/foo/bar/liba.so
The foo variable is instanciated twice:
once in liba.so (this instance is not used when linked with program b)
once in b (this instance is used instance of the other in b).
(I don't really understand why the variable is instanciated in the executable.)
There is only a declaration in b (as expected) in the DWARF informations:
$ readelf -wi b
[...]
<1><ca>: Abbrev Number: 9 (DW_TAG_variable)
<cb> DW_AT_name : foo
<cf> DW_AT_decl_file : 1
<d0> DW_AT_decl_line : 3
<d1> DW_AT_type : <0x57>
<d5> DW_AT_external : 1
<d5> DW_AT_declaration : 1
[...]
and a location is found in liba.so:
$ readelf -wi liba.so
[...]
<1><90>: Abbrev Number: 5 (DW_TAG_variable)
<91> DW_AT_name : foo
<95> DW_AT_decl_file : 1
<96> DW_AT_decl_line : 3
<97> DW_AT_type : <0x57>
<9b> DW_AT_external : 1
<9b> DW_AT_location : 9 bloc d'octets: 3 d0 9 20 0 0 0 0 0 (DW_OP_addr: 2009d0)
[...]
This address is the location of the (unsued) instance of foo in liba.so (.data).
I end up with 2 instances of the foo global variable (on in liba.so and one in b);
only the first one can be seen with DWARF;
only the secone one is used.
How is the debugger supposed to resolve the foo global variable?

I don't really understand why the variable is instanciated in the executable.
You can find the answer here.
How is the debugger supposed to resolve the foo global variable
The debugger reads symbol tables (in addition to debug info), and foo does get defined in both the main executable b, and in liba.so:
nm b | grep foo
0000000000600c68 B foo

(I read the Oracle doc provided by #Employed Russian.)
The global variable reinstanciation is done for non-PIC code in order to dereference the variable in a non-PIC way without patching the non-PIC code:
a copy of the variable is done for non-PIC code;
the variable is instanciated in the executable;
a copy relocation instruction is used to copy the data from the source shared objet at dynamic linking time;
the instance in the shared objet is not used (after the relocation copy has been done).
Copy relocation instructions:
$readelf -r b
Relocation section '.rela.dyn' at offset 0x638 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000600c58 000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000600ca8 001200000005 R_X86_64_COPY 0000000000600ca8 foo + 0
For functions, the GOT+PLT technique is used the same way they are used in PIC code.

Related

Function address nearly the same as other variables addresses [duplicate]

This question already has answers here:
Possible to know section of memory a variable is located?
(2 answers)
Closed 2 years ago.
Why are function addresses nearly the same as the address of static global variables or dynamically allocated variables? Here is the code for demonstration:
#include <stdio.h>
#include <stdlib.h>
int global_var;
int global_var1;
int global_var2;
static int st_var = 3;
void func()
{
return;
}
int main(void)
{
int x;
int* x_m = malloc(sizeof(int));
printf("Malloc: %p\n", x_m);
printf("Local: %p\n", &x);
printf("Function: %p\n", &func);
printf("Global: %p\n", &global_var);
printf("Global: %p\n", &global_var1);
printf("Global: %p\n", &global_var2);
printf("Static: %p\n", &st_var);
free(x_m);
return 0;
}
Output:
Malloc: 0x55bede9ce2a0
Local: 0x7ffdbc67b25c
Function: 0x55bede7151a9
Global: 0x55bede718024
Global: 0x55bede718030
Global: 0x55bede718020
Static: 0x55bede718010
Can somebody explain this? Because I thought that just global and static variables are stored into the .bss segment.
This is because, usually, the .text section (containing function code) and the .bss section of an ELF executable are mapped "relatively near" each other.
You can check this with readelf:
$ gcc prog.c
$ readelf -S a.out
There are 29 section headers, starting at offset 0x1ac0:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[14] .text PROGBITS 00000000000007e0 000007e0
0000000000000302 0000000000000000 AX 0 0 16
...
[24] .bss NOBITS 0000000000201010 00001010
0000000000000010 0000000000000000 WA 0 0 8
...
You can see from above from the "Address" field of .text and .bss that they will be loaded 0x201010-0x7e0 = 0x200830 bytes apart in virtual memory when the program runs.
In any case, this does not mean that your code is in the .bss section or that your variables are in the .text section. They are in two different yet "relatively near" sections.
The distance between the two is arbitrary, there is no real minimum or maximum requirement dictated by the ELF specification. You could write your own linker script to place them farther away if you really want.

Replacing symbols without LD_PRELOAD

Is it possible to set hooks on system calls in runtime? In portable way, without asm, maybe some dynamic linker functions?
I want to intercept system calls of 3rd party libraries. Don't want to use LD_PRELOAD, it needs external wrapper-launcher script setting env var
You can override a library call by redefining the function:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
void abort(void)
{
// If necessary, get a instance to the "real" function:
void (*real_abort)(void) = dlsym(RTLD_NEXT, "abort");
if (!real_abort) {
fpritnf(stderr, "Could not find real abort\n");
exit(1);
}
fprintf(stderr, "Calling abort\n");
real_abort();
}
with main
#include <stdlib.h>
int main(int argc, char** argv) {
abort();
}
Resulting in:
$ ./a.out
Calling abort
Aborted
If you want to do this in runtime for an abitrary function (without compiling your own version of the function), you might try to use the relocation informations of your ELF objects (executable and shared objects) and update the relocations at runtime.
Let's compile a simple hell world and look at its relocations:
$ LANG=C readelf -r ./a.out
Relocation section '.rela.dyn' at offset 0x348 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
0000006008d8 000300000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
Relocation section '.rela.plt' at offset 0x360 contains 3 entries:
Offset Info Type Sym. Value Sym. Name + Addend
0000006008f8 000100000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
000000600900 000200000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
000000600908 000300000007 R_X86_64_JUMP_SLO 0000000000000000 __gmon_start__ + 0
Those are the relocations done by the dynamic linker: the first line of the .rela.plt tells the dynamic linker it needs to setup a PLT entry at 0x0000006008f8 for the puts symbol. In order to override the put function, we might find all occurences of the puts symbols in all shared objects and relocate them to the suitable function.

How to load library defined symbols to a specified location?

The test is on Ubuntu 12.04, 32-bit, with gcc 4.6.3.
Basically I am doing some binary manipulation work on ELF binaries, and what I have to do now is to assemble a assembly program and guarantee the libc symbols are loaded to a predefined address by me.
Let me elaborate it in an simple example.
Suppose in the original code, libc symbols stdout#GLIBC_2.0 is used.
#include <stdio.h>
int main() {
FILE* fout = stdout;
fprintf( fout, "hello\n" );
}
When I compile it and check the symbol address using these commands:
gcc main.c
readelf -s a.out | grep stdout
I got this:
0804a020 4 OBJECT GLOBAL DEFAULT 25 stdout#GLIBC_2.0 (2)
0804a020 4 OBJECT GLOBAL DEFAULT 25 stdout##GLIBC_2.0
and the .bss section is like this:
readelf -S a.out | grep bss
[25] .bss NOBITS 0804a020 001014 00000c 00 WA 0 0 32
Now what I am trying to do is to load the stdout symbol in a predefined address, so I did this:
echo "stdout = 0x804a024;" > symbolfile
gcc -Wl,--just-symbols=symbolfile main.c
Then when I check the .bss section and symbol stdout, I got this:
[25] .bss NOBITS 0804a014 001014 000008 00 WA 0 0 4
4: 0804a024 0 NOTYPE GLOBAL DEFAULT ABS stdout
49: 0804a024 0 NOTYPE GLOBAL DEFAULT ABS stdout
It seems that I didn't successfully load the symbol stdout##GLIBC_2.0, but just a wired stdout. (I tried to write stdout##GLIBC_2.0 in symbolfile, but it can't compile... )
It seems that as I didn't make it, the beginning address of .bss section has also changed, which makes the address of stdout symbol in a non-section area. During runtime, it throws a segmentation fault when loading from 0x804a024.
Could anyone help me on how to successfully load the library symbol at a predefined address? Thanks!

Replacing static function in kernel module

Folks,
I'm trying to hack a kernel module by modifying its symbol. The basic idea is to replace the original function with new function by overwriting its address in the symtab. However, I found when declaring the function as static, the hacking fails. But it works with non-static function. My example code is below:
filename: orig.c
int fun(void) {
printk(KERN_ALERT "calling fun!\n");
return 0;
}
int evil(void) {
printk(KERN_ALERT "===== EVIL ====\n");
return 0;
}
static int init(void) {
printk(KERN_ALERT "Init Original!");
fun();
return 0;
}
void clean(void) {
printk(KERN_ALERT "Exit Original!");
return;
}
module_init(init);
module_exit(clean);
Then I follow the styx's article to replace the original function "fun" in symtab to call function "evil", http://www.phrack.org/issues.html?issue=68&id=11
>objdump -t orig.ko
...
000000000000001b g F .text 000000000000001b evil
0000000000000056 g F .text 0000000000000019 cleanup_module
0000000000000036 g F .text 0000000000000020 init_module
0000000000000000 g F .text 000000000000001b fun
...
By executing the elfchger
>./elfchger -s fun -v 1b orig.ko
[+] Opening orig.ko file...
[+] Reading Elf header...
>> Done!
[+] Finding ".symtab" section...
>> Found at 0xc630
[+] Finding ".strtab" section...
>> Found at 0xc670
[+] Getting symbol' infos:
>> Symbol found at 0x159f8
>> Index in symbol table: 0x1d
[+] Replacing 0x00000000 with 0x0000001b... done!
I can successfully change the fun's symbol table to be equal to evil and inserting the module see the effects:
000000000000001b g F .text 000000000000001b evil
...
000000000000001b g F .text 000000000000001b fun
> insmod ./orig.ko
> dmesg
[ 7687.797211] Init Original!
[ 7687.797215] ===== EVIL ====
While this works fine. When I change the declaration of fun to be "static int fun(void)" and follows the same steps as mentioned above, I found the evil does not get called. Could anyone give me some suggestion?
Thanks,
William
Short version: Declaring a function as 'static' makes it local and prevents the symbol to be exported. Thus, the call is linked statically, and the dynamic linker does not effect the call in any way at load time.
Long Version
Declaring a symbol as 'static' prevents the compiler from exporting the symbol, making it local instead of global. You can verify this by looking for the (missing) 'g' in your objdump output, or at the lower-case 't' (instead of 'T') in the output of 'nm'. The compiler might also inline the local function, in which case the symbol table wouldn't contain it at all.
Local symbols have to be unique only for the translation unit in which they are defined. If your module consisted of multiple translation units, you could have a static fun() in each of them. An nm or objdump of the finished .ko may then contain multiple local symbols called fun.
This also implies that local symbols are valid only in their respective translation unit, and also can be referred (in your case: called) only from inside this unit. Otherwise, the linker just would not now, which one you mean. Thus, the call to static fun() is already linked at compile time, before the module is loaded.
At load time, the dynamic linker won't tamper with the local symbol fun or references (in particular: calls) to it, since:
its local linkage already done
there are potentially more symbols named 'fun' throughout and the dynamic linker would not be able to tell, which one you meant

If a global variable is initialized to 0, will it go to BSS?

All the initialized global/static variables will go to initialized data section.
All the uninitialized global/static variables will go to uninitialed data section(BSS). The variables in BSS will get a value 0 during program load time.
If a global variable is explicitly initialized to zero (int myglobal = 0), where that variable will be stored?
Compiler is free to put such variable into bss as well as into data. For example, GCC has a special option controlling such behavior:
-fno-zero-initialized-in-bss
If the target supports a BSS section, GCC by default puts variables that are initialized to zero into BSS. This
can save space in the resulting code. This option turns off this
behavior because some programs explicitly rely on variables going to
the data section. E.g., so that the resulting executable can find the
beginning of that section and/or make assumptions based on that.
The default is -fzero-initialized-in-bss.
Tried with the following example (test.c file):
int put_me_somewhere = 0;
int main(int argc, char* argv[]) { return 0; }
Compiling with no options (implicitly -fzero-initialized-in-bss):
$ touch test.c && make test && objdump -x test | grep put_me_somewhere
cc test.c -o test
0000000000601028 g O .bss 0000000000000004 put_me_somewhere
Compiling with -fno-zero-initialized-in-bss option:
$ touch test.c && make test CFLAGS=-fno-zero-initialized-in-bss && objdump -x test | grep put_me_somewhere
cc -fno-zero-initialized-in-bss test.c -o test
0000000000601018 g O .data 0000000000000004 put_me_somewhere
It's easy enough to test for a specific compiler:
$ cat bss.c
int global_no_value;
int global_initialized = 0;
int main(int argc, char* argv[]) {
return 0;
}
$ make bss
cc bss.c -o bss
$ readelf -s bss | grep global_
32: 0000000000400420 0 FUNC LOCAL DEFAULT 13 __do_global_dtors_aux
40: 0000000000400570 0 FUNC LOCAL DEFAULT 13 __do_global_ctors_aux
55: 0000000000601028 4 OBJECT GLOBAL DEFAULT 25 global_initialized
60: 000000000060102c 4 OBJECT GLOBAL DEFAULT 25 global_no_value
We're looking for the location of 0000000000601028 and 000000000060102c:
$ readelf -S bss
There are 30 section headers, starting at offset 0x1170:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[24] .data PROGBITS 0000000000601008 00001008
0000000000000010 0000000000000000 WA 0 0 8
[25] .bss NOBITS 0000000000601018 00001018
0000000000000018 0000000000000000 WA 0 0 8
It looks like both values are stored in the .bss section on my system: gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4).
The behavior is dependent upon the C implementation. It may end up in either .data or .bss, and to increase changes that it does not end up in .data taking redundant space up, it's better not to explicitly initialize it to 0, since it will be set to 0 anyway if the object is of static duration.

Resources