I am curious how programs like readelf, objdump and gdb know what to display next to callq instructions. Since the program has yet to run how do they know how far to 'fall through' the .plt? Do they guess based on the arguments passed to it? Or do they actually do a mock run of the program to find out?
For example:
400ca4: e8 e7 fb ff ff callq 400890 <printf#plt>
400ca9: 48 8b 85 28 ff ff ff mov -0xd8(%rbp),%rax
The above code knows to go to printf() in the .plt at 0x400890:
0000000000400890 <printf#plt>:
400890: ff 25 ba 17 20 00 jmpq *0x2017ba(%rip) # 602050 <_GLOBAL_OFFSET_TA$
400896: 68 07 00 00 00 pushq $0x7
40089b: e9 70 ff ff ff jmpq 400810 <_init+0x20>
This is just output from objdump -d so I'm not sure how the program knows it wants printf. The only correlation I can see is the relocation index (pushq $0x7) and the section .dynsym, though it is one value off because it starts at 0:
8: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf#GLIBC_2.2.5 (2)
Another thing that confuses me is the reference to the GOT in the .plt entry (#602050). I see from readelf that it is part of .got.plt based on the address range, but how do these programs determine the value before the program is run?
[23] .got.plt PROGBITS 0000000000602000 00002000
00000000000000b8 0000000000000008 WA 0 0 8
** Edit **
Symbol table '.dynsym' contains 22 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free#GLIBC_2.2.5 (2)
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND putchar#GLIBC_2.2.5 (2)
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strncpy#GLIBC_2.2.5 (2)
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts#GLIBC_2.2.5 (2)
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fclose#GLIBC_2.2.5 (2)
6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strlen#GLIBC_2.2.5 (2)
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __stack_chk_fail#GLIBC_2.4 (3)
8: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf#GLIBC_2.2.5 (2)
9: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
10: 0000000000000000 0 FUNC GLOBAL DEFAULT UND ftell#GLIBC_2.2.5 (2)
11: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
12: 0000000000000000 0 FUNC GLOBAL DEFAULT UND malloc#GLIBC_2.2.5 (2)
13: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _IO_getc#GLIBC_2.2.5 (2)
14: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fseek#GLIBC_2.2.5 (2)
15: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fopen#GLIBC_2.2.5 (2)
16: 0000000000000000 0 FUNC GLOBAL DEFAULT UND perror#GLIBC_2.2.5 (2)
17: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getopt#GLIBC_2.2.5 (2)
18: 0000000000000000 0 FUNC GLOBAL DEFAULT UND atoi#GLIBC_2.2.5 (2)
19: 0000000000000000 0 FUNC GLOBAL DEFAULT UND exit#GLIBC_2.2.5 (2)
20: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fwrite#GLIBC_2.2.5 (2)
21: 0000000000400a4d 34 FUNC GLOBAL DEFAULT 13 err
A little of this is going off of memory, but let's see if I can't help you out...
As to your first question, there's a chain of things that link together. I can't guarantee this is how these tools are doing things, but just to show that there is a way.
The PLT has a 1-to-1 correspondence (except for PLT[0], which is special) with a .rel(a).plt section. This section contains relocations for the PLT entries.
Each .rel(a).plt entry has an info field which has a symbol table index, e.g. into .dynsym.
Each symbol table entry has an offset into the string table (e.g. .dynstr) for its name. This offset is a byte offset starting from the beginning of the string section.
So as you can see, you can follow the PLT to the rel(a).plt, to the symbol table, to the string table, where you'll find "printf."
To answer your second question, take a look at the program headers (readelf -Wl <program>), and you'll see the virtual addresses for the different sections. That's where that address range comes from.
Related
I have been trying to learn about x86-64 machine code and ELF files. For that purpose i wrote some code to generate an ELF file with some machine code in it. I use a some machine code that i assembled using nasm (it just prints a message and calls the exit syscall, learning to assemble machine code myself comes next) and wrote a C program to write the correct ELF header/Section headers/Symbol table etc. manually into a file.
Now I am trying to link my file (with a single function in it) against another elf file, which I generate via gcc from C code (test.c):
// does not work with or without "extern"
extern void hello();
void _start()
{
hello();
// exit system call
asm(
"movl $60,%eax;"
"xorl %ebx,%ebx;"
"syscall");
}
The output of readelf -a on my ELF file is (hello.o):
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 64 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 9
Section header string table index: 8
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000280
0000000000000044 0000000000000000 AX 0 0 16
[ 2] .rela.text RELA 0000000000000000 000002c8
0000000000000030 0000000000000018 I 6 1 8
[ 3] .data PROGBITS 0000000000000000 00000300
0000000000000005 0000000000000000 WA 0 0 16
[ 4] .bss NOBITS 0000000000000000 00000310
0000000000000080 0000000000000000 A 0 0 16
[ 5] .rodata PROGBITS 0000000000000000 00000310
000000000000000d 0000000000000000 A 0 0 16
[ 6] .symtab SYMTAB 0000000000000000 00000320
0000000000000150 0000000000000018 7 14 8
[ 7] .strtab STRTAB 0000000000000000 00000470
0000000000000028 0000000000000000 0 0 1
[ 8] .shstrtab STRTAB 0000000000000000 00000498
000000000000003f 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
There is no dynamic section in this file.
Relocation section '.rela.text' at offset 0x2c8 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000001a 000500000001 R_X86_64_64 0000000000000000 .rodata + 0
000000000024 00050000000a R_X86_64_32 0000000000000000 .rodata + d
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 8
9: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello.c
10: 0000000000000000 68 FUNC GLOBAL DEFAULT 1 hello
11: 0000000000000060 13 OBJECT LOCAL DEFAULT 5 msg
12: 000000000000000d 8 NOTYPE LOCAL DEFAULT ABS len
13: 0000000000000050 5 OBJECT GLOBAL DEFAULT 3 _test
No version information found in this file.
I have compiled test.c with
gcc -c -nostdlib -fno-asynchronous-unwind-tables test.c -o test.o
to then link with ld test.o hello.o, which unfortunately yields
ld: test.o: in function `_start':
test.c:(.text+0xa): undefined reference to `hello'
even though the hello function is defined in hello.o (note the entry in the symbol table named hello which is in section 1, the .text section, and seems to have the correct size/type/value/bind).
If I compile a file with just void hello(){} in it the same way I compiled test.c, those two object files can obviously be linked. Also, if I generate my own ELF file hello.o as an executable, renaming the hello function to _start it executes just fine. I have been banging my head against the Wall for a while now, and there is two things I would like to know: Obviously I would like to know my issue with the ELF file. But also I would like to know how I can debug such issues in the future. I have tried to build ld from source (cloning the GNU binutils repo) with debugging symbols, but I did not get very far debugging ld itself.
Edit: I have uploaded my elf file here:
https://drive.google.com/file/d/1cRNr0VPAjkEbueuWFYwLYbpijVnLySqq/view?usp=sharing
This was quite hard to debug.
Here is the output from readelf -WSs hello.o for the file you uploaded to Google drive (it doesn't match the info in your question):
There are 9 section headers, starting at offset 0x40:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000000000000 000280 000044 00 AX 0 0 16
[ 2] .rela.text RELA 0000000000000000 0002c8 000030 18 I 6 1 8
[ 3] .data PROGBITS 0000000000000000 000300 000005 00 WA 0 0 16
[ 4] .bss NOBITS 0000000000000000 000310 000080 00 A 0 0 16
[ 5] .rodata PROGBITS 0000000000000000 000310 00000d 00 A 0 0 16
[ 6] .symtab SYMTAB 0000000000000000 000320 000150 18 7 14 8
[ 7] .strtab STRTAB 0000000000000000 000470 000028 00 0 0 1
[ 8] .shstrtab STRTAB 0000000000000000 000498 00003f 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 8
9: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello.c
10: 0000000000000000 68 FUNC GLOBAL DEFAULT 1 hello
11: 0000000000000060 13 OBJECT LOCAL DEFAULT 5 msg
12: 000000000000000d 8 NOTYPE LOCAL DEFAULT ABS len
13: 0000000000000050 5 OBJECT GLOBAL DEFAULT 3 _test
The issue is with the .sh_info value (14) of the .symtab section.
According to documentation, .sh_info for SYMTAB section is supposed to contain "one greater than the symbol table index of the last local symbol (binding STB_LOCAL)."
So the value 14 tells the linker that all symbols in this file are local, and therefore can't possibly be used to resolve any external references to them.
You need to move all LOCAL symbols before GLOBAL ones (here, msg and len would need to move before hello), so that the symbol table looks like this:
...
9: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello.c
10: 0000000000000060 13 OBJECT LOCAL DEFAULT 5 msg
11: 000000000000000d 8 NOTYPE LOCAL DEFAULT ABS len
12: 0000000000000000 68 FUNC GLOBAL DEFAULT 1 hello
13: 0000000000000050 5 OBJECT GLOBAL DEFAULT 3 _test
and then set .sh_info for the .symtab section to 12.
But also I would like to know how I can debug such issues in the future.
As you've discovered, debugging binutils ld is very hard, partially because it uses libbfd, which is choke-full of macros and is itself very hard to debug.
I debugged this by building Gold from source, which fortunately produced the exact same failure.
I build OpenSSL-1.0.2n with -g 386 shared option (to work with basic assembly version) to generate shared library libcrypto.so.1.0.0.
Inside crypto/aes folder, aes-x86_64.s is generated and it has different global functions/labels.
The total numbers of lines in aes-x86_64.s is 2535 and various labels are present at different place (or line number in .s file).
328 .globl AES_encrypt
.type AES_encrypt,#function
.align 16
.globl asm_AES_encrypt
.hidden asm_AES_encrypt
asm_AES_encrypt:
334 AES_encrypt:
775 .globl AES_decrypt
.type AES_decrypt,#function
.align 16
.globl asm_AES_decrypt
.hidden asm_AES_decrypt
asm_AES_decrypt:
781 AES_decrypt:
844 .globl private_AES_set_encrypt_key
.type private_AES_set_encrypt_key,#function
.align 16
847 private_AES_set_encrypt_key:
1105 .globl private_AES_set_decrypt_key
.type private_AES_set_decrypt_key,#function
.align 16
1108 private_AES_set_decrypt_key:
1292 .globl AES_cbc_encrypt
.type AES_cbc_encrypt,#function
.align 16
.globl asm_AES_cbc_encrypt
.hidden asm_AES_cbc_encrypt
asm_AES_cbc_encrypt:
1299 AES_cbc_encrypt:
1750 .LAES_Te:
.long 0xa56363c6,0xa56363c6
.long 0x847c7cf8,0x847c7cf8
.long 0x997777ee,0x997777ee
.long 0x8d7b7bf6,0x8d7b7bf6
.long 0x0df2f2ff,0x0df2f2ff
.long 0xbd6b6bd6,0xbd6b6bd6
....
....
2140 .LAES_Td:
.long 0x50a7f451,0x50a7f451
.long 0x5365417e,0x5365417e
.long 0xc3a4171a,0xc3a4171a
.long 0x965e273a,0x965e273a
.long 0xcb6bab3b,0xcb6bab3b
AES_cbc_encrypt is global function declared at line number 776 and label AES_cbc_encrypt is at line number 781.
local label .LAES_Te and .LAES_Td are at line number 1750 and 2140 respectively where long data are stored.
I am able to access global label AES_cbc_encrypt of assembly file from another C program by linking with shared library.
//test_glob.c
#include <stdlib.h>
extern void* AES_cbc_encrypt() ;
int main()
{
long *p;
int i;
p=(long *)(&AES_cbc_encrypt);
for(i=0;i<768;i++)
{
printf("p+%d %p %x\n",i, p+i,*(p+i));
}
}
gcc test_glob.c -lcryto
./a.out
This gives some random output and later segmentation fault.
There must be a way to find the offset of this data section (local label .LAES_Te and .LAES_Td) from global label AES_cbc_encrypt
so that the data can be used in encryption/decryption.
I have following questions.
1. How to find the offset from global label AES_cbc_encrypt to local label .LAES_Te and .LAES_Td so that based on
that offset I can access data from another C program ?
2. Is there any other way to access those data of assembly file from C program ?
3. Is there any way to find the location in memory where those data is loaded and access those memory location to access data ?
I am using gcc-5.4 Linux Ubuntu 16.04 . Any help or link will be highly appreciated. Thanks in advance.
EDIT 1:
readelf -a aes-x86_64.o produces following output.
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 14672 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 16
Section header string table index: 13
Section Headers:
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040 0000000000002e40 0000000000000000 AX 0 0 64
[ 2] .rela.text RELA 0000000000000000 00003808 0000000000000018 0000000000000018 I 14 1 8
[ 3] .data PROGBITS 0000000000000000 00002e80 0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00002e80 0000000000000000 0000000000000000 WA 0 0 1
[ 5] .note.GNU-stack PROGBITS 0000000000000000 00002e80 0000000000000000 0000000000000000 0 0 1
[ 6] .debug_line PROGBITS 0000000000000000 00002e80 00000000000005a4 0000000000000000 0 0 1
[ 7] .rela.debug_line RELA 0000000000000000 00003820 0000000000000018 0000000000000018 I 14 6 8
[ 8] .debug_info PROGBITS 0000000000000000 00003424 0000000000000071 0000000000000000 0 0 1
[ 9] .rela.debug_info RELA 0000000000000000 00003838 0000000000000060 0000000000000018 I 14 8 8
[10] .debug_abbrev PROGBITS 0000000000000000 00003495 0000000000000014 0000000000000000 0 0 1
[11] .debug_aranges PROGBITS 0000000000000000 000034b0 0000000000000030 0000000000000000 0 0 16
[12] .rela.debug_arang RELA 0000000000000000 00003898 0000000000000030 0000000000000018 I 14 11 8
[13] .shstrtab STRTAB 0000000000000000 000038c8 0000000000000085 0000000000000000 0 0 1
[14] .symtab SYMTAB 0000000000000000 000034e0 0000000000000228 0000000000000018 15 14 8
[15] .strtab STRTAB 0000000000000000 00003708 00000000000000fb 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
Relocation section '.rela.text' at offset 0x3808 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000fc0 001600000002 R_X86_64_PC32 0000000000000000 OPENSSL_ia32cap_P - 4
Relocation section '.rela.debug_line' at offset 0x3820 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000030 000100000001 R_X86_64_64 0000000000000000 .text + 0
Relocation section '.rela.debug_info' at offset 0x3838 contains 4 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000006 000a0000000a R_X86_64_32 0000000000000000 .debug_abbrev + 0
00000000000c 000b0000000a R_X86_64_32 0000000000000000 .debug_line + 0
000000000010 000100000001 R_X86_64_64 0000000000000000 .text + 0
000000000018 000100000001 R_X86_64_64 0000000000000000 .text + 2e40
Relocation section '.rela.debug_aranges' at offset 0x3898 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000006 00090000000a R_X86_64_32 0000000000000000 .debug_info + 0
000000000010 000100000001 R_X86_64_64 0000000000000000 .text + 0
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.symtab' contains 23 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 3
3: 0000000000000000 0 SECTION LOCAL DEFAULT 4
4: 0000000000000000 483 FUNC LOCAL DEFAULT 1 _x86_64_AES_encrypt
5: 00000000000001f0 609 FUNC LOCAL DEFAULT 1 _x86_64_AES_encrypt_compa
6: 0000000000000520 465 FUNC LOCAL DEFAULT 1 _x86_64_AES_decrypt
7: 0000000000000700 737 FUNC LOCAL DEFAULT 1 _x86_64_AES_decrypt_compa
8: 0000000000000ae0 649 FUNC LOCAL DEFAULT 1 _x86_64_AES_set_encrypt_k
9: 0000000000000000 0 SECTION LOCAL DEFAULT 8
10: 0000000000000000 0 SECTION LOCAL DEFAULT 10
11: 0000000000000000 0 SECTION LOCAL DEFAULT 6
12: 0000000000000000 0 SECTION LOCAL DEFAULT 11
13: 0000000000000000 0 SECTION LOCAL DEFAULT 5
14: 0000000000000460 177 FUNC GLOBAL DEFAULT 1 AES_encrypt
15: 0000000000000460 0 NOTYPE GLOBAL HIDDEN 1 asm_AES_encrypt
16: 00000000000009f0 184 FUNC GLOBAL DEFAULT 1 AES_decrypt
17: 00000000000009f0 0 NOTYPE GLOBAL HIDDEN 1 asm_AES_decrypt
18: 0000000000000ab0 35 FUNC GLOBAL DEFAULT 1 private_AES_set_encrypt_k
19: 0000000000000d70 541 FUNC GLOBAL DEFAULT 1 private_AES_set_decrypt_k
20: 0000000000000f90 1411 FUNC GLOBAL DEFAULT 1 AES_cbc_encrypt
21: 0000000000000f90 0 NOTYPE GLOBAL HIDDEN 1 asm_AES_cbc_encrypt
22: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND OPENSSL_ia32cap_P
No version information found in this file.
EDIT 2:
nm aes-x86_64.o produces following output.
0000000000000f90 T AES_cbc_encrypt
00000000000009f0 T AES_decrypt
0000000000000460 T AES_encrypt
0000000000000f90 T asm_AES_cbc_encrypt
00000000000009f0 T asm_AES_decrypt
0000000000000460 T asm_AES_encrypt
U OPENSSL_ia32cap_P
0000000000000d70 T private_AES_set_decrypt_key
0000000000000ab0 T private_AES_set_encrypt_key
0000000000000520 t _x86_64_AES_decrypt
0000000000000700 t _x86_64_AES_decrypt_compact
0000000000000000 t _x86_64_AES_encrypt
00000000000001f0 t _x86_64_AES_encrypt_compact
0000000000000ae0 t _x86_64_AES_set_encrypt_key
Edit 3:
nm -a gives following output
0000000000000f90 T AES_cbc_encrypt
00000000000009f0 T AES_decrypt
0000000000000460 T AES_encrypt
0000000000000f90 T asm_AES_cbc_encrypt
00000000000009f0 T asm_AES_decrypt
0000000000000460 T asm_AES_encrypt
0000000000000000 b .bss
0000000000000000 d .data
0000000000000000 N .debug_abbrev
0000000000000000 N .debug_aranges
0000000000000000 N .debug_info
0000000000000000 N .debug_line
0000000000000000 n .note.GNU-stack
U OPENSSL_ia32cap_P
0000000000000d70 T private_AES_set_decrypt_key
0000000000000ab0 T private_AES_set_encrypt_key
0000000000000000 t .text
0000000000000520 t _x86_64_AES_decrypt
0000000000000700 t _x86_64_AES_decrypt_compact
0000000000000000 t _x86_64_AES_encrypt
00000000000001f0 t _x86_64_AES_encrypt_compact
0000000000000ae0 t _x86_64_AES_set_encrypt_key
If you hard-code an offset based on this version of the library, it could break with a different version that has any changes in aes-x86_64.s.
So you should add a .globl foo and foo: label to the .s at the position of the data you want to access, and declare it in C as extern uint32_t foo[].
Then the normal code-gen mechanisms for accessing static data from a shared library will kick in. (i.e. load the address from the GOT if necessary).
Also, unless you compile with -fno-plt, &AES_cbc_encrypt will be the address of the PLT stub / wrapper, not the actual function in the library.
If you only need it to work with a specific build of the library:
Then yes I think with -fno-plt, taking the address of a function in the library will compile/assemble to a load from the GOT, so you get the actual address after dynamic linking. -fno-plt is essential for this to work.
It might be fairly far away if it's in another section (.rodata instead of .text probably) so your simple scan of 768 * 4 bytes may not find the table, though.
A better way to find the offset from a symbol you can use & on in C:
Use a debugger: single-step into a function that uses the data, and find what address it's loading from (gdb's built-in disassembly should work).
Or disassemble the binary and look at the little-endian rel32 offset in a RIP-relative load or LEA of the table address. (That offset won't be fixed-up at run-time). Look at the asm source to find an instruction that references the hidden symbol you want, then find that instruction in the disassembly.
That will give you the distance in bytes from the end of that instruction to the table. You can probably see the distance from that instruction to a symbol you can take the address of in C (like you're doing with the function pointer). Also, the disassembler will fill in absolute addresses (relative to some arbitrary base) for load addresses, and for symbols / instructions, so you can subtract those.
I have an elf binary which has the following dynsym symbol table as output by readelf:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
3: 0000000000400440 0 FUNC GLOBAL DEFAULT UND printf#GLIBC_2.2.5 (2)
4: 0000000000400460 0 FUNC GLOBAL DEFAULT UND fgets#GLIBC_2.2.5 (2)
What does the value column mean? Since this table has 400440 for printf, does that mean that the dynamic linker has to map printf at that address? If yes, how is this value decided? Is it random?
EDIT: Also, this is linux x86-64 with gcc
It appears that the value of undefined dynamic symbols of function types is just the address of their entry in the PLT. Likewise, the values of entries for variables is probably just their entry in the GOT.
glibc provides backtrace() and backtrace_symbols() to get the stack trace of a running program. But for this to work the program has to be built with linker's -rdynamic flag.
What is the difference between -g flag passed to gcc vs linker's -rdynamic flag ? For a sample code I did readelf to compare the outputs. -rdynamic seems to produce more info under Symbol table '.dynsym' But I am not quite sure what the additional info is.
Even if I strip a program binary built using -rdynamic, backtrace_symbols() continue to work.
When strip removes all the symbols from the binary why is it leaving behind whatever was added by the -rdynamic flag ?
Edit: Follow-up questions based on Mat's response below..
For the same sample code you took this is the difference I see with -g & -rdynamic
without any option..
Symbol table '.dynsym' contains 4 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 218 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
Symbol table '.symtab' contains 70 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000400200 0 SECTION LOCAL DEFAULT 1
2: 000000000040021c 0 SECTION LOCAL DEFAULT 2
with -g there are more sections, more entries in .symtab table but .dynsym remains the same..
[26] .debug_aranges PROGBITS 0000000000000000 0000095c
0000000000000030 0000000000000000 0 0 1
[27] .debug_pubnames PROGBITS 0000000000000000 0000098c
0000000000000023 0000000000000000 0 0 1
[28] .debug_info PROGBITS 0000000000000000 000009af
00000000000000a9 0000000000000000 0 0 1
[29] .debug_abbrev PROGBITS 0000000000000000 00000a58
0000000000000047 0000000000000000 0 0 1
[30] .debug_line PROGBITS 0000000000000000 00000a9f
0000000000000038 0000000000000000 0 0 1
[31] .debug_frame PROGBITS 0000000000000000 00000ad8
0000000000000058 0000000000000000 0 0 8
[32] .debug_loc PROGBITS 0000000000000000 00000b30
0000000000000098 0000000000000000 0 0 1
Symbol table '.dynsym' contains 4 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 218 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
Symbol table '.symtab' contains 77 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000400200 0 SECTION LOCAL DEFAULT 1
with -rdynamic no additional debug sections, .symtab entries are 70 (same as vanilla gcc invocation), but more .dynsym entries..
Symbol table '.dynsym' contains 19 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 218 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
2: 00000000005008e8 0 OBJECT GLOBAL DEFAULT ABS _DYNAMIC
3: 0000000000400750 57 FUNC GLOBAL DEFAULT 12 __libc_csu_fini
4: 00000000004005e0 0 FUNC GLOBAL DEFAULT 10 _init
5: 0000000000400620 0 FUNC GLOBAL DEFAULT 12 _start
6: 00000000004006f0 86 FUNC GLOBAL DEFAULT 12 __libc_csu_init
7: 0000000000500ab8 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
8: 00000000004006de 16 FUNC GLOBAL DEFAULT 12 main
9: 0000000000500aa0 0 NOTYPE WEAK DEFAULT 23 data_start
10: 00000000004007c8 0 FUNC GLOBAL DEFAULT 13 _fini
11: 00000000004006d8 6 FUNC GLOBAL DEFAULT 12 foo
12: 0000000000500ab8 0 NOTYPE GLOBAL DEFAULT ABS _edata
13: 0000000000500a80 0 OBJECT GLOBAL DEFAULT ABS _GLOBAL_OFFSET_TABLE_
14: 0000000000500ac0 0 NOTYPE GLOBAL DEFAULT ABS _end
15: 00000000004007d8 4 OBJECT GLOBAL DEFAULT 14 _IO_stdin_used
16: 0000000000500aa0 0 NOTYPE GLOBAL DEFAULT 23 __data_start
17: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
18: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
Symbol table '.symtab' contains 70 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000400200 0 SECTION LOCAL DEFAULT 1
2: 000000000040021c 0 SECTION LOCAL DEFAULT 2
Now these are the questions I have..
In gdb you can do bt to get the bactrace. If that works with just -g why do we need -rdynamic for backtrace_symbols to work ?
Comparing the additions to .symtab with -g & additions to .dynsym with -rdynamic they are not exactly the same.. Does either one provide better debugging info compared to the other ?
FWIW, size of the output produced is like this: with -g > with -rdynamic > with neither option
What exactly is the usage of .dynsym ? Is it all the symbols exported by this binary ? In that case why is foo going into .dynsym because we are not compiling the code as a library.
If I link my code using all static libraries then -rdynamic is not needed for backtrace_symbols to work ?
According to the docs:
This instructs the linker to add all symbols, not only used ones, to the dynamic symbol table.
Those are not debug symbols, they are dynamic linker symbols. Those are not removed by strip since it would (in most cases) break the executable - they are used by the runtime linker to do the final link stage of your executable.
Example:
$ cat t.c
void foo() {}
int main() { foo(); return 0; }
Compile and link without -rdynamic (and no optimizations, obviously)
$ gcc -O0 -o t t.c
$ readelf -s t
Symbol table '.dynsym' contains 3 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
Symbol table '.symtab' contains 50 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000400270 0 SECTION LOCAL DEFAULT 1
....
27: 0000000000000000 0 FILE LOCAL DEFAULT ABS t.c
28: 0000000000600e14 0 NOTYPE LOCAL DEFAULT 18 __init_array_end
29: 0000000000600e40 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC
So the executable has a .symtab with everything. But notice that .dynsym doesn't mention foo at all - it has the bare essentials in there. This is not enough information for backtrace_symbols to work. It relies on the information present in that section to match code addresses with function names.
Now compile with -rdynamic:
$ gcc -O0 -o t t.c -rdynamic
$ readelf -s t
Symbol table '.dynsym' contains 17 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
4: 0000000000601018 0 NOTYPE GLOBAL DEFAULT ABS _edata
5: 0000000000601008 0 NOTYPE GLOBAL DEFAULT 24 __data_start
6: 0000000000400734 6 FUNC GLOBAL DEFAULT 13 foo
7: 0000000000601028 0 NOTYPE GLOBAL DEFAULT ABS _end
8: 0000000000601008 0 NOTYPE WEAK DEFAULT 24 data_start
9: 0000000000400838 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used
10: 0000000000400750 136 FUNC GLOBAL DEFAULT 13 __libc_csu_init
11: 0000000000400650 0 FUNC GLOBAL DEFAULT 13 _start
12: 0000000000601018 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
13: 000000000040073a 16 FUNC GLOBAL DEFAULT 13 main
14: 0000000000400618 0 FUNC GLOBAL DEFAULT 11 _init
15: 00000000004007e0 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
16: 0000000000400828 0 FUNC GLOBAL DEFAULT 14 _fini
Symbol table '.symtab' contains 50 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000400270 0 SECTION LOCAL DEFAULT 1
....
27: 0000000000000000 0 FILE LOCAL DEFAULT ABS t.c
28: 0000000000600e14 0 NOTYPE LOCAL DEFAULT 18 __init_array_end
29: 0000000000600e40 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC
Same thing for symbols in .symtab, but now foo has a symbol in the dynamic symbol section (and a bunch of other symbols appear there now too). This makes backtrace_symbols work - it now has enough information (in most cases) to map code addresses with function names.
Strip that:
$ strip --strip-all t
$ readelf -s t
Symbol table '.dynsym' contains 17 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
4: 0000000000601018 0 NOTYPE GLOBAL DEFAULT ABS _edata
5: 0000000000601008 0 NOTYPE GLOBAL DEFAULT 24 __data_start
6: 0000000000400734 6 FUNC GLOBAL DEFAULT 13 foo
7: 0000000000601028 0 NOTYPE GLOBAL DEFAULT ABS _end
8: 0000000000601008 0 NOTYPE WEAK DEFAULT 24 data_start
9: 0000000000400838 4 OBJECT GLOBAL DEFAULT 15 _IO_stdin_used
10: 0000000000400750 136 FUNC GLOBAL DEFAULT 13 __libc_csu_init
11: 0000000000400650 0 FUNC GLOBAL DEFAULT 13 _start
12: 0000000000601018 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
13: 000000000040073a 16 FUNC GLOBAL DEFAULT 13 main
14: 0000000000400618 0 FUNC GLOBAL DEFAULT 11 _init
15: 00000000004007e0 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
16: 0000000000400828 0 FUNC GLOBAL DEFAULT 14 _fini
$ ./t
$
Now .symtab is gone, but the dynamic symbol table is still there, and the executable runs. So backtrace_symbols still works too.
Strip the dynamic symbol table:
$ strip -R .dynsym t
$ ./t
./t: relocation error: ./t: symbol , version GLIBC_2.2.5 not defined in file libc.so.6 with link time reference
... and you get a broken executable.
An interesting read for what .symtab and .dynsym are used for is here: Inside ELF Symbol Tables. One of the things to note is that .symtab is not needed at runtime, so it is discarded by the loader. That section does not remain in the process's memory. .dynsym, on the otherhand, is needed at runtime, so it is kept in the process image. So it is available for things like backtrace_symbols to gather information about the current process from within itself.
So in short:
dynamic symbols are not stripped by strip since that would render the executable non-loadable
backtrace_symbols needs dynamic symbols to figure out what code belongs which function
backtrace_symbols does not use debugging symbols
Hence the behavior you noticed.
For your specific questions:
gdb is a debugger. It uses debug information in the executable and libraries to display relevant information. It is much more complex than backtrace_symbols, and inspects the actual files on your drive in addition to the live process. backtrace_symbols does not, it is entirely in-process - so it cannot access sections that are not loaded into the executable image. Debug sections are not loaded into the runtime image, so it can't use them.
.dynsym is not a debugging section. It is a section used by the dynamic linker. .symbtab isn't a debugging section either, but it can be used by debugger that have access to the executable (and library) files. -rdynamic does not generate debug sections, only that extended dynamic symbol table. The executable growth from -rdynamic depends entirely on the number of symbols in that executable (and alignment/padding considerations). It should be considerably less than -g.
Except for statically linked binaries, executables need external dependencies resolved at load time. Like linking printf and some application startup procedures from the C library. These external symbols must be indicated somewhere in the executable: this is what .dynsym is used for, and this is why the exe has a .dynsym even if you don't specify -rdynamic. When you do specify it, the linker adds other symbols that are not necessary for the process to work, but can be used by things like backtrace_symbols.
backtrace_symbols will not resolve any function names if you statically link. Even if you specify -rdynamic, the .dynsym section will not be emitted to the executable. No symbol tables gets loaded into the executable image, so backtrace_symbols cannot map code adresses to symbols.
Hi I'm working in a Linux environment and I have to link to a object file already compiled which offers me some services (services.o) and I know some of them, but I'd like to know which are all of the exported symbols of it.
Is there any way to accomplish this not having the sources? If so, how?
Thanks you very much.
Try nm -- this tool is there for just this purpose.
Another option is objdump which also can show you a bunch of other stuff
or you can use readelf -s, this provides more detail infos.
Symbol table '.symtab' contains 19 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS a.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 8
8: 0000000000000000 0 SECTION LOCAL DEFAULT 9
9: 0000000000000000 0 SECTION LOCAL DEFAULT 11
10: 0000000000000000 0 SECTION LOCAL DEFAULT 12
11: 0000000000000000 0 SECTION LOCAL DEFAULT 14
12: 0000000000000000 0 SECTION LOCAL DEFAULT 16
13: 0000000000000000 0 SECTION LOCAL DEFAULT 17
14: 0000000000000000 0 SECTION LOCAL DEFAULT 15
15: 0000000000000000 71 FUNC GLOBAL DEFAULT 1 fa_global
16: 0000000000000000 4 OBJECT GLOBAL DEFAULT 4 a
17: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND fb_ex
18: 0000000000000050 17 FUNC GLOBAL DEFAULT 1 test