`.comm` directive in GAS not showing in `.o` file? - gnu-assembler

This is a purely pedagogical question.
I have the following C code, in a file called comm.c:
#include <stdio.h>
int a;
int main(){
printf("%d", a);
return 0;
}
The code prints "0\n".
Compiling with gcc -S, I get the following assembly code:
.file "comm.c"
.text
.comm a,4,4
.section .rodata
.LC0:
.string "%d"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl a(%rip), %eax
movl %eax, %esi
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf#PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 9.1.0"
.section .note.GNU-stack,"",#progbits
I am confused as to what .comm a,4,4 is doing. According to 7.96 of
the GNU as manual, the .text directive, it assembles what follows into
the end of the .text section. Thus, I would think that the beginning
of the .text section contains four bytes allocated to storing the
contents of a. This appears to not be the case, because if we
disassemble the .o file, we find:
comm.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <main+0xa>
a: 89 c6 mov %eax,%esi
c: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 13 <main+0x13>
13: b8 00 00 00 00 mov $0x0,%eax
18: e8 00 00 00 00 callq 1d <main+0x1d>
1d: b8 00 00 00 00 mov $0x0,%eax
22: 5d pop %rbp
23: c3 retq
Why aren't there four extra bytes at the beginning of =text=, as is
promised by the .text GAS directive? Of course, that would be stupid, to put data in the text segment.
So I guess my question is: what is .comm doing? Why is it placed under a .text directive?

.comm does not allocate in the .text section, but in the .bss section.
From https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC76 and https://sourceware.org/binutils/docs/as/Comm.html#Comm:
If ld does not see a definition for the symbol--just one or more common symbols--then it will allocate length bytes of uninitialized memory.
It is the linker's job to allocate and map the .comm symbols.
You can see this when you link the program and read the symbols table:
gcc comm.o -o comm
readelf comm -s
The relevant symbols:
Num: Value Size Type Bind Vis Ndx Name
24: 0000000000004030 0 SECTION LOCAL DEFAULT 24
31: 0000000000004030 1 OBJECT LOCAL DEFAULT 24 completed.7392
57: 0000000000004038 0 NOTYPE GLOBAL DEFAULT 24 _end
59: 0000000000004034 4 OBJECT GLOBAL DEFAULT 24 a
60: 0000000000004030 0 NOTYPE GLOBAL DEFAULT 24 __bss_start
__bss_start(0000000000004030) is the start of the .bss section, and _end(0000000000004038) is the end of the executable(and in this case also the end of the .bss section).
As the 4 bytes of a are at addresses 0000000000004034-0000000000004037, a is obviously in the .bss section.
And it does show in the .o, just not where you were looking for.
You can read the symbols in the .o file the same way and something like this will show up:
$ readelf comm.o -s
Symbol table '.symtab' contains 13 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS comm.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 8
8: 0000000000000000 0 SECTION LOCAL DEFAULT 6
9: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM a
10: 0000000000000000 36 FUNC GLOBAL DEFAULT 1 main
11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf

Related

How Does C Compiled to ASM Know Where to Branch to for External Functions?

How does C compiled to ARM ASM know where to branch to for external functions?
For example, here is a simple C program:
#include <stdio.h>
int main() {
printf("Hello World!");
return 0;
}
and its corresponding ARM ASM program:
.arch armv6
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "main.c"
.text
.section .rodata
.align 2
.LC0:
.ascii "Hello World!\000"
.text
.align 2
.global main
.arch armv6
.syntax unified
.arm
.fpu vfp
.type main, %function
main:
# args = 0, pretend = 0, frame = 0
# frame_needed = 1, uses_anonymous_args = 0
push {fp, lr}
add fp, sp, #4
ldr r0, .L3
bl printf
mov r3, #0
mov r0, r3
pop {fp, pc}
.L4:
.align 2
.L3:
.word .LC0
.size main, .-main
.ident "GCC: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110"
.section .note.GNU-stack,"",%progbits
I dont see a "printf" tag anywhere so i am assuming that it links outside of the program. but how does it know where to search? it wouldnt look everywhere, because there might be duplicate tags, but there are also libraries that are placed (in the computers perspective) at random, though i also dont see anywhere where it defines a library location.
so where does it link, for more than just the standard C library?
and how can i compile it to not rely on those external dependencies?
or know where the libraries are so i know which files i can delete?
i am currently operating linux on a raspberry pi 400
My computer has an x86_64 processor, but the principle is the same. I'm using gcc 9.3.0.
I copied your code into a file called main.c and compiled it to assembly with gcc -S main.c. It produced the file main.s with the following contents:
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello World!"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf#PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
.section .note.GNU-stack,"",#progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
There are a lot of assembler directives here that can make it confusing to read, so I assembled it into an object file (gcc -c main.s) and then ran objdump -d main.o to disassemble it. Here is the output of the disassembly:
main.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # f <main+0xf>
f: b8 00 00 00 00 mov $0x0,%eax
14: e8 00 00 00 00 callq 19 <main+0x19>
19: b8 00 00 00 00 mov $0x0,%eax
1e: 5d pop %rbp
1f: c3 retq
The first three instructions here are boilerplate, so we'll ignore them. The first interesting instruction is
lea 0x0(%rip),%rdi
This is meant to load the address of the "Hello World!" string into register %rdi. Confusingly, it appears to simply be copying %rip into %rdi.
The next instruction puts a 0 into register %eax. I actually don't know why this is, but it's not really relevant to this discussion.
Then comes the actual call to printf:
callq 19 <main+0x19>
Once again, this uses an address that doesn't seem correct. You may notice that address 0x19 actually points to the next instruction.
The next 3 instructions basically perform the final return 0.
To really answer your question we need to look at more than just assembly code. At this point I would recommend taking some time to research the format of ELF files. I would consider that topic to be beyond the scope of this answer, but it will help you understand what I'm about to show you.
I first want to point out that in both your assembly and mine, the "Hello World!" string is preceded by this directive:
.section .rodata
whereas the main function is preceded by
.text
which is shorthand for
.section .text
These directives instruct the assembler on how to arrange the code and data in the object file. You can see this by printing the section headers of the object file:
$ readelf -S main.o
There are 14 section headers, starting at offset 0x318:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000020 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000258
0000000000000030 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000060
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000060
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000060
000000000000000d 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 0000006d
000000000000002b 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 00000098
0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.propert NOTE 0000000000000000 00000098
0000000000000020 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000b8
0000000000000038 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 00000288
0000000000000018 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 000000f0
0000000000000138 0000000000000018 12 10 8
[12] .strtab STRTAB 0000000000000000 00000228
000000000000002a 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 000002a0
0000000000000074 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
If you can figure out how to read this output, you will see that the .text section is 0x20 bytes in size (which matches the above disassembly output), and the .rodata section is 0xd (13) bytes in size (i.e. strlen("Hello World!") plus a null byte). The answer your question, however, is in the relocation data:
$ readelf -r main.o
Relocation section '.rela.text' at offset 0x258 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000b 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
000000000015 000c00000004 R_X86_64_PLT32 0000000000000000 printf - 4
Relocation section '.rela.eh_frame' at offset 0x288 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
This output is also very confusing to read if you don't know what it means. The first thing to understand is that the relocation sections tell the linker about places in the code that depend on symbols, either in other sections of the same file, or, more frequently, symbols that are defined in other files. The .rela.text section, for example, contains relocation information about the .text section. When this object file is linked into the final executable, the linker will overwrite part of the .text section with the missing addresses.
So, looking at the first entry under .rela.text, we see an offset of 0xb. Looking at the disassembly, we can see that offset 0xb references the fourth byte of the lea instruction's 7-byte encoding. The type, R_X86_64_PC32, tells us that that instruction is expecting a 32-bit PC-relative address, so we can expect the linker to overwrite the next 4 bytes (currently all 0). The rightmost column tells us, in human readable format, that this address needs to be populated with the address of the .rodata section minus 4 (with PC-relative addressing you have to remember that the PC will be pointing at the next instruction). It leaves out the fact, implicit for relocation type R_X86_64_PC32, that it will then subtract from that the final address of byte 0xb in the .text section, which will make that a valid PC-relative pointer to the "Hello World!" string data.
Similarly, the second entry tells the linker to copy the address of printf (minus 4) to offset 0x15 in the .text section, which would be the last 4 bytes of the callq instruction encoding. In this case, the type is R_X86_64_PLT32, which tells us that it's pointing to an entry in the procedure linkage table (PLT). A PLT is used for dynamic linking so that shared object libraries (in this case libc.so) can be loaded into physical memory once and shared by many running executables.
As a note, that might answer some of your specific questions, your compiler automatically links all the runtime libraries needed to execute a program. This includes any standard library functions, which would be part of libc.so. The only way to run without "external dependencies" would be to run on a bare-metal system (i.e. one without an operating system). Any operating system you use will have to do some amount of work to get your program to the start of main().

How would you explain this disassembly listing?

I have a simple function in C language, in separate file string.c:
void var_init(){
char *hello = "Hello";
}
compiled with:
gcc -ffreestanding -c string.c -o string.o
And then I use command
objdump -d string.o
to see disassemble listing. What I got is:
string.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <var_init>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # b <var_init+0xb>
b: 48 89 45 f8 mov %rax,-0x8(%rbp)
f: 90 nop
10: 5d pop %rbp
11: c3 retq
I lost in understanding this listing. The book "Writing OS from scratch" says something about old disassembly and slightly uncover the mistery, but their listing is completely different and I even not see that data interpreted as code in mine as author says.
In addition to the explaination from #VladfromMoscow, Just thought it might be helpful for the poster to see what happens when you compile to assembly rather than using objdump to see it, as the data can be seen more plainly then (IMO) and the RIP relative addressing may make a bit more sense.
gcc -S x.s
Yields
.file "x.c"
.text
.section .rodata
.LC0:
.string "Hello"
.text
.globl var_init
.type var_init, #function
var_init:
.LFB0:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rax
movq %rax, -8(%rbp)
nop
popq %rbp
ret
.LFE0:
.size var_init, .-var_init
.ident "GCC: (Alpine 8.3.0) 8.3.0"
.section .note.GNU-stack,"",#progbits
This command
lea 0x0(%rip),%rax
stores the address of the string literal in the register rax.
And this command
mov %rax,-0x8(%rbp)
copies the address from the register rax into the allocated stack memory. The address occupies 8 bytes as it is seen from the offset in the stack -0x8.
This store only happens at all because you compiled in debug mode; it would normally be optimized away. The next thing that happens is that the local vars (in the red-zone below the stack pointer) are effectively discarded as the function tears down its stack frame and returns.
The material you're looking at probably included a sub $16, %rsp or similar to allocate space for locals below RBP, then deallocating that space later; the x86-64 System V ABI doesn't need that in leaf functions (that don't call any other functions); they can just use the read zone. (See also Where exactly is the red zone on x86-64?). Or compile with gcc -mno-red-zone, which you probably want anyway for freestanding code: Why can't kernel code use a Red Zone
Then it restores the saved value of the caller's RBP (which was earlier set up as a frame pointer; notice that space for locals was addressed relative to RBP).
pop %rbp
and exits, effectively popping the return address into RIP
retq

Loading a dynamic library at run-time yields inconsistent and unexpected results, missing symbols and empty PLT entries. Why?

I've been fighting with this problem for quite some time, and I've been unable to find a solution or even an explanation for it. So sorry if the question is long, but bear with me as I just want to make it 100% clear in the hopes that someone more experienced than me will be able to figure it out.
I'm keeping the C syntax highlight on for all snippets because it makes them a little bit clearer even if not really correct.
What I want to do
I have a C program which uses some functions from a dynamic library (libzip). Here it is boiled down to a minimal reproducible example (it basically does nothing, but it works just fine):
#include <zip.h>
int main(void) {
int err;
zip_t *myzip;
myzip = zip_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
zip_close(myzip);
return 0;
}
Normally, to compile it, I would simply do:
gcc -c prog.c
gcc -o prog prog.o -lzip
This creates, as expected, an ELF which requires libzip to run:
$ ldd prog
linux-vdso.so.1 (0x00007ffdafb53000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f81eedc7000)
/lib64/ld-linux-x86-64.so.2 (0x00007f81ef780000)
libzip.so.4 => /usr/lib/x86_64-linux-gnu/libzip.so.4 (0x00007f81ef166000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f81eebad000)
(libz is just a dependency of libzip)
What I really want to do though, is to load the library myself using dlopen(). Pretty simple task, no? Well yes, or at least I thought.
To achieve this, I should just need to call dlopen and let the loader do its job:
#include <zip.h>
#include <dlfcn.h>
int main(void) {
void *lib;
int err;
zip_t *myzip;
lib = dlopen("libzip.so", RTLD_LAZY | RTLD_GLOBAL);
if (lib == NULL)
return 1;
myzip = zip_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
zip_close(myzip);
return 0;
}
Of course, since I want to manually load the library myself, I will not link it this time:
# Create prog.o
gcc -c prog.c
# Do a dry-run just to make sure all symbols are resolved
gcc -o /dev/null prog.o -ldl -lzip
# Now recompile only with libdl
gcc -o prog prog.o -ldl -Wl,--unresolved-symbols=ignore-in-object-files
The flag --unresolved-symbols=ignore-in-object-files tells ld to not worry about my prog.o having unresolved symbols at link time (I want to take care of that myself at runtime).
The problem
The above Should Just Work™, and indeed it does seem to... but I have two machines, and being the pedantic nerd I am I just thought "well, better make sure and compile it on both of them".
First machine
x86-64, Linux 4.9, Debian 9, gcc 6.3.0, ld 2.28. Here everything works as expected.
I can clearly see that the symbols are there:
$ readelf --dyn-syms prog
Symbol table '.dynsym' contains 15 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
===> 4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND zip_close
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND dlopen#GLIBC_2.2.5 (3)
===> 6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND zip_open
7: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
8: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
9: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize#GLIBC_2.2.5 (2)
10: 0000000000201040 0 NOTYPE GLOBAL DEFAULT 25 _edata
11: 0000000000201048 0 NOTYPE GLOBAL DEFAULT 26 _end
12: 0000000000201040 0 NOTYPE GLOBAL DEFAULT 26 __bss_start
13: 00000000000006a0 0 FUNC GLOBAL DEFAULT 11 _init
14: 0000000000000924 0 FUNC GLOBAL DEFAULT 15 _fini
The PLT entries are also there as expected and look fine:
$ objdump -j .plt -M intel -d prog
Disassembly of section .plt:
00000000000006c0 <.plt>:
6c0: ff 35 42 09 20 00 push QWORD PTR [rip+0x200942] # 201008 <_GLOBAL_OFFSET_TABLE_+0x8>
6c6: ff 25 44 09 20 00 jmp QWORD PTR [rip+0x200944] # 201010 <_GLOBAL_OFFSET_TABLE_+0x10>
6cc: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
00000000000006d0 <zip_close#plt>:
6d0: ff 25 42 09 20 00 jmp QWORD PTR [rip+0x200942] # 201018 <zip_close>
6d6: 68 00 00 00 00 push 0x0
6db: e9 e0 ff ff ff jmp 6c0 <.plt>
00000000000006e0 <dlopen#plt>:
6e0: ff 25 3a 09 20 00 jmp QWORD PTR [rip+0x20093a] # 201020 <dlopen#GLIBC_2.2.5>
6e6: 68 01 00 00 00 push 0x1
6eb: e9 d0 ff ff ff jmp 6c0 <.plt>
00000000000006f0 <zip_open#plt>:
6f0: ff 25 32 09 20 00 jmp QWORD PTR [rip+0x200932] # 201028 <zip_open>
6f6: 68 02 00 00 00 push 0x2
6fb: e9 c0 ff ff ff jmp 6c0 <.plt>
And the program runs without any problem:
$ ./prog
$ echo $?
0
Even looking inside it with a debugger I can clearly see the symbols getting correctly resolved like any normal dynamic symbol:
0x55555555479b <main+43> lea rax, [rbp - 0x14]
0x55555555479f <main+47> mov rdx, rax
0x5555555547a2 <main+50> mov esi, 9
0x5555555547a7 <main+55> lea rdi, [rip + 0xc0] <0x7ffff7ffd948>
0x5555555547ae <main+62> call zip_open#plt <0x555555554620>
|
v ### PLT entry:
0x555555554620 <zip_open#plt> jmp qword ptr [rip + 0x200a02] <0x555555755028>
|
v
0x555555554626 <zip_open#plt+6> push 2
0x55555555462b <zip_open#plt+11> jmp 0x5555555545f0
|
v ### PLT stub:
0x5555555545f0 push qword ptr [rip + 0x200a12] <0x555555755008>
0x5555555545f6 jmp qword ptr [rip + 0x200a14] <0x7ffff7def0d0>
|
v ### Symbol gets correctly resolved
0x7ffff7def0d0 <_dl_runtime_resolve_fxsave> push rbx
0x7ffff7def0d1 <_dl_runtime_resolve_fxsave+1> mov rbx, rsp
0x7ffff7def0d4 <_dl_runtime_resolve_fxsave+4> and rsp, 0xfffffffffffffff0
0x7ffff7def0d8 <_dl_runtime_resolve_fxsave+8> sub rsp, 0x240
Second machine
x86-64, Linux 4.15, Ubuntu 18.04, gcc 7.4, ld 2.30. Here, something really strange is going on.
Compilation doesn't yield any warning or error, but I do not see the symbols:
$ readelf --dyn-syms prog
Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND dlopen#GLIBC_2.2.5 (3)
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
6: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize#GLIBC_2.2.5 (2)
The PLT entries are there, but they are filled with zeroes, and aren't even recognized by objdump:
$ objdump -j .plt -M intel -d prog
Disassembly of section .plt:
0000000000000560 <.plt>:
560: ff 35 4a 0a 20 00 push QWORD PTR [rip+0x200a4a] # 200fb0 <_GLOBAL_OFFSET_TABLE_+0x8>
566: ff 25 4c 0a 20 00 jmp QWORD PTR [rip+0x200a4c] # 200fb8 <_GLOBAL_OFFSET_TABLE_+0x10>
56c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
...
# ^^^
# Here, these three dots are actually hiding another 0x10+ bytes filled of 0x0
# zip_close#plt should be here instead...
0000000000000580 <dlopen#plt>:
580: ff 25 42 0a 20 00 jmp QWORD PTR [rip+0x200a42] # 200fc8 <dlopen#GLIBC_2.2.5>
586: 68 00 00 00 00 push 0x0
58b: e9 d0 ff ff ff jmp 560 <.plt>
...
# ^^^
# Here, these three dots are actually hiding another 0x10+ bytes filled of 0x0
# zip_open#plt should be here instead...
When the program is run, dlopen() works fine and loads libzip into memory, but then when zip_open() gets called, it just generates a segmentation fault:
$ ./prog
Segmentation fault (code dumped)
Taking a look with a debugger, the issue is even more obvious (in case it wasn't already obvious enough). The PLT entries filled with zeroes just end up decoding to a bunch of add instructions dereferencing rax, which contains an invalid address and makes the program segfault and die:
0x5555555546e5 <main+43> lea rax, [rbp - 0x14]
0x5555555546e9 <main+47> mov rdx, rax
0x5555555546ec <main+50> mov esi, 9
0x5555555546f1 <main+55> lea rdi, [rip + 0xc6]
0x5555555546f8 <main+62> call dlopen#plt+16 <0x555555554590>
|
v ### Broken PLT enrty (all 0x0, will cause a segfault):
0x555555554590 <dlopen#plt+16> add byte ptr [rax], al
0x555555554592 <dlopen#plt+18> add byte ptr [rax], al
0x555555554594 <dlopen#plt+20> add byte ptr [rax], al
0x555555554596 <dlopen#plt+22> add byte ptr [rax], al
0x555555554598 <dlopen#plt+24> add byte ptr [rax], al
0x55555555459a <dlopen#plt+26> add byte ptr [rax], al
0x55555555459c <dlopen#plt+28> add byte ptr [rax], al
0x55555555459e <dlopen#plt+30> add byte ptr [rax], al
### Next PLT entry...
0x5555555545a0 <__cxa_finalize#plt> jmp qword ptr [rip + 0x200a52] <0x7ffff7823520>
|
v
0x7ffff7823520 <__cxa_finalize> push r15
0x7ffff7823522 <__cxa_finalize+2> push r14
Questions
So, first of all... why is this happening?
I thought that this was supposed to work, isn't it? If not, why? And why only on one of the two machines?
But most importantly: how can I fix this?
For question 3 I want to emphasize that the whole point of this is that I want to load the library myself, without linking it, so please refrain from just commenting that this is bad practice, or whatever else.
The above Should Just Work™, and indeed it does seem to...
No, it should not, and if it appears to, that's more of an accident. In general, using --unresolved-symbols=... is a really bad idea™, and will almost never do what you want.
The solution is trivial: you just need to look up zip_open and zip_close, like so:
int main(void) {
void *lib;
zip_t *p_open(const char *, int, int *);
void *p_close(zip_t*);
int err;
zip_t *myzip;
lib = dlopen("libzip.so", RTLD_LAZY | RTLD_GLOBAL);
if (lib == NULL)
return 1;
p_open = (zip_t(*)(const char *, int, int *))dlsym(lib, "zip_open");
if (p_open == NULL)
return 1;
p_close = (void(*)(zip_t*))dlsym(lib, "zip_close");
if (p_close == NULL)
return 1;
myzip = p_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
p_close(myzip);
return 0;
}
To add to EmployedRussian's answer, you can achieve what you need with the help of Implib.so tool. It would generate stubs for all library symbols (e.g. zip_open) which would call dlopen/dlsym internally and forward calls from your program to shared library:
$ gcc -c prog.c
$ implib-gen.py path/to/libzip.so
$ gcc -o prog prog.o libzip.tramp.S libzip.init.c -ldl
(note that you no longer need fancy linker flags and linker dry runs).
As a side note what you are trying to do is called delayed loading and is a standard feature of Windows DLLS.

Builtins in Clang not so builtin?

If I have the following in strlen.c:
int call_strlen(char *s) {
return __builtin_strlen(s);
}
And then compile it with both gcc and clang like this:
gcc -c -o strlen-gcc.o strlen.c
clang -c -o strlen-clang.o strlen.c
I am surprised to see that strlen-clang.o contains a reference to "strlen", whereas gcc has expectedly inlined the function and has no such reference. (see objdumps below). Is this a bug in clang? I have tested it in several versions of the clang compiler, including 3.8.
Edit: the reason this is important for me is that I'm linking with -nostdlib, and the clang-compiled version gives me a link error that strlen is not found.
Clang
#> objdump -d strlen-clang.o
strlen-clang.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <call_strlen>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
c: 48 8b 7d f8 mov -0x8(%rbp),%rdi
10: e8 00 00 00 00 callq 15 <call_strlen+0x15>
15: 89 c1 mov %eax,%ecx
17: 89 c8 mov %ecx,%eax
19: 48 83 c4 10 add $0x10,%rsp
1d: 5d pop %rbp
1e: c3 retq
#> objdump -t strlen-clang.o
strlen-clang.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 strlen.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 g F .text 000000000000001f call_strlen
0000000000000000 *UND* 0000000000000000 strlen
GCC
#> objdump -d strlen-gcc.o
strlen-gcc.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <call_strlen>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 89 7d f8 mov %rdi,-0x8(%rbp)
8: 48 8b 45 f8 mov -0x8(%rbp),%rax
c: 48 c7 c1 ff ff ff ff mov $0xffffffffffffffff,%rcx
13: 48 89 c2 mov %rax,%rdx
16: b8 00 00 00 00 mov $0x0,%eax
1b: 48 89 d7 mov %rdx,%rdi
1e: f2 ae repnz scas %es:(%rdi),%al
20: 48 89 c8 mov %rcx,%rax
23: 48 f7 d0 not %rax
26: 48 83 e8 01 sub $0x1,%rax
2a: 5d pop %rbp
2b: c3 retq
#> objdump -t strlen-gcc.o
strlen-gcc.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 strlen.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g F .text 000000000000002c call_strlen
Just to get optimisation out of the way:
With clang -O0:
t.o:
(__TEXT,__text) section
_call_strlen:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 subq $0x10, %rsp
0000000000000008 movq %rdi, -0x8(%rbp)
000000000000000c movq -0x8(%rbp), %rdi
0000000000000010 callq _strlen
0000000000000015 movl %eax, %ecx
0000000000000017 movl %ecx, %eax
0000000000000019 addq $0x10, %rsp
000000000000001d popq %rbp
000000000000001e retq
With clang -O3
t.o:
(__TEXT,__text) section
_call_strlen:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 popq %rbp
0000000000000005 jmp _strlen
Now, onto the problem:
The clang documentation claims that clang support all GCC-supported builtins.
However, the GCC documentation seems to treat builtin functions and the names of their library equivalents as synonyms:
Both forms have the same type (including prototype), the same address (when their address is taken), and the same meaning as the C library functions [...].
Also it does not guarantee a builtin function with a library equivalent (as is the case with strlen) to indeed get optimised:
Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted.
Further, the clang internals manual mentions __builtin_strlen only once:
__builtin_strlen and strlen: These are constant folded as integer constant expressions if the argument is a string literal.
Other than that they seem to make no promises.
Since in your case the argument to __builtin_strlen is not a string literal, and since the GCC documentation allows for calls to builtin functions to be converted to library function calls, clang's behaviour seems perfectly valid.
A "patch for review" on the clang developers mailing list also says:
[...] It will still fall back to runtime use of library strlen, if
compile-time evaluation is not possible/required [...].
That was in 2012, but the text indicates that at least back then, only compile-time evaluation was supported.
Now, I see two options:
If you only need to compile the program yourself and then use and/or distribute it, I suggest you simply use gcc.
If you need others to be able to compile your code under both gcc and clang, I suggest adding a C library as a dependency for static linking.
I strongly advise against rolling your own implementations of standard library functions, even for seemingly simple cases (if you disagree, try writing up your own strlen implementation, then compare it to the glibc one).
Neither GCC nor Clang promises to inline this builtin. You quoted some GCC documentation seeming to make such a promise:
...GCC built-in functions are always expanded inline...
but this is a sentence fragment pulled out of context. The complete sentence is
With the exception of built-ins that have library equivalents such as the standard C library functions discussed below, or that expand to library calls, GCC built-in functions are always expanded inline and thus do not have corresponding entry points and their address cannot be obtained.
__builtin_strlen has the library equivalent strlen, so this sentence makes no promises about whether it gets inlined.

What c code would compile to something like `call *%eax`?

I'm working with c and assembly and I've seen call *%eax in a few spots. I wanted to write a small c program that would compile to something like this, but I'm stuck.
I was thinking about just writing up some assembly code like in this question: x86 assembly instruction: call *Reg only using AT&T syntax in my case to get a small example with the call in it. However, that wouldn't solve my burning question of what kind of c code compiles to that?
I understand that it is a call to the address that eax is pointing to.
Documentation: http://gcc.gnu.org/onlinedocs/gcc/Local-Reg-Vars.html#Local-Reg-Vars
Try this
#include <stdio.h>
typedef void (*FuncPtr)(void);
void _Func(void){
printf("Hello");
}
int main(int argc, char *argv[]){
register FuncPtr func asm ("eax") = _Func;
func();
return 0;
}
And its relative assembly:
.file "functorTest.c"
.section .rdata,"dr"
LC0:
.ascii "Hello\0"
.text
.globl __Func
.def __Func; .scl 2; .type 32; .endef
__Func:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $LC0, (%esp)
call _printf
leave
ret
.def ___main; .scl 2; .type 32; .endef
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
movl $__Func, %eax
call *%eax ; see?
movl $0, %eax
movl %ebp, %esp
popl %ebp
ret
.def _printf; .scl 2; .type 32; .endef
Just to add another example with local variables instead.
#include <stdio.h>
// A normal function with an int parameter
// and void return type
void fun(int a)
{
printf("Value of a is %d\n", a);
}
int main()
{
// fun_ptr is a pointer to function fun()
void (*fun_ptr)(int) = &fun;
// Invoking fun() using fun_ptr
(*fun_ptr)(10);
return 0;
}
The Assembly (There are some extra lines, for this was compiled in x86 on an x64 computer. I can provide the x64 if requested by new viewers)
000011c7 <main>:
11c7: 8d 4c 24 04 lea 0x4(%esp),%ecx
11cb: 83 e4 f0 and $0xfffffff0,%esp
11ce: ff 71 fc pushl -0x4(%ecx)
11d1: 55 push %ebp
11d2: 89 e5 mov %esp,%ebp
11d4: 51 push %ecx
11d5: 83 ec 14 sub $0x14,%esp
11d8: e8 28 00 00 00 call 1205 <__x86.get_pc_thunk.ax>
11dd: 05 23 2e 00 00 add $0x2e23,%eax
11e2: 8d 80 99 d1 ff ff lea -0x2e67(%eax),%eax
11e8: 89 45 f4 mov %eax,-0xc(%ebp)
11eb: 83 ec 0c sub $0xc,%esp
11ee: 6a 0a push $0xa
11f0: 8b 45 f4 mov -0xc(%ebp),%eax
11f3: ff d0 call *%eax
11f5: 83 c4 10 add $0x10,%esp
11f8: b8 00 00 00 00 mov $0x0,%eax
11fd: 8b 4d fc mov -0x4(%ebp),%ecx
1200: c9 leave
1201: 8d 61 fc lea -0x4(%ecx),%esp
1204: c3 ret

Resources