I have the following library:
shared4.c:
int get_another_int(void){
return 10;
}
And the binary:
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
int get_another_int(void);
int main(void){
void *handle = dlopen("/home/me/c/build/libshar4.so", RTLD_GLOBAL | RTLD_NOW);
if(!handle){
exit(EXIT_FAILURE);
}
printf("handle = %p\n", handle);
printf("another_int = %d\n", get_another_int());
}
I did not link the binary with the library and added an ld option to ignore the symbol not found error:
-Wl,-z,lazy -Wl,--unresolved-symbols=ignore-all
With such options the binary compiled and linked "fine".
The plt section looks as follows:
$ objdump -d -j .plt ./build/bin_shared
./build/bin_shared: file format elf64-x86-64
Disassembly of section .plt:
00000000000005e0 <.plt>:
5e0: ff 35 22 0a 20 00 pushq 0x200a22(%rip) # 201008 <_GLOBAL_OFFSET_TABLE_+0x8>
5e6: ff 25 24 0a 20 00 jmpq *0x200a24(%rip) # 201010 <_GLOBAL_OFFSET_TABLE_+0x10>
5ec: 0f 1f 40 00 nopl 0x0(%rax)
...
0000000000000600 <dlopen#plt>:
600: ff 25 1a 0a 20 00 jmpq *0x200a1a(%rip) # 201020 <dlopen#GLIBC_2.2.5>
606: 68 00 00 00 00 pushq $0x0
60b: e9 d0 ff ff ff jmpq 5e0 <.plt>
0000000000000610 <__printf_chk#plt>:
610: ff 25 12 0a 20 00 jmpq *0x200a12(%rip) # 201028 <__printf_chk#GLIBC_2.3.4>
616: 68 01 00 00 00 pushq $0x1
61b: e9 c0 ff ff ff jmpq 5e0 <.plt>
0000000000000620 <exit#plt>:
620: ff 25 0a 0a 20 00 jmpq *0x200a0a(%rip) # 201030 <exit#GLIBC_2.2.5>
626: 68 02 00 00 00 pushq $0x2
62b: e9 b0 ff ff ff jmpq 5e0 <.plt>
I examined the objdump and noticed the following fragment of main:
66b: e8 a0 ff ff ff callq 610 <__printf_chk#plt>
670: e8 7b ff ff ff callq 5f0 <.plt+0x10>
But the application failed to start with the following error:
$ ./build/bin_shared
./build/bin_shared: error while loading shared libraries: unexpected PLT reloc type 0x00
I looked at all relocation types:
$ objdump -R ./build/bin_shared
./build/bin_shared: file format elf64-x86-64
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
0000000000200dd8 R_X86_64_RELATIVE *ABS*+0x00000000000007a0
0000000000200de0 R_X86_64_RELATIVE *ABS*+0x0000000000000760
0000000000201040 R_X86_64_RELATIVE *ABS*+0x0000000000201040
0000000000200fd8 R_X86_64_GLOB_DAT _ITM_deregisterTMCloneTable
0000000000200fe0 R_X86_64_GLOB_DAT __libc_start_main#GLIBC_2.2.5
0000000000200fe8 R_X86_64_GLOB_DAT __gmon_start__
0000000000200ff0 R_X86_64_GLOB_DAT _ITM_registerTMCloneTable
0000000000200ff8 R_X86_64_GLOB_DAT __cxa_finalize#GLIBC_2.2.5
0000000000201020 R_X86_64_JUMP_SLOT dlopen#GLIBC_2.2.5
0000000000201028 R_X86_64_JUMP_SLOT __printf_chk#GLIBC_2.3.4
0000000000201030 R_X86_64_JUMP_SLOT exit#GLIBC_2.2.5
0000000000000000 R_X86_64_NONE *ABS* //<---- This relocation
Is there a way to workaround this and make symbols from a library loaded with dlopen, but not linked with ld during the link time to be available for dynamic linker so the application I showed above would run as expected?
Is there a way to workaround this
The linker must know that the symbol will be coming from some shared library, and needs to know what kind of symbol this is, in order to properly build a dynamic symbol reference for that symbol in the main executable.
Since you don't want to (or can't) provide libshar4.so at link time, your other option is to pretend that some other library that you do link against provides this symbol.
For example, since you use dlopen, you could create a dlopen_stub.so, which provides both dlopen and get_another_int (the actual implementation of either function in the stub can be empty), set the SONAME of this stub library to libdl.so.2 (or whatever SONAME your real libdl.so uses), and link your binary with that stub (instead of linking with -ldl).
At runtime, provided that LD_BIND_NOW is not in effect, the binary will not attempt to resolve get_another_int until after you've loaded libshar4.so, and by that time the symbol will be available.
Related
This question already has answers here:
GCC: Empty program == 23202 bytes?
(10 answers)
Closed 1 year ago.
I write a nothing.c, which is just one line as follows
int main(){}
Then I compile it using command gcc nothing.c -o nothing
Here's what I get using command readelf -x .text nothing
Hex dump of section '.text':
0x00001040 f30f1efa 31ed4989 d15e4889 e24883e4 ....1.I..^H..H..
0x00001050 f050544c 8d055601 0000488d 0ddf0000 .PTL..V...H.....
0x00001060 00488d3d c1000000 ff15722f 0000f490 .H.=......r/....
0x00001070 488d3d99 2f000048 8d05922f 00004839 H.=./..H.../..H9
0x00001080 f8741548 8b054e2f 00004885 c07409ff .t.H..N/..H..t..
0x00001090 e00f1f80 00000000 c30f1f80 00000000 ................
0x000010a0 488d3d69 2f000048 8d35622f 00004829 H.=i/..H.5b/..H)
0x000010b0 fe4889f0 48c1ee3f 48c1f803 4801c648 .H..H..?H...H..H
0x000010c0 d1fe7414 488b0525 2f000048 85c07408 ..t.H..%/..H..t.
0x000010d0 ffe0660f 1f440000 c30f1f80 00000000 ..f..D..........
0x000010e0 f30f1efa 803d252f 00000075 2b554883 .....=%/...u+UH.
0x000010f0 3d022f00 00004889 e5740c48 8b3d062f =./...H..t.H.=./
0x00001100 0000e829 ffffffe8 64ffffff c605fd2e ...)....d.......
0x00001110 0000015d c30f1f00 c30f1f80 00000000 ...]............
0x00001120 f30f1efa e977ffff fff30f1e fa554889 .....w.......UH.
0x00001130 e5b80000 00005dc3 0f1f8400 00000000 ......].........
0x00001140 f30f1efa 41574c8d 3da32c00 00415649 ....AWL.=.,..AVI
0x00001150 89d64155 4989f541 544189fc 55488d2d ..AUI..ATA..UH.-
0x00001160 942c0000 534c29fd 4883ec08 e88ffeff .,..SL).H.......
0x00001170 ff48c1fd 03741f31 db0f1f80 00000000 .H...t.1........
0x00001180 4c89f24c 89ee4489 e741ff14 df4883c3 L..L..D..A...H..
0x00001190 014839dd 75ea4883 c4085b5d 415c415d .H9.u.H...[]A\A]
0x000011a0 415e415f c366662e 0f1f8400 00000000 A^A_.ff.........
0x000011b0 f30f1efa c3 .....
So what does it do?
So what does it do?
You can see what it does:
objdump -d nothing
Disassembly of section .text:
0000000000001040 <_start>:
1040: 31 ed xor %ebp,%ebp
1042: 49 89 d1 mov %rdx,%r9
1045: 5e pop %rsi
1046: 48 89 e2 mov %rsp,%rdx
1049: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
104d: 50 push %rax
104e: 54 push %rsp
104f: 4c 8d 05 3a 01 00 00 lea 0x13a(%rip),%r8 # 1190 <__libc_csu_fini>
1056: 48 8d 0d d3 00 00 00 lea 0xd3(%rip),%rcx # 1130 <__libc_csu_init>
105d: 48 8d 3d c1 00 00 00 lea 0xc1(%rip),%rdi # 1125 <main>
1064: ff 15 76 2f 00 00 call *0x2f76(%rip) # 3fe0 <__libc_start_main#GLIBC_2.2.5>
106a: f4 hlt
106b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
0000000000001070 <deregister_tm_clones>:
1070: 48 8d 3d b1 2f 00 00 lea 0x2fb1(%rip),%rdi # 4028 <__TMC_END__>
1077: 48 8d 05 aa 2f 00 00 lea 0x2faa(%rip),%rax # 4028 <__TMC_END__>
... etc.
The compiler injects info when you are compiling the source code. This highly depends on the operating system and the compiler you are using.
For example, on a macOS, the compiler injects the so-called 'unwind info' which does something with unwinding the stack when there is an exception.
To get to know what the compiler injects in your .text file besides the empty main, you should generate a .map file in which you will see clearly what's going on. The next question will be why the compiler injects this extra section?
To generate a .map file use the following command:
gcc -Wl,-map,nothing.map nothing.c -o nothing
I'm trying to develop some very low-level x86 code following this document. I wrote the following C program:
void main()
{
char* video_memory = (char*) 0xb8000;
*video_memory = 'X';
}
I compile and link it like so:
gcc -m32 -fno-pie -c main.c -o main.o
ld -m elf_i386 -o main.bin -Ttext 513 --oformat binary main.o
This produces a binary called main.bin which is over a hundred megabytes. I disassembled that binary and it's basically my code (ten or so lines), then a hundred meg of zeros, and then some kind of footer.
The extra bytes are all unnecessary, because I used head to snip off the ones that weren't my code and it still ran fine.
I'm using 32-bit flags because my test machine is an old 32-bit laptop, but you can get similar (but less extreme) behavior in 64-bit. This script:
gcc -fno-pie -c main.c -o main.o
ld -o main.bin -Ttext 513 --oformat binary main.o
produces a main.bin of over 4 MB. Again the pattern is the same: my code, 4 meg of zeros, and then a footer. A little bit of noise in between my code and the zeros. Here's the disassembled 4MB file:
0: f3 0f 1e fa endbr64
4: 55 push %ebp
5: 48 dec %eax
6: 89 e5 mov %esp,%ebp
8: 48 dec %eax
9: c7 45 f8 00 80 0b 00 movl $0xb8000,-0x8(%ebp)
10: 48 dec %eax
11: 8b 45 f8 mov -0x8(%ebp),%eax
14: c6 00 58 movb $0x58,(%eax)
17: 90 nop
18: 5d pop %ebp
19: c3 ret
...
aea: 00 00 add %al,(%eax)
aec: 00 14 00 add %dl,(%eax,%eax,1)
aef: 00 00 add %al,(%eax)
af1: 00 00 add %al,(%eax)
af3: 00 00 add %al,(%eax)
af5: 01 7a 52 add %edi,0x52(%edx)
af8: 00 01 add %al,(%ecx)
afa: 78 10 js 0xb0c
afc: 01 1b add %ebx,(%ebx)
afe: 0c 07 or $0x7,%al
b00: 08 90 01 00 00 1c or %dl,0x1c000001(%eax)
b06: 00 00 add %al,(%eax)
b08: 00 1c 00 add %bl,(%eax,%eax,1)
b0b: 00 00 add %al,(%eax)
b0d: f3 f4 repz hlt
b0f: ff (bad)
b10: ff 1a lcall *(%edx)
b12: 00 00 add %al,(%eax)
b14: 00 00 add %al,(%eax)
b16: 45 inc %ebp
b17: 0e push %cs
b18: 10 86 02 43 0d 06 adc %al,0x60d4302(%esi)
b1e: 51 push %ecx
b1f: 0c 07 or $0x7,%al
b21: 08 00 or %al,(%eax)
...
3ffaeb: 00 00 add %al,(%eax)
3ffaed: 04 00 add $0x0,%al
3ffaef: 00 00 add %al,(%eax)
3ffaf1: 10 00 adc %al,(%eax)
3ffaf3: 00 00 add %al,(%eax)
3ffaf5: 05 00 00 00 47 add $0x47000000,%eax
3ffafa: 4e dec %esi
3ffafb: 55 push %ebp
3ffafc: 00 02 add %al,(%edx)
3ffafe: 00 00 add %al,(%eax)
3ffb00: c0 04 00 00 rolb $0x0,(%eax,%eax,1)
3ffb04: 00 03 add %al,(%ebx)
3ffb06: 00 00 add %al,(%eax)
3ffb08: 00 00 add %al,(%eax)
3ffb0a: 00 00 add %al,(%eax)
...
The giant binary files works, but it's ugly and I'd like to understand what's going on.
I'm doing the compilation/linking on Ubuntu 20.20 on a 64-bit machine. Tool versions:
gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)
GNU ld (GNU Binutils for Ubuntu) 2.34
I assumed a procedure call in the same object module wouldn't require relocation in the link stage. e.g. following code
void callee()
{
printf("Should I be relocated\n");
}
void caller()
{
callee();
}
After compile/assemble, I got following
Relocation section '.rel.text' at offset 0x438 contains 3 entries:
Offset Info Type Sym.Value Sym. Name
00000009 00000501 R_386_32 00000000 .rodata
0000000e 00000a02 R_386_PC32 00000000 puts
0000001b 00000902 R_386_PC32 00000000 callee
And the result of disassembly:
Disassembly of section .text:
00000000 <callee>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 18 sub $0x18,%esp
6: c7 04 24 00 00 00 00 movl $0x0,(%esp)
d: e8 fc ff ff ff call e <callee+0xe>
12: c9 leave
13: c3 ret
00000014 <caller>:
14: 55 push %ebp
15: 89 e5 mov %esp,%ebp
17: 83 ec 08 sub $0x8,%esp
1a: e8 fc ff ff ff call 1b <caller+0x7>
1f: c9 leave
20: c3 ret
Why does the procedure call in a same object module (1a: e8 fc ff ff ff call 1b ) need relocation? Does it depend on my toolchain? Does the PC-relative address (the address offset between the caller and callee) ever have chance to change when calling the procedure within a same object module? If not, why not just fix the code at 0x1a to "e8 e1 ff ff ff"
Relocation tables have to be stored inside each module, for that actual module, to allow load-time relocation of shared libs.
Since the dynamic linker in (most) Unix distributions can override functions in shared libraries, this means that the function can be relocated, even if the call happens inside a single module. Tools like Valgrind benefit from such features for instrumentation and leak detection.
So, as noted in comments, if you mark the function static, then the compiler can skip this part altogether and hardcode the jump.
With the following CFLAGS:
-Wall -Werror -Wextra -pedantic -std=c99 -O3 -nostartfiles -nodefaultlibs
my __start entry point (notice -nostartfiles) is successfully compiled and put into an output executable.
However, when I add -flto flag, both the entry point and functions called by it only are optimized out. Moreover, the following linking is performed with neither error nor warning, but with incorrect (random) entry point.
A question is how to prevent __start function from being optimized out. It`s also interesting for me why a linker “forgets” about an external dependency on my entry point in lack of the default one.
My GCC version is gcc (i686-posix-dwarf-rev1, Built by MinGW-W64 project) 4.9.2.
UPD:
Source code (fixed with help of #FUZxxl, who wrote about prepended underscores in Windows ABI):
#include <windows.h>
void _start()
{
MessageBox(NULL, TEXT("Hello world."), TEXT(""), MB_OK);
ExitProcess(0);
}
Assembly output emitted by a linker (-S):
Non--flto version:
Disassembly of section .text:
00401000 <__start>:
401000: 83 ec 1c sub $0x1c,%esp
401003: c7 44 24 0c 00 00 00 movl $0x0,0xc(%esp)
40100a: 00
40100b: c7 44 24 08 00 20 40 movl $0x402000,0x8(%esp)
401012: 00
401013: c7 44 24 04 0d 20 40 movl $0x40200d,0x4(%esp)
40101a: 00
40101b: c7 04 24 00 00 00 00 movl $0x0,(%esp)
401022: ff 15 54 40 40 00 call *0x404054
401028: 83 ec 10 sub $0x10,%esp
40102b: c7 04 24 00 00 00 00 movl $0x0,(%esp)
401032: ff 15 4c 40 40 00 call *0x40404c
401038: 90 nop
401039: 90 nop
40103a: 90 nop
40103b: 90 nop
40103c: 90 nop
40103d: 90 nop
40103e: 90 nop
40103f: 90 nop
00401040 <__CTOR_LIST__>:
401040: ff (bad)
401041: ff (bad)
401042: ff (bad)
401043: ff 00 incl (%eax)
401045: 00 00 add %al,(%eax)
...
00401048 <__DTOR_LIST__>:
401048: ff (bad)
401049: ff (bad)
40104a: ff (bad)
40104b: ff 00 incl (%eax)
40104d: 00 00 add %al,(%eax)
-flto version (notice the lack of _start here, just a bunch of thunks for API entries):
Disassembly of section .text:
00401000 <_ExitProcess#4>:
401000: ff 25 4c 30 40 00 jmp *0x40304c
401006: 90 nop
401007: 90 nop
00401008 <_MessageBoxA#16>:
401008: ff 25 54 30 40 00 jmp *0x403054
40100e: 90 nop
40100f: 90 nop
00401010 <__CTOR_LIST__>:
401010: ff (bad)
401011: ff (bad)
401012: ff (bad)
401013: ff 00 incl (%eax)
401015: 00 00 add %al,(%eax)
...
00401018 <__DTOR_LIST__>:
401018: ff (bad)
401019: ff (bad)
40101a: ff (bad)
40101b: ff 00 incl (%eax)
40101d: 00 00 add %al,(%eax)
With all the exotic/embedded-related options you've set, you have to ensure that your symbol is seen as your entrypoint and not garbage collected by linker optimizations (--gc-sections also does that: collecting "useless" sections)
You can end up with a fully empty .elf file since no section is reachable.
To tell the linker that you are using that symbol as an entrypoint (and avoid that the linker eludes it!), just add
-Wl,-e__start
option to your link command (or write a linker spec file where you declare your symbol, but the command line option is easier)
I found this text in the book Professional Assembly Language by Richard Blum.
The compiling step converts the text programming language statements
into the instruction codes required to carry out the application
function. Each of the HLL lines of code are matched up with one or
more instruction codes pertaining to the specific processor on which
the application will run. For example, the simple HLL code
int main()
{
int i = 1;
exit(0);
}
is compiled into the following IA-32 instruction codes:
55
89 E5
83 EC 08
C7 45 FC 01 00 00 00
83 EC 0C
6A 00
E8 D1 FE FF FF
But when I try this program myself, I cannot reproduce these results.
First some details about my system and compiler.
$ cat /etc/debian_version
8.3
$ uname -a
Linux debian1 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u2 (2016-01-02) x86_64 GNU/Linux
$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ dpkg -l gcc-multilib
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=========================================-=========================-=========================-========================================================================================
ii gcc-multilib 4:4.9.2-2 amd64 GNU C compiler (multilib files)
Here is the program I wrote.
$ cat foo.c
#include <stdlib.h>
int main()
{
int i = 1;
exit(0);
}
Here are the results I get after compiling only.
$ gcc -m32 -c foo.c
$ objdump -d foo.o
foo.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: 55 push %ebp
b: 89 e5 mov %esp,%ebp
d: 51 push %ecx
e: 83 ec 14 sub $0x14,%esp
11: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%ebp)
18: 83 ec 0c sub $0xc,%esp
1b: 6a 00 push $0x0
1d: e8 fc ff ff ff call 1e <main+0x1e>
Here are the results I get after compiling and linking.
$ gcc -c foo.c
$ objdump -d a.out | grep -A15 "<main>"
080483fb <main>:
80483fb: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483ff: 83 e4 f0 and $0xfffffff0,%esp
8048402: ff 71 fc pushl -0x4(%ecx)
8048405: 55 push %ebp
8048406: 89 e5 mov %esp,%ebp
8048408: 51 push %ecx
8048409: 83 ec 14 sub $0x14,%esp
804840c: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%ebp)
8048413: 83 ec 0c sub $0xc,%esp
8048416: 6a 00 push $0x0
8048418: e8 c3 fe ff ff call 80482e0 <exit#plt>
804841d: 66 90 xchg %ax,%ax
804841f: 90 nop
08048420 <__libc_csu_init>:
What can I do to reproduce the results provided by the author in the book?
The extra instructions not in the book are:
80483fb: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483ff: 83 e4 f0 and $0xfffffff0,%esp
8048402: ff 71 fc pushl -0x4(%ecx)
8048408: 51 push %ecx
8048409: 83 ec 14 sub $0x14,%esp
...
804841d: 66 90 xchg ax,ax
804841f: 90 nop
The first couple lines align the stack to a 16-byte boundary. This improves performance (arguments can't cross cache line boundary) and allow usage of SIMD instruction that only operate on 16-bit aligned addresses.
The xchg %ax, %ax at the end is a 2-byte NOP. The 3 bytes of nop don't matter because they are unreachable anyway. They are there to pad the __libc_csu_init function to a suitable alignment.
As for why the assembly differs, assembly is a programming language and there's usually more than one way to do things. You can't expect a C program to give the same output across compilers, versions of the same compiler or configurations of the same version.
In your specific case, the 16-bit stack alignment is due to -mpreferred-stack-boundary=4 and the 3 byte nop due to -falign-functions.
These are configured as default arguments when you call gcc. Either directly or by being required by -O2 or similar.