GCC behavior for unresolved weak functions - c

Consider the simple program below:
__attribute__((weak)) void weakf(void);
int main(int argc, char *argv[])
{
weakf();
}
When compiling this with gcc and running it on a Linux PC, it segfaults. When running it on ARM CM0 (arm-none-eabi-gcc), the linker replace the undefined symbol by a jump to the following instruction and a nop.
Where is this behavior documented? Is there possible ways to change it through command line options? I have been through GCC and LD documentations, there is no information about that.
If I check the ARM compiler doc however, this is clearly explained.

man nm
I was reading some docs and happened to come across a related quote for this:
man nm
says:
"V"
"v" The symbol is a weak object. When a weak defined symbol is linked with a normal defined symbol, the normal defined symbol is used with no error. When a weak undefined symbol is linked and
the symbol is not defined, the value of the weak symbol becomes zero with no error. On some systems, uppercase indicates that a default value has been specified.
"W"
"w" The symbol is a weak symbol that has not been specifically tagged as a weak object symbol. When a weak defined symbol is linked with a normal defined symbol, the normal defined symbol is
used with no error. When a weak undefined symbol is linked and the symbol is not defined, the value of the symbol is determined in a system-specific manner without error. On some systems,
uppercase indicates that a default value has been specified.
nm is part of Binutils, which GCC uses under the hood, so this should be canonical enough.
Then, example on your source file:
main.c
__attribute__((weak)) void weakf(void);
int main(int argc, char *argv[])
{
weakf();
}
we do:
gcc -O0 -ggdb3 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
nm main.out
which contains:
w weakf
and so it is a system-specific value. I can't find where the per-system behavior is defined however. I don't think you can do better than reading Binutils source here.
v would be fixed to 0, but that is used for undefined variables (which are objects): How to make weak linking work with GCC?
Then:
gdb -batch -ex 'disassemble/rs main' main.out
gives:
Dump of assembler code for function main:
main.c:
4 {
0x0000000000001135 <+0>: 55 push %rbp
0x0000000000001136 <+1>: 48 89 e5 mov %rsp,%rbp
0x0000000000001139 <+4>: 48 83 ec 10 sub $0x10,%rsp
0x000000000000113d <+8>: 89 7d fc mov %edi,-0x4(%rbp)
0x0000000000001140 <+11>: 48 89 75 f0 mov %rsi,-0x10(%rbp)
5 weakf();
0x0000000000001144 <+15>: e8 e7 fe ff ff callq 0x1030 <weakf#plt>
0x0000000000001149 <+20>: b8 00 00 00 00 mov $0x0,%eax
6 }
0x000000000000114e <+25>: c9 leaveq
0x000000000000114f <+26>: c3 retq
End of assembler dump.
which means it gets resolved at the PLT.
Then since I don't fully understand PLT, I experimentally verify that it resolves to address 0 and segfaults:
gdb -nh -ex run -ex bt main.out
I'm supposing the same happens on ARM, it must just set it to 0 as well.

On ARM with gcc this code does not work for me (test on armv7 with gcc Debian 4.6.3-14+rpi1). It looks like the arm compiler toolchain has a different behavior.
I do not found useful documentation for this behavior. It seems that the weakf equals NULL if it's undefine at link time.
So I sugest you to test it:
if (weakf == NULL) printf ("weakf not found\n");
else weakf();

Related

How to prevent all symbols from static library to load and why other symbols from same .o file get exported to test while linking static library

Suppose there are three c files, say a.c contains functions xx(), yy() and b.c contains nn(), mm() and c.c contains qq(), rr().
I made a static library stat.a out of a.o, b.o and c.o. If I link stat.a into a test which calls xx(), then symbol yy() also gets exported: nm test has both symbols xx and yy.
I would like to know why the symbols qq and rr do not get exported ?
Is there any method to prevent any other symbols than xx being loaded?
Here is an implementation of your scenario:
a.c
#include <stdio.h>
void xx(void)
{
puts(__func__);
}
void yy(void)
{
puts(__func__);
}
b.c
#include <stdio.h>
void nn(void)
{
puts(__func__);
}
void mm(void)
{
puts(__func__);
}
c.c
#include <stdio.h>
void qq(void)
{
puts(__func__);
}
void rr(void)
{
puts(__func__);
}
test.c
extern void xx(void);
int main(void)
{
xx();
return 0;
}
Compile all the *.c files to *.o files:
$ gcc -Wall -c a.c b.c c.c test.c
Make a static library stat.a, containing a.o, b.o, c.o:
$ ar rcs stat.a a.o b.o c.o
Link program test, inputting test.o and stat.a:
$ gcc -o test test.o stat.a
Run:
$ ./test
xx
Let's see the symbol tables of the object files in stat.a:
$ nm stat.a
a.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
U _GLOBAL_OFFSET_TABLE_
U puts
0000000000000000 T xx
0000000000000013 T yy
b.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
U _GLOBAL_OFFSET_TABLE_
0000000000000013 T mm
0000000000000000 T nn
U puts
c.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
U _GLOBAL_OFFSET_TABLE_
U puts
0000000000000000 T qq
0000000000000013 T rr
The definitions (T) of xx, yy are in member stat.a(a.o). Definitions of nn, mm
are in stat.a(b.o). Definitions of qq, rr are in stat.a(c.o).
Let's see which of those symbols are also defined in the symbol table of the program test:
$ nm test | egrep 'T (xx|yy|qq|rr|nn|mm)'
000000000000064a T xx
000000000000065d T yy
xx, which is called in the program, is defined. yy, which is not called, is also
defined. nn, mm, qq and rr, none of which are called, are all absent.
That's what you've observed.
I would like to know why the symbols qq and rr do not get exported?
What is a static library, such as stat.a, and what is its special role in a linkage?
It is an ar archive that conventionally - but not necessarily - contains nothing
but object files. You can offer such an archive to the linker from which to select the
object files it needs, if any, to carry on the linkage. The linker needs those object
files in the archive that provide definitions for symbols that have been
referenced, but not yet defined, in input files it has already linked. The
linker extracts the needed object files from the archive and inputs them to the
linkage, exactly as if they were individually named input files and the static library
was not mentioned at all.
So what the linker does with an input static library is different from what it does
with an input object file. Any input object file is linked into the output file unconditionally
(whether it is needed or not).
In this light, let's redo the linkage of test with some diagnostics (-trace) to show what
files are actually linked:
$ gcc -o test test.o stat.a -Wl,--trace
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
test.o
(stat.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
Apart from all the boiler-plate files for a C program linkage that gcc adds by
default, the only files of ours in the linkage are the two object files:
test.o
(stat.a)a.o
The linkage:
$ gcc -o test test.o stat.a
is exactly the same as the linkage:
$ gcc -o test test.o a.o
Let's think that through.
test.o was the first linker input. This object file was linked unconditionally into the program.
test.o contains a reference (specifically, a function call) to xx but no definition of the function xx.
So the linker now needs to find a definition of xx to complete the linkage.
The next linker input is the static library stat.a.
The linker searches stat.a for an object file that contains a defintion of xx.
It finds a.o. It extracts a.o from the archive and links it into the program.
There are no other unresolved symbol references in the linkage for which the
linker can find definitions in stat.a(b.o) or stat(c.o). So neither of those
object files is extracted and linked.
By extracting an linking (just) stat.a(a.o) the linker has got a definition
of xx that it needed to resolved the function call in test.o. But a.o also contains
the definition of yy. So that definition is also linked into the program.
nn, mm, qq and rr are not defined in the program because none of them
are defined in the object files that were linked into the program.
That's the answer to your first question. Your second is:
Is there any method to prevent any other symbols than xx being loaded?
There are at least two ways.
One is simply to define each of xx, yy, nn, mm, qq, rr in a source
file by itself. Then compile object files xx.o, yy.o, nn.o, mm.o, qq.o, rr.o
and archive all of them in stat.a. Then, if the linker ever needs to find an
object file in stat.a that defines xx, it will find xx.o, extract and link it,
and the definition of xx alone will be added to linkage.
There's another way that does not require you code just one function in each source
file. This way depends on the fact that an ELF object file, as produced by the
compiler, is composed of various sections and these sections are in fact the
units that the linker distinguishes and merges together into the output file. By
default, there is a standard ELF section for each kind of symbol. The
compiler places all of the function definitions in one code section and
all data definitions in an appropriate data section. The reason that your
linkage of program test contains the definitions of both xx and yy is that
the compiler has placed both of these definitions in the single code section of a.o,
so the linker can either merge that code section into the program, or not: it can
only link the definitions of xx and yy, or neither of them, so it is obliged
to link both, even though only xx is needed. Let's see the disassembly of the code section of a.o. By default the
code section is is called .text:
$ objdump -d a.o
a.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <xx>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <xx+0xb>
b: e8 00 00 00 00 callq 10 <xx+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
0000000000000013 <yy>:
13: 55 push %rbp
14: 48 89 e5 mov %rsp,%rbp
17: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 1e <yy+0xb>
1e: e8 00 00 00 00 callq 23 <yy+0x10>
23: 90 nop
24: 5d pop %rbp
25: c3 retq
There you see the definitions of xx and yy, both in the .text section.
But you can ask the compiler to place the definition of each global symbol
in its own section in the object file. Then the linker can seperate the code
section for any function definition from any other, and you can ask the linker
to throw away any sections that aren't used in the output file. Let's try that.
Compile all the source files again, this time asking for a separate section per symbol:
$ gcc -Wall -ffunction-sections -fdata-sections -c a.c b.c c.c test.c
Now look again at the disassembly of a.o:
$ objdump -d a.o
a.o: file format elf64-x86-64
Disassembly of section .text.xx:
0000000000000000 <xx>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <xx+0xb>
b: e8 00 00 00 00 callq 10 <xx+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
Disassembly of section .text.yy:
0000000000000000 <yy>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <yy+0xb>
b: e8 00 00 00 00 callq 10 <yy+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
Now we've got two code sections in a.o: .text.xx, containing only the definition of xx,
and .text.yy, containing only the definition of yy. The linker can merge either of
these sections into a program and not merge the other.
Rebuild stat.a
$ rm stat.a
$ ar rcs stat.a a.o b.o c.o
Relink the program, this time asking the linker to discard unused input sections
(-gc-sections). We'll also ask it to trace the files it loads (-trace)
and to print a mapfile for us (-Map=mapfile):
$ gcc -o test test.o stat.a -Wl,-gc-sections,-trace,-Map=mapfile
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
test.o
(stat.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
The -trace output is exactly the same as before. But check again which of our
symbols are defined in the program:
$ nm test | egrep 'T (xx|yy|qq|rr|nn|mm)'
000000000000064a T xx
Only xx, which is what you want.
The output of the program is the same as before:
$ ./test
xx
Finally look at the mapfile. Near the top you see:
mapfile
...
Discarded input sections
...
...
.text.yy 0x0000000000000000 0x13 stat.a(a.o)
...
...
The linker was able to throw away the redundant code section .text.yy from
the input file stat.a(a.o). That's why the redundant definition of yy is
no longer in the program.
I would like to know why the symbols qq and rr do not get exported ?
You have to inform the linker of your intention How to force gcc to link an unused static library
gcc -L./ -o test test.c -Wl,--whole-archive stat.a -Wl,--no-whole-archive
Is there any method to prevent any other symbols than xx being loaded?
From How do I include only used symbols when statically linking with gcc?
gcc -ffunction-sections -c a.c
gcc -L./ -o test test.c -Wl,--gc-sections stat.a

gcc subtracting from esp before call

I am planning to use C to write a small kernel and I really don't want it to bloat with unnecessary instructions.
I have two C files which are called main.c and hello.c. I compile and link them using the following GCC command:
gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o
I am dumping .text section using following OBJDUMP command:
objdump -w -j .text -D -mi386 -Maddr16,data16,intel main.o
and get the following dump:
00001000 <main>:
1000: 67 66 8d 4c 24 04 lea ecx,[esp+0x4]
1006: 66 83 e4 f0 and esp,0xfffffff0
100a: 67 66 ff 71 fc push DWORD PTR [ecx-0x4]
100f: 66 55 push ebp
1011: 66 89 e5 mov ebp,esp
1014: 66 51 push ecx
1016: 66 83 ec 04 sub esp,0x4
101a: 66 e8 10 00 00 00 call 1030 <hello>
1020: 90 nop
1021: 66 83 c4 04 add esp,0x4
1025: 66 59 pop ecx
1027: 66 5d pop ebp
1029: 67 66 8d 61 fc lea esp,[ecx-0x4]
102e: 66 c3 ret
00001030 <hello>:
1030: 66 55 push ebp
1032: 66 89 e5 mov ebp,esp
1035: 90 nop
1036: 66 5d pop ebp
1038: 66 c3 ret
My questions are: Why are machine codes at the following lines being generated?
I can see that subtraction and addition completes each other, but why are they generated? I don't have any variable to be allocated on stack. I'd appreciate a source to read about usage of ECX.
1016: 66 83 ec 04 sub esp,0x4
1021: 66 83 c4 04 add esp,0x4
main.c
extern void hello();
void main(){
hello();
}
hello.c
void hello(){}
lscript.ld
SECTIONS{
.text 0x1000 : {*(.text)}
}
As I mentioned in my comments:
The first few lines (plus the push ecx) are to ensure the stack is aligned on a 16-byte boundary which is required by the Linux System V i386 ABI. The pop ecx and lea before the ret in main is to undo that alignment work.
#RossRidge has provided a link to another Stackoverflow answer that details this quite well.
In this case you seem to be doing real mode development. GCC isn't well suited for this but it can work and I will assume you know what you are doing. I mention some of the pitfalls of using -m16 in this Stackoverflow answer. I put this warning in that answer regarding real mode development with GCC:
There are so many pitfalls in doing this that I recommend against it.
If you remain undeterred and wish to continue forward you can do a few things to minimize the code. The 16-byte alignment of the stack at the point a function call is made is part of the more recent Linux System V i386 ABIs. Since you are generating code for a non-Linux environment you can change the stack alignment to 4 using compiler option -mpreferred-stack-boundary=2 . The GCC manual says:
-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).
If we add that to your GCC command we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2:
00001000 <main>:
1000: 66 55 push ebp
1002: 66 89 e5 mov ebp,esp
1005: 66 e8 04 00 00 00 call 100f <hello>
100b: 66 5d pop ebp
100d: 66 c3 ret
0000100f <hello>:
100f: 66 55 push ebp
1011: 66 89 e5 mov ebp,esp
1014: 66 5d pop ebp
1016: 66 c3 ret
Now all the extra alignment code to get it on a 16-byte boundary has disappeared. We are left with typical function frame pointer prologue and epilogue code. This is often in the form of push ebp and mov ebp,esp pop ebp. we can remove these with the -fomit-frame-pointer define in the GCC manual as:
The option -fomit-frame-pointer removes the frame pointer for all functions which might make debugging harder.
If we add that option we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2 -fomit-frame-pointer:
00001000 <main>:
1000: 66 e8 02 00 00 00 call 1008 <hello>
1006: 66 c3 ret
00001008 <hello>:
1008: 66 c3 ret
You can then optimize for size with -Os. The GCC manual says this:
-Os
Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
This has a side effect that main will be placed into a section called .text.startup. If we display both with objdump -w -j .text -j .text.startup -D -mi386 -Maddr16,data16,intel main.o we get:
Disassembly of section .text:
00001000 <hello>:
1000: 66 c3 ret
Disassembly of section .text.startup:
00001002 <main>:
1002: e9 fb ff jmp 1000 <hello>
If you have functions in separate objects you can alter the calling convention so the first 3 Integer class parameters are passed in registers rather than the stack. The Linux kernel uses this method as well. Information on this can be found in the GCC documentation:
regparm (number)
On the Intel 386, the regparm attribute causes the compiler to pass arguments number one to number if they are of integral type in registers EAX, EDX, and ECX instead of on the stack. Functions that take a variable number of arguments will continue to be passed all of their arguments on the stack.
I wrote a Stackoverflow answer with code that uses __attribute__((regparm(3))) that may be a useful source of further information.
Other Suggestions
I recommend you consider compiling each object individually rather than altogether. This is also advantageous since it can be more easily be done in a Makefile later on.
If we look at your command line with the extra options mentioned above you'd have:
gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o \
-mpreferred-stack-boundary=2 -fomit-frame-pointer -Os
I recommend you do it this way:
gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
-fomit-frame-pointer main.c -o main.o
gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
-fomit-frame-pointer hello.c -o hello.o
The -c option (I added it to the beginning) forces the compiler to just generate the object file from the source and not to perform linking. You will also notice the -T lscript.ld has been removed. We have created .o files above. We can now use GCC to link all of them together:
gcc -ffreestanding -nostdlib -Wl,--build-id=none -m16 \
-Tlscript.ld main.o hello.o -o main.elf
The -ffreestanding will force the linker to not use the C runtime, the -Wl,--build-id=none will tell the compiler not to generate some noise in the executable for build notes. In order for this to really work you'll need a slightly more complex linker script that places the .text.startup before .text. This script also adds the .data section, the .rodata and .bss sections. The DISCARD option removes exception handling data and other unneeded information.
ENTRY(main)
SECTIONS{
.text 0x1000 : SUBALIGN(4) {
*(.text.startup);
*(.text);
}
.data : SUBALIGN(4) {
*(.data);
*(.rodata);
}
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss);
}
. = ALIGN(4);
__bss_end = .;
/DISCARD/ : {
*(.eh_frame);
*(.comment);
*(.note.gnu.build-id);
}
}
If we look at a complete OBJDUMP with objdump -w -D -mi386 -Maddr16,data16,intel main.elf we would see:
Disassembly of section .text:
00001000 <main>:
1000: e9 01 00 jmp 1004 <hello>
1003: 90 nop
00001004 <hello>:
1004: 66 c3 ret
If you want to convert main.elf to a binary file that you can place in a disk image and read it (ie. via BIOS interrupt 0x13), you can create it this way:
objcopy -O binary main.elf main.bin
If you dump main.bin with NDISASM using ndisasm -b16 -o 0x1000 main.bin you'd see:
00001000 E90100 jmp word 0x1004
00001003 90 nop
00001004 66C3 o32 ret
Cross Compiler
I can't stress this enough but you should consider using a GCC cross compiler. The OSDev Wiki has information on building one. It also has this to say about why:
Why do I need a Cross Compiler?
You need to use a cross-compiler unless you are developing on your own operating system. The compiler must know the correct target platform (CPU, operating system), otherwise you will run into trouble. If you use the compiler that comes with your system, then the compiler won't know it is compiling something else entirely. Some tutorials suggest using your system compiler and passing a lot of problematic options to the compiler. This will certainly give you a lot of problems in the future and the solution is build a cross-compiler.

TCC -c option error

I am trying to convert my .c file to a .s file using TCC, however, I get the error: tcc: cannot specify multiple files with -c
tcc.exe main.c -c main.S
What should I do?
tcc, as far as I can tell, does not have an option to generate an assembly listing.
tcc -c foo.c takes the C source file foo.c as input and generates a binary object file foo.o.
It can also take assembly files as input:
tcc -c asm.S preprocesses and assembles the assembly source in the existing asm.S file and generates an object file asm.o.
tcc -c asm.s is similar, but it doesn't preprocess the input file before assembling it.
The man page says:
TCC options are a very much like gcc options. The main difference is
that TCC can also execute directly the resulting program and give it
runtime arguments.
If tcc had an option to generate an assembly listing, then surely it would use the same option that gcc (and many other Unix-based compilers) use, namely -S -- but:
$ tcc -S foo.c
tcc: error: invalid option -- '-S'
$
You can get an assembly listing of sorts using objdump:
$ cat foo.c
#include <stdio.h>
int main(void) {
puts("hello");
}
$ tcc -c foo.c
$ objdump -d foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 81 ec 00 00 00 00 sub $0x0,%rsp
b: 48 8d 05 fc ff ff ff lea -0x4(%rip),%rax # e <main+0xe>
12: 48 89 c7 mov %rax,%rdi
15: b8 00 00 00 00 mov $0x0,%eax
1a: e8 fc ff ff ff callq 1b <main+0x1b>
1f: c9 leaveq
20: c3 retq
$
but as you can see you lose some information that you'd get from a compiler-generated assembly listing. (Playing with objdump options might give you more information.)
I'm using tcc version 0.9.25 on a Linux x86_64 system.
(remyabel posted a similar but less detailed answer but deleted it, I'm not sure why.)

How can I get the _GLOBAL_OFFSET_TABLE_ address in my program?

I want to get the address of _GLOBAL_OFFSET_TABLE_ in my program. One way is to use the nm command in Linux, maybe redirect the output to a file and parse that file to get address of _GLOBAL_OFFSET_TABLE_. However, that method seems to be quite inefficient. What are some more efficient methods of doing it?
This appears to work:
// test.c
#include <stdio.h>
extern void *_GLOBAL_OFFSET_TABLE_;
int main()
{
printf("_GLOBAL_OFFSET_TABLE = %p\n", &_GLOBAL_OFFSET_TABLE_);
return 0;
}
In order to get consistent address of _GLOBAL_OFFSET_TABLE_, matching nm's result, you will need to compile your code with -fPIE to do code-gen as if linking into a position-independent executable. (Otherwise you get a small integer like 0x2ed6 with -fno-pie -no-pie). The GCC default for most modern Linux distros is -fPIE -pie, which would make nm addresses be just offsets relative to an image base, and the runtime address be ASLRed. (This is normally good for security, but you may not want it.)
$: gcc -fPIE -no-pie test.c -o test
It gives:
$ ./test
_GLOBAL_OFFSET_TABLE = 0x6006d0
However, nm thinks different:
$ nm test | fgrep GLOBAL
0000000000600868 d _GLOBAL_OFFSET_TABLE_
Or with a GCC too old to know about PIEs at all, let alone have it -fPIE -pie as the default, -fpic can work.
If you use assembly language, you can get _GLOBAL_OFFSET_TABLE_ address without get_pc_thunk.
It is tricky way. :)
Here is the sample code :
$ cat test.s
.global main
main:
lea HEREIS, %eax # Now %eax holds address of _GLOBAL_OFFSET_TABLE_
.section .got
HEREIS:
$ gcc -o test test.s
This is available because .got section is adjacent to the <.got.plt>
Therefore the symbol HEREIS and _GLOBAL_OFFSET_TABLE_ locate at same address.
PS. You can check it works with objdump.
Disassembly of section .got:
080495e8 <HEREIS-0x4>:
80495e8: 00 00 add %al,(%eax)
...
Disassembly of section .got.plt:
080495ec <_GLOBAL_OFFSET_TABLE_>:
80495ec: 00 95 04 08 00 00 add %dl,0x804(%ebp)
80495f2: 00 00 add %al,(%eax)
80495f4: 00 00 add %al,(%eax)

What do R_X86_64_32S and R_X86_64_64 relocation mean?

Got the following error when I tried to compile a C application in 64-bit FreeBSD:
relocation R_X86_64_32S can not be used when making a shared object; recompile with -fPIC
What is R_X86_64_32S relocation and what is R_X86_64_64?
I've googled about the error, and it's possible causes - It'd be great if anyone could tell what R_X86_64_32S really means.
The R_X86_64_32S and R_X86_64_64 are names of relocation types, for code compiled for the amd64 architecture. You can look all of them up in the amd64 ABI.
According to it, R_X86_64_64 is broken down to:
R_X86_64 - all names are prefixed with this
64 - Direct 64 bit relocation
and R_X86_64_32S to:
R_X86_64 - prefix
32S - truncate value to 32 bits and sign-extend
which basically means "the value of the symbol pointed to by this relocation, plus any addend", in both cases. For R_X86_64_32S the linker then verifies that the generated value sign-extends to the original 64-bit value.
Now, in an executable file, the code and data segments are given a specified virtual base address. The executable code is not shared, and each executable gets its own fresh address space. This means that the compiler knows exactly where the data section will be, and can reference it directly. Libraries, on the other hand, can only know that their data section will be at a specified offset from the base address; the value of that base address can only be known at runtime. Hence, all libraries must be produced with code that can execute no matter where it is put into memory, known as position independent code (or PIC for short).
Now when it comes to resolving your problem, the error message speaks for itself.
For any of this to make sense, you must first:
see a minimal example of relocation: https://stackoverflow.com/a/30507725/895245
understand the basic structure of an ELF file: https://stackoverflow.com/a/30648229/895245
Standards
R_X86_64_64, R_X86_64_32 and R_X86_64_32S are all defined by the System V AMD ABI, which contains the AMD64 specifics of the ELF file format.
They are all possible values for the ELF32_R_TYPE field of a relocation entry, specified in the System V ABI 4.1 (1997) which specifies the architecture neutral parts of the ELF format. That standard only specifies the field, but not it's arch dependant values.
Under 4.4.1 "Relocation Types" we see the summary table:
Name Field Calculation
------------ ------ -----------
R_X86_64_64 word64 A + S
R_X86_64_32 word32 A + S
R_X86_64_32S word32 A + S
We will explain this table later.
And the note:
The R_X86_64_32 and R_X86_64_32S relocations truncate the computed value to 32-bits. The linker must verify that the generated value for the R_X86_64_32 (R_X86_64_32S) relocation zero-extends (sign-extends) to the original 64-bit value.
Example of R_X86_64_64 and R_X86_64_32
Let's first look into R_X86_64_64 and R_X86_64_32:
.section .text
/* Both a and b contain the address of s. */
a: .long s
b: .quad s
s:
Then:
as --64 -o main.o main.S
objdump -dzr main.o
Contains:
0000000000000000 <a>:
0: 00 00 add %al,(%rax)
0: R_X86_64_32 .text+0xc
2: 00 00 add %al,(%rax)
0000000000000004 <b>:
4: 00 00 add %al,(%rax)
4: R_X86_64_64 .text+0xc
6: 00 00 add %al,(%rax)
8: 00 00 add %al,(%rax)
a: 00 00 add %al,(%rax)
Tested on Ubuntu 14.04, Binutils 2.24.
Ignore the disassembly for now (which is meaningless since this is data), and look only to the labels, bytes and relocations.
The first relocation:
0: R_X86_64_32 .text+0xc
Which means:
0: acts on byte 0 (label a)
R_X86_64_: prefix used by all relocation types of the AMD64 system V ABI
32: the 64-bit address of the label s is truncated to a 32 bit address because we only specified a .long (4 bytes)
.text: we are on the .text section
0xc: this is the addend, which is a field of the relocation entry
The address of the relocation is calculated as:
A + S
Where:
A: the addend, here 0xC
S: the value of the symbol before relocation, here 00 00 00 00 == 0
Therefore, after relocation, the new address will be 0xC == 12 bytes into the .text section.
This is exactly what we expect, since s comes after a .long (4 bytes) and a .quad (8 bytes).
R_X86_64_64 is analogous, but simpler, since here there is no need to truncate the address of s. This is indicated by the standard through word64 instead of word32 on the Field column.
R_X86_64_32S vs R_X86_64_32
The difference between R_X86_64_32S vs R_X86_64_32 is when the linker will complain "with relocation truncated to fit":
32: complains if the truncated after relocation value does not zero extend the old value, i.e. the truncated bytes must be zero:
E.g.: FF FF FF FF 80 00 00 00 to 80 00 00 00 generates a complaint because FF FF FF FF is not zero.
32S: complains if the truncated after relocation value does not sign extend the old value.
E.g.: FF FF FF FF 80 00 00 00 to 80 00 00 00 is fine, because the last bit of 80 00 00 00 and the truncated bits are all 1.
See also: What does this GCC error "... relocation truncated to fit..." mean?
R_X86_64_32S can be generated with:
.section .text
.global _start
_start:
mov s, %eax
s:
Then:
as --64 -o main.o main.S
objdump -dzr main.o
Gives:
0000000000000000 <_start>:
0: 8b 04 25 00 00 00 00 mov 0x0,%eax
3: R_X86_64_32S .text+0x7
Now we can observe the "relocation" truncated to fit on 32S with a linker script:
SECTIONS
{
. = 0xFFFFFFFF80000000;
.text :
{
*(*)
}
}
Now:
ld -Tlink.ld a.o
Is fine, because: 0xFFFFFFFF80000000 gets truncated into 80000000, which is a sign extension.
But if we change the linker script to:
. = 0xFFFF0FFF80000000;
It now generates the error, because that 0 made it not be a sign extension anymore.
Rationale for using 32S for memory access but 32 for immediates: When is it better for an assembler to use sign extended relocation like R_X86_64_32S instead of zero extension like R_X86_64_32?
R_X86_64_32S and PIE (position independent executables
R_X86_64_32S cannot be used in position independent executables, e.g. done with gcc -pie, otherwise link fails with:
relocation R_X86_64_32S against `.text' can not be used when making a PIE object; recompile with -fPIC
l
I have provided a minimal example explaining it at: What is the -fPIE option for position-independent executables in gcc and ld?
That means that compiled a shared object without using -fPIC flag as you should:
gcc -shared foo.c -o libfoo.so # Wrong
You need to call
gcc -shared -fPIC foo.c -o libfoo.so # Right
Under ELF platform (Linux) shared objects are compiled with position independent code - code that can run from any location in memory, if this flag is not given, the code that is generated is position dependent, so it is not possible to use this shared object.
I ran into this problem and found this answer didn't help me. I was trying to link a static library together with a shared library. I also investigated putting the -fPIC switch earlier on the command line (as advised in answers elsewhere).
The only thing that fixed the problem, for me, was changing the static library to shared. I suspect the error message about -fPIC can happen due to a number of causes but fundamentally what you want to look at is how your libraries are being built, and be suspicious of libraries that are being built in different ways.
In my case the issue arose because the program to compile expected to find shared libraries in a remote directory, while only the corresponding static libraries were there in a mistake.
Actually, this relocation error was a file-not-found error in disguise.
I have detailed how I coped with it in this other thread https://stackoverflow.com/a/42388145/5459638
The above answer demonstrates what these relocations are, and I found building x86_64 objects with GCC -mcmodel=large flag can prevent R_X86_64_32S because the compiler has no assumption on the relocated address in this model.
In the following case:
extern int myarr[];
int test(int i)
{
return myarr[i];
}
Built with gcc -O2 -fno-pie -c test_array.c and disassemble with objdump -drz test_array.o, we have:
0: 48 63 ff movslq %edi,%rdi
3: 8b 04 bd 00 00 00 00 mov 0x0(,%rdi,4),%eax
6: R_X86_64_32S myarr
a: c3 ret
With -mcmodel=large, i.e. gcc -mcmodel=large -O2 -fno-pie -c test_array.c, we have:
0: 48 b8 00 00 00 00 00 movabs $0x0,%rax
7: 00 00 00
2: R_X86_64_64 myarr
a: 48 63 ff movslq %edi,%rdi
d: 8b 04 b8 mov (%rax,%rdi,4),%eax
10: c3 ret

Resources