embed assembly produced code into C program using FASM - c

I am trying to link assembly-compiled with c-compiled code, and I get undefined reference error during linking phase. This is how i do it:
[niko#dev1 test]$ cat ssefuncs.asm
format ELF64
EQUAL_ANY = 0000b
RANGES = 0100b
EQUAL_EACH = 1000b
EQUAL_ORDERED = 1100b
NEGATIVE_POLARITY = 010000b
BYTE_MASK = 1000000b
asm_sse:
movntdqa xmm0,[eax]
pcmpestri xmm0,[ecx],0x0
ret
[niko#dev1 test]$ fasm ssefuncs.asm ssefuncs.o
flat assembler version 1.71.50 (16384 kilobytes memory)
1 passes, 405 bytes.
[niko#dev1 test]$ ls -l ssefuncs.o
-rw-r--r-- 1 niko niko 405 Jan 31 14:52 ssefuncs.o
[niko#dev1 test]$ objdump -M intel -d ssefuncs.o
ssefuncs.o: file format elf64-x86-64
Disassembly of section .flat:
0000000000000000 <.flat>:
0: 67 66 0f 38 2a 00 movntdqa xmm0,XMMWORD PTR [eax]
6: 67 66 0f 3a 61 01 00 pcmpestri xmm0,XMMWORD PTR [ecx],0x0
d: c3 ret
[niko#dev1 test]$ cat stest.c
void asm_sse();
int main() {
asm_sse();
}
[niko#dev1 test]$ gcc -c stest.c
[niko#dev1 test]$ gcc -o stest ssefuncs.o stest.o
stest.o: In function `main':
stest.c:(.text+0xa): undefined reference to `asm_sse'
collect2: error: ld returned 1 exit status
[niko#dev1 test]$
Looking at the ELF file, it is very thin and I don't see any symbols. :
[niko#dev1 test]$ readelf -a ssefuncs.o
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 149 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 4
Section header string table index: 3
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .flat PROGBITS 0000000000000000 00000040
000000000000000e 0000000000000000 WAX 0 0 8
[ 2] .symtab SYMTAB 0000000000000000 0000004e
0000000000000030 0000000000000018 3 2 8
[ 3] .strtab STRTAB 0000000000000000 0000007e
0000000000000017 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
There are no relocations in this file.
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.symtab' contains 2 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .flat
No version information found in this file.
[niko#dev1 test]$
What is the correct way to embed FASM produced assembly code into a C program?

Subroutine names in assembly code are usually simply labels for certain positions within the instruction stream. They are not automatically made visible for linking with external object code. To make it possible, a symbol should be declared public. Also, by convention the code in ELF files resides in the .text section. Your assembly file should look like this:
format ELF64
EQUAL_ANY = 0000b
RANGES = 0100b
EQUAL_EACH = 1000b
EQUAL_ORDERED = 1100b
NEGATIVE_POLARITY = 010000b
BYTE_MASK = 1000000b
section '.text' code readable executable
asm_sse:
movntdqa xmm0,[eax]
pcmpestri xmm0,[ecx],0x0
ret
public asm_sse

It much depends on the compiler used. E.g. GCC (and by cloning, clang) has a very extensive facility for writing assembly language snippets in-line, handling the routine details of interfacing with the surrounding code (saving clobbered registers as needed, placing inputs where they can be used and picking up results, and matching inputs/outputs with what is given). This is usually the easiest way to go.
If the above isn't an option, you should start by writing a short C program, and compile it to assembly. Something like cc -g -S somefile.c should give you a somefile.s with assembly language. The -g (or other debugging enablement) should include comments in the code, allowing easier backreference to C. This will allow you to reverse engineer the compiler's result, and serve as a starting point for a standalone assembly file by messing with the inards of the compiled functions.
As the comment by #LaurentH says, often compilers mangle names of source symbols in generated assembly language to prevent clashing with outside symbols, by e.g. prepending _ or even some characters legal in the specific assembly but not in C, like . or $.

Related

Will an executable access shared-libraries' global variable via GOT?

I was learning dynamic linking recently and gave it a try:
dynamic.c
int global_variable = 10;
int XOR(int a) {
return global_variable;
}
test.c
#include <stdio.h>
extern int global_variable;
extern int XOR(int);
int main() {
global_variable = 3;
printf("%d\n", XOR(0x10));
}
The compiling commands are:
clang -shared -fPIC -o dynamic.so dynamic.c
clang -o test test.c dynamic.so
I was expecting that in executable test the main function will access global_variable via GOT. However, on the contrary, the global_variable is placed in test's data section and XOR in dynamic.so access the global_variable indirectly.
Could anyone tell me why the compiler didn't ask the test to access global_variable via GOT, but asked the shared object file to do so?
Part of the point of a shared library is that one copy gets loaded into memory, and multiple processes can access that one copy. But every program has its own copy of each of the library's variables. If they were accessed relative to the library's GOT then those would instead be shared among the processes using the library, just like the functions are.
There are other possibilities, but it is clean and consistent for each executable to provide for itself all the variables it needs. That requires the library functions to access all of its variables with static storage duration (not just external ones) indirectly, relative to the program. This is ordinary dynamic linking, just going the opposite direction from what you usually think of.
Turns out my clang produced PIC by default so it messed with results.
I will leave updated answer here, and the original can be read below it.
After digging a bit more into the topic i have noticed that compilation of test.c does not generate a .got section by itself. You can check it by compiling the executable into an object file and omitting the linking step for now (-c option):
clang -c -o test.o test.c
If you inspect the sections of resulting object file with readelf -S you will notice that there is no .got in there:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000035 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000210
0000000000000060 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000075
0000000000000004 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 00000079
0000000000000013 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 0000008c
0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.pr[...] NOTE 0000000000000000 00000090
0000000000000030 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000c0
0000000000000038 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 00000270
0000000000000018 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 000000f8
00000000000000d8 0000000000000018 12 4 8
[12] .strtab STRTAB 0000000000000000 000001d0
000000000000003e 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 00000288
0000000000000074 0000000000000000 0 0 1
This means that the entirety of .got section present in the test executable actually comes from dynamic.so, as it is PIC and uses GOT.
Would it be possible to compile dynamic.so as non-PIC as well? Turns out it apparently used to be 10 years ago (the article compiles examples to 32-bits, they dont have to work on 64 bits!). Linked article describes how a non-PIC shared library was relocated at load time - basically, every time an address that needed to be relocated after loading was present in machine code, it was instead set to zeroes and a relocation of a certain type was set in the library. During loading of the library the loader filled the zeros with actual runtime address of data/code that was needed. It is important to note that it cannot be applied in your though as 64-bit shared libraries cannot be made out of non-PIC (Source).
If you compile dynamic.so as a shared 32-bit library instead and do not use the -fPIC option (you usually need special repositories enabled to compile 32-bit code and have 32-bit libc installed):
gcc -m32 dynamic.c -shared -o dynamic.so
You will notice that:
// readelf -s dynamic.so
(... lots of output)
27: 00004010 4 OBJECT GLOBAL DEFAULT 19 global_variable
// readelf -S dynamic.so
(... lots of output)
[17] .got PROGBITS 00003ff0 002ff0 000010 04 WA 0 0 4
[18] .got.plt PROGBITS 00004000 003000 00000c 04 WA 0 0 4
[19] .data PROGBITS 0000400c 00300c 000008 00 WA 0 0 4
[20] .bss NOBITS 00004014 003014 000004 00 WA 0 0 1
global_variable is at offset 0x4010 which is inside .data section. Also, while .got is present (at offset 0x3ff0), it only contains relocations coming from other sources than your code:
// readelf -r
Offset Info Type Sym.Value Sym. Name
00003f28 00000008 R_386_RELATIVE
00003f2c 00000008 R_386_RELATIVE
0000400c 00000008 R_386_RELATIVE
00003ff0 00000106 R_386_GLOB_DAT 00000000 _ITM_deregisterTM[...]
00003ff4 00000206 R_386_GLOB_DAT 00000000 __cxa_finalize#GLIBC_2.1.3
00003ff8 00000306 R_386_GLOB_DAT 00000000 __gmon_start__
00003ffc 00000406 R_386_GLOB_DAT 00000000 _ITM_registerTMCl[...]
This article introduces GOT as part of introduction on PIC, and i have found that to be the case in plenty of places, which would suggest that indeed GOT is only used by PIC code although i am not 100% sure of it and i recommend researching the topic more.
What does this mean for you? A section in the first article i linked called "Extra credit #2" contains an explanation for a similar scenario. Although it is 10 years old, uses 32-bit code and the shared library is non-PIC it shares some similarities with your situation and might explain the problem you presented in your question.
Also keep in mind that (although similar) -fPIE and -fPIC are two separate options with slightly different effects and that if your executable during inspection is not loaded at 0x400000 then it probably is compiled as PIE without your knowledge which might also have impact on results. In the end it all boils down to what data is to be shared between processes, what data/code can be loaded at arbitrary address, what has to be loaded at fixed address etc. Hope this helps.
Also two other answers on Stack Overflow which seem relevant to me: here and here. Both the answers and comments.
Original answer:
I tried reproducing your problem with exactly the same code and compilation commands as the ones you provided, but it seems like both main and XOR use the GOT to access the global_variable. I will answer by providing example output of commands that i used to inspect the data flow. If your outputs differ from mine, it means there is some other difference between our environments (i mean a big difference, if only addresses/values are different then its ok). Best way to find that difference is for you to provide commands you originally used as well as their output.
First step is to check what address is accessed whenever a write or read to global_variable happens. For that we can use objdump -D -j .text test command to disassemble the code and look at the main function:
0000000000001150 <main>:
1150: 55 push %rbp
1151: 48 89 e5 mov %rsp,%rbp
1154: 48 8b 05 8d 2e 00 00 mov 0x2e8d(%rip),%rax # 3fe8 <global_variable>
115b: c7 00 03 00 00 00 movl $0x3,(%rax)
1161: bf 10 00 00 00 mov $0x10,%edi
1166: e8 d5 fe ff ff call 1040 <XOR#plt>
116b: 89 c6 mov %eax,%esi
116d: 48 8d 3d 90 0e 00 00 lea 0xe90(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1174: b0 00 mov $0x0,%al
1176: e8 b5 fe ff ff call 1030 <printf#plt>
117b: 31 c0 xor %eax,%eax
117d: 5d pop %rbp
117e: c3 ret
117f: 90 nop
Numbers in the first column are not absolute addresses - instead they are offsets relative to the base address at which the executable will be loaded. For the sake of explanation i will refer to them as "offsets".
The assembly at offset 0x115b and 0x1161 comes directly from the line global_variable = 3; in your code. To confirm that, you could compile the program with -g for debug symbols and invoke objdump with -S. This will display source code above corresponding assembly.
We will focus on what these two instructions are doing. First instruction is a mov of 8 bytes from a location in memory to the rax register. The location in memory is given as relative to the current rip value, offset by a constant 0x2e8d. Objdump already calculated the value for us, and it is equal to 0x3fe8. So this will take 8 bytes present in memory at the 0x3fe8 offset and store them in the rax register.
Next instruction is again a mov, the suffix l tells us that data size is 4 bytes this time. It stores a 4 byte integer with value equal to 0x3 in the location pointed to by the current value of rax (not in the rax itself! brackets around a register such as those in (%rax) signify that the location in the instruction is not the register itself, but rather where its contents are pointing to!).
To summarize, we read a pointer to a 4 byte variable from a certain location at offset 0x3fe8 and later store an immediate value of 0x3 at the location specified by said pointer. Now the question is: where does that offset of 0x3fe8 come from?
It actually comes from GOT. To show the contents of the .got section we can use the objdump -s -j .got test command. -s means we want to focus on actual raw contents of the section, without any disassembling. The output in my case is:
test: file format elf64-x86-64
Contents of section .got:
3fd0 00000000 00000000 00000000 00000000 ................
3fe0 00000000 00000000 00000000 00000000 ................
3ff0 00000000 00000000 00000000 00000000 ................
The whole section is obviously set to zero, as GOT is populated with data after loading the program into memory, but what is important is the address range. We can see that .got starts at 0x3fd0 offset and ends at 0x3ff0. This means it also includes the 0x3fe8 offset - which means the location of global_variable is indeed stored in GOT.
Another way of finding this information is to use readelf -S test to show sections of the executable file and scroll down to the .got section:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
(...lots of sections...)
[22] .got PROGBITS 0000000000003fd0 00002fd0
0000000000000030 0000000000000008 WA 0 0 8
Looking at the Address and Size columns, we can see that the section is loaded at offset 0x3fd0 in memory and its size is 0x30 - which corresponds to what objdump displayed. Note that in readelf ouput "Offset" is actually the offset into the file form which the program is loaded - not the offset in memory that we are interested in.
by issuing the same commands on the dynamic.so library we get similar results:
00000000000010f0 <XOR>:
10f0: 55 push %rbp
10f1: 48 89 e5 mov %rsp,%rbp
10f4: 89 7d fc mov %edi,-0x4(%rbp)
10f7: 48 8b 05 ea 2e 00 00 mov 0x2eea(%rip),%rax # 3fe8 <global_variable##Base-0x38>
10fe: 8b 00 mov (%rax),%eax
1100: 5d pop %rbp
1101: c3 ret
So we see that both main and XOR use GOT to find the location of global_variable.
As for the location of global_variable we need to run the program to populate GOT. For that we can use GDB. We can run our program in GDB by invoking it this way:
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:." gdb ./test
LD_LIBRARY_PATH environment variable tells linker where to look for shared objects, so we extend it to include the current directory "." so that it may find dynamic.so.
After the GDB loads our code, we may invoke break main to set up a breakpoint at main and run to run the program. The program execution should pause at the beginning of the main function, giving us a view into our executable after it was fully loaded into memory, with GOT populated.
Running disassemble main in this state will show us the actual absolute offsets into memory:
Dump of assembler code for function main:
0x0000555555555150 <+0>: push %rbp
0x0000555555555151 <+1>: mov %rsp,%rbp
=> 0x0000555555555154 <+4>: mov 0x2e8d(%rip),%rax # 0x555555557fe8
0x000055555555515b <+11>: movl $0x3,(%rax)
0x0000555555555161 <+17>: mov $0x10,%edi
0x0000555555555166 <+22>: call 0x555555555040 <XOR#plt>
0x000055555555516b <+27>: mov %eax,%esi
0x000055555555516d <+29>: lea 0xe90(%rip),%rdi # 0x555555556004
0x0000555555555174 <+36>: mov $0x0,%al
0x0000555555555176 <+38>: call 0x555555555030 <printf#plt>
0x000055555555517b <+43>: xor %eax,%eax
0x000055555555517d <+45>: pop %rbp
0x000055555555517e <+46>: ret
End of assembler dump.
(gdb)
Our 0x3fe8 offset has turned into an absolute address of equal to 0x555555557fe8. We may again check that this location comes from the .got section by issuing maintenance info sections inside GDB, which will list a long list of sections and their memory mappings. For me .got is placed in this address range:
[21] 0x555555557fd0->0x555555558000 at 0x00002fd0: .got ALLOC LOAD DATA HAS_CONTENTS
Which contains 0x555555557fe8.
To finally inspect the address of global_variable itself we may examine the contents of that memory by issuing x/xag 0x555555557fe8. Arguments xag of the x command deal with the size, format and type of data being inspected - for explanation invoke help x in GDB. On my machine the command returns:
0x555555557fe8: 0x7ffff7fc4020 <global_variable>
On your machine it may only display the address and the data, without the "<global_variable>" helper, which probably comes from an extension i have installed called pwndbg. It is ok, because the value at that address is all we need. We now know that the global_variable is located in memory under the address 0x7ffff7fc4020. Now we may issue info proc mappings in GDB to find out what address range does this address belong to. My output is pretty long, but among all the ranges listed there is one of interest to us:
0x7ffff7fc4000 0x7ffff7fc5000 0x1000 0x3000 /home/user/test_got/dynamic.so
The address is inside of that memory area, and GDB tells us that it comes from the dynamic.so library.
In case any of the outputs of said commands are different for you (change in a value is ok - i mean a fundamental difference like addresses not belonging to certain address ranges etc.) please provide more information about what exactly did you do to come to the conclusion that global_variable is stored in the .data section - what commands did you invoke and what outputs they produced.

Assembly code different from gdb display of code

I'm learning about operating systems from the book Operating Systems from 0 to 1, and I'm trying to display the code in my kernel called main, however the code displayed in GDB is not the same even though I jumped to the address that is the entry point.
bootloader.asm
;*************************************************
; bootloader.asm
; A Simple Bootloader
;*************************************************
bits 16
start: jmp boot
;; constants and variable definitions
msg db "Welcome to My Operating System!", 0ah, 0dh, 0h
boot:
cli ; no interrupts
cld ; all that we need to init
mov ax, 0x0000
;; set buffer
mov es, ax
mov bx, 0x0600
mov al, 1 ; read one sector
mov ch, 0 ; track 0
mov cl, 2 ; sector to read
mov dh, 0 ; head number
mov dl, 0 ; drive number
mov ah, 0x02 ; read sectors from disk
int 0x13 ; call the BIOS routine
jmp 0x0000:0x0600 ; jump and execute the sector!
hlt ; halt the system
; We have to be 512 bytes. Clear the rest of the bytes with 0
times 510 - ($-$$) db 0
dw 0xAA55 ; Boot Signature
readelf -l main
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x600
Start of program headers: 52 (bytes into file)
Start of section headers: 12888 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 12
Section header string table index: 11
readelf -l main
Elf file type is EXEC (Executable file)
Entry point 0x600
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000000 0x00000000 0x00000000 0x00094 0x00094 R 0x4
LOAD 0x000000 0x00000000 0x00000000 0x00094 0x00094 R 0x4
LOAD 0x000100 0x00000600 0x00000600 0x00006 0x00006 R E 0x100
Section to Segment mapping:
Segment Sections...
00
01
02 .text
main.c
void main(){}
objdump -z -M intel -S -D build/os/main
Disassembly of section .text:
00000600 <main>:
void main(){}
600: 55 push ebp
601: 89 e5 mov ebp,esp
603: 90 nop
604: 5d pop ebp
605: c3 ret
But this is GDB's output by setting a breakpoint at main 0x600
0x600 <main> jg 0x647 │
│ 0x602 <main+2> dec esp │
│ 0x603 <main+3> inc esi │
│ 0x604 <main+4> add DWORD PTR [ecx],eax │
why is this happening? Am I loading at the wrong address? How do I find the correct address to load at?
edit:
here is the code for compiling;
nasm -f elf bootloader.asm -F dwarf -g -o ../build/bootloader/bootloader.o
ld -m elf_i386 -T bootloader.lds ../build/bootloader/bootloader.o -o ../build/bootloader/bootloader.o.elf
objcopy -O binary ../build/bootloader/bootloader.o.elf ../build/bootloader/bootloader.o
gcc -ffreestanding -nostdlib -fno-pic -gdwarf-4 -m16 -ggdb3 -c main.c -o ../build/os/main.o
ld -m elf_i386 -nmagic -T os.lds ../build/os/main.o -o ../build/os/main
dd if=/dev/zero of=disk.img bs=512 count=2880
2880+0 records in
2880+0 records out
1474560 bytes (1.5 MB, 1.4 MiB) copied, 0.0150958 s, 97.7 MB/s
dd conv=notrunc if=build/bootloader/bootloader.o of=disk.img bs=512 count=1 seek=0
1+0 records in
1+0 records out
512 bytes copied, 0.000127745 s, 4.0 MB/s
dd conv=notrunc if=build/os/main.o of=disk.img bs=512 count=$((8504/512))
seek=1
16+0 records in
16+0 records out
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.000184251 s, 44.5 MB/s
qemu-system-i386 -machine q35 -fda disk.img -gdb tcp::26000 -S
and gdb code for displaying main code;
set architecture i8086
target remote localhost:26000
b *0x7c00
set disassembly-flavor intel
layout asm
layout reg
symbol-file build/os/main
b main
jg / dec esp / inc esi is the ELF magic number, not machine code! You'll see the same thing from the start of the output of ndisasm -b32 /bin/ls. (ndisasm always treats its input as a flat binary; it doesn't look for any metadata.)
7F 45 4C 46 is the string "ELF" after a 0x7F byte, the ELF magic number that identifies the file format as ELF. It's followed by more ELF header bytes before the actual machine code for main. objdump -D disassembles all ELF sections, but it still parses the ELF headers, not disassembling them like ndisasm does. So you still just end up seeing the code from the .text section because the others are empty (because you linked this executable without libc or CRT startfiles, and with C main as the ELF entry point?!?)
You're jumping to the start of the ELF file as if it was a flat binary. It's not, writing an ELF program loader is not that simple. The ELF program headers (which readelf can parse) tell you which file offset goes at which address. The start of the .text section will be at some offset into the file, not overlapping the ELF magic number for obvious reasons. (Although it can overlap with the ELF header if you can find a way to make it fit: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html)
Then once you have the file mapped into memory as specified in the program headers, you jump to the ELF entry point address (0x600 in your case). (Which is normally not a function; under a real OS like Linux, you can't ret from the entry point. Instead you need to make an exit system call.) You can't here, either, because you jmp to it instead of call.
This is why _start is separate from main; building a program with a compiler-generated main as its entry point doesn't work.
Of course most of this effort is doomed because you're jumping to your main with the CPU still in 16-bit real mode. But your main is compiled/assembled for 32-bit mode. You could somewhat work around that with gcc -m16 to assemble gcc output for 16-bit mode, using operand-size + address-size prefixes as necessary.
The machine code for that do-nothing main will actually work the in both 16 and 32-bit mode. If you'd used a return 0 without optimization, that wouldn't be the case: the opcode (without prefixes) for mov eax, imm32 implies a different instruction length depending on what mode the CPU decodes it in, so decoding in 16-bit mode would write AX and leave 2 bytes of zeros.
Most likely the easiest thing to do is turn your "kernel" into a flat binary, instead of writing an ELF program loader in your bootloader. Follow an osdev tutorial because lots can go wrong, and you have to be careful about static data for example.
Or see How to make the kernel for my bootloader? for an example bootloader that calls a C function after switching to 32-bit protected mode.
See more links in https://stackoverflow.com/tags/x86/info.

gdb incorrectly resolving stack variable location

While debugging a small kernel I am writing for fun/learning experience, I encountered a somewhat puzzling issue with gdb where it is apparently not correctly resolving local variable addresses on the stack. My investigation so far suggests that the debugging symbols are correct but somehow gdb still reads from a wrong memory location when displaying the contents of that variable.
The relevant C code in question is:
typedef union
{
uint16_t packed;
struct __attribute__((packed))
{
uint8_t PhysicalLimit;
uint8_t LinearLimit;
} limits;
} MemAddrLimits;
void KernelMain32()
{
ClearScreen();
SimplePrint("kernelMain32");
MemAddrLimits memAddr;
memAddr.packed = GetMemoryAddressLimits();
for (;;) {}
}
where GetMemoryAddressLimits() returns the memory address widths provided by the cpuid instruction in a 2-byte integer (0x3028 currently for my tests). However, when stepping through this function using gdb to print the value of memAddr does not show the right result:
gdb> p memAddr
$1 = {packed = 0, limits = {PhysicalLimit = 0 '\000', LinearLimit = 0 '\000'}}
gdb> info locals
memAddr = {packed = 0, limits = {PhysicalLimit = 0 '\000', LinearLimit = 0 '\000'}}
gdb> info addr memAddr prints Symbol "memAddr" is a variable at frame base reg $ebp offset 8+-18. i.e., memAddr is located at ebp-10 and, indeed, inspecting that address shows the expected content:
gdb> x/hx $ebp-10
0x8ffee: 0x3028
In contrast gdb> p &memAddr gives a value of (MemAddrLimits *) 0x7f6 at which location the memory is zeroed.
When declaring memAddr as a uint16_t instead of my union type these issues do not occur. In that case we get
gdb> info addr memAddr
Symbol "memAddr" is multi-location:
Range 0x8b95-0x8b97: a variable in $eax
.
However, the result is still (also) written to ebp-10, i.e., the disassembly of the function is identical - the only difference is in debug symbols.
Am I missing something here or has someone a good idea on what might be going wrong in this case?
More Details
Program versions and build flags
Using gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0 and GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1.
Compiling with flags
-ffreestanding -m32 -fcf-protection=none -fno-pie -fno-pic -O0 -gdwarf-2 -fvar-tracking -fvar-tracking-assignments
and linking with
-m elf_i386 -nodefaultlibs -nostartfiles -Ttext 0x7c00 -e start -g
The linking phase produces the kernel.elf which I postprocess to extract the raw executable binary as well as a symbols file to load into gdb. So far, this has been working well for me.
There's obviously more code involved in the binary than what I have shown, most of which written in assembly, which shouldn't be relevant here.
Compiled Files
gcc generates the following code (snippet from objdump -d kernel.elf):
00008b74 <KernelMain32>:
8b74: 55 push ebp
8b75: 89 e5 mov ebp,esp
8b77: 83 ec 18 sub esp,0x18
8b7a: e8 f0 fe ff ff call 8a6f <ClearScreen>
8b7f: 68 41 8c 00 00 push 0x8c41
8b84: e8 7a ff ff ff call 8b03 <SimplePrint>
8b89: 83 c4 04 add esp,0x4
8b8c: e8 0f 00 00 00 call 8ba0 <GetMemoryAddressLimits>
8b91: 66 89 45 f6 mov WORD PTR [ebp-0xa],ax
8b95: eb fe jmp 8b95 <KernelMain32+0x21>
From that we can see that memAddr is indeed located at ebp-10 on the stack, consistent to what gdb> info addr memAddr told us.
Dwarf information (objdump --dwarf kernel.elf):
<1><4ff>: Abbrev Number: 20 (DW_TAG_subprogram)
<500> DW_AT_external : 1
<501> DW_AT_name : (indirect string, offset: 0x23c): KernelMain32
<505> DW_AT_decl_file : 2
<506> DW_AT_decl_line : 79
<507> DW_AT_decl_column : 6
<508> DW_AT_low_pc : 0x8b74
<50c> DW_AT_high_pc : 0x8b97
<510> DW_AT_frame_base : 0x20 (location list)
<514> DW_AT_GNU_all_call_sites: 1
<515> DW_AT_sibling : <0x544>
<2><519>: Abbrev Number: 21 (DW_TAG_variable)
<51a> DW_AT_name : (indirect string, offset: 0x2d6): memAddr
<51e> DW_AT_decl_file : 2
<51f> DW_AT_decl_line : 86
<520> DW_AT_decl_column : 19
<521> DW_AT_type : <0x4f3>
<525> DW_AT_location : 2 byte block: 91 6e (DW_OP_fbreg: -18)
and relevant snippet from objdump --dwarf=loc kernel.elf:
Offset Begin End Expression
00000000 <End of list>
objdump: Warning: There is an overlap [0x8 - 0x0] in .debug_loc section.
00000000 <End of list>
objdump: Warning: There is a hole [0x8 - 0x20] in .debug_loc section.
00000020 00008b74 00008b75 (DW_OP_breg4 (esp): 4)
0000002c 00008b75 00008b77 (DW_OP_breg4 (esp): 8)
00000038 00008b77 00008b97 (DW_OP_breg5 (ebp): 8)
00000044 <End of list>
[...]
These all seem to be what I'd expect. (I'm not sure if the warnings in the last one have significance, though).
Additional Note
If I change compilation flag -gdwarf-2 to just -g I get
gdb> p &memAddr
$1 = (MemAddrLimits *) 0x8ffde
gdb> info addr memAddr
Symbol "memAddr" is a complex DWARF expression:
0: DW_OP_fbreg -18
.
gdb> p memAddr
$2 = {packed = 0, limits = {PhysicalLimit = 0 '\000', LinearLimit = 0 '\000'}}
gdb> p/x $ebp-10
$3 = 0x8ffee
So memAddr is still not resolved correctly but p &memAddr at least is in the stack frame and not somewhere completely different. However, info addr memAddr seems to have problems now...
After some more investigation, I have tracked this to being due to remote debugging 32-bit code (my kernel not yet having switched to long mode) on a x86-64 qemu emulated system.
If I debug the same code with qemu-system-i386 everything works just as it should.

Why does register_tm_clones and deregister_tm_clones reference an address past the .bss section? Where is this memory allocated?

register_tm_clones and deregister_tm_clones are referencing memory addresses past the end of my RW sections. How is this memory tracked?
Example: In the example below deregister_tm_clones references memory address 0x601077, but the last RW section we allocated, .bss starts at 0x601069 and has size 0x7, adding we get 0x601070. So the reference is clearly past whats been allocated for the .bss section and should be in our heap space, but who's managing it.
objdump -d main
...
0000000000400540 <deregister_tm_clones>:
400540: b8 77 10 60 00 mov $0x601077,%eax
400545: 55 push %rbp
400546: 48 2d 70 10 60 00 sub $0x601070,%rax
40054c: 48 83 f8 0e cmp $0xe,%rax
...
readelf -S main
...
[25] .data PROGBITS 0000000000601040 00001040
0000000000000029 0000000000000000 WA 0 0 16
[26] .bss NOBITS 0000000000601069 00001069
0000000000000007 0000000000000000 WA 0 0 1
[27] .comment PROGBITS 0000000000000000 00001069
0000000000000058 0000000000000001 MS 0 0 1
[28] .shstrtab STRTAB 0000000000000000 000019f2
000000000000010c 0000000000000000 0 0 1
[29] .symtab SYMTAB 0000000000000000 000010c8
00000000000006c0 0000000000000018 30 47 8
[30] .strtab STRTAB 0000000000000000 00001788
000000000000026a 0000000000000000 0 0 1
Note that the references start exactly at the end of the .bss section. When I examine the memory allocated using gdb, I see that there is plenty of space, so it works fine, but I don't see how this memory is managed.
Start Addr End Addr Size Offset objfile
0x400000 0x401000 0x1000 0x0 /home/nobody/main
0x600000 0x601000 0x1000 0x0 /home/nobody/main
0x601000 0x602000 0x1000 0x1000 /home/nobody/main
0x7ffff7a17000 0x7ffff7bd0000 0x1b9000 0x0 /usr/lib64/libc-2.23.so
I can find no other reference to it in any other sections. There is also no space reserved for it in by the segment loaded for .bss:
LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x0000000000000259 0x0000000000000260 RW 200000
Can anyone clarify these functions? Where is the source? I've read all the references on transactional memory, but they cover programming not implementation. I can not find a compiler option to remove this code, except of course -nostdlibs which leaves you with nothing.
Are these part of malloc perhaps? Still for code that's not using malloc, threading, or STM, I'm not sure I agree these should be linked into my code.
See also What functions does gcc add to the linux ELF?
More details:
$ make main
cc -c -o main.o main.c
cc -o main main.o
$ which cc
/usr/bin/cc
$ cc --version
cc (GCC) 6.2.1 20160916 (Red Hat 6.2.1-2)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ cc --verbose
Using built-in specs.
COLLECT_GCC=cc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.2.1/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap
--enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto
--prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared
--enable-threads=posix --enable-checking=release --enable-multilib
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-linker-hash-style=gnu --enable-plugin --enable-initfini-array
--disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function
--with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 6.2.1 20160916 (Red Hat 6.2.1-2) (GCC)
It is just very silly pointer arithmetic code generated by gcc for deregister_tm_clones(). It does not actually access the memory at those addresses.
Summary
No accesses are done at those pointers; they just act as address labels, and GCC is being silly about how it compares the two (relocated) addresses.
The two functions are needed as part of transaction support in C and C++. For further details, see GNU libitm.
Background
I'm running Ubuntu 16.04.3 LTS (Xenial Xerus) on x86-64, with GCC versions 4.8.5, 4.9.4, 5.4.1, 6.3.0, and 7.1.0 installed. The register_tm_clones() and deregister_tm_clones() get compiled in from /usr/lib/gcc/x86-64/VERSION/crtbegin.o. For all versions, register_tm_clones() is okay (no odd addresses). For versions 4.9.4, 5.4.1, and 6.3.0, the code for deregister_tm_clones() is the same, and includes a very odd pointer comparison test. The code for deregister_tm_clones() is fixed in 7.1.0, where it is a straightforward address test.
The sources for the two functions are in libgcc/crtstuff.c in the GCC sources.
On this machine, objdump -t /usr/lib/gcc/ARCH/VERSION/crtbegin.o shows .tm_clone_table, __TMC_LIST__, and __TMC_END__, for all GCC versions I mentioned above, so in the GCC sources, both USE_TM_CLONE_REGISTRY and HAVE_GAS_HIDDEN are defined. Thus, we can describe the two functions in C as
typedef void (*func_ptr) (void);
extern void _ITM_registerTMCloneTable(void *, size_t);
extern void _ITM_deregisterTMCloneTable(void *);
static func_ptr __TMC_LIST__[] = { };
extern func_ptr __TMC_END__[];
void deregister_tm_clones(void)
{
void (*fn)(void);
if (__TMC_LIST__ != __TMC_END__) {
fn = _ITM_deregisterTMCloneTable;
if (fn != NULL)
fn(__TMC_LIST__);
}
}
void register_tm_clones(void)
{
void (*fn)(void);
size_t size;
size = (__TMC_END__ - __TMC_LIST__) / 2;
if (size > 0) {
fn = _ITM_registerTMCloneTable;
if (fn != NULL)
fn(__TMC_LIST__, size);
}
}
Essentially, __TMC_LIST__ is an array of function pointers, and size is the number of function pointer pairs in the array. If the array is not empty, a function called _ITM_registerTMCloneTable() or _ITM_deregisterTMCloneTable(), which are defined in libitm.a, GNU libitm. When the _ITM_registerTMCloneTable/_ITM_deregisterTMCloneTable symbols are not defined, the relocation code yields zero as their address.
So, when the array is empty, and/or _ITM_registerTMCloneTable/_ITMderegisterTMCloneTable symbols are not defined, the code does nothing: only some fancy pointer arithmetic.
Note that the code does not load the pointer values from any memory address. The addresses (of __TMC_LIST__, __TMC_END__, _ITM_registerTMCloneTable, and _ITM_deregisterTMCloneTable) are supplied by the linker/relocator, as immediate 32-bit literals in the code. (This is why, if you look at the disassembly of the object files, you see only zeros for the addresses.)
Investigation
The problematic code for deregister_tm_clones occurs at the very beginning:
004008c0 <deregister_tm_clones>:
4008c0: b8 57 bb 6c 00 mov $0x6cbb57,%eax
4008c5: 55 push %rbp
4008c6: 48 2d 50 bb 6c 00 sub $0x6cbb50,%rax
4008cc: 48 83 f8 0e cmp $0xe,%rax
4008d0: 48 89 e5 mov %rsp,%rbp
4008d3: 76 1b jbe 4008f0 <deregister_tm_clones+0x30>
4008d5: b8 00 00 00 00 mov $0x0,%eax
4008da: 48 85 c0 test %rax,%rax
4008dd: 74 11 je 4008f0 <deregister_tm_clones+0x30>
4008df: 5d pop %rbp
4008e0: bf 50 bb 6c 00 mov $0x6cbb50,%edi
4008e5: ff e0 jmpq *%rax
4008e7: (9-byte NOP)
4008f0: 5d pop %rbp
4008f1: c3 retq
4008f2: (14-byte NOP)
400900:
(This particular example comes from compiling a basic Hello, World! example in C using gcc-6.3.0 on x86-64 statically).
If we look at the section headers (objdump -h) for the same binary, we see that addresses 0x6cbb50 to 0x6cbb5f are actually not mapped to any segment; that
24 .data 00001ad0 00000000006ca080 00000000006ca080 000ca080 2**5
25 .bss 00001878 00000000006cbb60 00000000006cbb60 000cbb50 2**5
i.e. .data covers addresses 0x6ca080 to 0x6cbb4f, and .bss covers
0x6cbb60 to 0x6cd3d8.
It would seem like the assembly code is using invalid addresseses!
However, the 0x6cbb50 address is quite valid, because there is a zero-size hidden symbol at that address (objdump -t):
006cbb50 g O .data 0000000000000000 .hidden __TMC_END__
Because I compiled the binary statically, the __TMC_END__ symbol is part of the .data segment here; normally, it is in .bss. In any case, it does not matter, because __TMC_END__ symbol is of zero size: We can use its address as part of whatever calculations we want, we just cannot dereference it, because it contains no data, having zero size.
This leaves the very first relocated address in the deregister_tm_clones function, 0x0x6cbb57 in this case.
If we look at what the code actually does with that value, it turns out that for some braindead reason, the compiled binary code is essetially computing
long temporary = relocated__TMC_LIST__address + 7;
long difference = temporary - relocated__TMC_END__address;
if (difference <= 14)
return;
Because the comparison function used is a signed comparison, the above behaves exactly the same as
long temporary = relocated__TMC_LIST__address;
long difference = temporary - relocated__TMC_END__address;
if (difference <= 7)
return;
In any case, it is obvious that __TMC_LIST__ == __TMC_END__, and that the relocated addresses are the same, in both OP's binary, and the binary above.
Addendum
I do not know exactly why GCC generates
if ((__TMC_END__ + 7) - __TMC_LIST <= 14)
rather than
if (__TMC_END__ <= __TMC_LIST__)
but in GCC bug 77813 Marc Glisse does mention that it (the former above) is indeed what GCC ends up generating. (The bug itself is not directly related to this, as it is about GCC optimizing the expression to zero, affecting only libitm users, and easily fixed.)
Also, between gcc-6.3.0 and gcc-7.1.0, when the generated code dropped that inanity, the C sources for the functions did not change. What changed is how GCC generates code (in some situations) for such pointer comparisons.

What do R_X86_64_32S and R_X86_64_64 relocation mean?

Got the following error when I tried to compile a C application in 64-bit FreeBSD:
relocation R_X86_64_32S can not be used when making a shared object; recompile with -fPIC
What is R_X86_64_32S relocation and what is R_X86_64_64?
I've googled about the error, and it's possible causes - It'd be great if anyone could tell what R_X86_64_32S really means.
The R_X86_64_32S and R_X86_64_64 are names of relocation types, for code compiled for the amd64 architecture. You can look all of them up in the amd64 ABI.
According to it, R_X86_64_64 is broken down to:
R_X86_64 - all names are prefixed with this
64 - Direct 64 bit relocation
and R_X86_64_32S to:
R_X86_64 - prefix
32S - truncate value to 32 bits and sign-extend
which basically means "the value of the symbol pointed to by this relocation, plus any addend", in both cases. For R_X86_64_32S the linker then verifies that the generated value sign-extends to the original 64-bit value.
Now, in an executable file, the code and data segments are given a specified virtual base address. The executable code is not shared, and each executable gets its own fresh address space. This means that the compiler knows exactly where the data section will be, and can reference it directly. Libraries, on the other hand, can only know that their data section will be at a specified offset from the base address; the value of that base address can only be known at runtime. Hence, all libraries must be produced with code that can execute no matter where it is put into memory, known as position independent code (or PIC for short).
Now when it comes to resolving your problem, the error message speaks for itself.
For any of this to make sense, you must first:
see a minimal example of relocation: https://stackoverflow.com/a/30507725/895245
understand the basic structure of an ELF file: https://stackoverflow.com/a/30648229/895245
Standards
R_X86_64_64, R_X86_64_32 and R_X86_64_32S are all defined by the System V AMD ABI, which contains the AMD64 specifics of the ELF file format.
They are all possible values for the ELF32_R_TYPE field of a relocation entry, specified in the System V ABI 4.1 (1997) which specifies the architecture neutral parts of the ELF format. That standard only specifies the field, but not it's arch dependant values.
Under 4.4.1 "Relocation Types" we see the summary table:
Name Field Calculation
------------ ------ -----------
R_X86_64_64 word64 A + S
R_X86_64_32 word32 A + S
R_X86_64_32S word32 A + S
We will explain this table later.
And the note:
The R_X86_64_32 and R_X86_64_32S relocations truncate the computed value to 32-bits. The linker must verify that the generated value for the R_X86_64_32 (R_X86_64_32S) relocation zero-extends (sign-extends) to the original 64-bit value.
Example of R_X86_64_64 and R_X86_64_32
Let's first look into R_X86_64_64 and R_X86_64_32:
.section .text
/* Both a and b contain the address of s. */
a: .long s
b: .quad s
s:
Then:
as --64 -o main.o main.S
objdump -dzr main.o
Contains:
0000000000000000 <a>:
0: 00 00 add %al,(%rax)
0: R_X86_64_32 .text+0xc
2: 00 00 add %al,(%rax)
0000000000000004 <b>:
4: 00 00 add %al,(%rax)
4: R_X86_64_64 .text+0xc
6: 00 00 add %al,(%rax)
8: 00 00 add %al,(%rax)
a: 00 00 add %al,(%rax)
Tested on Ubuntu 14.04, Binutils 2.24.
Ignore the disassembly for now (which is meaningless since this is data), and look only to the labels, bytes and relocations.
The first relocation:
0: R_X86_64_32 .text+0xc
Which means:
0: acts on byte 0 (label a)
R_X86_64_: prefix used by all relocation types of the AMD64 system V ABI
32: the 64-bit address of the label s is truncated to a 32 bit address because we only specified a .long (4 bytes)
.text: we are on the .text section
0xc: this is the addend, which is a field of the relocation entry
The address of the relocation is calculated as:
A + S
Where:
A: the addend, here 0xC
S: the value of the symbol before relocation, here 00 00 00 00 == 0
Therefore, after relocation, the new address will be 0xC == 12 bytes into the .text section.
This is exactly what we expect, since s comes after a .long (4 bytes) and a .quad (8 bytes).
R_X86_64_64 is analogous, but simpler, since here there is no need to truncate the address of s. This is indicated by the standard through word64 instead of word32 on the Field column.
R_X86_64_32S vs R_X86_64_32
The difference between R_X86_64_32S vs R_X86_64_32 is when the linker will complain "with relocation truncated to fit":
32: complains if the truncated after relocation value does not zero extend the old value, i.e. the truncated bytes must be zero:
E.g.: FF FF FF FF 80 00 00 00 to 80 00 00 00 generates a complaint because FF FF FF FF is not zero.
32S: complains if the truncated after relocation value does not sign extend the old value.
E.g.: FF FF FF FF 80 00 00 00 to 80 00 00 00 is fine, because the last bit of 80 00 00 00 and the truncated bits are all 1.
See also: What does this GCC error "... relocation truncated to fit..." mean?
R_X86_64_32S can be generated with:
.section .text
.global _start
_start:
mov s, %eax
s:
Then:
as --64 -o main.o main.S
objdump -dzr main.o
Gives:
0000000000000000 <_start>:
0: 8b 04 25 00 00 00 00 mov 0x0,%eax
3: R_X86_64_32S .text+0x7
Now we can observe the "relocation" truncated to fit on 32S with a linker script:
SECTIONS
{
. = 0xFFFFFFFF80000000;
.text :
{
*(*)
}
}
Now:
ld -Tlink.ld a.o
Is fine, because: 0xFFFFFFFF80000000 gets truncated into 80000000, which is a sign extension.
But if we change the linker script to:
. = 0xFFFF0FFF80000000;
It now generates the error, because that 0 made it not be a sign extension anymore.
Rationale for using 32S for memory access but 32 for immediates: When is it better for an assembler to use sign extended relocation like R_X86_64_32S instead of zero extension like R_X86_64_32?
R_X86_64_32S and PIE (position independent executables
R_X86_64_32S cannot be used in position independent executables, e.g. done with gcc -pie, otherwise link fails with:
relocation R_X86_64_32S against `.text' can not be used when making a PIE object; recompile with -fPIC
l
I have provided a minimal example explaining it at: What is the -fPIE option for position-independent executables in gcc and ld?
That means that compiled a shared object without using -fPIC flag as you should:
gcc -shared foo.c -o libfoo.so # Wrong
You need to call
gcc -shared -fPIC foo.c -o libfoo.so # Right
Under ELF platform (Linux) shared objects are compiled with position independent code - code that can run from any location in memory, if this flag is not given, the code that is generated is position dependent, so it is not possible to use this shared object.
I ran into this problem and found this answer didn't help me. I was trying to link a static library together with a shared library. I also investigated putting the -fPIC switch earlier on the command line (as advised in answers elsewhere).
The only thing that fixed the problem, for me, was changing the static library to shared. I suspect the error message about -fPIC can happen due to a number of causes but fundamentally what you want to look at is how your libraries are being built, and be suspicious of libraries that are being built in different ways.
In my case the issue arose because the program to compile expected to find shared libraries in a remote directory, while only the corresponding static libraries were there in a mistake.
Actually, this relocation error was a file-not-found error in disguise.
I have detailed how I coped with it in this other thread https://stackoverflow.com/a/42388145/5459638
The above answer demonstrates what these relocations are, and I found building x86_64 objects with GCC -mcmodel=large flag can prevent R_X86_64_32S because the compiler has no assumption on the relocated address in this model.
In the following case:
extern int myarr[];
int test(int i)
{
return myarr[i];
}
Built with gcc -O2 -fno-pie -c test_array.c and disassemble with objdump -drz test_array.o, we have:
0: 48 63 ff movslq %edi,%rdi
3: 8b 04 bd 00 00 00 00 mov 0x0(,%rdi,4),%eax
6: R_X86_64_32S myarr
a: c3 ret
With -mcmodel=large, i.e. gcc -mcmodel=large -O2 -fno-pie -c test_array.c, we have:
0: 48 b8 00 00 00 00 00 movabs $0x0,%rax
7: 00 00 00
2: R_X86_64_64 myarr
a: 48 63 ff movslq %edi,%rdi
d: 8b 04 b8 mov (%rax,%rdi,4),%eax
10: c3 ret

Resources