What does this GCC error "... relocation truncated to fit..." mean? - c

I am programming the host side of a host-accelerator system. The host runs on the PC under Ubuntu Linux and communicates with the embedded hardware via a USB connection. The communication is performed by copying memory chunks to and from the embedded hardware's memory.
On the board's memory there is a memory region which I use as a mailbox where I write and read the data. The mailbox is defined as a structure and I use the same definition to allocate a mirror mailbox in my host space.
I used this technique successfully in the past so now I copied the host Eclipse project to my current project's workspace, and made the appropriate name changes. The strange thing is that when building the host project I now get the following message:
Building target: fft2d_host
Invoking: GCC C Linker
gcc -L/opt/adapteva/esdk/tools/host/x86_64/lib -o "fft2d_host" ./src/fft2d_host.o -le_host -lrt
./src/fft2d_host.o: In function `main':
fft2d_host.c:(.text+0x280): relocation truncated to fit: R_X86_64_PC32 against symbol `Mailbox' defined in COMMON section in ./src/fft2d_host.o
What does this error mean and why it won't build on the current project, while it is OK with the older project?

You are attempting to link your project in such a way that the target of a relative addressing scheme is further away than can be supported with the 32-bit displacement of the chosen relative addressing mode. This could be because the current project is larger, because it is linking object files in a different order, or because there's an unnecessarily expansive mapping scheme in play.
This question is a perfect example of why it's often productive to do a web search on the generic portion of an error message - you find things like this:
http://www.technovelty.org/code/c/relocation-truncated.html
Which offers some curative suggestions.

Minimal example that generates the error
main.S moves an address into %eax (32-bit).
main.S
_start:
mov $_start, %eax
linker.ld
SECTIONS
{
/* This says where `.text` will go in the executable. */
. = 0x100000000;
.text :
{
*(*)
}
}
Compile on x86-64:
as -o main.o main.S
ld -o main.out -T linker.ld main.o
Outcome of ld:
(.text+0x1): relocation truncated to fit: R_X86_64_32 against `.text'
Keep in mind that:
as puts everything on the .text if no other section is specified
ld uses the .text as the default entry point if ENTRY. Thus _start is the very first byte of .text.
How to fix it: use this linker.ld instead, and subtract 1 from the start:
SECTIONS
{
. = 0xFFFFFFFF;
.text :
{
*(*)
}
}
Notes:
we cannot make _start global in this example with .global _start, otherwise it still fails. I think this happens because global symbols have alignment constraints (0xFFFFFFF0 works). TODO where is that documented in the ELF standard?
the .text segment also has an alignment constraint of p_align == 2M. But our linker is smart enough to place the segment at 0xFFE00000, fill with zeros until 0xFFFFFFFF and set e_entry == 0xFFFFFFFF. This works, but generates an oversized executable.
Tested on Ubuntu 14.04 AMD64, Binutils 2.24.
Explanation
First you must understand what relocation is with a minimal example: https://stackoverflow.com/a/30507725/895245
Next, take a look at objdump -Sr main.o:
0000000000000000 <_start>:
0: b8 00 00 00 00 mov $0x0,%eax
1: R_X86_64_32 .text
If we look into how instructions are encoded in the Intel manual, we see that:
b8 says that this is a mov to %eax
0 is an immediate value to be moved to %eax. Relocation will then modify it to contain the address of _start.
When moving to 32-bit registers, the immediate must also be 32-bit.
But here, the relocation has to modify those 32-bit to put the address of _start into them after linking happens.
0x100000000 does not fit into 32-bit, but 0xFFFFFFFF does. Thus the error.
This error can only happen on relocations that generate truncation, e.g. R_X86_64_32 (8 bytes to 4 bytes), but never on R_X86_64_64.
And there are some types of relocation that require sign extension instead of zero extension as shown here, e.g. R_X86_64_32S. See also: https://stackoverflow.com/a/33289761/895245
R_AARCH64_PREL32
Asked at: How to prevent "main.o:(.eh_frame+0x1c): relocation truncated to fit: R_AARCH64_PREL32 against `.text'" when creating an aarch64 baremetal program?

I ran into this problem while building a program that requires a huge amount of stack space (over 2 GiB). The solution was to add the flag -mcmodel=medium, which is supported by both GCC and Intel compilers.

On Cygwin -mcmodel=medium is already default and doesn't help. To me adding -Wl,--image-base -Wl,0x10000000 to GCC linker did fixed the error.

Often, this error means your program is too large, and often it's too large because it contains one or more very large data objects. For example,
char large_array[1ul << 31];
int other_global;
int main(void) { return other_global; }
will produce a "relocation truncated to fit" error on x86-64/Linux, if compiled in the default mode and without optimization. (If you turn on optimization, it could, at least theoretically, figure out that large_array is unused and/or that other_global is never written, and thus generate code that doesn't trigger the problem.)
What's going on is that, by default, GCC uses its "small code model" on this architecture, in which all of the program's code and statically allocated data must fit into the lowest 2GB of the address space. (The precise upper limit is something like 2GB - 2MB, because the very lowest 2MB of any program's address space is permanently unusable. If you are compiling a shared library or position-independent executable, all of the code and data must still fit into two gigabytes, but they're not nailed to the bottom of the address space anymore.) large_array consumes all of that space by itself, so other_global is assigned an address above the limit, and the code generated for main cannot reach it. You get a cryptic error from the linker, rather than a helpful "large_array is too large" error from the compiler, because in more complex cases the compiler can't know that other_global will be out of reach, so it doesn't even try for the simple cases.
Most of the time, the correct response to getting this error is to refactor your program so that it doesn't need gigantic static arrays and/or gigabytes of machine code. However, if you really have to have them for some reason, you can use the "medium" or "large" code models to lift the limits, at the price of somewhat less efficient code generation. These code models are x86-64-specific; something similar exists for most other architectures, but the exact set of "models" and the associated limits will vary. (On a 32-bit architecture, for instance, you might have a "small" model in which the total amount of code and data was limited to something like 224 bytes.)

Remember to tackle error messages in order. In my case, the error above this one was "undefined reference", and I visually skipped over it to the more interesting "relocation truncated" error. In fact, my problem was an old library that was causing the "undefined reference" message. Once I fixed that, the "relocation truncated" went away also.

I may be wrong, but in my experience there's another possible reason for the error, the root cause being a compiler (or platform) limitation which is easy to reproduce and work around. Next the simplest example
define an array of 1GB with:
char a[1024 x 1024 x 1024];
Result: it works, no warnings. Can use 1073741824 instead of the triple product naturally
Double the previous array:
char a[2 x 1024 x 1024 x 1024];
Result in GCC: "error: size of array 'a' is negative" => That's a hint that the array argument accepted/expected is of type signed int
Based on the previous, cast the argument:
char a[(unsigned)2 x 1024 x 1024 x 1024];
Result: error relocation truncated to fit appears, along with this warning: "integer overflow in expression of type 'int'"
Workaround: use dynamic memory. Function malloc() takes an argument of type size_t which is a typedef of unsigned long long int thus avoiding the limitation
This has been my experience using GCC on Windows. Just my 2 cents.

I encountered the "relocation truncated" error on a MIPS machine. The -mcmodel=medium flag is not available on mips, instead -mxgot may help there.

I ran into the exact same issue. After compiling without the -fexceptions build flag, the file compiled with no issue

I ran into this error on 64 bit Windows when linking a c++ program which called a nasm function. I used nasm for assembly and g++ to compile the c++ and for linking.
In my case this error meant I needed DEFAULT REL at the top of my nasm assembler code.
It's written up in the NASM documentation:
Chapter 11: Writing 64-bit Code (Unix, Win64)
Obvious in retrospect, but it took me days to arrive there, so I decided to post this.
This is a minimal version of the C++ program:
> extern "C" { void matmul(void); }
int main(void) {
matmul();
return 0;
}
This is a minimal version of the nasm program:
; "DEFAULT REL" means we can access data in .bss, .data etc
; because we generate position-independent code in 64-bit "flat" memory model.
; see NASM docs
; Chapter 11: Writing 64-bit Code (Unix, Win64)
;DEFAULT REL
global matmul
section .bss
align 32 ; because we want to move 256 bit packed aligned floats to and from it
saveregs resb 32
section .text
matmul:
push rbp ; prologue
mov rbp,rsp ; aligns the stack pointer
; preserve ymm6 in local variable 'saveregs'
vmovaps [saveregs], ymm6
; restore ymm6 from local variable 'saveregs'
vmovaps ymm6, [saveregs]
mov rsp,rbp ; epilogue
pop rbp ; re-aligns the stack pointer
ret
With DEFAULT REL commented out, I got the error message above:
g++ -std=c++11 -c SO.cpp -o SOcpp.o
\bin\nasm -f win64 SO.asm -o SOnasm.obj
g++ SOcpp.o SOnasm.obj -o SO.exe
SOnasm.obj:SO.asm:(.text+0x9): relocation truncated to fit: IMAGE_REL_AMD64_ADDR32 against `.bss'
SOnasm.obj:SO.asm:(.text+0x12): relocation truncated to fit: IMAGE_REL_AMD64_ADDR32 against `.bss'
collect2.exe: error: ld returned 1 exit status

With GCC, there's a -Wl,--default-image-base-low option that sometimes helps to deal with such errors, e.g. in some MSYS2 / MinGW configurations.

Related

Static executable segfaults if location counter is initialized as too small or too large in linker script

I'm trying to generate a static executable for this program (with musl):
main.S:
.section .text
.global main
main:
mov $msg, %rdi
mov $0, %rax
call printf
mov %rax, %rdi
mov $60, %rax
syscall
msg:
.ascii "hello world from printf\n\0"
Compilation command:
clang -g -c main.S -o main.o
Linking command (musl libc is placed in musl directory (version 1.2.1)):
ld main.o musl/crt1.o -o sm -Tstatic.ld -static -lc -lm -Lmusl
Linker script (static.ld):
ENTRY(_start)
SECTIONS
{
. = 0x100e8;
}
This config results in a working executable, but if I change the location counter offset to 0x10000 or 0x20000, the resulting executable crashes during startup with a segfault. On debugging I found that musl initialization code tries to read the program headers (location received in aux vector), and for some reason the memory address of program header as given by aux vector is unmapped in our address space.
What is the cause of this behavior? What exactly is the counter offset in a linker script? How does it affect the linker output other than altering the load address?
Note: The segfault occurs when the the musl initialization code tries to access program headers
There are a few issues here.
Your main.S has a stack alignment bug: on x86_64, you must realign the stack to 16-byte boundary before calling any other function (you can assume 8-byte alignment on entry).
Without this, I get a crash inside printf due to movaps %xmm0,0x40(%rsp) with misaligned $rsp.
Your link order is wrong: crt1.o should be linked before main.o
When you don't leave SIZEOF_HEADERS == 0xe8 space before starting your .text section, you are leaving it up to the linker to put program headers elsewhere, and it does. The trouble is: musl (and a lot of other code) assumes that the file header and program headers are mapped in (but the ELF format doesn't require this). So they crash.
The right way to specify start address:
ENTRY(_start)
SECTIONS
{
. = 0x10000 + SIZEOF_HEADERS;
}
Update:
Why does the order matter?
Linkers (in general) will assemble initializers (constructors) left to right. When you call standard C library routines from main(), you expect the standard library to have initialized itself before main() was called. Code in crt1.o is responsible for performing such initialization.
If you link in the wrong order: crt1.o after main.o, construction may not happen correctly. Whether you'll be able to observe this depends on implementation details of the standard library, and exactly what parts of it you are using. So your binary may appear to work correctly. But it is still better to link objects in the correct order.
I'm leaving 0x10000 space, isn't it enough for headers?
You are interfering with the built-in default linker script, and instead giving it incomplete specification of how to lay out your program in memory. When you do so, you need to know how the linker will react. Different linkers will react differently.
The binutils ld reacts by not emitting a LOAD segment covering program headers. The ld.lld reacts differently -- it actually moves .text past program headers.
The resulting binaries still crash though, because the binary layout is not what the kernel expects, and the kernel-supplied AT_PHDR address in the aux vector is wrong.
It looks like the kernel expects the first LOAD segment to be the one which contains program headers. Arguably that is a bug in the kernel -- nothing in the ELF spec requires this. But all normal binaries do have program headers in the first LOAD segment, so you'll just have to do the same (or convince kernel developers to add code to handle your weird binary layout).

Is it possible in practice to compile millions of small functions into a static binary?

I've created a static library with about 2 million small functions, but I'm having trouble linking it to my main function, using GCC (tested 4.8.5 or 7.3.0) under Linux x86_64.
The linker complains about relocation truncations, very much like those in this question.
I've already tried using -mcmodel=large, but as the answer to that same question says, I would
"need a crt1.o that can handle full 64-bit addresses". I've then tried compiling one, following this answer, but recent glibc won't compile under -mcmodel=large, even if libgcc does, which accomplishes nothing.
I've also tried adding the flags -fPIC and/or -fPIE to no avail. The best I get is this sole error:
ld: failed to convert GOTPCREL relocation; relink with --no-relax
and adding that flag also doesn't help.
I've searched around the Internet for hours, but most posts are very old and I can't find a way to do this.
I'm aware this is not a common thing to try, but I think it should be possible to do this. I'm working in an HPC environment, so memory or time constraints are not the issue here.
Has anyone been successful in accomplishing something similar with a recent compiler and toolchain?
Either don't use the standard library or patch it. As for the 2.34 version, Glibc doesn't support the large code model. (See also Glibc mailing list and Redhat Bugzilla)
Explanation
Let's examine the Glibc source code to understand why recompiling with -mcmodel=large accomplished nothing. It replaced the relocations originating from C files. But Glibc contained hardcoded 32-bit relocations in raw Assembly files, such as in start.S (sysdeps/x86_64/start.S).
call *__libc_start_main#GOTPCREL(%rip)
start.S emitted R_X86_64_GOTPCREL for __libc_start_main, which used relative addressing. x86_64 CALL instruction didn't support relative jumps by more than 32-bit displacement, see AMD64 Manual 3. So, ld couldn't offset the relocation R_X86_64_GOTPCREL because the code size surpassed 2GB.
Adding -fPIC didn't help due to the same ISA constraints. For position-independent code, the compiler still generated relative jumps.
Patching
In short, you have to replace 32-bit relocations in the Assembly code. See System V Application Binary Interface AMD64 Architecture Process Supplement for more info about implementing 64-bit relocations. See also this for a more in-depth explanation of code models.
Why don't 32-bit relocations suffice for the large code model? Because we can't rely on other symbols being in a range of 2GB. All calls must become absolute. Contrast with the small PIC code model, where the compiler generates relative jumps whenever possible.
Let's look closely at the R_X86_64_GOTPCREL relocation. It contains the 32-bit difference between RIP and the symbol's GOT entry address. It has a 64-bit substitute — R_X86_64_GOTPCREL64, but I couldn't find a way to use it in Assembly.
So, to replace the GOTPCREL, we have to compute the symbol entry GOT base offset and the GOT address itself. We can calculate the GOT location once in the function prologue because it doesn't change.
First, let's get the GOT base (code lifted wholesale from the ABI Supplement). The GLOBAL_OFFSET_TABLE relocation specifies the offset relative to the current position:
leaq 1f(%rip), %r11
1: movabs $_GLOBAL_OFFSET_TABLE_, %r15
leaq (%r11, %r15), %r15
With the GOT base residing on the %r15 register, now we have to find the symbol's GOT entry offset. The R_X86_64_GOT64 relocation specifies exactly this. With this, we can rewrite the call to __libc_start_main as:
movabs $__libc_start_main#GOT, %r11
call *(%r11, %r15)
We replaced R_X86_64_GOTPCREL with GLOBAL_OFFSET_TABLE and R_X86_64_GOT64. Replace others in the same vein.
N.B.: Replace R_X86_64_GOT64 with R_X86_64_PLTOFF64 for functions from dynamically linked executables.
Testing
Verify the patch correctness using the following test that requires the large code model. It doesn't contain a million small functions, having one huge function and one small function instead.
Your compiler must support the large code model. If you use GCC, you'll need to build it from the source with the flag -mcmodel=large. Startup files shouldn't contain 32-bit relocations.
The foo function takes more than 2GB, rendering 32-bit relocations unusable. Thus, the test will fail with the overflow error if compiled without -mcmodel=large. Also, add flags -O0 -fPIC -static, link with gold.
extern int foo();
extern int bar();
int foo(){
bar();
// Call sys_exit
asm( "mov $0x3c, %%rax \n"
"xor %%rdi, %%rdi \n"
"syscall \n"
".zero 1 << 32 \n"
: : : "rax", "rdx");
return 0;
}
int bar(){
return 0;
}
int __libc_start_main(){
foo();
return 0;
}
int main(){
return 0;
}
N.B. I used patched Glibc startup files without the standard library itself, so I had to define both _libc_start_main and main.

Why do I get oveflow in printf? [duplicate]

This question already has answers here:
Can't call C standard library function on 64-bit Linux from assembly (yasm) code
(2 answers)
How to print a number in assembly NASM?
(6 answers)
Closed 3 years ago.
Hey I have to call a function of glibc in assembly for an exercise. So I found this code to call printf.
section .rodata
format: db 'Hello %s', 10
name: db 'Conrad'
section .text
global main
extern printf
main:
; printf(format, name)
mov rdi, format
mov rsi, name
call printf
; return 0
mov rax, 0
ret
But i get the error:
Symbol `printf' causes overflow in R_X86_64_PC32 relocation
Compiled it with:
nasm -f elf64 -o test.o test.asm
gcc -o test test.o
The error occurs after doing
./test
Change call printf to call printf#PLT. The former only works if the actual definition of printf is withing ±2GB of the call instruction, which can't be known if the definition is in a shared library (it would work if you static link, though). "Overflow" is telling you that the relative address, which would need to be up to 64-bit, overflows in a 32-bit call instruction offset.
By using printf#PLT, you'll instead get a relative address that resolves statically at link time to a thunk in the PLT, which loads and jumps to the address of the function definition, resolved at dynamic-linking time.
As Maxime B. noted, the loads of the addresses of format and name are also not correct for position-independent code. They should be loaded with "rip-relative" form, but it looks like you're using the weird "Intel syntax" for asm and I'm not sure how to write it in that syntax. You could, as Maxime B. suggested, build with -fno-pie, but better would be figuring out the way to fix your code so it doesn't depend on being linked for a particular fixed address.
You should compile your with -no-pie
This error is explained here. Quoting the original post:
Debian switched to PIC/PIE binaries in 64-bits mode & GCC in your case is trying to link your object as PIC, but it will encounter absolute address in mov $str, %rdi.

Disassemble, modify, and reassemble executable

How can I disassemble an executable on my mac using ndisasm and reassemble and link it using nasm and ld?
This is what I tried (I'm running MacOS X btw):
*ndisasm a.out | cut -c 29- > main.asm*
this generated clean assembler code with all the processor instructions in main.asm
*nasm -f macho main.asm*
this generated an object file main.o which I then tried to link
*ld main.o*
... this is where I'm stuck. I don't know why it generates the following error:
ld: in section __TEXT,__text reloc 0: R_ABS reloc but no absolute symbol at target adress file 'main.o' for inferred architecture i386.
I also tried specifying the architecture (ld -arch x86_64 main.o) but that didn't work either.
My goal is to disassemble any executable, modify it and then reassemble it again.
What am I doing wrong?
There is no reliable way to do this with normal assembler syntax. See How to disassemble, modify and then reassemble a Linux executable?. Section info is typically not faithfully disassembled, so you'd need a special format designed for modify and reassembling + relinking.
Also, instruction-lengths are a problem when code only works when padded by using longer encodings. (e.g. in a table of jump targets for a computed goto). See Where are GNU assembler instruction suffixes like ".s" in x86 "mov.s" documented?, but note that disassemblers don't support disassembling into that format.
ndisasm doesn't understand object file formats, so it disassembles headers as machine code!
For this to have any hope of working, use a disassembler like Agner Fog's objconv which will output asm source (NASM, MASM, or GAS AT&T) which does assemble. It might not actually work if any of the code depended on a specific longer-than-default encoding.
I'm not sure how faithful objconv is with respect to emitting section .bss, section .rodata and other directives like that to place data where it found it in the object file, but that's what you need.
Re: absolute relocations: make sure you put DEFAULT REL at the top of your file. I forget if objconv does this by default. x86-64 Mach-o only supports PC-relative relocations, so you have to create position-independent code (e.g. using RIP-relative addressing modes).
ndisasm doesn't read the symbol table, so all its operands use absolute addressing. objconv makes up label names for jump targets and static data that doesn't appear in the symbol table.

Relocation truncated to fit: R_X86_64_32

I have a C driver file which declares an extern function in order to use it in my asm file. I am on a Windows 7 x64 machine.
I assembled the asm file with NASM with this command:
nasm avxmain.asm -f win64 -o avxmain.o
Then I compiled the C file like this:
gcc avxdriver.c -c -m64 -o avxdriver.o
Linking it all together, I ran:
gcc avxdriver.o avxmain.o -o final
Here are the errors I am getting:
avxmain.o:G:\Desktop\CPSC240:(.text+0x50): relocation truncated to
fit: R_X86_64_32 against `.bss'
avxmain.o:G:\Desktop\CPSC240:(.text+0xb9): relocation truncated to
fit: R_X86_64_32 against `.data'
avxmain.o:G:\Desktop\CPSC240:(.text+0xc2): relocation truncated to
fit: R_X86_64_32 against `.data'
avxmain.o:G:\Desktop\CPSC240:(.text+0x14e): relocation truncated to
fit: R_X86_64_32 against `.bss'
collect2: error: ld returned 1 exit
status
avxdriver.c file:
#include <stdio.h>
#include <stdint.h>
extern double avxdemo();
int main()
{
double return_code = -99.9;
printf("%s","This program will test for the presence of AVX (Advanced Vector Extensions) also known as state component number 2.\n");
return_code = avxdemo();
printf("%s %1.12lf\n","The value returned to the driver is ", return_code);
printf("%s","The driver program will next send a zero to the operating system. Enjoy your programming.\n");
return 0;
}
avxmain.asm file:
http://pastebin.com/CfnjbpXM
I posted it here because it is very long due to comments provided by the professor.
I have tried running the -fPIC and -mcmodel=medium option. I still get the same errors.
I'm completely lost and confused since this is the sample project I am supposed to run for my class. This subject is brand new to me. I've spent about half my day searching these errors and trying different things. I just need to be pointed in the right direction.
The problem is that general x64 instructions do not allow direct 64-bit addresses in their encodings. There are two ways around this:
Use the movabs rax, symbolNameHere instruction to set rax to the address, then use [rax] to access data at that address.
Use [rel symbolNameHere] as the operand; this creates a PC-relative reference to symbolNameHere. It's encoded as a 32-bit signed offset from whatever rip will be when that instruction is executed.
Method 1 allows you to encode the absolute address in the instruction, whereas method 2 is smaller code (you can always do lea rax, [rel symbolNameHere] to get the same effect as method 1).
You should use objdump -d on avxmain.o to find out which statements the linker is complaining about. However it's pretty clear which instructions are the problem:
xrstor [backuparea]
vmovupd ymm0, [testdata]
vmovupd ymm1, [testdata+32]
xsave [backuparea]
The problem, as Drew McGowen explained, is that the 64-bit instruction set has no way of encoding these addresses a 64-bit values in an instruction. Instead they become 32-bit signed displacements that are signed extended to 64-bits to create the effective address. As apparently Windows loads 64-bit drivers in the address range starting at 0xFFFFF88`00000000 the truncated 32-bit displacements won't get sign extended back to the correct value.
You should be able to use RIP relative addressing, like Drew McGowen suggested, to work around the problem. In that case the assembler should generate PC relative (RIP relative) relocations that the linker won't complain about:
xrstor [rel backuparea]
vmovupd ymm0, [rel testdata]
vmovupd ymm1, [rel + testdata+32]
xsave [rel backuparea]

Resources