How to avoid "relocation truncated to fit" from .ARM.exidx? - c

I am experiencing relocation truncated to fit kind of error for my embedded ARM application compiled and linked with with GCC 4.9.3. I am using code relocation for this function from external flash (0x70000000) to internal RAM (0x08000000) to improve performance of my application, and this is one of the causes of the problem.
I have a small inline-assembly naked function to perform a short loop:
void ThreeCycleDelay(uint32_t count) __attribute__((naked))
{
__asm(" subs r0, #1\n"
" bne ThreeCycleDelay\n"
" bx lr");
}
But when linking, I receive the following error from ld:
D:/app/app.a(app_utils.obj):(.ARM.exidx.text.ThreeCycleDelay+0x0):
relocation truncated to fit: R_ARM_PREL31 against
`.text.ThreeCycleDelay'
I have seen suggestions on the internet to solve this issue, but none of them were helpful. Trying to "remove" .ARM.exidx section by -funwind-tables -fno-exceptions made no difference.
The error disappears when I perform no code relocation, and it does not show for any other function. Removing the __attribute__((naked)) does not solve the issue either, so I was suspicious it is linekd with the inline assembly jump, but the real question is - how can I solve this issue?

Related

Is it possible in practice to compile millions of small functions into a static binary?

I've created a static library with about 2 million small functions, but I'm having trouble linking it to my main function, using GCC (tested 4.8.5 or 7.3.0) under Linux x86_64.
The linker complains about relocation truncations, very much like those in this question.
I've already tried using -mcmodel=large, but as the answer to that same question says, I would
"need a crt1.o that can handle full 64-bit addresses". I've then tried compiling one, following this answer, but recent glibc won't compile under -mcmodel=large, even if libgcc does, which accomplishes nothing.
I've also tried adding the flags -fPIC and/or -fPIE to no avail. The best I get is this sole error:
ld: failed to convert GOTPCREL relocation; relink with --no-relax
and adding that flag also doesn't help.
I've searched around the Internet for hours, but most posts are very old and I can't find a way to do this.
I'm aware this is not a common thing to try, but I think it should be possible to do this. I'm working in an HPC environment, so memory or time constraints are not the issue here.
Has anyone been successful in accomplishing something similar with a recent compiler and toolchain?
Either don't use the standard library or patch it. As for the 2.34 version, Glibc doesn't support the large code model. (See also Glibc mailing list and Redhat Bugzilla)
Explanation
Let's examine the Glibc source code to understand why recompiling with -mcmodel=large accomplished nothing. It replaced the relocations originating from C files. But Glibc contained hardcoded 32-bit relocations in raw Assembly files, such as in start.S (sysdeps/x86_64/start.S).
call *__libc_start_main#GOTPCREL(%rip)
start.S emitted R_X86_64_GOTPCREL for __libc_start_main, which used relative addressing. x86_64 CALL instruction didn't support relative jumps by more than 32-bit displacement, see AMD64 Manual 3. So, ld couldn't offset the relocation R_X86_64_GOTPCREL because the code size surpassed 2GB.
Adding -fPIC didn't help due to the same ISA constraints. For position-independent code, the compiler still generated relative jumps.
Patching
In short, you have to replace 32-bit relocations in the Assembly code. See System V Application Binary Interface AMD64 Architecture Process Supplement for more info about implementing 64-bit relocations. See also this for a more in-depth explanation of code models.
Why don't 32-bit relocations suffice for the large code model? Because we can't rely on other symbols being in a range of 2GB. All calls must become absolute. Contrast with the small PIC code model, where the compiler generates relative jumps whenever possible.
Let's look closely at the R_X86_64_GOTPCREL relocation. It contains the 32-bit difference between RIP and the symbol's GOT entry address. It has a 64-bit substitute — R_X86_64_GOTPCREL64, but I couldn't find a way to use it in Assembly.
So, to replace the GOTPCREL, we have to compute the symbol entry GOT base offset and the GOT address itself. We can calculate the GOT location once in the function prologue because it doesn't change.
First, let's get the GOT base (code lifted wholesale from the ABI Supplement). The GLOBAL_OFFSET_TABLE relocation specifies the offset relative to the current position:
leaq 1f(%rip), %r11
1: movabs $_GLOBAL_OFFSET_TABLE_, %r15
leaq (%r11, %r15), %r15
With the GOT base residing on the %r15 register, now we have to find the symbol's GOT entry offset. The R_X86_64_GOT64 relocation specifies exactly this. With this, we can rewrite the call to __libc_start_main as:
movabs $__libc_start_main#GOT, %r11
call *(%r11, %r15)
We replaced R_X86_64_GOTPCREL with GLOBAL_OFFSET_TABLE and R_X86_64_GOT64. Replace others in the same vein.
N.B.: Replace R_X86_64_GOT64 with R_X86_64_PLTOFF64 for functions from dynamically linked executables.
Testing
Verify the patch correctness using the following test that requires the large code model. It doesn't contain a million small functions, having one huge function and one small function instead.
Your compiler must support the large code model. If you use GCC, you'll need to build it from the source with the flag -mcmodel=large. Startup files shouldn't contain 32-bit relocations.
The foo function takes more than 2GB, rendering 32-bit relocations unusable. Thus, the test will fail with the overflow error if compiled without -mcmodel=large. Also, add flags -O0 -fPIC -static, link with gold.
extern int foo();
extern int bar();
int foo(){
bar();
// Call sys_exit
asm( "mov $0x3c, %%rax \n"
"xor %%rdi, %%rdi \n"
"syscall \n"
".zero 1 << 32 \n"
: : : "rax", "rdx");
return 0;
}
int bar(){
return 0;
}
int __libc_start_main(){
foo();
return 0;
}
int main(){
return 0;
}
N.B. I used patched Glibc startup files without the standard library itself, so I had to define both _libc_start_main and main.

how to fix a local symbol' can not be used when making a shared object error?

My compilation of c source files failed with the following error:
libservices.a(protocol.o): relocation R_ARM_MOVW_ABS_NC against `a local symbol' can not be used when making a shared object; recompile with -fPIC
libservices.a: error adding symbols: Bad value collect2.exe:
error: ld returned 1 exit status
I didn't do any changes other than adding an extra file to the source directory and things went for a toss. It was working fine without these warnings even though I have not used -fPIC option. Since I didn't make any make file changes I am curious to know why this error and what's the meaning of this, and how to get out of this issue. Help is appreciated.
You should compile with -fPIC. See how to recompile with -fPIC
Non-PIC code needs to be modified when it's relocated to another address. These modifications of the binary code and data are called relocations. There are many different types of relocations. They might involve putting the absolute address of a symbol into a data word, or modifying some of the bits of a MOVW or MOVT instruction to place part of a constant into the instruction itself. That latter one is your problem.
The static linker will do all of these relocations when you link your executable. If you use a shared library, then the dynamic linker must do them when the executable is run and dynamically linked to shared library.
Some types of relocations, called text relocations or TEXTRELS, might not be supported by the dynamic linker. These involve modifying the text segment, with the actual code, and putting values inside instructions. Since it modifies the code, that part of the shared library won't be shared anymore between processes. You're supposed to make the library position independent via -fPIC, so the dynamic linker doesn't need to do text relocations.
It's possible that your previous code didn't cause the compiler to emit anything that needed an unsupported relocation. And then the file you added had something in it that did. For instance, accessing a global variable can trigger this on ARM.
int x = 1;
void foo(void) { x=42; }
Compiles to:
movw r3, #:lower16:x
mov r2, #42
movt r3, #:upper16:x
str r2, [r3]
bx lr
The lower and upper half of the address of x need to be placed into the movw and movt instructions. The dynamic linker doesn't support this. If the code had been compiled with -fPIC (or -mword-relocations), the compiler would produce different output that would not need these relocations.
"protocol.o" has unsupported relocations and before your change it wasn't there, or wasn't used, or didn't have them. If libservice.a was already there, keep in mind that code from a static library is only included if it's used. Maybe "protocol.o" wasn't used by anything, but whatever you added did use it.
It depends on which kind of files are added by you into the source folder.
For example, if you add a new library file and that new library file is built against -fPIC build option, you may meet similar case.
Actually, the error message gives the enough information to you how to fix it.
It will be OK to recompile it via "-fPIC" build option.
If you are building via command line, appendix "-fPIC" in the command
line.
If you are building via Makefiles, modify the makefiles as below:
CFLAGS += -fPIC
For ARM Compiler, please refer to
http://infocenter.arm.com/help/topic/com.arm.doc.dui0804b/CHDECFHF.html?resultof=%22%2d%66%70%69%63%22%20
For GCC Compiler, please refer to
https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html

ARM + gcc: don't use one big .rodata section

I want to compile a program with gcc with link time optimization for an ARM processor. When I compile without LTO, the system gets compiled. When I enable LTO
(with -flto), I get the following assembler-error:
Error: invalid literal constant: pool needs to be closer
Looking around the web I found out that this has something to do with the constants in my system, which are placed in a special section called .rodata, which is called a constant pool and is placed right after the .text section in my system. It seems that when compiling with LTO because of inlining and other optimizations this .rodata section gets too far away from the instructions, so that the addressing of the constants is not possible anymore. Is it possible to place the constants right after the function that uses them? Or is it possible to use another addressing mode so the .rodata section can still be addressed? Thanks.
This is an assembler message, not a linker message, so this happens before sections are generated.
The assembler has a pseudo instruction for loading constants into registers:
ldr r0, =0x12345678
this is expanded into
ldr r0, [constant_12345678, r15]
...
bx lr
constant_12345678:
dw 0x12345678
The constant pool usually follows the return instruction. With function inlining, the function can get long enough that the return instruction is too far away; unfortunately, the compiler has no idea of the distance between memory addresses, and the assembler has no idea of control flow other than "flow does not pass beyond the return instruction, so it is safe to emit the constant pool here".
Unfortunately, there is no good solution at the moment.
You could try an asm block containing
b 1f
.ltorg
1:
This will force-emit the constant pool at this point, at the cost of an extra branch instruction.
It may be possible to instruct the assembler to omit the branch if the constant pool is empty, but I cannot test that at the moment, so this is probably not valid:
.if (2f - 1f)
.b 2f
.endif
1:
.ltorg
2:
"This is an assembler message, not a linker message, so this happens before sections are generated" - I am not sure but I think it is a little bit more complicated with LTO. Compiling (including assembling) of the individual c-files with LTO enabled works fine and does not cause any problems. The problem occurs when I try to link them together with LTO enabled. I don't know how LTO is exactly done, but apparently this also includes calling the assembler again and then I get this error message. When linking without LTO, everything is fine and when I look at the disassemly I can see that my constants are not placed after a function. Instead all constants are placed in the .rodata section. With LTO enabled because of inlining, my functions probably get to large to reach the constant pool...

What does this GCC error "... relocation truncated to fit..." mean?

I am programming the host side of a host-accelerator system. The host runs on the PC under Ubuntu Linux and communicates with the embedded hardware via a USB connection. The communication is performed by copying memory chunks to and from the embedded hardware's memory.
On the board's memory there is a memory region which I use as a mailbox where I write and read the data. The mailbox is defined as a structure and I use the same definition to allocate a mirror mailbox in my host space.
I used this technique successfully in the past so now I copied the host Eclipse project to my current project's workspace, and made the appropriate name changes. The strange thing is that when building the host project I now get the following message:
Building target: fft2d_host
Invoking: GCC C Linker
gcc -L/opt/adapteva/esdk/tools/host/x86_64/lib -o "fft2d_host" ./src/fft2d_host.o -le_host -lrt
./src/fft2d_host.o: In function `main':
fft2d_host.c:(.text+0x280): relocation truncated to fit: R_X86_64_PC32 against symbol `Mailbox' defined in COMMON section in ./src/fft2d_host.o
What does this error mean and why it won't build on the current project, while it is OK with the older project?
You are attempting to link your project in such a way that the target of a relative addressing scheme is further away than can be supported with the 32-bit displacement of the chosen relative addressing mode. This could be because the current project is larger, because it is linking object files in a different order, or because there's an unnecessarily expansive mapping scheme in play.
This question is a perfect example of why it's often productive to do a web search on the generic portion of an error message - you find things like this:
http://www.technovelty.org/code/c/relocation-truncated.html
Which offers some curative suggestions.
Minimal example that generates the error
main.S moves an address into %eax (32-bit).
main.S
_start:
mov $_start, %eax
linker.ld
SECTIONS
{
/* This says where `.text` will go in the executable. */
. = 0x100000000;
.text :
{
*(*)
}
}
Compile on x86-64:
as -o main.o main.S
ld -o main.out -T linker.ld main.o
Outcome of ld:
(.text+0x1): relocation truncated to fit: R_X86_64_32 against `.text'
Keep in mind that:
as puts everything on the .text if no other section is specified
ld uses the .text as the default entry point if ENTRY. Thus _start is the very first byte of .text.
How to fix it: use this linker.ld instead, and subtract 1 from the start:
SECTIONS
{
. = 0xFFFFFFFF;
.text :
{
*(*)
}
}
Notes:
we cannot make _start global in this example with .global _start, otherwise it still fails. I think this happens because global symbols have alignment constraints (0xFFFFFFF0 works). TODO where is that documented in the ELF standard?
the .text segment also has an alignment constraint of p_align == 2M. But our linker is smart enough to place the segment at 0xFFE00000, fill with zeros until 0xFFFFFFFF and set e_entry == 0xFFFFFFFF. This works, but generates an oversized executable.
Tested on Ubuntu 14.04 AMD64, Binutils 2.24.
Explanation
First you must understand what relocation is with a minimal example: https://stackoverflow.com/a/30507725/895245
Next, take a look at objdump -Sr main.o:
0000000000000000 <_start>:
0: b8 00 00 00 00 mov $0x0,%eax
1: R_X86_64_32 .text
If we look into how instructions are encoded in the Intel manual, we see that:
b8 says that this is a mov to %eax
0 is an immediate value to be moved to %eax. Relocation will then modify it to contain the address of _start.
When moving to 32-bit registers, the immediate must also be 32-bit.
But here, the relocation has to modify those 32-bit to put the address of _start into them after linking happens.
0x100000000 does not fit into 32-bit, but 0xFFFFFFFF does. Thus the error.
This error can only happen on relocations that generate truncation, e.g. R_X86_64_32 (8 bytes to 4 bytes), but never on R_X86_64_64.
And there are some types of relocation that require sign extension instead of zero extension as shown here, e.g. R_X86_64_32S. See also: https://stackoverflow.com/a/33289761/895245
R_AARCH64_PREL32
Asked at: How to prevent "main.o:(.eh_frame+0x1c): relocation truncated to fit: R_AARCH64_PREL32 against `.text'" when creating an aarch64 baremetal program?
I ran into this problem while building a program that requires a huge amount of stack space (over 2 GiB). The solution was to add the flag -mcmodel=medium, which is supported by both GCC and Intel compilers.
On Cygwin -mcmodel=medium is already default and doesn't help. To me adding -Wl,--image-base -Wl,0x10000000 to GCC linker did fixed the error.
Often, this error means your program is too large, and often it's too large because it contains one or more very large data objects. For example,
char large_array[1ul << 31];
int other_global;
int main(void) { return other_global; }
will produce a "relocation truncated to fit" error on x86-64/Linux, if compiled in the default mode and without optimization. (If you turn on optimization, it could, at least theoretically, figure out that large_array is unused and/or that other_global is never written, and thus generate code that doesn't trigger the problem.)
What's going on is that, by default, GCC uses its "small code model" on this architecture, in which all of the program's code and statically allocated data must fit into the lowest 2GB of the address space. (The precise upper limit is something like 2GB - 2MB, because the very lowest 2MB of any program's address space is permanently unusable. If you are compiling a shared library or position-independent executable, all of the code and data must still fit into two gigabytes, but they're not nailed to the bottom of the address space anymore.) large_array consumes all of that space by itself, so other_global is assigned an address above the limit, and the code generated for main cannot reach it. You get a cryptic error from the linker, rather than a helpful "large_array is too large" error from the compiler, because in more complex cases the compiler can't know that other_global will be out of reach, so it doesn't even try for the simple cases.
Most of the time, the correct response to getting this error is to refactor your program so that it doesn't need gigantic static arrays and/or gigabytes of machine code. However, if you really have to have them for some reason, you can use the "medium" or "large" code models to lift the limits, at the price of somewhat less efficient code generation. These code models are x86-64-specific; something similar exists for most other architectures, but the exact set of "models" and the associated limits will vary. (On a 32-bit architecture, for instance, you might have a "small" model in which the total amount of code and data was limited to something like 224 bytes.)
Remember to tackle error messages in order. In my case, the error above this one was "undefined reference", and I visually skipped over it to the more interesting "relocation truncated" error. In fact, my problem was an old library that was causing the "undefined reference" message. Once I fixed that, the "relocation truncated" went away also.
I may be wrong, but in my experience there's another possible reason for the error, the root cause being a compiler (or platform) limitation which is easy to reproduce and work around. Next the simplest example
define an array of 1GB with:
char a[1024 x 1024 x 1024];
Result: it works, no warnings. Can use 1073741824 instead of the triple product naturally
Double the previous array:
char a[2 x 1024 x 1024 x 1024];
Result in GCC: "error: size of array 'a' is negative" => That's a hint that the array argument accepted/expected is of type signed int
Based on the previous, cast the argument:
char a[(unsigned)2 x 1024 x 1024 x 1024];
Result: error relocation truncated to fit appears, along with this warning: "integer overflow in expression of type 'int'"
Workaround: use dynamic memory. Function malloc() takes an argument of type size_t which is a typedef of unsigned long long int thus avoiding the limitation
This has been my experience using GCC on Windows. Just my 2 cents.
I encountered the "relocation truncated" error on a MIPS machine. The -mcmodel=medium flag is not available on mips, instead -mxgot may help there.
I ran into the exact same issue. After compiling without the -fexceptions build flag, the file compiled with no issue
I ran into this error on 64 bit Windows when linking a c++ program which called a nasm function. I used nasm for assembly and g++ to compile the c++ and for linking.
In my case this error meant I needed DEFAULT REL at the top of my nasm assembler code.
It's written up in the NASM documentation:
Chapter 11: Writing 64-bit Code (Unix, Win64)
Obvious in retrospect, but it took me days to arrive there, so I decided to post this.
This is a minimal version of the C++ program:
> extern "C" { void matmul(void); }
int main(void) {
matmul();
return 0;
}
This is a minimal version of the nasm program:
; "DEFAULT REL" means we can access data in .bss, .data etc
; because we generate position-independent code in 64-bit "flat" memory model.
; see NASM docs
; Chapter 11: Writing 64-bit Code (Unix, Win64)
;DEFAULT REL
global matmul
section .bss
align 32 ; because we want to move 256 bit packed aligned floats to and from it
saveregs resb 32
section .text
matmul:
push rbp ; prologue
mov rbp,rsp ; aligns the stack pointer
; preserve ymm6 in local variable 'saveregs'
vmovaps [saveregs], ymm6
; restore ymm6 from local variable 'saveregs'
vmovaps ymm6, [saveregs]
mov rsp,rbp ; epilogue
pop rbp ; re-aligns the stack pointer
ret
With DEFAULT REL commented out, I got the error message above:
g++ -std=c++11 -c SO.cpp -o SOcpp.o
\bin\nasm -f win64 SO.asm -o SOnasm.obj
g++ SOcpp.o SOnasm.obj -o SO.exe
SOnasm.obj:SO.asm:(.text+0x9): relocation truncated to fit: IMAGE_REL_AMD64_ADDR32 against `.bss'
SOnasm.obj:SO.asm:(.text+0x12): relocation truncated to fit: IMAGE_REL_AMD64_ADDR32 against `.bss'
collect2.exe: error: ld returned 1 exit status
With GCC, there's a -Wl,--default-image-base-low option that sometimes helps to deal with such errors, e.g. in some MSYS2 / MinGW configurations.

What kind of error is this "c(.text+0x7): relocation truncated to fit: 8 .data"

I was compiling/linking my program
i386-gcc -o output.lnx func.opc mainc.opc
and I kept getting that error. I honestly have no idea what this means.
Any clue?
thanks,
This is usually a symptom of having too much code or data in the program. The relocation at offset 7 in .text segment (code) has been compiled with a fixed size (2 or 4), but the data/instruction it is referring to is more than 64k or 2G away.
Other than that, I can't tell you how to fix it without actually seeing the object files. Useful tools for pinpointing the problem are objdump (with flags -dr) and readelf programs.

Resources