Difference in position-independent code: x86 vs x86-64 - c

I was recently building a certain shared library (ELF) targeting x86-64 architecture, like this:
g++ -o binary.so -shared --no-undefined ... -lfoo -lbar
This failed with the following error:
relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
Of course, it means I need to rebuild it as position-independent code, so it's suitable for linking into a shared library.
But this works perfectly well on x86 with exactly the same build arguments. So the question is, how is relocation on x86 different from x86-64 and why don't I need to compile with -fPIC on the former?

I have found a nice and detailed explanation, which boils down to:
x86-64 uses IP-relative offset to load global data, x86-32 cannot, so it dereferences a global offset.
IP-relative offset does not work for shared libraries, because global symbols can be overridden, so x86-64 breaks down when not built with PIC.
If x86-64 built with PIC, the IP-relative offset dereference now yields a pointer to GOT entry, which is then dereferenced.
x86-32, however, already uses a dereference of a global offset, so it is turned into GOT entry directly.

It is a code model issues. By default, static code is build assuming the whole program will stay in the lower 2G part of the memory address space. Code for shared libraries need to be compiled for another memory model, either PIC, or with -mcmodel=large which will compile without making that assumption.
Note that -mcmodel=large is not implemented in older gcc version (it is in 4.4, it isn't in 4.2, I don't know for 4.3).
.

It's purely an arbitrary requirement the ABI people have imposed on us. There's no logical reason why the dynamic linker on x86_64 couldn't support non-PIC libraries. However, since x86_64 is not under such horrible register pressure as x86 (and has better features for PIC), I don't know of any significant reason not to use PIC.

Related

Is it possible in practice to compile millions of small functions into a static binary?

I've created a static library with about 2 million small functions, but I'm having trouble linking it to my main function, using GCC (tested 4.8.5 or 7.3.0) under Linux x86_64.
The linker complains about relocation truncations, very much like those in this question.
I've already tried using -mcmodel=large, but as the answer to that same question says, I would
"need a crt1.o that can handle full 64-bit addresses". I've then tried compiling one, following this answer, but recent glibc won't compile under -mcmodel=large, even if libgcc does, which accomplishes nothing.
I've also tried adding the flags -fPIC and/or -fPIE to no avail. The best I get is this sole error:
ld: failed to convert GOTPCREL relocation; relink with --no-relax
and adding that flag also doesn't help.
I've searched around the Internet for hours, but most posts are very old and I can't find a way to do this.
I'm aware this is not a common thing to try, but I think it should be possible to do this. I'm working in an HPC environment, so memory or time constraints are not the issue here.
Has anyone been successful in accomplishing something similar with a recent compiler and toolchain?
Either don't use the standard library or patch it. As for the 2.34 version, Glibc doesn't support the large code model. (See also Glibc mailing list and Redhat Bugzilla)
Explanation
Let's examine the Glibc source code to understand why recompiling with -mcmodel=large accomplished nothing. It replaced the relocations originating from C files. But Glibc contained hardcoded 32-bit relocations in raw Assembly files, such as in start.S (sysdeps/x86_64/start.S).
call *__libc_start_main#GOTPCREL(%rip)
start.S emitted R_X86_64_GOTPCREL for __libc_start_main, which used relative addressing. x86_64 CALL instruction didn't support relative jumps by more than 32-bit displacement, see AMD64 Manual 3. So, ld couldn't offset the relocation R_X86_64_GOTPCREL because the code size surpassed 2GB.
Adding -fPIC didn't help due to the same ISA constraints. For position-independent code, the compiler still generated relative jumps.
Patching
In short, you have to replace 32-bit relocations in the Assembly code. See System V Application Binary Interface AMD64 Architecture Process Supplement for more info about implementing 64-bit relocations. See also this for a more in-depth explanation of code models.
Why don't 32-bit relocations suffice for the large code model? Because we can't rely on other symbols being in a range of 2GB. All calls must become absolute. Contrast with the small PIC code model, where the compiler generates relative jumps whenever possible.
Let's look closely at the R_X86_64_GOTPCREL relocation. It contains the 32-bit difference between RIP and the symbol's GOT entry address. It has a 64-bit substitute — R_X86_64_GOTPCREL64, but I couldn't find a way to use it in Assembly.
So, to replace the GOTPCREL, we have to compute the symbol entry GOT base offset and the GOT address itself. We can calculate the GOT location once in the function prologue because it doesn't change.
First, let's get the GOT base (code lifted wholesale from the ABI Supplement). The GLOBAL_OFFSET_TABLE relocation specifies the offset relative to the current position:
leaq 1f(%rip), %r11
1: movabs $_GLOBAL_OFFSET_TABLE_, %r15
leaq (%r11, %r15), %r15
With the GOT base residing on the %r15 register, now we have to find the symbol's GOT entry offset. The R_X86_64_GOT64 relocation specifies exactly this. With this, we can rewrite the call to __libc_start_main as:
movabs $__libc_start_main#GOT, %r11
call *(%r11, %r15)
We replaced R_X86_64_GOTPCREL with GLOBAL_OFFSET_TABLE and R_X86_64_GOT64. Replace others in the same vein.
N.B.: Replace R_X86_64_GOT64 with R_X86_64_PLTOFF64 for functions from dynamically linked executables.
Testing
Verify the patch correctness using the following test that requires the large code model. It doesn't contain a million small functions, having one huge function and one small function instead.
Your compiler must support the large code model. If you use GCC, you'll need to build it from the source with the flag -mcmodel=large. Startup files shouldn't contain 32-bit relocations.
The foo function takes more than 2GB, rendering 32-bit relocations unusable. Thus, the test will fail with the overflow error if compiled without -mcmodel=large. Also, add flags -O0 -fPIC -static, link with gold.
extern int foo();
extern int bar();
int foo(){
bar();
// Call sys_exit
asm( "mov $0x3c, %%rax \n"
"xor %%rdi, %%rdi \n"
"syscall \n"
".zero 1 << 32 \n"
: : : "rax", "rdx");
return 0;
}
int bar(){
return 0;
}
int __libc_start_main(){
foo();
return 0;
}
int main(){
return 0;
}
N.B. I used patched Glibc startup files without the standard library itself, so I had to define both _libc_start_main and main.

How do I use the GNU linker instead of the Darwin Linker?

I'm running OS X 10.12 and I'm developing a basic text-based operating system. I have developed a boot loader and that seems to be running fine. My only problem is that when I attempt to compile my kernel into pure binary, the linker won't work. I have done some research and I think that this is because of the fact OS X runs the Darwin linker and not the GNU linker. Because of this, I have downloaded and installed the GNU binutils. However, it still won't work...
Here is my kernel:
void main() {
// Create pointer to a character and point it to the first cell of video
// memory (i.e. the top-left)
char* video_memory = (char*) 0xb8000;
// At that address, put an x
*video_memory = 'x';
}
And this is when I attempt to compile it:
Hazims-MacBook-Pro:32 bit root# gcc -ffreestanding -c kernel.c -o kernel.o
Hazims-MacBook-Pro:32 bit root# ld -o kernel.bin -T text 0x1000 kernel.o --oformat binary
ld: unknown option: -T
Hazims-MacBook-Pro:32 bit root#
I would love to know how to solve this issue. Thank you for your time.
-T is a gcc compiler flag, not a linker flag. Have a look at this:
With these components you can now actually build the final kernel. We use the compiler as the linker as it allows it greater control over the link process. Note that if your kernel is written in C++, you should use the C++ compiler instead.
You can then link your kernel using:
i686-elf-gcc -T linker.ld -o myos.bin -ffreestanding -O2 -nostdlib boot.o kernel.o -lgcc
Note: Some tutorials suggest linking with i686-elf-ld rather than the compiler, however this prevents the compiler from performing various tasks during linking.
The file myos.bin is now your kernel (all other files are no longer needed). Note that we are linking against libgcc, which implements various runtime routines that your cross-compiler depends on. Leaving it out will give you problems in the future. If you did not build and install libgcc as part of your cross-compiler, you should go back now and build a cross-compiler with libgcc. The compiler depends on this library and will use it regardless of whether you provide it or not.
This is all taken directly from OSDev, which documents the entire process, including a bare-bones kernel, very clearly.
You're correct in that you probably want binutils for this especially if you're coding baremetal; while clang as is purports to be a cross compiler it's far from optimal or usable here, for various reasons. noticing you're developing on ARM I infer; you want this.
https://developer.arm.com/open-source/gnu-toolchain/gnu-rm
Aside from the fact that gcc does this thing better than clang markedly, there's also the issue that ld does not build on OS X from the binutils package; it in some configurations silently fails so you may in fact never have actually installed it despite watching libiberty etc build, it will even go through the motions of compiling the source of that target sometimes and just refuse to link it... to the fellow with the lousy tone blaming OP, if you had relevant experience ie ever had built this under this condition you would know that is patently obnoxious. it'd be nice if you'd refrain from discouraging people from asking legitimate questions.
In the CXXfilt package they mumble about apple-darwin not being a target; try changing FAKE_TARGET to instead of mn10003000-whatever or whatever they used, to apple-rhapsody some time.
You're still in way better shape just building them from current if you say need to strip relocations from something or want to work on restoring static linkage to the system. which is missing by default from that clang installation as well...anyhow it's not really that ld couldn't work with macho, it's all there, codewise in fact...that i am sure of
Regarding locating things in memory, you may want to refer to a linker script
http://svn.screwjackllc.com/?p=noid.git;a=blob_plain;f=new_mbed_bs.link_script.ld
As i have some code in there that will directly place things in memory, rather than doing it on command line it is more reproducible to go with the linker script. it's a little complex but what it is doing is setting up a couple of regions of memory to be used with my memory allocators, you can use malloc, but you should prefer not to use actual malloc; dynamic memory is fine when it isn't dynamic...heh...
The script also sets flags for the stack and heap locations, although they are just markers, not loaded til go time, they actually get placed, stack and heap, by the startup code, which is in assembly and rather readable and well commented (hard to believe, i know)... neat trick, you have some persistence to volatile memory, so i set aside a very tiny bit to flip and you can do things like have it control what bootloader to run on the next power cycle. again you are 100% correct regarding the linker; seems to be you are headed the right direction. incidentally another way you can modify objects prior to loading them , and preload things in memory, similar to this method, well there are a ton of ways, but, check out objcopy and objdump...you can use gdb to dump srecs of structures in memory, note the address, and then before linking but after assembly use dd to insert the records you extracted with gdb back in to extracted sections..is one of my favorite ways just because is smartass route :D also, if you are tight on memory ever and need to precalculate constants it's one way to optimize things...that way is actually closer to what ld is doing, just doing it by hand... probably path of least resistance on this now though is linker script.

Is the -mx32 GCC flag implemented (correctly)?

I am trying to build a program that communicates with a 32-bit embedded system, that runs on a Linux based x86_64 machine (host). On the host program I have a structure containing a few pointers that reflects an identical structure on the embedded system.
The problem is that on the host, pointers are natively 64-bits, so the offset of the structure members is not the same as in the embedded system. Thus, when copying the structure (as memcpy), the contents end up at the wrong place in the host copy.
struct {
float a;
float b;
float *p;
float *q;
} mailbox;
// sizeof(mailbox) is 4*4=16 on the embedded, but 2*4+2*8=24 on the host
Luckily, I found out here that gcc has an option -mx32 for generating 32-bit pointers on x86_64 machines. But, when trying to use this, I get an error saying:
$ gcc -mx32 test.c -o test.e
cc1: error: unrecognized command line option "-mx32"
This is for gcc versions 4.4.3 and 4.7.0 20120120 (experimental).
Why doesn't this option work? Is there a way around this?
EDIT: Accrding to the v4.4.7 manual, there was no -mx32 option available, and this is true up to v4.6.3. OTOH, v4.7.0 does show that option, so it may be that the Jan-20 version I am using is not the final one?!
Don't do this. First, x32 is a separate architecture. It's not merely a compiler switch. You need an x32 version of every library you link against to make this work. Linux distros aren't yet producing x32 versions, so that means you'll be either linking statically or rolling your own library environment.
More broadly: that's just asking for trouble. If your structure contains pointers they should be pointers. If it contains "32 bit addresses" they should be a 32 bit integer type.
You might need a newer version of binutils
Though I think gcc 4.8 is recommended
But in general you need a kernel compiled multilib with it: https://unix.stackexchange.com/questions/121424/linux-and-x32-abi-how-to-use

dynamically loaded object loaded into a C program gives undefined symbol errors on x86_64

I have a C program that dynamically loads a .so file at runtime in order to connect to a MySQL database. On an x86 (32bit) kernel this works fine but when I recompile my program on an x86_64 (64 bit) kernel I get runtime errors like this:
dlerror: mysql-1.932-x86_64-freebsd7.2.so::plugin_tweak_products: Undefined symbol "plugin_filter_cart"
dlerror: mysql-1.932-x86_64-freebsd7.2.so::plugin_shutdown: Undefined symbol "plugin_post_action"
Obviously from the error message above you can see that this program is running on a FreeBSD 7.2 x86_64 machine. Both the C program and the .so file are compiled for 64 bit.
I am passing RTLD_LAZY to dlopen() when I load the .so file. I think the problem is that for some reason on x86_64 it is not dynamically loading parts of the library as needed but on 32 bit x86 it is. Is there some flag I can put in my Makefile.am to get this to work on x86_64? Any other ideas?
Here is what the file command lists for my C program
ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), for FreeBSD 7.2, dynamically linked (uses shared libs), FreeBSD-style, not stripped
and for the .so file
ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), not stripped
Just a wild guess. The prefix plugin seems to indicate there might be some callbacks with function pointers going on. Also probably your compiler versions are not the same for 32 and 64 bit? Do you use C99's or gcc's inline feature?
Such things can happen if one variant of your compiler is able to inline some function (static or inline) and the other doesn't. Then an external symbol might be produced or not. This depends a lot of your compiler version, gcc had different strategies to handle such situations over time. Try to enforce the implementation of the function in at least one of your objects. And as roguenut indicates, check with nm for the missing symbols.
It looks like this was being caused by the same problem as
dlerror: Undefined symbol "_nss_cache_cycle_prevention_function" on FreeBSD 7.2
You need to call dlerror() first and ignore the return value to clear out errors from previous errors before you check the dlerror()'s return value.

GCC, ARMboot - Creating standalone application without any library and any OS

I have an embedded hardware system which contains a bootloader based on ARMboot (which is very similar to Uboot and PPCboot).
This bootloader normally serves to load uClinux image from the flash. However, now I am trying to use this bootloader to run a standalone helloworld application, which does not require any linked library. Actually, it contains only while(1){} code in the main function.
My problem is that I cannot find out what GCC settings should I use in order to build a standalone properly formatted binary.
I do use following build command:
cr16-elf-gcc -o helloworld helloworld.c -nostdlib
which produces warning message:
warning: cannot find entry symbol _start; defaulting to 00000004
Thereafter, within the bootloader, I upload a produced application and start it at some address:
tftpboot 0xa00000 helloworld
go 0xa00004
But it doesn't work :(
The system reboots.
Normally it should just hang (because of while(1)).
I don't know that loader, but I think you should use objcopy like this to dump your executable data to a raw binary file. Don't jump to ELF headers, people :)
objcopy -O binary ./a.out o.bin
Also try to compile position independent code and to read ld and gcc manuals.
The linker is complaining about missing startup code.
You need to provide two things: startup code and a linker command file that defines the address map of your target processor.
In your case the startup code is as "bl main", but usually the startup code will initialize the stack pointer at least before branching to main.
If you know you are loading your example into RAM, you can start your program at main directly. You'll need to determine main()'s address ate use that for your "go" command.
I operate on the ARM non-os non-lib all day every day. This is my current gcc options:
arm-whatever-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -c hello.c -o hello.o
then I use the linker to combine the C code with the vector tables and such, even if it is not an image that needs a vector table using a vector table makes it easy to put your entry point on the first instruction.
Any reason you can't statically link at least the standard libraries in? You should have a working program and the benefits of the standard libraries without external dependencies.
Also, does your toolchain/IDE provide differentiate between "standalone application" and "linux application"? The IDE for the AVR32 has that distinction and is able to generate either a program that runs within the embedded linux environment or a standalone program that basically becomes the OS.

Resources