Why specify the target architecture to the linker? - c

I've been working on using the Meson build system for an embedded project. Since I'm working on an embedded platform, I've written a custom linker script and also an invocation for the linker. I didn't have any problems until I tried to link in newlib to my project, when I started to have link issues. Just before I got it working, the last error was undefined reference to main which I knew was clearly in the project.
Out of happenstance, I tried adding -mcpu=cortex-m4 to my linker invocation (I am using gcc to link, I am told this is quite typical instead of directly calling ld). It worked! Now, my only question is "why"?
Perhaps I am missing something about how the linking process actually works, but considering I am just producing an ELF file, I didn't think it would be important to specify the CPU architecture to the linker. Is this a newlib thing, or has gcc just been doing magic behind the scenes for me that I haven't seen before?
For reference, here's my project (it's not complete)

In general, you should always link via the compiler driver (link forms of the gcc command), not via direct invocation of ld. If you're developing for bare metal on a particular exact target, it's possible to determine the set of linker arguments you need and use ld directly, but there's a lot that the compiler driver takes care of for you, and it's usually better to let it. (If you don't have a single fixed target, there are unlimited combinations of possibilities and no way you can reproduce all present and future ones someone may care about.)
You can still pass whatever options you like to the linker, e.g. custom linker scripts, via -Wl,... option forms.
As for why the specific target architecture ISA level could matter to linking, linking is not a dumb process of just sticking together binary chunks. Linking can involve patching up (relocations) or even generating (thunks for distant jump targets, etc.) code, in which case the linker may need to care what particular ISA level/variant it's targeting.

Such linker options ensure that the appropriate standard library and start-up code is linked when these are defaulted rather then explicitly specified or overridden.
The one ARM toolchain supports a variety of ARM architecture variants and options; they may be big or little-endian, have various instruction sets - ARM, Thumb. Thumb-2, ARM64 etc, and various extensions such a SIMD or DSP units. The linker requires the architecture information to select the correct library to link for both performance and binary compatibility.

Related

GCC preprocessor directives for Arch Linux

Does GCC (or alternatively Clang) defines any macro when it is compiled for the Arch Linux OS?
I need to check that my software restricts itself from compiling under anything but Arch Linux (the reason behind this is off-topic). I couldn't find any relevant resources on the internet.
Does anyone know how to guarantee through GCC preprocessor directives that my binaries are only compilable under Arch Linux?
Of course I can always
#ifdef __linux__
...
#endif
But this is not precise enough.
Edit: This must be done through C source code and not by any building systems, so, for example, doing this through CMake is completely discarded.
Edit 2: Users faking this behaviour is not a problem since the software is distributed to selected clients and thus, actively trying to "misuse" our source code is "their decision".
Does GCC (or alternatively Clang) defines any macro when it is compiled for the Arch Linux OS?
No. Because there's nothing inherently specific to Arch Linux on the binary level. For what it's worth, when compiling the only things you/the compiler has to care about is the target architecture (i.e. what kind of CPU it's going to run with), data type sizes and alignments and function calling conventions.
Then later on, when it's time to link the compiled translation unit objects into the final binary executable, the runtime libraries around are also of concern. Without taking special precautions you're essentially locking yourself into the specific brand of runtime libraries (glibc vs. e.g. musl; libstdc++ vs. libc++) pulled by the linker.
One can easily sidestep the later problem by linking statically, but that limits the range of system and midlevel APIs available to the program. For example on Linux a purely naively statically linked program wouldn't be able to use graphics acceleration APIs like OpenGL-3.x or Vulkan, since those rely on loading components of the GPU drivers into the process. You can however still use X11 and indirect GLX OpenGL, since those work using wire protocols going over sockets, which are implemented using direct syscalls to the kernel.
And these kernel syscalls are exactly the same on the binary level for each and every Linux kernel of every distribution out there. Although inside of the kernel there's a lot of leeway when it comes to redefining interfaces, when it comes to the interfaces toward the userland (i.e. regular programs) there's this holy, dogmatic, ironclad rule that YOU NEVER BREAK USERLAND! Kernel developers breaking this rule, intentionally or not are chewed out publicly by Linus Torvalds in his in-/famous rants.
The bottom line to this is, that there is no such thing as a "Linux distribution specific identifier on the binary level". At the end of the day, a Linux distribution is just that: A distribution of stuff. That means someone or more decided on a set of files that make up a working Linux system, wrap it up somehow and slap a name on it. That's it. There's nothing inherently specific to "Arch" Linux other than it's called "Arch" and (for the time being) relies on the pacman package manager. That's it. Everything else about "Arch", or any other Linux distribution, is just a matter of happenstance.
If you really want to sort different Linux distributions into certain bins regarding binary compatibility, then you'd have to pigeonhole the combinations of
Minimum required set of supported syscalls. This translates into minimum required kernel version.
What libc variant is being used; and potentially which version, although it's perfectly possible to link against a minimally supported set of functions, that has been around for almost "forever".
What variant of the C++ standard library the distribution decided upon. This actually also inflicts programs that might appear to be purely C, because certain system level libraries (*cough* Mesa *cough*) will internally pull a lot of C++ infrastructure (even compilers), also triggering other "fun" problems¹
I need to check that my software restricts itself from running under anything but Arch Linux (the reason behind this is off-topic). I couldn't find any relevant resources on the internet.
You couldn't find resources on the Internet, because there's nothing specific on the binary level that makes "Arch" Arch. For what it's worth right now, this instant I could create a fork of Arch, change out its choice of default XDG skeleton – so that by default user directories are populated with subdirs called leech, flicks, beats, pics – and call it "l33tz" Linux. For all intents and purposes it's no longer Arch. It does behave significantly different from the default Arch behavior, which would also be of concern to you, if you'd relied on any specific thing, and be it most minute.
Your employer doesn't seem to understand what Linux is or what distinguished distributions from each other.
Hint: It's not the binary compatibility. As a matter of fact, as long as you stay within the boring old realm of boring old glibc + libstdc++ Linux distributions are shockingly compatible with each other. There might be slight differences in where they put libraries other than libc.so, libdl.so and ld-linux[-${arch}].so, but those two usually always can be found under /lib. And once ld-linux[-${arch}].so and libdl.so take over (that means pulling in all libraries loaded at runtime) all the specifics of where shared objects and libraries are to be found are abstracted away by the dynamic linker.
1: like becoming multithreaded only after global constructors were executed and libstdc++ deciding it wants to be singlethreaded, because libpthread wasn't linked into a program that didn't create a single thread on its own. That was a really weird bug I unearthed, but yshui finally understood https://gitlab.freedesktop.org/mesa/mesa/-/issues/3199
You can list the predefined preprocessor macros with
gcc -dM -E - /dev/null
clang -dM -E - /dev/null
None of those indicate what operating system the compiler is running under. So not only you can't tell whether the program is compiled under Arch Linux, you can't even tell whether the program is compiled under Linux. The macros __linux__ and friends indicate that the program is being compiler for Linux. They are defined when cross-compiling from another system to Linux, and not defined when cross-compiling from Linux to another system.
You can artificially make your program more difficult to compile by specifying absolute paths for system headers and relying on non-portable headers (e.g. /usr/include/bits/foo.h). That can make cross-compilation or compilation for anything other than Linux practically impossible without modifying the source code. However, most Linux distributions install headers in the same location, so you're unlikely to pinpoint a specific distribution.
You're very likely asking the wrong question. Instead of asking how to restrict compilation to Arch Linux, start from why you want to restrict compilation to Arch Linux. If the answer is “because the resulting program wouldn't be what I want under another distribution”, then start from there and make sure that the difference results in a compilation error rather than incorrect execution. If the answer to “why” is something else, then you're probably looking for a technical solution to a social problem, and that rarely ends well.
No, it doesn't. And even if it did, it wouldn't stop anyone from compiling the code on an Arch Linux distro and then running it on a different Linux.
If you need to prevent your software from "from running under anything but Arch Linux", you'll need to insert a run-time check. Although, to be honest, I have no idea what that check might consist of, since linux distros are not monolithic products. The actual check would probably have to do with your reasons for imposing the restriction.

Where in the GCC source code does it compile to the different assembly languages?

Where is the code in the GCC source code that actually constructs the assembly for the different architectures?
Wondering how many different assembly languages it compiles to, and how it actually does this (by taking a look at the source code).
Is it in the gcc repo somewhere, or in another repo? I have started to dig around but haven't found anything.
https://github.com/gcc-mirror/gcc
For example, here is some of the assembly generating code in V8:
https://github.com/v8/v8-git-mirror/tree/master/src/x64
Is there anything equivalent for GCC?
I am wondering because it's a mystery how GCC does this, and it would be a great way to learn how compilers are actually implemented down to the assembly level.
The .md (machine description) files of GCC source contain stuff to generate assembly. GCC contains several specialized C/C++ code generators (and some of them translates the .md files into code emitting assembly).
GCC is a very complex program. The documentation of GCC MELT (an obsolete project) contains several interesting links and slides, notably refering to the Indian GCC Resource Center
Most of the optimizations in GCC happens in the middle-end (which is mostly independent of source language or target system), notably with many passes working on the Gimple representations.
The GCC repo is an SVN repository.
See also this answer, notably the pictures inside it.
The actual source code for GCC is most accessible from here:
https://gcc.gnu.org/svn.html
The software is accessible via SVN (subversion), a source code control system. This would be installed on many versions of Linux/UNIX, but if not on your platform, you can install the svn kit and then fetch the source using the following command:
svn checkout svn://gcc.gnu.org/svn/gcc/trunk SomeLocalDir
GCC is complex and would take significant experience to understand the nature of how the application actually compiles to different architectures.
In a nutshell, GCC has three major components - front-end, middle and back-end processing. The front-end processor has the component of the language parsing to understand the syntax of languages (like C, C++, Objective-C, etc). The front-end deconstructs the code to a portable construct which is then passed to the back-end for compilation to the target environment.
The middle part performs code analysis and optimisation, attempting to prioritise the code to generate the best possible output at the end of the full process. Technically, optimisation can occur at any part of the process as patterns are discovered during analysis.
The back-end processor compiles the code to a tree-style output format (not actually final executable code). Based on what the expected output is designed to be, the "pseudo-code" is optimised for using registers, bit-sizes, endian-ness, and so on. The final code is then generated during the assembly phase, which converts the back-end code into machine executable instructions.
It's important to note that the compiler has many options to deal with output formats so you can create output to many classes of architecture, usually out of the box. For cross-compiling and target compiler options, try checking out this link:
https://gcc.gnu.org/install/configure.html

GCC Error while compiling for ARM

I am getting the following error while trying to compile some code for an ARM Cortex-M4
using
gcc -mcpu=cortex-m4 arm.c
`-mcpu=' is deprecated. Use `-mtune=' or '-march=' instead.
arm.c:1: error: bad value (cortex-m4) for -mtune= switch
I was following GCC 4.7.1 ARM options. Not sure whether I am missing some critical option. Any kickstart for using GCC for ARM will also be really helpful.
As starblue implied in a comment, that error is because you're using a native compiler built for compiling for x86 CPUs, rather than a cross-compiler for compiling to ARM.
GCC only supports a single general architecture type in any given compiler binary -- so, although the same copy of GCC can compile for both 32-bit and 64-bit x86 machines, you can't compile to both x86 and ARM with the same copy of GCC -- you need an ARM-specific GCC.
(As auselen suggests, getting a pre-built one will save you quite a lot of work, even if you're only using it as a starting point to get things set up. You need to have GCC, binutils, and a C library as a minimum, and those are all separate open-source projects that the pre-built versions have already done the work of combining. I'll recommend Sourcery CodeBench Lite since that's the one my company makes and I do think it's a fairly good one.)
As the error message says -mcpu is deprecated, and you should use the other options stated. However "deprectated" simply means that its use may not continue to be supported; it will still work.
ARM Cortex-M4 is ARM Architecture V7E-M, so you should use -march=armv7-m (the documentation does not specifically list armv7e-m, but that may have been added since the documentation was last updated. The E is essentially the difference between M3 and M4 - the DSP instructions, so the compiler will not generate code that takes advantage of these instructions. Using ARM's Cortex-M DSP library is probably the best way to use these instructions to benefit your application. If your part has an FPU, then other options will be needed enable code generation for that.
Like others already pointed out, you are using a compiler for your host machine, and you need a compiler for generating code for your target processor instead (a cross compiler). Like #Brooks suggested, you can use a pre-built toolchain, but if you want to roll out your own cross-compiler, libc and binutils, there is a nice tool called Crosstool-NG. It greatly simplifies the process of building a cross-compiler optimized to generate code for a specific processor, so you're not stuck with a generic prebuilt toolchain, which usually builds code for a family of compatible processors (e.g. you could tune the toolchain for generating ASM for your specific target, or floating point code for a hardware FPU which is specific to your processor, instead of using only software floating point routines, which are default to most pre-built toolchains).

What is the point of using `-L` when there is `LD_LIBRARY_PATH`?

After reading this question, my first reaction was that the user is not seeing the error because he specifies the location of the library with -L.
However, apparently, the -L option only influences where the linker looks, and has no influence over where the loader looks when you try to run the compiled application.
My question then is what's the point of -L? Since you won't be able to run your binary unless you have the proper directories in LD_LIBRARY_PATH anyway, why not just put them there in the first place, and drop the -L, since the linker looks in LD_LIBRARY_PATH automatically?
It might be the case that you are cross-compiling and the linker is targeting a system other than your own. For instance, MinGW can be used to compile Windows binaries on Linux. Here -L will point to the DLLs needed for linking and LD_LIBRARY_PATH will point to any libraries needed by linker to run. This allows compiling and linking of different architectures, OS ABIs, or processor types.
It's also helpful when trying to build special targets. I might be case that one links a static version of program against a different static library. This is the first step in Linux From Scratch, where one creates a separate mini-environment on the main system to become a chroot jail.
Setting LD_LIBRARY_PATH will affect all the commands you run to build your code (including the compiler itself).
That's not desirable in general (e.g. you might not want your compiler to run debug/instrumented libraries while it compiles - it might even go as far as breaking your compiles).
Use -L to tell the compiler where to look, LD_LIBRARY_PATH to influence runtime linking.
Building the binary and running the binary are two completely independent and unrelated processes. You seem to suggest that the running environment should affect the building environment, i.e. you seem to be making an assumption that the code build in some setup (account, machine) will be later run in the same setup. I find this assumption rather strange. I'd even say that in most cases the building and the running are done in different environments. I would actually prefer my compilers not to derive any assumptions about future running environment from the environment these compilers are invoked in. Looking onto the LD_LIBRARY_PATH of the building environment would be a major no-no.
The other answers are all good, but one nobody has mentioned yet is static libraries. Most of the time when you use -L it's with a static library built locally in your build tree that you don't intent to install, and it has nothing to do with LD_LIBRARY_PATH.
Compilers on Solaris support the -R /runtime/path/to/some/libs that adds to the path where libraries are to be searched by the run-time linker. On Linux the same could be achieved with -Wl,-rpath,/runtime/path/to/some/libs. It passes the -rpath /runtime/path/to/some/libs option to ld. GNU ld also supports the -R /path/to/libs for compatibility with other ELF linkers but this should be avoided as -R is normally used to specify symbol files to GNU ld.

Binary compatibility between avr-gcc 3.4.0 and avr-gcc 4.3.x

I have inherited an application that links to a library which MAY HAVE been built with gcc3. Or maybe with the imagecraft compiler. That information has now vanished to the heavenly bitfield and I am left with a libXXX.a library against which to link my app. I cannot recompile the libXXX.a because it requires certain unknown headers from imagecraft and somewhere else which at a certain point may have been ubiquitous in my environment but now are nowhere to be found.
My question is this, provided that my compiling my app with avr-gcc version 3.4.0 (and linking to that "special" libXXX) resulted in a working binary image, is it reasonable to expect that I could compile all the other parts of my app with avr-gcc 4 (this action having some very nice and proven benefits), link with libXXX and still get a working program?
Essentially, it all boils down to: is avr-gcc binary compatible with "mysterious compiler X which just may have been avr-gcc 3.something"?
To be honest, I have successfully compiled the rest of my app with avr-gcc4 and linked it with the library, and verified that the result works, but what kind of side effects or quirks should I be on the lookout for?
Linking libraries from different compilers (or -versions) will work reliably if both compilers use the same ABI (Application Binary Interface)
The ABI of a specific platform is typically specified by the dominant compiler for that platform, but that could be done by referencing an external specification.
ABI changes are rare, especially if the platform supports third-party libraries/applications, because an ABI change means that literally everything has to be rebuilt.

Resources