How can the Linux kernel compile itself? - c

I don't quite understand the compiling process of the Linux kernel when I install
a Linux system on my machine.
Here are some things that confused me:
The kernel is written in C, however how did the kernel get compiled without a compiler installed?
If the C compiler is installed on my machine before the kernel is compiled, how can the compiler itself get compiled without a compiler installed?
I was so confused for a couple of days, thanks for the response.

The first round of binaries for your Linux box were built on some other Linux box (probably).
The binaries for the first Linux system were built on some other platform.
The binaries for that computer can trace their root back to an original system that was built on yet another platform.
...
Push this far enough, and you find compilers built with more primitive tools, which were in turn built on machines other than their host.
...
Keep pushing and you find computers built so that their instructions could be entered by setting switches on the front panel of the machine.
Very cool stuff.
The rule is "build the tools to build the tools to build the tools...". Very much like the tools which run our physical environment. Also known as "pulling yourself up by the bootstraps".

I think you should distinguish between:
compile, v: To use a compiler to process source code and produce executable code [1].
and
install, v: To connect, set up or prepare something for use [2].
Compilation produces binary executables from source code. Installation merely puts those binary executables in the right place to run them later. So, installation and use do not require compilation if the binaries are available. Think about ”compile” and “install” like about “cook” and “serve”, correspondingly.
Now, your questions:
The kernel is written in C, however how did the kernel get compiled without a compiler installed?
The kernel cannot be compiled without a compiler, but it can be installed from a compiled binary.
Usually, when you install an operating system, you install an pre-compiled kernel (binary executable). It was compiled by someone else. And only if you want to compile the kernel yourself, you need the source and the compiler, and all the other tools.
Even in ”source-based” distributions like gentoo you start from running a compiled binary.
So, you can live your entire life without compiling kernels, because you have them compiled by someone else.
If the C compiler is installed on my machine before the kernel is compiled, how can the compiler itself get compiled without a compiler installed?
The compiler cannot be run if there is no kernel (OS). So one has to install a compiled kernel to run the compiler, but does not need to compile the kernel himself.
Again, the most common practice is to install compiled binaries of the compiler, and use them to compile anything else (including the compiler itself and the kernel).
Now, chicken and egg problem. The first binary is compiled by someone else... See an excellent answer by dmckee.

The term describing this phenomenon is bootstrapping, it's an interesting concept to read up on. If you think about embedded development, it becomes clear that a lot of devices, say alarm clocks, microwaves, remote controls, that require software aren't powerful enough to compile their own software. In fact, these sorts of devices typically don't have enough resources to run anything remotely as complicated as a compiler.
Their software is developed on a desktop machine and then copied once it's been compiled.
If this sort of thing interests you, an article that comes to mind off the top of my head is: Reflections on Trusting Trust (pdf), it's a classic and a fun read.

The kernel doesn't compile itself -- it's compiled by a C compiler in userspace. In most CPU architectures, the CPU has a number of bits in special registers that represent what privileges the code currently running has. In x86, these are the current privilege level bits (CPL) in the code segment (CS) register. If the CPL bits are 00, the code is said to be running in security ring 0, also known as kernel mode. If the CPL bits are 11, the code is said to be running in security ring 3, also known as user mode. The other two combinations, 01 and 10 (security rings 1 and 2 respectively) are seldom used.
The rules about what code can and can't do in user mode versus kernel mode are rather complicated, but suffice to say, user mode has severely reduced privileges.
Now, when people talk about the kernel of an operating system, they're referring to the portions of the OS's code that get to run in kernel mode with elevated privileges. Generally, the kernel authors try to keep the kernel as small as possible for security reasons, so that code which doesn't need extra privileges doesn't have them.
The C compiler is one example of such a program -- it doesn't need the extra privileges offered by kernel mode, so it runs in user mode, like most other programs.
In the case of Linux, the kernel consists of two parts: the source code of the kernel, and the compiled executable of the kernel. Any machine with a C compiler can compile the kernel from the source code into the binary image. The question, then, is what to do with that binary image.
When you install Linux on a new system, you're installing a precompiled binary image, usually from either physical media (such as a CD DVD) or from the network. The BIOS will load the (binary image of the) kernel's bootloader from the media or network, and then the bootloader will install the (binary image of the) kernel onto your hard disk. Then, when you reboot, the BIOS loads the kernel's bootloader from your hard disk, and the bootloader loads the kernel into memory, and you're off and running.
If you want to recompile your own kernel, that's a little trickier, but it can be done.

Which one was there first? the chicken or the egg?
Eggs have been around since the time of the dinosaurs..
..some confuse everything by saying chickens are actually descendants of the great beasts.. long story short: The technology (Egg) was existent prior to the Current product (Chicken)
You need a kernel to build a kernel, i.e. you build one with the other.
The first kernel can be anything you want (preferably something sensible that can create your desired end product ^__^)
This tutorial from Bran's Kernel Development teaches you to develop and build a smallish kernel which you can then test with a Virtual Machine of your choice.
Meaning: you write and compile a kernel someplace, and read it on an empty (no OS) virtual machine.
What happens with those Linux installs follows the same idea with added complexity.

It's not turtles all the way down. Just like you say, you can't compile an operating system that has never been compiled before on a system that's running that operating system. Similarly, at least the very first build of a compiler must be done on another compiler (and usually some subsequent builds too, if that first build turns out not to be able to compile its own source code just yet).
I think the very first Linux kernels were compiled on a Minix box, though I'm not certain about that. GCC was available at the time. One of the very early goals of many operating systems is to run a compiler well enough to compile their own source code. Going further, the first compiler was almost certainly written in assembly language. The first assemblers were written by those poor folks who had to write in raw machine code.
You may want to check out the Linux From Scratch project. You actually build two systems in the book: a "temporary system" that is built on a system you didn't build yourself, and then the "LFS system" that is built on your temporary system. The way the book is currently written, you actually build the temporary system on another Linux box, but in theory you could adapt it to build the temporary system on a completely different OS.

If I am understanding your question correctly. The kernel isn't "compiling itself" these days. Most Linux distributions today provide system installation through a linux live cd. The kernel is loaded from the CD into memory and operates as it would normally as if it were installed to disk. With a linux environment up and running on your system it is easy to just commit the necessary files to your disk.
If you were talking about the bootstrapping issue; dmckee summed it up pretty nice.
Just offering another possibility...

Related

GCC preprocessor directives for Arch Linux

Does GCC (or alternatively Clang) defines any macro when it is compiled for the Arch Linux OS?
I need to check that my software restricts itself from compiling under anything but Arch Linux (the reason behind this is off-topic). I couldn't find any relevant resources on the internet.
Does anyone know how to guarantee through GCC preprocessor directives that my binaries are only compilable under Arch Linux?
Of course I can always
#ifdef __linux__
...
#endif
But this is not precise enough.
Edit: This must be done through C source code and not by any building systems, so, for example, doing this through CMake is completely discarded.
Edit 2: Users faking this behaviour is not a problem since the software is distributed to selected clients and thus, actively trying to "misuse" our source code is "their decision".
Does GCC (or alternatively Clang) defines any macro when it is compiled for the Arch Linux OS?
No. Because there's nothing inherently specific to Arch Linux on the binary level. For what it's worth, when compiling the only things you/the compiler has to care about is the target architecture (i.e. what kind of CPU it's going to run with), data type sizes and alignments and function calling conventions.
Then later on, when it's time to link the compiled translation unit objects into the final binary executable, the runtime libraries around are also of concern. Without taking special precautions you're essentially locking yourself into the specific brand of runtime libraries (glibc vs. e.g. musl; libstdc++ vs. libc++) pulled by the linker.
One can easily sidestep the later problem by linking statically, but that limits the range of system and midlevel APIs available to the program. For example on Linux a purely naively statically linked program wouldn't be able to use graphics acceleration APIs like OpenGL-3.x or Vulkan, since those rely on loading components of the GPU drivers into the process. You can however still use X11 and indirect GLX OpenGL, since those work using wire protocols going over sockets, which are implemented using direct syscalls to the kernel.
And these kernel syscalls are exactly the same on the binary level for each and every Linux kernel of every distribution out there. Although inside of the kernel there's a lot of leeway when it comes to redefining interfaces, when it comes to the interfaces toward the userland (i.e. regular programs) there's this holy, dogmatic, ironclad rule that YOU NEVER BREAK USERLAND! Kernel developers breaking this rule, intentionally or not are chewed out publicly by Linus Torvalds in his in-/famous rants.
The bottom line to this is, that there is no such thing as a "Linux distribution specific identifier on the binary level". At the end of the day, a Linux distribution is just that: A distribution of stuff. That means someone or more decided on a set of files that make up a working Linux system, wrap it up somehow and slap a name on it. That's it. There's nothing inherently specific to "Arch" Linux other than it's called "Arch" and (for the time being) relies on the pacman package manager. That's it. Everything else about "Arch", or any other Linux distribution, is just a matter of happenstance.
If you really want to sort different Linux distributions into certain bins regarding binary compatibility, then you'd have to pigeonhole the combinations of
Minimum required set of supported syscalls. This translates into minimum required kernel version.
What libc variant is being used; and potentially which version, although it's perfectly possible to link against a minimally supported set of functions, that has been around for almost "forever".
What variant of the C++ standard library the distribution decided upon. This actually also inflicts programs that might appear to be purely C, because certain system level libraries (*cough* Mesa *cough*) will internally pull a lot of C++ infrastructure (even compilers), also triggering other "fun" problems¹
I need to check that my software restricts itself from running under anything but Arch Linux (the reason behind this is off-topic). I couldn't find any relevant resources on the internet.
You couldn't find resources on the Internet, because there's nothing specific on the binary level that makes "Arch" Arch. For what it's worth right now, this instant I could create a fork of Arch, change out its choice of default XDG skeleton – so that by default user directories are populated with subdirs called leech, flicks, beats, pics – and call it "l33tz" Linux. For all intents and purposes it's no longer Arch. It does behave significantly different from the default Arch behavior, which would also be of concern to you, if you'd relied on any specific thing, and be it most minute.
Your employer doesn't seem to understand what Linux is or what distinguished distributions from each other.
Hint: It's not the binary compatibility. As a matter of fact, as long as you stay within the boring old realm of boring old glibc + libstdc++ Linux distributions are shockingly compatible with each other. There might be slight differences in where they put libraries other than libc.so, libdl.so and ld-linux[-${arch}].so, but those two usually always can be found under /lib. And once ld-linux[-${arch}].so and libdl.so take over (that means pulling in all libraries loaded at runtime) all the specifics of where shared objects and libraries are to be found are abstracted away by the dynamic linker.
1: like becoming multithreaded only after global constructors were executed and libstdc++ deciding it wants to be singlethreaded, because libpthread wasn't linked into a program that didn't create a single thread on its own. That was a really weird bug I unearthed, but yshui finally understood https://gitlab.freedesktop.org/mesa/mesa/-/issues/3199
You can list the predefined preprocessor macros with
gcc -dM -E - /dev/null
clang -dM -E - /dev/null
None of those indicate what operating system the compiler is running under. So not only you can't tell whether the program is compiled under Arch Linux, you can't even tell whether the program is compiled under Linux. The macros __linux__ and friends indicate that the program is being compiler for Linux. They are defined when cross-compiling from another system to Linux, and not defined when cross-compiling from Linux to another system.
You can artificially make your program more difficult to compile by specifying absolute paths for system headers and relying on non-portable headers (e.g. /usr/include/bits/foo.h). That can make cross-compilation or compilation for anything other than Linux practically impossible without modifying the source code. However, most Linux distributions install headers in the same location, so you're unlikely to pinpoint a specific distribution.
You're very likely asking the wrong question. Instead of asking how to restrict compilation to Arch Linux, start from why you want to restrict compilation to Arch Linux. If the answer is “because the resulting program wouldn't be what I want under another distribution”, then start from there and make sure that the difference results in a compilation error rather than incorrect execution. If the answer to “why” is something else, then you're probably looking for a technical solution to a social problem, and that rarely ends well.
No, it doesn't. And even if it did, it wouldn't stop anyone from compiling the code on an Arch Linux distro and then running it on a different Linux.
If you need to prevent your software from "from running under anything but Arch Linux", you'll need to insert a run-time check. Although, to be honest, I have no idea what that check might consist of, since linux distros are not monolithic products. The actual check would probably have to do with your reasons for imposing the restriction.

How to run executable file a.out created in my laptop gcc environment in other laptops?

I have written a program code in c compiled and executed in gcc compiler. I want to share the executable file of program without sharing actual source code. Is there any way to share my program without revealing actual source code so that executable file could run on other computers with gcc compilers??
Is there any way to share my program without revealing actual source code so that executable file could run on other computers with gcc compilers?
TL;DR: yes, provided a greater degree of similarity than just having GCC. One simply copies the binary file and any needed auxiliary files to a compatible system and runs it.
In more detail
It is quite common to distribute compiled binaries without source code, for execution on machines other than the ones on which those binaries were built. This mode of distribution does present potential compatibility issues (as described below), but so does source distribution. In broad terms, you simply install (copy) the binaries and any needed supporting files to suitable locations on a compatible system and execute them. This is the manner of distribution for most commercial software.
Architecture dependence
Compiled binaries are certainly specific to a particular hardware architecture, or in certain special cases to a small, predetermined set of two or more architectures (e.g. old Mac universal binaries). You will not be able to run a binary on hardware too different from what it was built for, but "architecture" is quite a different thing from CPU model.
For example, there is a very wide range of CPUs that implement the x86_64 architecture. Most programs targeting that architecture will run on any such CPU. Indeed, the x86 architecture is similar enough to x86_64 that most programs built for x86 will also run on x86_64 (but not vise versa). It is possible to introduce finer-grained hardware dependency, but you do not generally get that by default.
Operating system dependence
Furthermore, most binaries are built to run in the context of a host operating system. You will not be able to run a binary on an operating system too different from the one it was built for.
For example, Linux binaries do not run (directly) on Windows. Windows binaries do not run (directly) on OS X. Etc.
Library dependence
Additionally, a program built against shared libraries require a compatible version of each required shared library to be available in the runtime environment. That does not necessarily have to be exactly the same version against which it was built; that depends on the library, which of its functions and data are used, and whether and how those changed over time.
You can sidestep this issue by linking every needed library statically, up to and including the C standard library, or by distributing shared libraries along with your binary. It's fairly common to just live with this issue, however, and therefore to support only a subset of all possible environments with your binary distribution(s).
Other
There is a veritable universe of other potential compatibility issues, but it's unlikely that any of them would catch you by surprise with respect to a program that you wrote yourself and want to distribute. For example, if you use nVidia CUDA in your program then it might require an nVidia GPU, but such a requirement would surely be well known to you.
Executable are often specific to the environment/machine they were created on. Even if the same processor/hardware is involved, there may be dependencies on libraries that may prevent executables from just running on other machines.
A program that uses only "standard libraries" and that links all libraries statically, does not need any other dependency (in the sense that all the code it need is in the binary itself or into OS libraries that -being part of the system itself- are already on the system).
You have to link the standard library statically. Otherwise it will only work if the version of the standard library for your compiler is installed in your OS by default (which you can't rely on, in general).

native compilation & build linux kernel embedded system

I have cross-compiled a kernel, in an autodidactic manner, on a raspberry pi twice in the past.
This kind of things can sometimes a pain in the ... But fortunately there are some step-by-step tutorials.
So I am wondering whether there are general steps that have to be taken and that are the same on all the embedded systems (rpi, beaglebone, atmega controllers, etc...) in order to successfully cross-compile the kernel and make everything work?
My guess:
1) download the kernel source code
2) generate a .config file (which seems necessary)
3) get into the blue screen to do additional adjustements
with e.g.: make ARCH=arm CROSS_COMPILE=/usr/bin/arm-linux-gnueabi- menuconfig
4) compile the kernel:
make ARCH=arm CROSS_COMPILE=/usr/bin/arm-linux-gnueabi-
5) put it on the SD card or anything else
Would this be a correct general scheme for any cross-compilation on an embedded system?
Sorry for my ignorance, as I mentioned above I learned it by myself.
I would like to be able to setup a kernel on any embedded device.
Any more information or explanation would be more than welcome! As it seems this kind of things can always be done in multiple manners, it gets me confused.
I'd say your first two steps haven't much to do with cross-compiling. In fact it just comes down to having a cross toolchain targeting your platform correctly installed on your system.
The CROSS_COMPILE make variable of the kernel doesn't do anything other than prepending the string it is set to to any toolchain command (like e.g. gcc for compiling), so if your cross toolchain is installed in your search path, it would be enough to set it to just the desired target triplet with added hyphen, e.g. in your case CROSS_COMPILE=arm-linux-gnueabi-. This would lead to using the command arm-linux-gnueabi-gcc for compiling and so on.
For other embedded devices, you might need different cross toolchains (depending on their architecture), but the general process would indeed stay the same.

Cross Toolchain for ARM U-Boot Build Questions

I'm trying to build my own toolchain for an Raspberry-Pi.
I know there are plenty of prebuilt Toolchains. This work is for educational reasons.
I'm following the embedded arm linux from scratch book.
And succeeded in building a gcc and uClib so far.
I'm building for the target arm-unknown-linux-eabi.
Now that it comes to preparing a bootable filesystem i'm questioning myself about the bootloader build.
The part about the bootloader for this System seems to be incomplete.
Now I'm questioning myself how do I build a uboot for this System with my arm-unknown-linux-eabi toolchain.
Do I need to build a toolchain which doesn't depend on linux kernel calls.
My first reasearch lead me to the point that there are separate kind of tool chain
the OS dependent (linux kernel sys-calls etc...) and the ones which don't need to have a kernel underneath. Sometimes refered to as "Bare-Metal" toolchain or "standalone" toolchain.
Some sources mention that it would be possible to build an U-Boot with the linux toolchain.
If this is true why and how should this work?
And if I have to build a second toolchain for "Bare Metal" Toolchain where can I find informations about the difference between these two. Do I need another libstdc?
You can built U-Boot with the same cross-toolchain used to build the kernel - and most probably the rest of the user-space of the system.
A bootloader is - by definition - self-contained and doesn't care about your choice of C-runtime library because it doesn't use it. Therefore the issue of sys-calls doesn't come into it.
A toolchain is always going to need to be hosted by a fully functioning development system - invariably not your target system. Whatever references you see to a 'bare-metal toolchain' are not referring to the compiler's use of sys-calls (it relies heavily on the operating system for I/O). What is important when building bootloaders and kernels is that compiler and linker are configured to produce statically linked code that can run at specific memory address.
In almost all possible ways, there is no difference between the embedded and the Linux toolchain. But there is one exception.
That exception is __clear_cache - a function that can be generated by the compiler and in a "Linux"-toolchain includes a system call to synchronize instruction and data caches. (See http://blogs.arm.com/software-enablement/141-caches-and-self-modifying-code/ for more information about that bit.)
Now, unless you explicitly add a call to that function, the only way I know for it to be invoked is by writing nested functions in C (a GCC extension that should be avoided).
But it is a difference.

What alternative environments exist for building projects?

I was looking at the Linux From Scratch project awhile ago and was sort of disapointed that you needed an existing copy of Linux on your machine to build it. I know that Linux is very easy to obtain, install, etc. but I was hoping to build the LFS project outside of the modern operating systems (Unix/Linux/OS-X/Windows/Etc.) and in something like DOS.
My question is, how might I build a project whether it be C, C++ or some other language with a C compiler, without building that project within another operating system. By operating system I mean Unix, Linux, OS-X, Windows, and every other GUI capable 'modern-ish' OS.
So specifically I'm looking for something that works much like DOS. I'm not above using DOS if thats all that is available, however I'm thinking something that has the ability to use all available memory, processing power, etc. I want to start my computer and be welcomed by a "prompt" from which I can build or execute a program (like another Operating System).
In order to build a program you need to: execute other programs (compiler, linker), access a filesystem both for reading the code and writing out the compiled files, and so on. You need a "real" operating system, even more so if you want to "use all available memory" and processing power. If you don't like the "high level appearence" of GUI capable OSes, just try one of the many stripped-down linux distros: for instance, "damn small linux" comes to mind.
I think the closest you're going to come is a Gentoo Linux Stage 1 install. It basically gives you a prompt and then you compile EVERYTHING, including the kernel, from that minimal starting point. It's about as close as you're going to get without keying in the binary for the bootloader by hand ;)
My guess is, it will be lots of work, but this DOS compiler may help DJGPP. Minix may also be an option, but it does have X Windows. Beyond that, you are going to be hard pressed to find anything.

Resources