What are the first operations that the Linux Kernel executes on boot? - c

After the boot loader hands execution over to the kernel, what happens? I know assembler, so what are the first few instructions that a kernel must make? Or is there a C function that does this? What is the startup sequence before the kernel can execute an arbitrary binary?

I'll assume that you're talking about x86 here...
It depends where you consider the boundary between "boot loader" and "kernel" to be: the start of the kernel proper is 32-bit protected mode code, but the kernel itself provides some boot code to get there from real mode.
The real mode code is in arch/x86/boot/: start_of_setup does some basic setup of the environment for C, and calls main(), which does some fairly dull stuff, ending with the actual jump to protected mode (see pmjump.S).
Where you end up now depends on whether or not the kernel is compressed. If it is, the entry point is actually a self-decompression routine. This is fairly dull stuff as well, and essentially transparent: the decompression code and compressed kernel are moved higher up in memory out of the way, then the kernel is uncompressed to the original location, and then jumped into as if it had been uncompressed all along. This code is in arch/x86/boot/compressed/ (the entry point is startup_32 in head_32.S).
The kernel really gets going properly at startup_32 in arch/x86/kernel/head_32.S. The code there ends up by calling i386_start_kernel() in arch/x86/kernel/head32.c, which finally calls the generic kernel startup code in start_kernel().

It's asmlinkage void __init start_kernel(void) C function in init/main.c.

Related

What are the exact programming instructions that are in user space?

I know that a process switches between user mode and kernel mode for running. I am confused that for every line of code, we should possibly need the kernel. Below is the example, could I get explanation of the kernels role in execution of the following coding lines. Does the following actually require kernel mode.
if(a < 0)
a++
I am confused that for every line of code, we should possibly need the kernel.
Most code in user-space is executed without the kernel being involved. The kernel becomes involved (and the CPU switches from user-space to kernel) when:
a) The user-space code explicitly asks the kernel to do something (calls a system call).
b) There's an IRQ (from a device) that interrupts user-space code.
c) The kernel is providing some functionality that user-space code is unaware of. The most common reason is virtual memory management; but debugging and profiling are other reasons.
d) Asynchronous notifications (e.g. something causing a switch to kernel so that kernel can redirect the program to a suitable signal handler).
e) The user-space code does something illegal (crashes).
Does the following actually require kernel mode.
That code (if(a < 0) a++;) probably won't require kernel's assistance; but it possibly might. For example, if the variable a is in memory that was previously sent to swap space, then any attempt to access a is a request for the kernel to fetch that data from swap space. In a similar way, if the executable file was memory mapped but not loaded yet (a common optimization to improve program startup time), then attempting to execute any instruction (regardless of what the instruction is) could ask the kernel to fetch the code from the executable file on disk.
Short answer:
It depends on what you are trying to do, following code depending on which enviroment and how its compiled it shouldn't need to use the kernel. The CPU executes machine code directly, only trapping to the kernel on instructions like syscall, or on faults like page-fault or an interrupt.
The ISA is designed so that a kernel can set up the page tables in a way that stops user-space from taking over the machine, even though the CPU is fetching bytes of its machine code directly. This is how user-space code can run just as efficiently when it's just operating on its own data, doing pure computation not hardware access.
Long answer:
Comparing something and increasing value of something shouldn't require use of a kernel, On x86 (64 bit) architecture following could be represented like this (in NASM syntax):
; a is in RAX, perhaps a return value from some earlier function
cmp rax, 0 ; if (a<0) implemented as
jnl no_increase ; a jump over the inc if a is Not Less-than 0
inc rax
no_increase:
Actual compilers do it branchlessly, with various tricks as you can see on the Godbolt compiler explorer.
Clearly there aren't any syscalls so this piece of code can be ran on any x86 device but it wouldn't be meaningful
What requires kernels are the system calls now sys calls aren't required to have a device that can output something in theory you can output something by finding a memory location that let's say corresponds to video memory and you can manipulate pixels to output something in the screen but for userland this isn't possible due virtual memory.
A userspace application needs a kernel to exist if a kernel did not exist then userspace wouldn't exist :) and please note not every kernel let's a userspace.
So only doing something like:
write(open(stdout, _O_RDWR), "windows sucks linux rocks", 24);
would obviously require a kernel.
Writing / reading to arbitary memory location for example: 0xB8000 to manipulate video memory doesn't need a kernel.
TL:DR; For example code you provided it needs a kernel to be in userspace but can be written in a system where userspace and kernel doesn't exist at all and work perfectly fine (eg: microcontrollers)
In simpler words: It doesn't require a kernel to be work since it doesn't use any system calls, but for meaningful operation in a modern operating system it would atleast require a exit syscall to exit with a code otherwise you will see Segmentation fault even though there isn't dynamic allocation done by you.
Whatever the code we write is but obvious in the realm of user mode.. Kernel mode is only going to be in picture when you write any code that performs any system call..
and since the if() is not calling any system function it's not going to be in kernel mode.

Put unchanging code in separate memory section, to reduce OTA update size

I am writing C code for the STM32L010RB microcontroller, which has 128 KB of flash memory where the program will reside. I want to implement over-the-air updates of this program code, and I've done this once before with an other Cortex M0+ based microcontroller. I divided the flash memory into four sections like this:
The actual program which executes would reside in the Main program section, and when time comes to update it, it would contact our servers and download the new program into the New firmware section. A CRC check is used to ensure the integrity of the downloaded FW. A flag in the Persistent memory section is set to signal new FW available, and the program then restarts into the bootloader. The bootloader checks the flag and sees that an update is available, erases the contents of the Main program and copies the content of New firmware into the Main program section. The flag is cleared, and it then branches into the now updated main program. This works perfectly, and is robust against power loss or errors at any point in the prosess.
I want to do a very similar implementation on the STM32. However, this time around my Main program uses a few dependencies that take up quite a lot of flash memory. For instance a printf implementation that is very useful, but takes up quite a lot of space (relative to what I have available). These libraries will naturally be used, probably unchanged, in all future versions of the program.
My question is whether it is possible to relocate certain parts of code that will be used, unchanged, by all future versions of the program, to a separate (fifth) section in the flash memory. This would save space in that (for instance) the printf library would only be allocated once in flash, instead of having to be included within the main program and all updates.
Is there a way to do this, if so, how? And is it a viable way to go in your opinion? Any pitfalls that must be avoided?

Calling system calls from the kernel code

I am trying to create a mechanism to read performance counters for processes. I want this mechanism to be executed from within the kernel (version 4.19.2) itself.
I am able to do it from the user space the sys_perf_event_open() system call as follows.
syscall (__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
I would like to invoke this call from the kernel space. I got some basic idea from here How do I use a Linux System call from a Linux Kernel Module
Here are the steps I took to achieve this:
To make sure that the virtual address of the kernel remains valid, I have used set_fs(), get_fs() and get_fd().
Since sys_perf_event_open() is defined in /include/linux/syscalls.h I have included that in the code.
Eventually, the code for calling the systems call looks something like this:
mm_segment_t fs;
fs = get_fs();
set_fs(get_ds());
long ret = sys_perf_event_open(&pe, pid, cpu, group_fd, flags);
set_fs(fs);
Even after these measures, I get an error claiming "implicit declaration of function ‘sys_perf_event_open’ ". Why is this popping up when the header file defining it is included already? Does it have to something with the way one should call system calls from within the kernel code?
In general (not specific to Linux) the work done for systems calls can be split into 3 categories:
switching from user context to kernel context (and back again on the return path). This includes things like changing the processor's privilege level, messing with gs, fiddling with stacks, and doing security mitigations (e.g. for Meltdown). These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
using a "function number" parameter to find the right function to call, and calling it. This typically includes some sanity checks (does the function exist?) and a table lookup, plus code to mangle input and output parameters that's needed because the calling conventions used for system calls (in user space) is not the same as the calling convention that normal C functions use. These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
the final normal C function that ends up being called. This is the function that you might have (see note) been able to call directly without using any of the expensive, useless and/or dangerous system call junk.
Note: If you aren't able to call the final normal C function directly without using (any part of) the system call junk (e.g. if the final normal C function isn't exposed to other kernel code); then you must determine why. For example, maybe it's not exposed because it alters user-space state, and calling it from kernel will corrupt user-space state, so it's not exposed/exported to other kernel code so that nobody accidentally breaks everything. For another example, maybe there's no reason why it's not exposed to other kernel code and you can just modify its source code so that it is exposed/exported.
Calling system calls from inside the kernel using the sys_* interface is discouraged for the reasons that others have already mentioned. In the particular case of x86_64 (which I guess it is your architecture) and starting from kernel versions v4.17 it is now a hard requirement not to use such interface (but for a few exceptions). It was possible to invoke system calls directly prior to this version but now the error you are seeing pops up (that's why there are plenty of tutorials on the web using sys_*). The proposed alternative in the Linux documentation is to define a wrapper between the syscall and the actual syscall's code that can be called within the kernel as any other function:
int perf_event_open_wrapper(...) {
// actual perf_event_open() code
}
SYSCALL_DEFINE5(perf_event_open, ...) {
return perf_event_open_wrapper(...);
}
source: https://www.kernel.org/doc/html/v4.19/process/adding-syscalls.html#do-not-call-system-calls-in-the-kernel
Which kernel version are we talking about?
Anyhow, you could either get the address of the sys_call_table by looking at the System map file, or if it is exported, you can look up the symbol (Have a look at kallsyms.h), once you have the address to the syscall table, you may treat it as a void pointer array (void **), and find your desired functions indexed. i.e sys_call_table[__NR_open] would be open's address, so you could store it in a void pointer and then call it.
Edit: What are you trying to do, and why can't you do it without calling syscalls? You must understand that syscalls are the kernel's API to the userland, and should not be really used from inside the kernel, thus such practice should be avoided.
calling system calls from kernel code
(I am mostly answering to that title; to summarize: it is forbidden to even think of that)
I don't understand your actual problem (I feel you need to explain it more in your question which is unclear and lacks a lot of useful motivation and context). But a general advice -following the Unix philosophy- is to minimize the size and vulnerability area of your kernel or kernel module code, and to deport, as much as convenient, such code in user-land, in particular with the help of systemd, as soon as your kernel code requires some system calls. Your question is by itself a violation of most Unix and Linux cultural norms.
Have you considered to use efficient kernel to user-land communication, in particular netlink(7) with socket(7). Perhaps you also
want some driver specific kernel thread.
My intuition would be that (in some user-land daemon started from systemd early at boot time) AF_NETLINK with socket(2) is exactly fit for your (unexplained) needs. And eventd(2) might also be relevant.
But just thinking of using system calls from inside the kernel triggers a huge flashing red light in my brain and I tend to believe it is a symptom of a major misunderstanding of operating system kernels in general. Please take time to read Operating Systems: Three Easy Pieces to understand OS philosophy.

How does a system call translate to CPU instructions?

Let's say there is a simple program like:
#include<stdio.h>
void main()
{
int x;
printf("Cool");
fd = open("/tmp/cool.txt", O_READONLY)
}
The open is a system call here. I suppose when the shell runs it, it makes some hundred other system calls to implement it? How about a declaration like int x - at some point should it have some additional system calls in the backdrop to get the memory from the computer?
I am not sure what is the boundary between a system call and a normal stuff ... everything, in the end, needs the operating system's help right?!
Or is it like the C generates an executable (code) which can be run on the processor and need no OS assistance is needed until a system call is reached - at which point it has to do something to load the OS instructions etc ...
A bit vague :) Please clarify.
I'm not answering the questions in order, so I'm prefixing my answers with the questions. I've taken the liberty of editing them a bit. You didn't specify the processor architecture, but I'm assuming you want to know about x86, so the processor-level details will pertain to x86. Other architectures can behave differently (memory management, how system calls are made, etc.). I'm also using Linux for examples.
Does the c compiler generate executable code that can be run straight on the processor without need for OS assistance until a system call is reached, at which point it has to do something to load the OS instructions?
Yes, that is correct. The compiler generates native machine code that can be run straight on the processor. The executable files that you get from the compiler, however, contain both the code and other needed data, for example, instructions on where to load the code in the memory. On Linux the ELF format is typically used for executables.
If the process is completely loaded into memory and has sufficient stack space, it will not need further OS assistance before it wants to make a system call. When you make a system call, it is just an instruction in the machine code that calls the OS. The program itself does not need to "load the OS instructions" in any way. The processor handles transferring execution to the OS code.
With Linux on the x86 architecture, one way for the machine code to make a system call is to use the software interrupt vector 128 to transfer execution to the operating system. In x86 assembly (Intel syntax), that is expressed as int 0x80. Linux will then perform tasks based on the values that the calling program placed into processor registers before making the system call: the system call number is found in the eax processor register and the system call parameters are found in other processor registers. After the OS is done, it will return a result in the eax register, and has possibly modified buffers pointed to by the system call parameters etc. Note however, that this is not the only way to make a system call.
However, if the process is not entirely in memory, and execution moves to a part of the code that is not in memory at the moment, the processor causes a page fault, which moves execution to the operating system, which then loads the required part of the process into memory and transfers execution back to the process, which can then continue execution normally, without even noticing that anything happened.
I'm not entirely sure on the next point, so take it with a grain of salt. The Wikipedia article on stack overflow (the computer error, not this site :) seems to indicate that stacks are usually of fixed size, so int x; should not cause the OS to run, unless that part of the stack is not in the memory (see previous paragraph). If you had a system with dynamic stack size (if it is even possible, but as far as I can see, it is), int x; could also cause a page fault when the stack space is used up, prompting the operating system to allocate more stack space for the process.
Page faults cause the execution to move to the operating system, but are not system calls in the usual sense of the word. System calls are explicit calls to the OS when you want it to perform some work for you. Page faults and other such events are implicit. Hardware interrupts continuously transfer the execution from your process to the OS so that it can react to them. After that it transfers the execution back to your process, or some other process.
On a multitasking OS, you can run many programs at once even if you have only one processor/core. This is accomplished by running only one program at a time, but switching between programs quickly. The hardware timer interrupt makes sure that control is transferred back to the OS in a timely fashion, so that one process can't hog the CPU all for itself. When control is passed to the OS and it has done what it needs to, it may always start a different process from the one that was interrupted. The OS handles all this totally transparently, so you don't have to think about it, and your process won't notice it. From the viewpoint of your process, it is executing continuously.
In short: Your program executes system calls only when you explicitly ask it to. The operating system may also swap parts of your process in and out of the memory when it wants to, and generally does things related and unrelated to your process in the background, but you don't normally need to think about that at all. (You can reduce the amount of page faults, though, by keeping your program as small as possible, and things like that)
In this case open() is an explicit system call, but I suppose when the shell runs it, it makes some hundred other system calls to implement it.
No, the shell has got nothing to do with an open() call in your c program. Your program makes that one system call, and shell doesn't come into the picture at all.
The shell will only affect your program when it starts it. When you start your program with the shell, the shell does a fork system call to fork off a second process, which then does an execve system call to replace itself with your program. After that, your program is in control. Before the control gets to your main() function though, it executes some initialization code, that was put there by the compiler. If you want to see what system calls a process makes, on Linux you can use strace to view them. Just say strace ls, for example, to see what system calls ls makes during its execution. If you compile a c program with just a main() function that returns immediately, you can see with strace what system calls the initialization code makes.
How does the process get its memory from the computer etc.? It has to involve some system calls again right? I am not sure what is the boundary between a system call and normal stuff. Everything in the end needs the OS help, right?
Yep, system calls. When your program is loaded into memory with the execve system call, it takes care of getting enough memory for your process. When you need more memory and call malloc(), it will make a brk system call to grow the data segment of your process if it has run out of internally cached memory to give you.
Not everything needs explicit help from the OS. If you have enough memory, have all your input in memory, and you write your output data to memory, you won't need the OS at all. That is, as long as you only do calculations on data you already have in memory, don't need more memory, and don't need to communicate with the outside world, you don't need the OS. On the other hand, a program that does not communicate with the outside world at all is a pretty useless one, because it can't get any input, and cannot give any output. Even if you calculate the millionth decimal of pi, it doesn't matter at all if you don't output it to the user.
This answer got quite big, so in case I missed something or didn't explain something clearly enough, please leave me a comment and I'll try to elaborate. If anyone spots any mistakes, be sure to point them out also.

Flow of Startup code in an embedded system , concept of boot loader?

I am working with an embedded board , but i don't know the flow of the start up code(C/assembly) of the same.
Can we discuss the general modules/steps acted upon by the start up action in the case of an embedded system.
Just a high level overview(algorithmic) is enough.All examples are welcome.
/Kanu__
CPU gets a power on reset, and jumps to a defined point: the reset vector, beginning of flash, ROM, etc.
The startup code (crt - C runtime) is run. This is an important piece of code generated by your compiler/libc, which performs:
Configure and turn on any external memory (if absolutely required, otherwise left for later user code).
Establish a stack pointer
Clear the .bss segment (usually). .bss is the name for the uninitialized (or zeroed) global memory region. Global variables, arrays, etc which don't have an initializing value (beyond 0) are located here. The general practice on a microcontroller is to loop over this region and set all bytes to 0 at startup.
Copy from the end of .text the non-const .data. As most microcontrollers run from flash, they cannot store variable data there. For statements such as int thisGlobal = 5;, the value of thisGlobal must be copied from a persistent area (usually after the program in flash, as generated by your linker) to RAM. This applies to static values, and static values in functions. Values which are left undefined are not copied but instead cleared as part of step 2.
Perform other static initializers.
Call main()
From here, your code is run. Generally, the CPU is left in an interrupts-off state (platform dependent).
Pretty open-ended question, but here are a few things I have picked up.
For super simple processors, there is no true startup code. The cpu gets power and then starts running the first instruction in its memory: no muss no fuss.
A little further up we have mcu's like avr's and pic's. These have very little start up code. The only thing that really needs to be done is to set up the interrupt jump table with appropriate addresses. After that it is up to the application code (the only program) to do its thing. The good news is that you as the developer doesn't generally have to worry about these things: that's what libc is for.
After that we have things like simple arm based chips; more complicated than the avr's and pic's, but still pretty simple. These also have to setup the interrupt table, as well as make sure the clock is set correctly, and start any needed on chip components (basic interrupts etc.). Have a look at this pdf from Atmel, it details the start up procedure for an ARM 7 chip.
Farther up the food chain we have full-on PCs (x86, amd64, etc.). The startup code for these is really the BIOS, which are horrendously complicated.
The big question is whether or not your embedded system will be running an operating system. In general, you'll either want to run your operating system, start up some form of inversion of control (an example I remember from a school project was a telnet that would listen for requests using RL-ARM or an open source tcp/ip stack and then had callbacks that it would execute when connections were made/data was received), or enter your own control loop (maybe displaying a menu then looping until a key has been pressed).
Functions of Startup Code for C/C++
Disables all interrupts
Copies any initialized data from ROM to RAM
Uninitialized data area is set to zero.
Allocates space for and initializes the stack
Initializes the processor’s stack pointer
Creates and initializes the heap
Executes the constructors and initializers for all global variables (C++ only)
Enables interrupts
Calls main
Where is "BOOT LOADER" placed then? It should be placed before the start-up code right?
As per my understanding, from the reset vector the control goes to the boot loader. There the code waits for a small period of time during which it expects for data to be flashed/downloaded to the controller/processor. If it does not detect a data then the control gets transferred to the next step as specified by theatrus. But my doubt is whether the BOOT LOADER code can be re-written. Eg: Can a UART bootloader be changed to a ETHERNET/CAN bootloader or is it that data sent using any protocol are converted to UART using a gateway and then flashed.

Resources