Strange load instructions produced by mipsel-gcc when compiling glibc - c

I'm trying to get a small piece of hello-world MIPS program running in Gem 5 simulator. The program was compiled with gcc 4.9.2 and glibc 2.19 (built by crosstool-ng) and runs well in qemu, but it crashed with a page fault (trying to access address 0) in gem5.
Code is rather simple:
#include <stdio.h>
int main()
{
printf("hello, world\n");
return 0;
}
file ./test result:
./test: ELF 32-bit LSB executable, MIPS, MIPS-I version 1, statically
linked, for GNU/Linux 3.15.4, not stripped
After some debugging with gdb, I figured out that the page fault is triggered by _dl_setup_stack_chk_guard function in glibc. It accepts a void pointer called _dl_random passed by __libc_start_main function, which happens to be NULL. However, as far as I know, these functions never dereference the pointer, but instructions were generated to load values from the memory _dl_random pointer points to. Some code pieces might help understanding:
in function __libc_start_main (macro THREAD_SET_STACK_GUARD is not set):
/* Initialize the thread library at least a bit since the libgcc
functions are using thread functions if these are available and
we need to setup errno. */
__pthread_initialize_minimal ();
/* Set up the stack checker's canary. */
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
# else
__stack_chk_guard = stack_chk_guard;
# endif
in function _dl_setup_stack_chk_guard (always inlined):
static inline uintptr_t __attribute__ ((always_inline))
_dl_setup_stack_chk_guard (void *dl_random)
{
union
{
uintptr_t num;
unsigned char bytes[sizeof (uintptr_t)];
} ret = { 0 };
if (dl_random == NULL)
{
ret.bytes[sizeof (ret) - 1] = 255;
ret.bytes[sizeof (ret) - 2] = '\n';
}
else
{
memcpy (ret.bytes, dl_random, sizeof (ret));
#if BYTE_ORDER == LITTLE_ENDIAN
ret.num &= ~(uintptr_t) 0xff;
#elif BYTE_ORDER == BIG_ENDIAN
ret.num &= ~((uintptr_t) 0xff << (8 * (sizeof (ret) - 1)));
#else
# error "BYTE_ORDER unknown"
#endif
}
return ret.num;
}
disassembly code:
0x00400ea4 <+228>: jal 0x4014b4 <__pthread_initialize_minimal>
0x00400ea8 <+232>: nop
0x00400eac <+236>: lui v0,0x4a
0x00400eb0 <+240>: lw v0,6232(v0)
0x00400eb4 <+244>: li a0,-256
0x00400eb8 <+248>: lwl v1,3(v0)
0x00400ebc <+252>: lwr v1,0(v0)
0x00400ec0 <+256>: addiu v0,v0,4
0x00400ec4 <+260>: and v1,v1,a0
0x00400ec8 <+264>: lui a0,0x4a
0x00400ecc <+268>: sw v1,6228(a0)
0x4a1858 (0x4a0000 + 6232) is the address of _dl_random
0x4a1854 (0x4a0000 + 6228) is the address of __stack_chk_guard
Page fault occurs at 0x00400eb8. I don't quite get it how instruction 0x00400eb8 and 0x00400ebc are generated. Could someone shed some light on it please? Thanks.

Here is how I find the root of this problem and my suggestion for solution.
I think it helpful to dive into the Glibc source code to see what really happens. Starting from _dl_random or __libc_start_main are both OK.
As the value of _dl_random is unexpectedly NULL, we need to find how this variable initialize and where it is assigned. With the help of code analysing tools, we can find _dl_random in Glibc is only assigned with meaningful value in function _dl_aux_init, and this function is called by __libc_start_min.
_dl_aux_init iterates on its parameter -- auxvec -- and acts corresponding to auxvec[i].at_type. AT_RANDOM is the case for the assignment of _dl_random. So the problem is that there isn't an AT_RANDOM element to make _dl_random assigned.
As the program runs well in user mode qemu, the root of this problem resides in system environment provider, say, gem5, which has the responsibility to construct auxvec. Having that keyword, we can find that the auxv is constructed in gem5/src/arch/<arch-name>/process.cc.
The current auxv for MIPS is constructed as below:
// Set the system page size
auxv.push_back(auxv_t(M5_AT_PAGESZ, MipsISA::PageBytes));
// Set the frequency at which time() increments
auxv.push_back(auxv_t(M5_AT_CLKTCK, 100));
// For statically linked executables, this is the virtual
// address of the program header tables if they appear in the
// executable image.
auxv.push_back(auxv_t(M5_AT_PHDR, elfObject->programHeaderTable()));
DPRINTF(Loader, "auxv at PHDR %08p\n", elfObject->programHeaderTable());
// This is the size of a program header entry from the elf file.
auxv.push_back(auxv_t(M5_AT_PHENT, elfObject->programHeaderSize()));
// This is the number of program headers from the original elf file.
auxv.push_back(auxv_t(M5_AT_PHNUM, elfObject->programHeaderCount()));
//The entry point to the program
auxv.push_back(auxv_t(M5_AT_ENTRY, objFile->entryPoint()));
//Different user and group IDs
auxv.push_back(auxv_t(M5_AT_UID, uid()));
auxv.push_back(auxv_t(M5_AT_EUID, euid()));
auxv.push_back(auxv_t(M5_AT_GID, gid()));
auxv.push_back(auxv_t(M5_AT_EGID, egid()));
Now we know what to do. We just need to provide an accessible address value to _dl_random tagged by MT_AT_RANDOM. Gem5's ARM arch implements this already (code). Maybe we can take it as an example.

Related

Custom Instruction crashing with SIGNAL 4 (Illegal Instruction): RISC-V (32) GNU-Toolchain with QEMU

I have been wanting to develop and understand the process of creating custom extensions for a large-scale task I have, involving RISC-V compilation using the QEMU emulator. I have been loosely following the below guide: https://chowdera.com/2021/04/20210430120004272q.html - In trying to understand where and how the instruction is parsed, implemented and executed in this way through QEMU. I have edited and created an entry for my R-type instruction, same as up there, however when I try to issue it as an .insn directive in ASM, the following exception is thrown:
'Program stopped with Signal 4 (Illegal Instruction)'
The code that produced this exception:
#include <stdio.h>
static int custom_cube(int addr)
{
int cube;
asm volatile(".insn r 0x7b, 6, 6, %0, %1, x0" : "=r"(cube) : "r"(addr));
return cube;
}
int main() {
int a = 3;
int ret = 0;
ret = custom_cube((int)&a);
if(ret == a * a * a) {
printf("Success");
}
else {
printf("Failed");
}
return 0;
}
I have attempted to look into the encoding of the instruction (exactly as the tutorial describes and same opcode and bitfields), to see if there was any conflict, if the helper functions were incorrectly implemented etc, I also rebuilt and configured QEMU after implementation and ran again, reaching the same error, I changed the encoding once again to something different and changed the .insn opcode call accordingly, and attained the same behaviour, even after verifying and type-checking. I am still rather new to the build process of implementing custom RISC-V instructions specifically through QEMU, and would appreciate if there was any input or anything that either I or perhaps this tutorial has neglected to include in this process, a similar guide using the RiscFree IDE had a similar process, (implementing trans and helper functions - then rebuilding QEMU and running), so the fact the Linux kernel is throwing this exception I am still unsure of. Is there something blindingly obvious I might have missed? Should I have rebuilt the entire toolchain and not just QEMU? From what this tutorial had gathered to me, I presumed nothing further was required.
Disassembly under the custom header: With the corresponding source
static int custom_cube(int addr)
{
1018c: fd010113 addi sp,sp,-48
10190: 02812623 sw s0,44(sp)
10194: 03010413 addi s0,sp,48
10198: fca42e23 sw a0,-36(s0)
int cube;
asm volatile (
1019c: fdc42783 lw a5,-36(s0)
101a0: 0c07e7fb .4byte 0xc07e7fb
101a4: fef42623 sw a5,-20(s0)
".insn r 0x7b, 6, 6, %0, %1, x0"
:"=r"(cube)
:"r"(addr)
);
return cube;
101a8: fec42783 lw a5,-20(s0)
}
101ac: 00078513 mv a0,a5
101b0: 02c12403 lw s0,44(sp)
101b4: 03010113 addi sp,sp,48
101b8: 00008067 ret
(Since it doesn't appear to disassemble under an existing RISC-V instruction, I ((presume)) that I did put it under a free spot? I changed the variation a fair few times and still experienced this, even after validating all other possible overlaps - none were found, so I feel as if it is to do with the implementation or it's call).
UPDATE TO ISSUE: I have since used a different encoding for the instruction
OPCODE = "0110011", FUNCT3 = "111" and FUNCT7 = "0100000".
Upon recompiling QEMU and running this using the .insn directive, QEMU appears to just hang upon decoding the instruction - I am unsure what further implementation is required after using TCG directives.

SPARC assembly jmp \boot

I'll explain the problem briefly. I have a Leon3 board (gr-ut-g99). Using GRMON2 I can load executables at the desired address in the board.
I have two programs. Let's call them A and B. I tried to load both in memory and individually they work.
What I would like to do now is to make the A program call the B program.
Both programs are written in C using a variant of the gcc compiler (the Gaisler Sparc GCC).
To do the jump I wrote a tiny inline assembler function in program A that jumps to a memory address where I loaded the program B.
below a snippet of the program A
unsigned int return_address;
unsigned int * const RAM_pointer = (unsigned int *) RAM_ADDRESS;
printf("RAM pointer set to: 0x%08x \n",(unsigned int)RAM_pointer);
printf("jumping...\n");
__asm__(" nop;" //clean the pipeline
"jmp %1;" // jmp to programB
:"=r" (return_address)
:"r" (RAM_pointer)
);
RAM_ADDRESS is a #define
#define RAM_ADDRESS 0x60000000
The program B is a simple hello world. The program B is loaded at the 0x60000000 address. If I try to run it, it works!
int main()
{
printf ("HELLO! I'M BOOTED! \n");
fflush(stdout);
return 0;
}
What I expect when I run the ProgramA, is to see the "jumping..." message on the console and then see the "HELLO! I'M BOOTED!" from the programB
What happens instead an IU exception.
Below I posted the messages show by grmon2 monitor. I also reported the "inst" report which should show the last operations performed before the exception.
grmon2> run
IU exception (tt = 0x07, mem address not aligned)
0x60004824: 9fc04000 call %g1
grmon2> inst
TIME ADDRESS INSTRUCTION RESULT SYMBOL
407085 600047FC mov %i3, %o2 [600063B8] -
407086 60004800 cmp %i4 [00000013] -
407089 60004804 be 0x60004970 [00000000] -
407090 60004808 mov %i0, %o0 [6000646C] -
407091 6000480C mov %i4, %o3 [00000013] -
407092 60004810 cmp %i4, %l0 [80000413] -
407108 60004814 bleu 0x60004820 [00000000] -
407144 60004818 ld [%i1 + 0x20], %o1 [FFFFFFFF] -
407179 60004820 ld [%i1 + 0x28], %g1 [FFFFFFFF] -
407186 60004824 call %g1 [ TRAP ] -
I also tried to substitute the "jmp" with a "jmpl" or a "call" but it does not worked.
I'm quite confused.
I do not know how to cope well with the problem and therefore I do not know what other information it is necessary to provide.
I can say that, the programB is loaded at 0x60000000 and the entry_point is, of course, 0x60000000. Running directly program B from that entry point it works good!
Thanks in advance for your help!
Looks to me like you did execute the jump, and it got to program B, as evidenced by the addresses of the instructions in the trace buffer. But where you crashed was in stdio trying to print stuff. Stdio makes extensive use of function pointers, and the sequence clearly shows a call instruction with the target address in a register, which indicates use of a function pointer.
I suggest putting fflush(stdout) in program A just before the jump, and this will allow you to see the messages before doing the jump. Then, in program B, instead of using printf, just put some known value in memory that you can examine later via the monitor to verify that it got there.
My guess is that the stdio library has some data or parameter that needs to be set up at the start of the program, and that's not being done or not done properly. Not sure about the platform you are running on, but do you have some sort of debugging or single stepping ability, like in a debugger? If so, just single step through the jump and follow where the program goes.

How to corrupt the stack in a C program

I have to change the designated section of function_b so that it changes the stack in such a way that the program prints:
Executing function_a
Executing function_b
Finished!
At this point it also prints Executed function_b in between Executing function_b and Finished!.
I have the following code and I have to fill something in, in the part where it says // ... insert code here
#include <stdio.h>
void function_b(void){
char buffer[4];
// ... insert code here
fprintf(stdout, "Executing function_b\n");
}
void function_a(void) {
int beacon = 0x0b1c2d3;
fprintf(stdout, "Executing function_a\n");
function_b();
fprintf(stdout, "Executed function_b\n");
}
int main(void) {
function_a();
fprintf(stdout, "Finished!\n");
return 0;
}
I am using Ubuntu Linux with the gcc compiler. I compile the program with the following options: -g -fno-stack-protector -fno-omit-frame-pointer. I am using an intel processor.
Here is a solution, not exactly stable across environments, but works for me on x86_64 processor on Windows/MinGW64.
It may not work for you out of the box, but still, you might want to use a similar approach.
void function_b(void) {
char buffer[4];
buffer[0] = 0xa1; // part 1
buffer[1] = 0xb2;
buffer[2] = 0xc3;
buffer[3] = 0x04;
register int * rsp asm ("rsp"); // part 2
register size_t r10 asm ("r10");
r10 = 0;
while (*rsp != 0x04c3b2a1) {rsp++; r10++;} // part 3
while (*rsp != 0x00b1c2d3) rsp++; // part 4
rsp -= r10; // part 5
rsp = (int *) ((size_t) rsp & ~0xF); // part 6
fprintf(stdout, "Executing function_b\n");
}
The trick is that each of function_a and function_b have only one local variable, and we can find the address of that variable just by searching around in the memory.
First, we put a signature in the buffer, let it be the 4-byte integer 0x04c3b2a1 (remember that x86_64 is little-endian).
After that, we declare two variables to represent the registers: rsp is the stack pointer, and r10 is just some unused register.
This allows to not use asm statements later in the code, while still being able to use the registers directly.
It is important that the variables don't actually take stack memory, they are references to processor registers themselves.
After that, we move the stack pointer in 4-byte increments (since the size of int is 4 bytes) until we get to the buffer. We have to remember the offset from the stack pointer to the first variable here, and we use r10 to store it.
Next, we want to know how far in the stack are the instances of function_b and function_a. A good approximation is how far are buffer and beacon, so we now search for beacon.
After that, we have to push back from beacon, the first variable of function_a, to the start of instance of the whole function_a on the stack.
That we do by subtracting the value stored in r10.
Finally, here comes a werider bit.
At least on my configuration, the stack happens to be 16-byte aligned, and while the buffer array is aligned to the left of a 16-byte block, the beacon variable is aligned to the right of such block.
Or is it something with a similar effect and different explanation?..
Anyway, so we just clear the last four bits of the stack pointer to make it 16-byte aligned again.
The 32-bit GCC doesn't align anything for me, so you might want to skip or alter this line.
When working on a solution, I found the following macro useful:
#ifdef DEBUG
#define show_sp() \
do { \
register void * rsp asm ("rsp"); \
fprintf(stdout, "stack pointer is %016X\n", rsp); \
} while (0);
#else
#define show_sp() do{}while(0);
#endif
After this, when you insert a show_sp(); in your code and compile with -DDEBUG, it prints what is the value of stack pointer at the respective moment.
When compiling without -DDEBUG, the macro just compiles to an empty statement.
Of course, other variables and registers can be printed in a similar way.
ok, let assume that epilogue (i.e code at } line) of function_a and for function_b is the same
despite functions A and B not symmetric, we can assume this because it have the same signature (no parameters, no return value), same calling conventions and same size of local variables (4 byte - int beacon = 0x0b1c2d3 vs char buffer[4];) and with optimization - both must be dropped because unused. but we must not use additional local variables in function_b for not break this assumption. most problematic point here - what is function_A or function_B will be use nonvolatile registers (and as result save it in prologue and restore in epilogue) - but however look like here no place for this.
so my next code based on this assumption - epilogueA == epilogueB (really solution of #Gassa also based on it.
also need very clearly state that function_a and function_b must not be inline. this is very important - without this any solution impossible. so I let yourself add noinline attribute to function_a and function_b. note - not code change but attribute add, which author of this task implicitly implies but not clearly stated. don't know how in GCC mark function as noinline but in CL __declspec(noinline) for this used.
next code I write for CL compiler where exist next intrinsic function
void * _AddressOfReturnAddress();
but I think that GCC also must have the analog of this function. also I use
void* _ReturnAddress();
but however really _ReturnAddress() == *(void**)_AddressOfReturnAddress() and we can use _AddressOfReturnAddress() only. simply using _ReturnAddress() make source (but not binary - it equal) code smaller and more readable.
and next code is work for both x86 and x64. and this code work (tested) with any optimization.
despite I use 2 global variables - code is thread safe - really we can call main from multiple threads in concurrent, call it multiple time - but all will be worked correct (only of course how I say at begin if epilogueA == epilogueB)
hope comments in code enough self explained
__declspec(noinline) void function_b(void){
char buffer[4];
buffer[0] = 0;
static void *IPa, *IPb;
// save the IPa address
_InterlockedCompareExchangePointer(&IPa, _ReturnAddress(), 0);
if (_ReturnAddress() == IPa)
{
// we called from function_a
function_b();
// <-- IPb
if (_ReturnAddress() == IPa)
{
// we called from function_a, change return address for return to IPb instead IPa
*(void**)_AddressOfReturnAddress() = IPb;
return;
}
// we at stack of function_a here.
// we must be really at point IPa
// and execute fprintf(stdout, "Executed function_b\n"); + '}' (epilogueA)
// but we will execute fprintf(stdout, "Executing function_b\n"); + '}' (epilogueB)
// assume that epilogueA == epilogueB
}
else
{
// we called from function_b
IPb = _ReturnAddress();
return;
}
fprintf(stdout, "Executing function_b\n");
// epilogueB
}
__declspec(noinline) void function_a(void) {
int beacon = 0x0b1c2d3;
fprintf(stdout, "Executing function_a\n");
function_b();
// <-- IPa
fprintf(stdout, "Executed function_b\n");
// epilogueA
}
int main(void) {
function_a();
fprintf(stdout, "Finished!\n");
return 0;
}

snprintf() prints garbage floats with newlib nano

I am running a bare metal embedded system with an ARM Cortex-M3 (STM32F205). When I try to use snprintf() with float numbers, e.g.:
float f;
f = 1.23;
snprintf(s, 20, "%5.2f", f);
I get garbage into s. The format seems to be honored, i.e. the garbage is a well-formed string with digits, decimal point, and two trailing digits. However, if I repeat the snprintf, the string may change between two calls.
Floating point mathematics seems to work otherwise, and snprintf works with integers, e.g.:
snprintf(s, 20, "%10d", 1234567);
I use the newlib-nano implementation with the -u _printf_float linker switch. The compiler is arm-none-eabi-gcc.
I do have a strong suspicion of memory allocation problems, as integers are printed without any hiccups, but floats act as if they got corrupted in the process. The printf family functions call malloc with floats, not with integers.
The only piece of code not belonging to newlib I am using in this context is my _sbrk(), which is required by malloc.
caddr_t _sbrk(int incr)
{
extern char _Heap_Begin; // Defined by the linker.
extern char _Heap_Limit; // Defined by the linker.
static char* current_heap_end;
char* current_block_address;
// first allocation
if (current_heap_end == 0)
current_heap_end = &_Heap_Begin;
current_block_address = current_heap_end;
// increment and align to 4-octet border
incr = (incr + 3) & (~3);
current_heap_end += incr;
// Overflow?
if (current_heap_end > &_Heap_Limit)
{
errno = ENOMEM;
current_heap_end = current_block_address;
return (caddr_t) - 1;
}
return (caddr_t)current_block_address;
}
As far as I have been able to track, this should work. It seems that no-one ever calls it with negative increments, but I guess that is due to the design of the newlib malloc. The only slightly odd thing is that the first call to _sbrk has a zero increment. (But this may be just malloc's curiosity about the starting address of the heap.)
The stack should not collide with the heap, as there is around 60 KiB RAM for the two. The linker script may be insane, but at least the heap and stack addresses seem to be correct.
As it may happen that someone else gets bitten by the same bug, I post an answer to my own question. However, it was #Notlikethat 's comment which suggested the correct answer.
This is a lesson of Thou shall not steal. I borrowed the gcc linker script which came with the STMCubeMX code generator. Unfortunately, the script along with the startup file is broken.
The relevant part of the original linker script:
_estack = 0x2000ffff;
and its counterparts in the startup script:
Reset_Handler:
ldr sp, =_estack /* set stack pointer */
...
g_pfnVectors:
.word _estack
.word Reset_Handler
...
The first interrupt vector position (at 0) should always point to the startup stack top. When the reset interrupt is reached, it also loads the stack pointer. (As far as I can say, the latter one is unnecessary as the HW anyway reloads the SP from the 0th vector before calling the reset handler.)
The Cortex-M stack pointer should always point to the last item in the stack. At startup there are no items in the stack and thus the pointer should point to the first address above the actual memory, 0x020010000 in this case. With the original linker script the stack pointer is set to 0x0200ffff, which actually results in sp = 0x0200fffc (the hardware forces word-aligned stack). After this the stack is misaligned by 4.
I changed the linker script by removing the constant definition of _estack and replacing it by _stacktop as shown below. The memory definitions were there before. I changed the name just to see where the value is used.
MEMORY
{
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 64K
}
_stacktop = ORIGIN(RAM) + LENGTH(RAM);
After this the value of _stacktop is 0x20010000, and my numbers float beautifully... The same problem could arise with any external (library) function using double length parameters, as the ARM Cortex ABI states that the stack must be aligned to 8 octets when calling external functions.
snprintf accepts size as second argument. You might want to go through this example http://www.cplusplus.com/reference/cstdio/snprintf/
/* snprintf example */
#include <stdio.h>
int main ()
{
char buffer [100];
int cx;
cx = snprintf ( buffer, 100, "The half of %d is %d", 60, 60/2 );
snprintf ( buffer+cx, 100-cx, ", and the half of that is %d.", 60/2/2 );
puts (buffer);
return 0;
}

Does Linux kernel have main function?

I am learning Device Driver and Kernel programming.According to Jonathan Corbet book we do not have main() function in device drivers.
#include <linux/init.h>
#include <linux/module.h>
static int my_init(void)
{
return 0;
}
static void my_exit(void)
{
return;
}
module_init(my_init);
module_exit(my_exit);
Here I have two questions :
Why we do not need main() function in Device Drivers?
Does Kernel have main() function?
start_kernel
On 4.2, start_kernel from init/main.c is a considerable initialization process and could be compared to a main function.
It is the first arch independent code to run, and sets up a large part of the kernel. So much like main, start_kernel is preceded by some lower level setup code (done in the crt* objects in userland main), after which the "main" generic C code runs.
How start_kernel gets called in x86_64
arch/x86/kernel/vmlinux.lds.S, a linker script, sets:
ENTRY(phys_startup_64)
and
phys_startup_64 = startup_64 - LOAD_OFFSET;
and:
#define LOAD_OFFSET __START_KERNEL_map
arch/x86/include/asm/page_64_types.h defines __START_KERNEL_map as:
#define __START_KERNEL_map _AC(0xffffffff80000000, UL)
which is the kernel entry address. TODO how is that address reached exactly? I have to understand the interface Linux exposes to bootloaders.
arch/x86/kernel/vmlinux.lds.S sets the very first bootloader section as:
.text : AT(ADDR(.text) - LOAD_OFFSET) {
_text = .;
/* bootstrapping code */
HEAD_TEXT
include/asm-generic/vmlinux.lds.h defines HEAD_TEXT:
#define HEAD_TEXT *(.head.text)
arch/x86/kernel/head_64.S defines startup_64. That is the very first x86 kernel code that runs. It does a lot of low level setup, including segmentation and paging.
That is then the first thing that runs because the file starts with:
.text
__HEAD
.code64
.globl startup_64
and include/linux/init.h defines __HEAD as:
#define __HEAD .section ".head.text","ax"
so the same as the very first thing of the linker script.
At the end it calls x86_64_start_kernel a bit awkwardly with and lretq:
movq initial_code(%rip),%rax
pushq $0 # fake return address to stop unwinder
pushq $__KERNEL_CS # set correct cs
pushq %rax # target address in negative space
lretq
and:
.balign 8
GLOBAL(initial_code)
.quad x86_64_start_kernel
arch/x86/kernel/head64.c defines x86_64_start_kernel which calls x86_64_start_reservations which calls start_kernel.
arm64 entry point
The very first arm64 that runs on an v5.7 uncompressed kernel is defined at https://github.com/cirosantilli/linux/blob/v5.7/arch/arm64/kernel/head.S#L72 so either the add x13, x18, #0x16 or b stext depending on CONFIG_EFI:
__HEAD
_head:
/*
* DO NOT MODIFY. Image header expected by Linux boot-loaders.
*/
#ifdef CONFIG_EFI
/*
* This add instruction has no meaningful effect except that
* its opcode forms the magic "MZ" signature required by UEFI.
*/
add x13, x18, #0x16
b stext
#else
b stext // branch to kernel start, magic
.long 0 // reserved
#endif
le64sym _kernel_offset_le // Image load offset from start of RAM, little-endian
le64sym _kernel_size_le // Effective size of kernel image, little-endian
le64sym _kernel_flags_le // Informative flags, little-endian
.quad 0 // reserved
.quad 0 // reserved
.quad 0 // reserved
.ascii ARM64_IMAGE_MAGIC // Magic number
#ifdef CONFIG_EFI
.long pe_header - _head // Offset to the PE header.
This is also the very first byte of an uncompressed kernel image.
Both of those cases jump to stext which starts the "real" action.
As mentioned in the comment, these two instructions are the first 64 bytes of a documented header described at: https://github.com/cirosantilli/linux/blob/v5.7/Documentation/arm64/booting.rst#4-call-the-kernel-image
arm64 first MMU enabled instruction: __primary_switched
I think it is __primary_switched in head.S:
/*
* The following fragment of code is executed with the MMU enabled.
*
* x0 = __PHYS_OFFSET
*/
__primary_switched:
At this point, the kernel appears to create page tables + maybe relocate itself such that the PC addresses match the symbols of the vmlinux ELF file. Therefore at this point you should be able to see meaningful function names in GDB without extra magic.
arm64 secondary CPU entry point
secondary_holding_pen defined at: https://github.com/cirosantilli/linux/blob/v5.7/arch/arm64/kernel/head.S#L691
Entry procedure further described at: https://github.com/cirosantilli/linux/blob/v5.7/arch/arm64/kernel/head.S#L691
Fundamentally, there is nothing special about a routine being named main(). As alluded to above, main() serves as the entry point for an executable load module. However, you can define different entry points for a load module. In fact, you can define more than one entry point, for example, refer to your favorite dll.
From the operating system's (OS) point of view, all it really needs is the address of the entry point of the code that will function as a device driver. The OS will pass control to that entry point when the device driver is required to perform I/O to the device.
A system programmer defines (each OS has its own method) the connection between a device, a load module that functions as the device's driver, and the name of the entry point in the load module.
Each OS has its own kernel (obviously) and some might/maybe start with main() but I would be surprised to find a kernel that used main() other than in a simple one, such as UNIX! By the time you are writing kernel code you have long moved past the requirement to name every module you write as main().
Hope this helps?
Found this code snippet from the kernel for Unix Version 6. As you can see main() is just another program, trying to get started!
main()
{
extern schar;
register i, *p;
/*
* zero and free all of core
*/
updlock = 0;
i = *ka6 + USIZE;
UISD->r[0] = 077406;
for(;;) {
if(fuibyte(0) < 0) break;
clearsig(i);
maxmem++;
mfree(coremap, 1, i);
i++;
}
if(cputype == 70)
for(i=0; i<62; i=+2) {
UBMAP->r[i] = i<<12;
UBMAP->r[i+1] = 0;
}
// etc. etc. etc.
Several ways to look at it:
Device drivers are not programs. They are modules that are loaded into another program (the kernel). As such, they do not have a main() function.
The fact that all programs must have a main() function is only true for userspace applications. It does not apply to the kernel, nor to device drivers.
With main() you propably mean what main() is to a program, namely its "entry point".
For a module that is init_module().
From Linux Device Driver's 2nd Edition:
Whereas an application performs a single task from beginning to end, a module registers itself in order to serve future requests, and its "main" function terminates immediately. In other words, the task of the function init_module (the module's entry point) is to prepare for later invocation of the module's functions; it's as though the module were saying, "Here I am, and this is what I can do." The second entry point of a module, cleanup_module, gets invoked just before the module is unloaded. It should tell the kernel, "I'm not there anymore; don't ask me to do anything else."
Yes, the Linux kernel has a main function, it is located in arch/x86/boot/main.c file. But Kernel execution starts from arch/x86/boot/header.S assembly file and the main() function is called from there by "calll main" instruction.
Here is that main function:
void main(void)
{
/* First, copy the boot header into the "zeropage" */
copy_boot_params();
/* Initialize the early-boot console */
console_init();
if (cmdline_find_option_bool("debug"))
puts("early console in setup code.\n");
/* End of heap check */
init_heap();
/* Make sure we have all the proper CPU support */
if (validate_cpu()) {
puts("Unable to boot - please use a kernel appropriate "
"for your CPU.\n");
die();
}
/* Tell the BIOS what CPU mode we intend to run in. */
set_bios_mode();
/* Detect memory layout */
detect_memory();
/* Set keyboard repeat rate (why?) and query the lock flags */
keyboard_init();
/* Query Intel SpeedStep (IST) information */
query_ist();
/* Query APM information */
#if defined(CONFIG_APM) || defined(CONFIG_APM_MODULE)
query_apm_bios();
#endif
/* Query EDD information */
#if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
query_edd();
#endif
/* Set the video mode */
set_video();
/* Do the last things and invoke protected mode */
go_to_protected_mode();
}
While the function name main() is just a common convention (there is no real reason to use it in kernel mode) the linux kernel does have a main() function for many architectures, and of course usermode linux has a main function.
Note the OS runtime loads the main() function to start an app, when an operating system boots there is no runtime, the kernel is simply loaded to a address by the boot loader which is loaded by the MBR which is loaded by the hardware. So while a kernel may contain a function called main it need not be the entry point.
See Also:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms633559%28v=vs.85%29.aspx
Linux kernel source:
x86: linux-3.10-rc6/arch/x86/boot/main.c
arm64: linux-3.10-rc6/arch/arm64/kernel/asm-offsets.c

Resources