Before running main() function in users' application, it will IMPORT __main and execute __main, so I wonder that what does this function do?
__main
copy rw variables from flash to ram?
initialize bss section?
initialzie stack/heap section?
anything else?
Does it initialize according to the scater file which defines the execute region?
Copied from https://developer.arm.com/documentation/100748/0618/Embedded-Software-Development/Application-startup
Application startup
In most embedded systems, an initialization sequence executes to set up the system before the main task is executed.
The following figure shows the default initialization sequence.
Figure 1. Default initialization sequence
Default initialization sequence
__main is responsible for setting up the memory and __rt_entry is responsible for setting up the run-time environment.
__main performs code and data copying, decompression, and zero initialization of the ZI data. It then branches to __rt_entry to set up the stack and heap, initialize the library functions and static data, and call any top level C++ constructors. __rt_entry then branches to main(), the entry to your application. When the main application has finished executing, __rt_entry shuts down the library, then hands control back to the debugger.
The function label main() has a special significance. The presence of a main() function forces the linker to link in the initialization code in __main and __rt_entry. Without a function labeled main(), the initialization sequence is not linked in, and as a result, some standard C library functionality is not supported.
Related
Linker basic function is to link the object code with other object code(it can be standard library code).
#include<stdio.h>
int main()
{
printf("hello");
}
I want to know will linker replace the printf() function with its definition (like an inline function in c++). Or it will paste the printf()
function outside the main() function and pass "hello" as argument to that function.
For printf("hello");, the compiler generates an instruction to call a subroutine. It leaves the address of the subroutine not completely filled in. The object module the compiler generates has some notes about what routine’s address should be filled in there.
The linker may work in different ways. For static linking, the linker will find the implementation of printf in a library and copy the object module for it from the library into the executable file it is building. Depending on certain characteristics of the link, the linker might then complete the call instruction with the final address of the printf routine or it might leave notes in the executable file about the relationship between the call instruction and the printf routine. Later, when the program is being loaded into memory, the program loader will complete the address in the instruction.
For dynamic linking, the linker will find the implementation of printf in a library (or in a file with sufficient information about the library). It will not copy the printf function’s object module into the executable file, but it will include notes about the relationship between the call instruction and the printf routine and its library in the executable file. Later, the program loader will copy the printf function’s object module into the memory of the process. (This might be done by mapping part of the process’ virtual address space to physical memory that already contains the object module from the library and that is shared by other processes on the system. This sharing reduces the load on the system and makes dynamic loading more favorable in this regard.) And the loader will complete the address in the call instruction.
Some dynamic loading is not done as soon as the program is loaded. When a process is started, the loader might load just the program entry point and some essential parts. Some call instructions might be left incomplete. They will have been filled in with the addresses of special subroutines of the program loader (or dynamic library loader). When one of these subroutines is called, it will then load the desired routine, change the address in the call instruction (or otherwise arrange for future calls to call the desired routine), and then jump to the desired routine. This is beneficial because routines that are not used by your program in a particular run do not have to be loaded into memory at all. For example, if your program has a lot of code and data to log errors and inform the user when certain errors occur, that code and data does not have to be loaded into memory of those errors do not occur in a particular session.
I'm reading a textbook which describes how loader works:
When the loader runs, it copies chunks of the executable object file into the code and data segments. Next, the loader jumps to the program’s entry point, which is always the address of the _start function. The _start function calls the system startup function, __libc_start_main
From the answer of this question What is __libc_start_main and _start? we have the below pseudo-code about the execution flow:
_start:
call __setup_for_c ; set up C environment
call __libc_start_main ; set up standard library
call _main ; call your main
call __libc_stop_main ; tear down standard library
call __teardown_for_c ; tear down C environment
jmp __exit ; return to OS
My questions are:
I used objdump to check the assembly code of the program and I found _start only call __libc_start_main as picture below shows:
What about the rest of functions like call __setup_for_c ,_main etc? especially my program's main function, I can't see how it get called. so is the pseudo-code about the execution flow correct?
What does __libc_start_main setup standard library mean? Why the standard library needs to be setup? Isn't that the standard library just need to be linked by the dynamic linker when the program is loaded?
Pseudo-code isn't code ;) _libc_start_main() can call the application's main() because the address of main() will have been fixed up by the linker. The order in which the code generated by the compiler does initialization might be interesting, but you shouldn't assume it will be the same from one compiler to another, or even one release to another. It's probably best not to rely on things being done in a particular way if you can avoid it.
As to what needs to be initialized -- standard C libraries like glibc are hugely complex, and a lot of stuff needs to be initialized. To take one example, the memory allocator's block table has to be set up, so that malloc() doesn't start with a random pattern of memory allocation.
The other function calls described in the linked answer give a synopsis of what needs to happen; the actual implementation details in the GNU C library are different, either using “constructors” (_dl_start_user), or explicitly in __libc_start_main. __libc_start_main also takes care of calling the user’s main, which is why you don’t see it called in your disassembly — but its address is passed along (see the lea just the callq). __libc_start_main also takes care of the program exit, and never returns; that’s the reason for the hlt just after the callq, which will crash the program if the function returns.
The library needs quite a lot of setup nowadays:
some of its own relocation
thread-local storage setup
pthread setup
destructor registration
vDSO setup (on Linux)
ctype initialisation
copying the program name, arguments and environment to various library variables
etc. See the x86-64-specific sysdeps/x86_64/start.S and the generic csu/libc-start.c, csu/init-first.c, and misc/init-misc.c among others.
what about the rest of functions like call __setup_for_c ,_main etc?
Those are just fancy made-up readable names used in the linked answer to transfer the meaning of that answer better.
how it get called
Your standard library implementation doesn't provide a function named __setup_for_c nor _main, so they don't exists so they don't get called. Every implementation may choose different names for the functions.
is the pseudo-code about the executation flow correct?
Yes - and the word "psuedo-code" you used infers that you are aware that it's not real code.
what does __libc_start_main setup standard library mean?
It means a symbol with the name __libc_start_main. __libc_start_main is a function that initializes all standard library things and runs main in glibc. It initializes libc, pthreads, atexit and finally runs main. glibc is open source, so just look at it.
why standard library needs to be setup?
Because it was written in the way that it depends on it. The simplest is, when you write:
int var = 42; // variable with static storage duration
int main() {
return var == 42;
}
(Assuming the optimizer doesn't kick in) then the value 42 has to be written into the memory held for var before main is executed. So something has to execute before main and actually write the 42 into the memory of var. This is the simplest case why something has to execute before main. Global variables are used in many places and all of them need to be setup, for example a variable named program_invocation_name in glibc holds the name of the program - so some code needs to actually query the environment or kernel about what is the name of the program and actually store the value (and potentially parse) a string into a global variable (and also remember about free() that string if dynamically allocated on exit). Some code "has to do it" - and that code is in standard library initialization.
There are many more cases - in C++ and other languages there are constructors, there is gcc GNU extension __attribute__((__constructor__)) and .init/.preinit sections - all of them executed before main. And destructors have to execute on exit, but not on _exit - thus atexit stuff is initialized before main and all destructors may be registered with it, depending on implementation.
Environment need to be initialized, potentially stack and some more stuff. And thread local variables need to be allocated only for current thread so that when you pthread_create another thread they don't get copied with non-thread-local variables.
isn't that standard library just need to be linked by the dynamic linker when the program is loaded?
It is - when the program is loaded, the standard library is just linked. The compiler, when generating the program, uses crt code to include some startup code into the program - for example a call to __libc_start_main.
From the past few days I have been trying to understand what happens behind the curtain when we execute a C program. However even after reading numerous posts I cannot find a detailed and accurate explanation for the same. Can someone please help me out ?
You would usually find special names like this for specific uses when compiling and linking programs.
Keeping in mind that this answer is of a general nature rather than a specific implementation of starting up a C environment, you would typically have something like a _start label, which would be the actual entry point for an executable (from the hosting environment's point of view).
This would be located in some object file or library (like crt0.o for the C runtime start-up code) and would normally be added automagically to your executable file by the linker, similar to the way the C runtime library is added(a).
The operating system code for starting a program would then be akin to (pseudo-code, obviously, and with much less error checking than it should have):
def spawnProg(progName):
id = newProcess() # make process space
loadProgram(pid = id, file = progName) # load program into it
newThread(pid, initialPc = '_start') # make thread to run it
Even though you yourself create a main when coding in C, that's not really where things start happening. There's a whole slew of things that need to be done even before your main program starts. Hence the content of the C start-up code would be along the lines of (at its most simplistic):
_start: ;; Weave magic here to set up C and libc.
;; Note this is example code for a mythical implementation,
;; intended to show how it could work. It is not specific
;; bound to any given implementation.
call __setup_for_c ; Set up C environment.
call __libc_start_main ; Set up standard library.
call _main ; Call your main.
call __libc_stop_main ; Tear down standard library.
call __teardown_for_c ; Tear down C environment.
jmp __exit ; Return to OS.
The "weaving of magic" is whatever it takes to make the environment ready for a C program. This may include things like:
setting up static data (this is supposed to be initialised to zeros so it's probably just an allocation of a chunk of of memory, which is then zeroed by the start-up code - otherwise you would need to store a chunk of that size, already zeroed, in the executable file);
preparing argc and argv on the stack, and even preparing the stack itself (there are specific calling conventions that may be used for C, and it's likely the operating system doesn't necessarily set up the stack at all when calling _start since the needs of the process are not known);
setting up thread-specific data structures (things like random number generators, or error variables, per thread);
initialising the C library in other ways; and so on.
Only once all that is complete will it be okay to call your main function. There's also the likelihood that work needs to be done after your main exits, such as:
invoking atexit handlers (things you want run automatically on exit, no matter where the exit occurs);
detaching from shared resources (for example, shared memory if the OS doesn't do this automatically when it shuts down a process); and
freeing up any other resources not automatically cleaned when the process exits, that would otherwise hang around.
(a) Many linkers can be told to not do that if, for example, you're writing something that doesn't use the standard C library, or if you want to provide your own _start routine for low-level work.
Is there good documentation of what happen when I run some executable in Linux. For example: I start ./a.out, so probably some bootloader assembly is run (come with c runtime?), and it finds start symbol in program, doing dynamic relocation, finally call main.
I know the above is not correct, but looking for detailed documentation of how this process happen. Can you please explain, or point to links or books that do?
For dynamic linked programs, the kernel detects the PT_INTERP header in the ELF file and first mmaps the dynamic linker (/lib/ld-linux.so.2 or similar), and starts execution at the e_entry address from the main ELF header of the dynamic linker. The initial state of the stack contains the information the dynamic linker needs to find the main program binary (already in memory). It's responsible for reading this and finding all the additional libraries that must be loaded, loading them, performing relocations, and jumping to the e_entry address of the main program.
For static linked programs, the kernel uses the e_entry address from the main program's ELF header directly.
In either case, the main program begins with a routine written in assembly traditionally called _start (but the name is not important as long as its address is in the e_entry field of the ELF header). It uses the initial stack contents to determine argc, argv, environ, etc. and calls the right implementation-internal functions (usually written in C) to run global constructors (if any) and perform any libc initialization needed prior to the entry to main. This usually ends with a call to exit(main(argc, argv)); or equivalent.
A book "Linker and Loader" gives a detail description about the loading process. Maybe it can give you some help on the problem.
Is it possible to RUN 2 different C programs(ie 2 main()), stored in Flash(micro controller), one at a time?
I have a bootloader code which is a separate program and resides in separate protected section of ROM. Then I have my application program which resides in separate ROM section. Although, residing in memory is not an issue, but how will linker interpret this? How can I switch between 2 programs. Is this possible?
For example:
Once I am done with bootloader, I can make it jump to Application function, but how will linker know this function?
Just to add, I am using Freescale HCS08 series and IDE is Codewarrior.
Further, here are the sequence of steps:
I load a Bootloader code in ROM. Then this bootloader code is required to load my application code. And then my application code should take over.
Bootloader Code:
Program Application Area ROM
Start Application Program
Application Code:
Check if to run Bootloader code or application itself.
Main is just a function. You may rename it and write another main which call either of them.
If you do not want rename main in the source you may mangle its name by define or compiler key:
cc -Dmain=main1 ...
(for first program), and
cc -Dmain=main2 ...
(for the second). Selector main:
int main(void) {
if(x) return main1();
else return main2();
}
Then link all together and download to your controller.
But there's problem with ISR's: you cannot assign two routines to single irq vector. If vectors are hardcoded to some flash location (like in most 8-bit controllers) you cannot switch ISR's. You will have to write ISR wrapper, recognizing which program is run and calling appropriate ISR.
UPD
Second issue is that statically linked variables from first and second program will be in RAM simultaneously while only one set of them is used. This may exhaust RAM (small amount of which often exists in microcontroller) too early.
UPD2
Oh, now I really understand. If you want to link and download them separately, you should deal with linker maps. In this case same symbol names (such as many main's) s not an issue. In linker map you should define known entry point [set it to absolute address], from which either application code starts. Startup code (commonly it is assemble code) should be linked from this address. From selector you should decide and jump to defined location directly. (Do this only for bootloader if your app is also a selector).
Entry point provided by linker may be accessible by program as extern function:
int app2_start(void);
{
.... /* condition check */
app2_start(); /* this symbol defined in linker map, not in any source */
}
But this is not the address of it's main(), because C RTL have do make many initialisations (stack, initialised variables, heap, IO, etc.) before main() can start.
There's more common way that the bootloader decides, should it run itself or application, because if application code fails, boodloader may became inaccessible.
The way I've seen this done is to stick the entry point into a header for the application. Then have the boot loader pull that entry point out and jump to it with an appropriate inline assembly instruction. You may need a linker script to get the entry point itself from the application. Gnu ld uses ENTRY.