What is so special about main() function in C?
In my embedded C compiler it tells the program counter where to start from. Whatever appear first (as instruction) into main function it will be placed first in the flash memory. So what about PC programs? What is the meaning of main() when we program for PC?
On a hosted implementation (basically, anything with an operating system), main is defined to be the entry point of the program. It's the function that will be called by the runtime environment when the program is launched.
On a freestanding implementation (embedded systems, PLCs, etc.), the entry point is whatever the implementation says it is. That could be main, or it could be something else.
In simple terms:
There is nothing special about the the main function apart from the fact that it is called by the system when your program is started.
The main function is where the "C program" starts, as far as the C standard is concerned. But in the real world outside the standard, where there is hardware, other things need to be done before main() is called.
On a typical embedded system, you have a reset interrupt service routine, where you end up after power-on-reset (or other reset reasons). From this ISR, the following should be done, in this order:
Set the stack pointer.
Set all other memory mapping-related things (MMU registers)
Safety features like watchdog and low voltage detect are initialized.
All static storage duration variables are initialized.
main() is called.
So when main() is called, you have a stable enough environment for standard C programs to execute as expected.
To use main() as the reset vector is unorthodox and non-standard. The C standard requires that static storage duration variables are already initialized before main() is called. Also, you really don't want to do fundamental things like setting the stack pointer inside main(), because that would mess up all local variables you have in main().
When your OS runs a program your program needs to pass control over to it. And the OS only knows where to begin inside of your program at the main() function.
Have you searched on the internet? Take a look in here, and also here.
When the operating system runs a program in C, it passes control of
the computer over to that program ... the key point is that the operating system needs to know where
inside your program the control needs to be passed. In the case of a C
language program, it's the main() function that the operating system
is looking for.
Function main is special - your program begins executing at the beginning of
main. This means that every program must have a main somewhere.
main will usually call other functions to help perform its job, some that you wrote, and others
from libraries that are provided for you.
You find it in every possible C book.
The main function allows the C program to find the beginning of the program. The main function is always called when the program is started.
Related
I'm reading a textbook which describes how loader works:
When the loader runs, it copies chunks of the executable object file into the code and data segments. Next, the loader jumps to the program’s entry point, which is always the address of the _start function. The _start function calls the system startup function, __libc_start_main
From the answer of this question What is __libc_start_main and _start? we have the below pseudo-code about the execution flow:
_start:
call __setup_for_c ; set up C environment
call __libc_start_main ; set up standard library
call _main ; call your main
call __libc_stop_main ; tear down standard library
call __teardown_for_c ; tear down C environment
jmp __exit ; return to OS
My questions are:
I used objdump to check the assembly code of the program and I found _start only call __libc_start_main as picture below shows:
What about the rest of functions like call __setup_for_c ,_main etc? especially my program's main function, I can't see how it get called. so is the pseudo-code about the execution flow correct?
What does __libc_start_main setup standard library mean? Why the standard library needs to be setup? Isn't that the standard library just need to be linked by the dynamic linker when the program is loaded?
Pseudo-code isn't code ;) _libc_start_main() can call the application's main() because the address of main() will have been fixed up by the linker. The order in which the code generated by the compiler does initialization might be interesting, but you shouldn't assume it will be the same from one compiler to another, or even one release to another. It's probably best not to rely on things being done in a particular way if you can avoid it.
As to what needs to be initialized -- standard C libraries like glibc are hugely complex, and a lot of stuff needs to be initialized. To take one example, the memory allocator's block table has to be set up, so that malloc() doesn't start with a random pattern of memory allocation.
The other function calls described in the linked answer give a synopsis of what needs to happen; the actual implementation details in the GNU C library are different, either using “constructors” (_dl_start_user), or explicitly in __libc_start_main. __libc_start_main also takes care of calling the user’s main, which is why you don’t see it called in your disassembly — but its address is passed along (see the lea just the callq). __libc_start_main also takes care of the program exit, and never returns; that’s the reason for the hlt just after the callq, which will crash the program if the function returns.
The library needs quite a lot of setup nowadays:
some of its own relocation
thread-local storage setup
pthread setup
destructor registration
vDSO setup (on Linux)
ctype initialisation
copying the program name, arguments and environment to various library variables
etc. See the x86-64-specific sysdeps/x86_64/start.S and the generic csu/libc-start.c, csu/init-first.c, and misc/init-misc.c among others.
what about the rest of functions like call __setup_for_c ,_main etc?
Those are just fancy made-up readable names used in the linked answer to transfer the meaning of that answer better.
how it get called
Your standard library implementation doesn't provide a function named __setup_for_c nor _main, so they don't exists so they don't get called. Every implementation may choose different names for the functions.
is the pseudo-code about the executation flow correct?
Yes - and the word "psuedo-code" you used infers that you are aware that it's not real code.
what does __libc_start_main setup standard library mean?
It means a symbol with the name __libc_start_main. __libc_start_main is a function that initializes all standard library things and runs main in glibc. It initializes libc, pthreads, atexit and finally runs main. glibc is open source, so just look at it.
why standard library needs to be setup?
Because it was written in the way that it depends on it. The simplest is, when you write:
int var = 42; // variable with static storage duration
int main() {
return var == 42;
}
(Assuming the optimizer doesn't kick in) then the value 42 has to be written into the memory held for var before main is executed. So something has to execute before main and actually write the 42 into the memory of var. This is the simplest case why something has to execute before main. Global variables are used in many places and all of them need to be setup, for example a variable named program_invocation_name in glibc holds the name of the program - so some code needs to actually query the environment or kernel about what is the name of the program and actually store the value (and potentially parse) a string into a global variable (and also remember about free() that string if dynamically allocated on exit). Some code "has to do it" - and that code is in standard library initialization.
There are many more cases - in C++ and other languages there are constructors, there is gcc GNU extension __attribute__((__constructor__)) and .init/.preinit sections - all of them executed before main. And destructors have to execute on exit, but not on _exit - thus atexit stuff is initialized before main and all destructors may be registered with it, depending on implementation.
Environment need to be initialized, potentially stack and some more stuff. And thread local variables need to be allocated only for current thread so that when you pthread_create another thread they don't get copied with non-thread-local variables.
isn't that standard library just need to be linked by the dynamic linker when the program is loaded?
It is - when the program is loaded, the standard library is just linked. The compiler, when generating the program, uses crt code to include some startup code into the program - for example a call to __libc_start_main.
I'm writing a university project. Writing in standard C99. One of the requirements is the lack of exit(); function. Is it possible to implement a similar function?
I tried to make a function that calls main with a minus argc to detect exit. It was a stupid attempt, because the first main continues.
Just the description of the project specified that the scores will be reduced for the use of exit by exit().I understand that it asks me to code running through pointers and returns an error in the return values of the function. I'm more interested in the practice. Only for myself.
I think you misunderstood the requirement: They probably said something like do not use exit(). This does not mean you are supposed to implement your own exit(), quite to the contrary: they probably mean that the only exit-point of your program shall be the end of your main-function (or a return-statement within the main function) which is considered good programming style.
exit() is a system level facility that you can't implement on your own without knowing how the operating system implements it (Linux? Windows? embedded system?) works. As Daniel Fischer mentioned, you could call abort() which will basically do the same thing that exit will do and quit the program.
There are other "hacks" to get your program to abort without calling exit() explicitly, but these are just hacks and should not be used in production code.
Create a C++ function with C linkage and throw an exception
extern "C" MyExit() { throw std::exception(); }
Call signal() with SIGKILL
Call abort()
Write some assembly code to unwind the call stack until it gets to the function that called main and insert the return value in to the proper return register and go from there. I don't think you can do this in pure C, as the ABI is not accessible directly. But at least this would be only method that doesn't involve the operating system (just the ABI).
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
In C, how is the main() method initially called?
I want to know, who calls the main function in C.
What is actual use of the main function ( why main is special/necessary ) ?
Can I write a c program with out main function ?
The main function is called, in practice, by the C runtime.
You can write a program without main but it must have an entry point. Different operating systems allow you to specify different entry points for your program, but they all serve the same purpose as main. On Windows, you can use WinMain. On Linux you can link without the CRT and define your own _start function (but it cannot return!)
A program without an entry point is like a car without wheels: it doesn't go anywhere.
When you ask your operating system to run a file, it loads it into memory, and jumps to it starting point (_start,etc). At this point, there is an code, that call main and then exit (The linker is responsible to this part). If you will write program without main function, the linker will give you an error, since it couldn't find it.
Your program (which is series of code bundled inside functions) must have to have a starting point right?
Something must be called first to run the rest.
So, that starting point is main, which is called by the parent process in your O/S (what ever that is) and lets your program run
Simplest answer is this: the user of your program calls the main function when they start your application. Have you ever used a command terminal? If you have you will know that you can pass arguments to a command. For example:
$ grep word myfile
What is going on under the covers is the Terminal looks at what you typed then calls the main method of the grep program and passes [word, myfile] as the second argument to this method. This is a simplification but I hope it helps.
Quoting from one of the unix programming books,
When a C program is executed by the
kernelby, one of the exec functions
calls special start-up routine. This
function is called before the main
function is called. The executable
program file specifies this routine as
the starting address for the program;
this is set up by the link editor when
it is invoked by the C compiler. This
start-up routine takes values from the
kernel the command-line arguments and
the environment and sets things up so
that the main function is called as
shown earlier.
Why do we a need a middle man start-up routine. The exec function could have straightway called the main function and the kernel could have directly passed the command line arguments and environment to the main function. Why do we need the start-up routine in between?
Because C has no concept of "plug in". So if you want to use, say, malloc() someone has to initialize the necessary data structures. The C programmers were lazy and didn't want to have to write code like this all the time:
main() {
initialize_malloc();
initialize_stdio();
initialize_...();
initialize_...();
initialize_...();
initialize_...();
initialize_...();
... oh wow, can we start already? ...
}
So the C compiler figures out what needs to be done, generates the necessary code and sets up everything so you can start with your code right away.
The start-up routine initializes the CRT (i.e. creates the CRT heap so that malloc/free work, initializes standard I/O streams, etc.); in case of C++ it also calls the globals' constructors. There may be other system-specific setup, you should check the sources of your run-time library for more details.
Calling main() is a C thing, while calling _start() is a kernel thing, indicated by the entry point in the binary format header. (for clarity: the kernel doesn't want or need to know that we call it _start)
If you would have a non-C binary, you might not have a main() function, you might not even have the concept of a "function" at all.
So the actual question would be: why doesn't a compiler give the address of main() as a starting point? That's because typical libc implementations want to do some initializations before really starting the program, see the other answers for that.
edit as an example, you can change the entry point like this:
$ cat entrypoint.c
int blabla() { printf("Yes it works!\n"); exit(0); }
int main() { printf("not called\n"); }
$ gcc entrypoint.c -e blabla
$ ./a.out
Yes it works!
Important to know also is that an application program is executed in user mode, and any system calls out, set the privileged bit and go into kernel mode. This helps increase OS security by preventing the user from accessing kernel level system calls and a myriad of other complications. So a call to printf will trap, set kernel mode bit, execute code, then reset to user mode and return to your application.
The CRT is required to help you and allow you to use the languages you want in Windows and Linux. it provides some very fundamental bootstrapping into the OS to provide you with feature sets for development.
The main() function in an avr-gcc program saves the register state on the stack, but when the runtime calls it I understand on a microcontroller there isn't anything to return to. Is this a waste of RAM? How can this state saving be prevented?
How can the compiler be sure that you aren't going to recursively call main()?
It's all about the C-standard.
Nothing forbids you from exiting main at some time. You may not do it in your program, but others may do it.
Furthermore you can register cleanup-handlers via the atexit runtime function. These functions need a defined register state to execute properly, and the only way to guarantee this is to save and restore the registers around main.
It could even be useful to do this:
I don't know about the AVR but other micro-controllers can go into a low power state when they're done with their job and waiting for a reset. Doing this from a cleanup-handler may be a good idea because this handler gets called if you exit main the normal way and (as far as I now) if your program gets interrupted via a kill-signal.
Most likely main is just compiled in the same was as a standard function. In C it pretty much needs to be because you might call it from somewhere.
Note that in C++ it's illegal to call main recursively so a c++ compiler might be able to optimize this more. But in C as your question stated it's legal (if a bad idea) to call main recursively so it needs to be compiled in the same way as any other function.
How can this state saving be prevented?
The only thing you can do is to write you own C-Startup routine. That means messing with assembler, but you can then JUMP to your main() instead of just CALLing it.
In my tests with avr-gcc 4.3.5, it only saves registers if not optimizing much. Normal levels (-Os or -O2) cause the push instructions to be optimized away.
One can further specify in a function declaration that it will not return with __attribute__((noreturn)). It is also useful to do full program optimization with -fwhole-program.
The initial code in avr-libc does use call to jump to main, because it is specified that main may return, and then jumps to exit (which is declared noreturn and thus generates no call). You could link your own variant if you think that is too much. exit() in turn simply disables interrupts and enters an infinite loop, effectively stopping your program, but not saving any power. That's four instructions and two bytes of stack memory overhead if your main() never returns or calls exit().