Is a main() required for a C program? - c

Well the title says it all. Is a main() function absolutely essential for a C program?
I am asking this because I was looking at the Linux kernel code, and I didn't see a main() function.

No, the ISO C standard states that a main function is only required for a hosted environment (such as one with an underlying OS).
For a freestanding environment like an embedded system (or an operating system itself), it's implementation defined. From C99 5.1.2:
Two execution environments are defined: freestanding and hosted. In both cases, program startup occurs when a designated C function is called by the execution environment.
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined.
As to how Linux itself starts, the start point for the Linux kernel is start_kernel though, for a more complete picture of the entire boot process, you should start here.

Well, no, but ...
C99 specifies that main() is called in the hosted environment "at program startup", however, you don't have to use the C runtime support. Your operating system executes image files and starts a program at an address provided by the linker.
If you are willing to write your program to conform to the operating system's requirements rather than C99's, you can do it without main(). The more modern (and complex) the system, though, the more trouble you will have with the C library making assumptions that the standard runtime startup is used.
Here is an example for Linux...
$ cat > nomain.S
.text
_start:
call iamnotmain
movl $0xfc, %eax
xorl %ebx, %ebx
int $0x80
.globl _start
$ cat > demo.c
void iamnotmain(void) {
static char s[] = "hello, world\n";
write(1, s, sizeof s);
}
$ as -o nomain.o nomain.S
$ cc -c demo.c
$ ld -static nomain.o demo.o -lc
$ ./a.out
hello, world
It's arguably not "a C99 program" now, though, just a "Linux program" with a object module written in C.

The main() function is called by an object file included with the libc. Since the kernel doesn't link against the libc it has its own entry point, written in assembler.

Paxdiablo's answer covers two of the cases where you won't encounter a main. Let me add a couple of more:
Many plug-in systems for other programs (like, say, browsers or text editors or the like) have no main().
Windows programs written in C have no main(). (They have a WinMain() instead.)

The operating systems loader has to call a single entry point; in the GNU compiler, the entry point is defined in the crt0.o linked object file, the source for this is the assembler file crt0.s - that invokes main() after performing various run-time start-up tasks (such as establishing a stack, static initialisation). So when building an executable that links the default crt0.o, you must have a main(), otherwise you will get a linker error since in crt0.o main() is an unresolved symbol.
It would be possible (if somewhat perverse and unnecessary) to modify crt0.s to call a different entry point. Just make sure that you make such an object file specific to your project rather than modifying the default version, or you will break every build on that machine.
The OS itself has its own C runtime start-up (which will be called from the bootloader) so can call any entry point it wishes. I have not looked at the Linux source, but imagine that it has its own crt0.s that will call whatever the C code entry point is.

main is called by glibc,that is a part of application(ring 3), not the kernel(ring 0).
the driver has another entry point,for example windows driver base on WDM is start from DRIVERENTRY

In machine language things get executed sequentially, what comes first is executed first. So, the default is for the compiler place a call to you main method to fit the C standard.
Your program works like a library, which is a collection of compiled functions. The main difference between a library and a standard executable is that for the second one the compiler generates assembly code which calls one of the functions in your program.
But you could write assembly code which calls your an arbitrary C program function (the same way calls to library functions work, actually) and this would work the same way other executables do. But the thing is you cannot do it in plain standard C, you have to resort to assembly or even some other compiler specific tricks.
This was intended as a general and superficial explanation, there are some technical differences I avoided on purpose as they don't seem relevant.

Related

Step by step C compilation using GCC?

I am trying to make the four steps that takes to transform from C source into an executable with GCC. The first 3 steps works as expected, but the last one gives me problems. I have two files: writeByte.h and writeByte.c, which contains the following:
// writeByte.h
// USED GCC COMMANDS BY ORDER:
// 1 - "gcc writeByte.c -o pre-processed.i -E"
// 2 - "gcc pre-processed.i -o assembled.s -S"
// 3 - "gcc assembled.s -o compiled.o -c"
// 4 - ???
void writeByte(char* addr, char val);
and
// writeByte.c
#include "writeByte.h"
void writeByte(char* addr, char val) { *addr = val; }
Supposedly, to link a file, I have to execute gcc compiled.o -o executable, but it says that at .text+0x20, reference to main is undefined, so I don't know how to follow.
TL;DR: Define a main() function, and your program will work.
From the C specification C11 (ISO/IEC 9899:201x / N1548)
5.1.2 Execution environments
Two execution environments are defined: freestanding and hosted.
[…]
5.1.2.1 Freestanding environment
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined.
[…]
5.1.2.2 Hosted environment
[…]
5.1.2.2.1 Program startup
The function called at program startup is named main. […]
Furthermore:
J2 Undefined behavior
The behavior is undefined in the following circumstances:
[…]
A program in a hosted environment does not define a function named main using one of the specified forms (5.1.2.2.1).
Hosted Environment
This is most likely your case.
This applies to typical operating systems, such as Linux, Unix, Mac OS X, Windows, Amiga OS, and many more. In that case, your environment typically is the hosted environment.
Given you run your C compiler and linker without any options that would select or influence the environment, and given the target is such a typical operating system, and you finally link, the assumption made by the C compiler and linker will be a hosted environment. As described above, this means that you need to provide a main() function so that the hosted environment knows where to start your C program. Because you did not provide a main function, clause J2 Undefined behavior applied, and the linker refused to complete its job.
Note: Under the hood, these operating systems actually provide their own custom interface, see below.
Solution: Provide a main() function.
Freestanding Environment
Freestanding environments typically occur when developing firmware or operating system yourself. In that case, the entry point is defined by the CPU. Most CPUs will either start execution at a pre-defined address, or at a configurable address read from a vector table specified at a pre-defined address.
Custom Environments
Besides that, there are custom environments. The two most common custom environments are:
The typical OS
In order to be able to do more than specified by the C specification, operating systems define their own environment. This environment is typically an extension of the hosted environment, and will use the entry point specified by the linker. Typically, that entry point is actually another function, often called _start, which is provided by a default library such as libglibc. This function _start is called by the OS, and that function _start then actually calls main.
So, instead of providing main, it would be possible to provide _start, or an equivalent, instead.
You could write your own _start function or entry point.
However, that risks that your program would unnecessarily be less portable, and that you have to deal with operating system issues that a hosted environment is hiding from you.
It is therefore not recommended for "normal" programs by "normal" developers.
DLL environments
When programs are supposed to run as plugins for other programs, those other programs define a custom environment.
Typically, that custom environment is realized as DLL (Dynamic Link Library).
Embedded environments
For several embedded systems, the toolchains (compiler etc.) come with libraries which provide a custom environment for that system. These environments have features which ranks somewhere between a freestanding environment and a hosted environment, and the entry points depend on the corresponding toolchain. To avoid confusion that stems from unexpected entry point names, toolchains usually use main, start, _start, Start or _Start as names for the entry point.

How do I get a full assembly code from C file?

I'm currently trying to figure out the way to produce equivalent assembly code from corresponding C source file.
I've been using the C language for several years, but have little experience with assembly language.
I was able to output the assembly code using the -S option in gcc. However, the resulting assembly code contained call instructions which in turn make a jump to another function like _exp. This is not what I wanted, I needed a fully functional assembly code in a single file, with no dependency to other code.
Is it possible to achieve what I'm looking for?
To better describe the problem, I'm showing you my code here:
#include <math.h>
float sigmoid(float i){
return 1/(1+exp(-i));
}
The platform I am working on is Windows 10 64-bit, the compiler I'm using is cl.exe from MSbuild.
My initial objective was to see, at a lowest level possible, how computers calculate mathematical functions. The level where I decided to observe the calculation process is assembly code, and the mathematical function I've chosen was sigmoid defined as above.
_exp is the standard math library function double exp(double); apparently you're on a platform that prepends a leading underscore to C symbol names.
Given a .s that calls some library functions, build it the same way you would a .c file that calls library functions:
gcc foo.S -o foo -lm
You'll get a dynamic executable by default.
But if you really want all the code in one file with no external dependencies, you can link your .c into a static executable and disassemble that.
gcc -O3 -march=native foo.c -o foo -static -lm
objdump -drwC -Mintel foo > foo.s
There's no guarantee that the _exp implementation in libm.a (static library) is identical to the one you'd get in libm.so or libm.dll or whatever, because it's a different file. This is especially true for a function like memcpy where dynamic-linker tricks are often used to select an optimal version (for your CPU) at run-time.
It is not possible in general, there are exceptions sure, I could craft one so that means other folks can too, but it isnt an interesting program.
Normally your C program, your main() entry point is only a percentage of the code. There is a bootstrap that contains the actual entry point for the operating system to launch your program, this does some things that prepare your virtual memory space so that your program can run. Zeros .bss and other such things. that is often and or should be written in assembly language (otherwise you get a chicken and egg problem) but not an assembly language file you will see unless you go find the sources for the C library, you will often get an object as part of the toolchain along with other compiler libraries, etc.
Then if you make any C calls or create code that results in a compiler library call (perform a divide on a platform that doesnt support divide, perform floating point on a platform that doesnt have floating point, etc) that is another object that came from some other C or assembly that is part of the library or compiler sources and is not something you will see during the compile/assemble/link (the chain in toolchain) process.
So except for specifically crafted trivial programs or specifically crafted tools for this purpose (for specific likely baremetal platforms), you will not see your whole program turn into one big assembly source file before it gets assembled then linked.
If not baremetal then there is of course the operating system layer which you certainly would not get to see as part of your source code, ultimately the C library calls that need the system will have a place where they do that, all compiled to object/lib before you use them, and the assembly sources for the operating system side is part of some other source and build process somewhere else.

What is the need for C startup routine?

Quoting from one of the unix programming books,
When a C program is executed by the
kernelby, one of the exec functions
calls special start-up routine. This
function is called before the main
function is called. The executable
program file specifies this routine as
the starting address for the program;
this is set up by the link editor when
it is invoked by the C compiler. This
start-up routine takes values from the
kernel the command-line arguments and
the environment and sets things up so
that the main function is called as
shown earlier.
Why do we a need a middle man start-up routine. The exec function could have straightway called the main function and the kernel could have directly passed the command line arguments and environment to the main function. Why do we need the start-up routine in between?
Because C has no concept of "plug in". So if you want to use, say, malloc() someone has to initialize the necessary data structures. The C programmers were lazy and didn't want to have to write code like this all the time:
main() {
initialize_malloc();
initialize_stdio();
initialize_...();
initialize_...();
initialize_...();
initialize_...();
initialize_...();
... oh wow, can we start already? ...
}
So the C compiler figures out what needs to be done, generates the necessary code and sets up everything so you can start with your code right away.
The start-up routine initializes the CRT (i.e. creates the CRT heap so that malloc/free work, initializes standard I/O streams, etc.); in case of C++ it also calls the globals' constructors. There may be other system-specific setup, you should check the sources of your run-time library for more details.
Calling main() is a C thing, while calling _start() is a kernel thing, indicated by the entry point in the binary format header. (for clarity: the kernel doesn't want or need to know that we call it _start)
If you would have a non-C binary, you might not have a main() function, you might not even have the concept of a "function" at all.
So the actual question would be: why doesn't a compiler give the address of main() as a starting point? That's because typical libc implementations want to do some initializations before really starting the program, see the other answers for that.
edit as an example, you can change the entry point like this:
$ cat entrypoint.c
int blabla() { printf("Yes it works!\n"); exit(0); }
int main() { printf("not called\n"); }
$ gcc entrypoint.c -e blabla
$ ./a.out
Yes it works!
Important to know also is that an application program is executed in user mode, and any system calls out, set the privileged bit and go into kernel mode. This helps increase OS security by preventing the user from accessing kernel level system calls and a myriad of other complications. So a call to printf will trap, set kernel mode bit, execute code, then reset to user mode and return to your application.
The CRT is required to help you and allow you to use the languages you want in Windows and Linux. it provides some very fundamental bootstrapping into the OS to provide you with feature sets for development.

x86 Assembly: What's the main prologue and epilogue?

I'm following this tutorial on x86 assembly. Every example so far uses what the author calls a "c-driver" program, compiled with the assembly module, for means of some "initialization". Something like:
int main(void) {
int ret = asm_main();
return ret;
}
And then the asm_main function is written normally, using a C calling convention. I'm wondering what exactly is the required initialization that's being generated by the C compiler, and if it can be done in a portable manner.
Infos: I'm on Windows XP, 32bit box, using the NASM assembler and mingw32-gcc for linking.
The initialisation isn't generated by the c compiler, it is part of the c library (which makes it easier to tailor for each OS/processor).
The code in question is normally very simple on windows/unixy systems - typically does a bit of library initialisation (opens STDIN, STDOUT, STDERR, sets timezone etc), sets up the environment, processes the command line for passing to main; catches the return from main() and calls exit etc.
The start up code in most c libraries is in a file called crt0.c, crt1.c or something similar (crt = c run time).
On more primitive or bare systems it will also set up the stack and other registers and clear the BSS data area - in this case it would often be in assembler (typically crt0.S).
Here is a link to the BSD c startup code - link text
And the start up code for mingw for windows is in crt1.c here - http://mingw.cvs.sourceforge.net/viewvc/mingw/runtime/
You could write your main in assembly if you want. But a lot of people want to put debugging statements in the main and those are easier in C than in asm.
If you wrote main in asm you might have to deal with main actually being called _main or using an alternate calling convention (especially under Windows) or other strange things like that that the C compiler handles for you automatically when generating code for a function with the name "main". This way makes it so you don't have to do that either.
The stack, registers, and program's file sections (data, rodata, bss, etc) have to be initialized before main() is called. C runtime library (CRT) provides this initialzsation.
CRT also provides prologue and epilogue code that is executed before and after each function is called. The prologue and epilogue code updates the stack and frame pointers.

LINUX: Is it possible to write a working program that does not rely on the libc library?

I wonder if I could write a program in the C-programming language that is executable, albeit not using a single library call, e.g. not even exit()?
If so, it obviously wouldn't depend on libraries (libc, ld-linux) at all.
I suspect you could write such a thing, but it would need to have an endless loop at the end, because you can't ask the operation system to exit your process. And you couldn't do anything useful.
Well start with compiling an ELF program, look into the ELF spec and craft together the header, the program segments and the other parts you need for a program. The kernel would load your code and jump to some initial address. You could place an endless loop there. But without knowing some assembler, that's hopeless from the start on anyway.
The start.S file as used by glibc may be useful as a start point. Try to change it so that you can assemble a stand-alone executable out of it. That start.S file is the entry point of all ELF applications, and is the one that calls __libc_start_main which in turn calls main. You just change it so it fits your needs.
Ok, that was theoretical. But now, what practical use does that have?
Answer to the Updated Question
Well. There is a library called libgloss that provides a minimal interface for programs that are meant to run on embedded systems. The newlib C library uses that one as its system-call interface. The general idea is that libgloss is the layer between the C library and the operation system. As such, it also contains the startup files that the operation system jumps into. Both these libraries are part of the GNU binutils project. I've used them to do the interface for another OS and another processor, but there does not seem to be a libgloss port for Linux, so if you call system calls, you will have to do it on your own, as others already stated.
It is absolutely possible to write programs in the C programming language. The linux kernel is a good example of such a program. But also user programs are possible. But what is minimally required is a runtime library (if you want to do any serious stuff). Such one would contain really basic functions, like memcpy, basic macros and so on. The C Standard has a special conformance mode called freestanding, which requires only a very limited set of functionality, suitable also for kernels. Actually, i have no clue about x86 assembler, but i've tried my luck for a very simple C program:
/* gcc -nostdlib start.c */
int main(int, char**, char**);
void _start(int args)
{
/* we do not care about arguments for main. start.S in
* glibc documents how the kernel passes them though.
*/
int c = main(0,0,0);
/* do the system-call for exit. */
asm("movl %0,%%ebx\n" /* first argument */
"movl $1,%%eax\n" /* syscall 1 */
"int $0x80" /* fire interrupt */
: : "r"(c) :"%eax", "%ebx");
}
int main(int argc, char** argv, char** env) {
/* yeah here we can do some stuff */
return 42;
}
We're happy, it actually compiles and runs :)
Yes, it is possible, however you will have to make system calls and set up your entry point manually.
Example of a minimal program with entry point:
.globl _start
.text
_start:
xorl %eax,%eax
incl %eax
movb $42, %bl
int $0x80
Or in plain C (no exit):
void __attribute__((noreturn)) _start() {
while(1);
}
Compiled with:
gcc -nostdlib -o example example.s
gcc -nostdlib -o example example.c
In pure C? As others have said you still need a way to make syscalls, so you might need to drop down to inline asm for that. That said, if using gcc check out -ffreestanding.
You'd need a way to prevent the C compiler from generating code that depends on libc, which with gcc can be done with -fno-hosted. And you'd need one assembly language routine to implement syscall(2). They're not hard to write if you can get suitable OS doco. After that you'd be off to the races.
Well, you would need to use some system calls to load all it's information into memory, so I doubt it.
And you would almost have to use exit(), just because of the way that Linux works.
Yes you can, but it's pretty tricky.
There is essentially absolutely no point.
You can statically link a program, but then the appropriate pieces of the C library are included in its binary (so it doesn't have any dependencies).
You can completely do without the C library, in which case you need to make system calls using the appropriate low-level interface, which is architecture dependent, and not necessarily int 0x80.
If your goal is making a very small self-contained binary, you might be better off static-linking against something like uclibc.

Resources