x86 Assembly: What's the main prologue and epilogue? - c

I'm following this tutorial on x86 assembly. Every example so far uses what the author calls a "c-driver" program, compiled with the assembly module, for means of some "initialization". Something like:
int main(void) {
int ret = asm_main();
return ret;
}
And then the asm_main function is written normally, using a C calling convention. I'm wondering what exactly is the required initialization that's being generated by the C compiler, and if it can be done in a portable manner.
Infos: I'm on Windows XP, 32bit box, using the NASM assembler and mingw32-gcc for linking.

The initialisation isn't generated by the c compiler, it is part of the c library (which makes it easier to tailor for each OS/processor).
The code in question is normally very simple on windows/unixy systems - typically does a bit of library initialisation (opens STDIN, STDOUT, STDERR, sets timezone etc), sets up the environment, processes the command line for passing to main; catches the return from main() and calls exit etc.
The start up code in most c libraries is in a file called crt0.c, crt1.c or something similar (crt = c run time).
On more primitive or bare systems it will also set up the stack and other registers and clear the BSS data area - in this case it would often be in assembler (typically crt0.S).
Here is a link to the BSD c startup code - link text
And the start up code for mingw for windows is in crt1.c here - http://mingw.cvs.sourceforge.net/viewvc/mingw/runtime/

You could write your main in assembly if you want. But a lot of people want to put debugging statements in the main and those are easier in C than in asm.
If you wrote main in asm you might have to deal with main actually being called _main or using an alternate calling convention (especially under Windows) or other strange things like that that the C compiler handles for you automatically when generating code for a function with the name "main". This way makes it so you don't have to do that either.

The stack, registers, and program's file sections (data, rodata, bss, etc) have to be initialized before main() is called. C runtime library (CRT) provides this initialzsation.
CRT also provides prologue and epilogue code that is executed before and after each function is called. The prologue and epilogue code updates the stack and frame pointers.

Related

C startup code is only written in assembly confusion

I understand that the C startup code is for initializing the C runtime environment, initializes static variables, sets up the stack pointer etc. and finally branches to main().
They say that this can only be written in assembly language as it's platform-specific. However, can't this still be written in C and compiled for the specific platform?
Function calls of course would be not possible because we "more than likely" don't have the stack pointer set up at that stage. I still can't see other main reasons. Thanks in advance.
Startup code can be written in C language only if:
Implementation provides all necessary intrinsic functions to set hardware features that cannot be set using standard C
Provides mechanism of placing fragments of code and data in the specific place and in specific order (gcc support for ld linker scripts for example).
If both conditions are met you can write the startup code in C language.
I use my own startup code written in C (instead of one provided by the chip vendors) for Cortex-M microcontrollers as ARM provides CMSIS header files with all needed inline assembly functions and gcc based toolchain gives me full memory layout control.
Most of the problem with writing early startup code in C is, in fact, the absence of a properly structured stack. It's worse than just not being able to make function calls. All of a C compiler's generated machine code assumes the existence of a stack, pointed to by the ABI-specified register, that can be used for scratch storage at any time. Changing this assumption would be so much work as to amount to a complete second "back end" for the compiler—way more work than continuing to write early startup code by hand in assembly.
Early bootstrap code, bringing up the machine from power-on, also has to do a bunch of special operations that can't usually be accessed from C, like configuring interrupts and virtual memory. And it may have to deal with the code not having been loaded at the address it was linked for, or the relocation table not having been processed, or other similar problems; these also break pervasive assumptions made by the C compiler (e.g. that it can inject a call to memcpy whenever it wants).
Despite all that, most of a user mode C library's startup code will, in fact, be written in C, for exactly the reason you are thinking. Nobody wants to write more code in assembly, over and over for each supported ISA, than absolutely necessary.
A minimal C runtime environment requires a stack, and a jump to a start address. Setting the stack pointer on most architectures requires assembly code. Once a stack is available it is possible to run code generated from C source.
ARM Cortex-M devices load the stack pointer and start address from the vector table on reset, so can in fact boot directly into code generated from C source.
On other architectures, the minimal assembly requires is to set a stack pointer, and jump to the start address. Thereafter it is possible to write other start-up tasks in C ( or C++ even). Such startup code is responsible for establishing the full C runtime, so must not assume static initialisation or library initialisation (no heap or filesystem for example), which are things that must be done by the startup code.
In that sense you can run code generated from C source, but the environment is not strictly conforming until main() has been called, so there are some constraints.
Even where assembly code is used, it need not be the whole start-up code that is in assembly.

Calling Mips from C

I'd like to call assembly (specifically MIPS) code from my C program and call the C back from the assembly.
I've decided on the GNU GCC as my compiler, (I am also guessing I need an emulator?)
I'm on a x86 Win 7 machine.
There are some things that are very unclear to me how this can/should work out.
If MIPS will be using a load-store archi with 32 regs and the C will continue to use a register memory archi because I'm on x86?
Now that I want to call mips assembly instead of x86 assembly, can/do I still use asm() ?
If MIPS uses more registers than C, will I be able to access those registers from my C code?
Can anyone help me out with this, perhaps by pointing out where I could learn this bit of sorcery?
Thanks
Disclaimer: I am working on a verification of self modifying code project for credit in school, and this code is going to be used as an example, but I am not getting any credit for this code.
The most common MIPS calling convention is described here. The easiest way to write a C-callable assembly routine is to write a skeleton for the routine in C, and then copy the assembly code output from the compiler into your assembly source (use gcc's -S option). Say you want to call an assembler function defined in C as int foo(int a, int b). You would write a simple version of that function in C. For example, put the following into foo.c:
int foo(int a, int b) {
return a+b; // some simple code to access all arguments and the return value
}
Then you would compile that function using a MIPS cross compiler (see below) using the -S and the -O0 option to gcc which will produce a text output file foo.S giving you MIPS assembler source code to access the arguments for function foo and showing you where to put the return value. Simply copy that source file into your own assembler source, and add the assembler calculations you need to compute foo.
Calling C from assembly is straightforward once you have calling in the other direction figured out.
You can download a free MIPS gcc cross compiler tool chain from Mentor Graphics (formerly Codesourcery).
You can download a free, fully functional (it boots and runs Linux) MIPS simulator from here. Don't use SPIM or MARS, since they do not completely model the MIPS architecture.

Mixing Assembly language and C programs

I am using a bootloader program which is in Assembly and I am calling a C function frequently to SEND and RECEIVE a Character at a time. The controller I am using seems to have just 3 general purpose registers which it uses frequently. Apart from that I am storing some bytes in fixed RAM locations.
SO, my question is:
Will C function overwrite these RAM location, which were defined in Assembly?
I am doing PUSH and PULL of the concerned registers before going and after coming from these C functions.
If I understand your question correctly, you are concerned about the RAM locations used in your assembly module overlapping with some variable declared in a C module. You can examine the list file output by your linker to determine if this is the case. The linker list file will show all of the RAM addresses used by your C modules which you can compare to the fixed RAM locations used in the assembly module.
Note that if your linker does not produce a list file automatically, you will have to read through your linker's documentation to find the right command line option to do so.
As long as you are keeping the previous values on the stack when doing the c calls you should be fine. Just make sure that you are pushing onto stack before the call and popping off the stack after returning.
It all depends on the C calling convention that the C code was compiled in. Calling convention is how the caller and callee will communicate with regards to passing data into the function and returning values afterwards. This includes who wil do stuff like back up registers onto the stack before/after calling, will it be necessary to prep the registers before calling the C function, can you guarantee that the registers will return the way they were, etc.
You'll need to find out how the C code was compiled (with what Calling Convention setting). Note that this is also architecture specific. A summary of the different calling conventions and a description of what each entails can be found at Wikipedia here:
http://en.wikipedia.org/wiki/Calling_convention
http://en.wikipedia.org/wiki/X86_calling_conventions
On x86, cdecl and stdcall are the most popular conventions. cdecl means your ASM code should do the cleanup, while stdcall says the function being called is responsible for it. If you have the source code for the C function, I would suggest passing the necessary flags to the compiler to make it a "Callee cleanup" convention (usually stdcall, but safecall and fastcall are also options) which means you can safely call the C function without worrying about register corruption.

Is a main() required for a C program?

Well the title says it all. Is a main() function absolutely essential for a C program?
I am asking this because I was looking at the Linux kernel code, and I didn't see a main() function.
No, the ISO C standard states that a main function is only required for a hosted environment (such as one with an underlying OS).
For a freestanding environment like an embedded system (or an operating system itself), it's implementation defined. From C99 5.1.2:
Two execution environments are defined: freestanding and hosted. In both cases, program startup occurs when a designated C function is called by the execution environment.
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined.
As to how Linux itself starts, the start point for the Linux kernel is start_kernel though, for a more complete picture of the entire boot process, you should start here.
Well, no, but ...
C99 specifies that main() is called in the hosted environment "at program startup", however, you don't have to use the C runtime support. Your operating system executes image files and starts a program at an address provided by the linker.
If you are willing to write your program to conform to the operating system's requirements rather than C99's, you can do it without main(). The more modern (and complex) the system, though, the more trouble you will have with the C library making assumptions that the standard runtime startup is used.
Here is an example for Linux...
$ cat > nomain.S
.text
_start:
call iamnotmain
movl $0xfc, %eax
xorl %ebx, %ebx
int $0x80
.globl _start
$ cat > demo.c
void iamnotmain(void) {
static char s[] = "hello, world\n";
write(1, s, sizeof s);
}
$ as -o nomain.o nomain.S
$ cc -c demo.c
$ ld -static nomain.o demo.o -lc
$ ./a.out
hello, world
It's arguably not "a C99 program" now, though, just a "Linux program" with a object module written in C.
The main() function is called by an object file included with the libc. Since the kernel doesn't link against the libc it has its own entry point, written in assembler.
Paxdiablo's answer covers two of the cases where you won't encounter a main. Let me add a couple of more:
Many plug-in systems for other programs (like, say, browsers or text editors or the like) have no main().
Windows programs written in C have no main(). (They have a WinMain() instead.)
The operating systems loader has to call a single entry point; in the GNU compiler, the entry point is defined in the crt0.o linked object file, the source for this is the assembler file crt0.s - that invokes main() after performing various run-time start-up tasks (such as establishing a stack, static initialisation). So when building an executable that links the default crt0.o, you must have a main(), otherwise you will get a linker error since in crt0.o main() is an unresolved symbol.
It would be possible (if somewhat perverse and unnecessary) to modify crt0.s to call a different entry point. Just make sure that you make such an object file specific to your project rather than modifying the default version, or you will break every build on that machine.
The OS itself has its own C runtime start-up (which will be called from the bootloader) so can call any entry point it wishes. I have not looked at the Linux source, but imagine that it has its own crt0.s that will call whatever the C code entry point is.
main is called by glibc,that is a part of application(ring 3), not the kernel(ring 0).
the driver has another entry point,for example windows driver base on WDM is start from DRIVERENTRY
In machine language things get executed sequentially, what comes first is executed first. So, the default is for the compiler place a call to you main method to fit the C standard.
Your program works like a library, which is a collection of compiled functions. The main difference between a library and a standard executable is that for the second one the compiler generates assembly code which calls one of the functions in your program.
But you could write assembly code which calls your an arbitrary C program function (the same way calls to library functions work, actually) and this would work the same way other executables do. But the thing is you cannot do it in plain standard C, you have to resort to assembly or even some other compiler specific tricks.
This was intended as a general and superficial explanation, there are some technical differences I avoided on purpose as they don't seem relevant.

LINUX: Is it possible to write a working program that does not rely on the libc library?

I wonder if I could write a program in the C-programming language that is executable, albeit not using a single library call, e.g. not even exit()?
If so, it obviously wouldn't depend on libraries (libc, ld-linux) at all.
I suspect you could write such a thing, but it would need to have an endless loop at the end, because you can't ask the operation system to exit your process. And you couldn't do anything useful.
Well start with compiling an ELF program, look into the ELF spec and craft together the header, the program segments and the other parts you need for a program. The kernel would load your code and jump to some initial address. You could place an endless loop there. But without knowing some assembler, that's hopeless from the start on anyway.
The start.S file as used by glibc may be useful as a start point. Try to change it so that you can assemble a stand-alone executable out of it. That start.S file is the entry point of all ELF applications, and is the one that calls __libc_start_main which in turn calls main. You just change it so it fits your needs.
Ok, that was theoretical. But now, what practical use does that have?
Answer to the Updated Question
Well. There is a library called libgloss that provides a minimal interface for programs that are meant to run on embedded systems. The newlib C library uses that one as its system-call interface. The general idea is that libgloss is the layer between the C library and the operation system. As such, it also contains the startup files that the operation system jumps into. Both these libraries are part of the GNU binutils project. I've used them to do the interface for another OS and another processor, but there does not seem to be a libgloss port for Linux, so if you call system calls, you will have to do it on your own, as others already stated.
It is absolutely possible to write programs in the C programming language. The linux kernel is a good example of such a program. But also user programs are possible. But what is minimally required is a runtime library (if you want to do any serious stuff). Such one would contain really basic functions, like memcpy, basic macros and so on. The C Standard has a special conformance mode called freestanding, which requires only a very limited set of functionality, suitable also for kernels. Actually, i have no clue about x86 assembler, but i've tried my luck for a very simple C program:
/* gcc -nostdlib start.c */
int main(int, char**, char**);
void _start(int args)
{
/* we do not care about arguments for main. start.S in
* glibc documents how the kernel passes them though.
*/
int c = main(0,0,0);
/* do the system-call for exit. */
asm("movl %0,%%ebx\n" /* first argument */
"movl $1,%%eax\n" /* syscall 1 */
"int $0x80" /* fire interrupt */
: : "r"(c) :"%eax", "%ebx");
}
int main(int argc, char** argv, char** env) {
/* yeah here we can do some stuff */
return 42;
}
We're happy, it actually compiles and runs :)
Yes, it is possible, however you will have to make system calls and set up your entry point manually.
Example of a minimal program with entry point:
.globl _start
.text
_start:
xorl %eax,%eax
incl %eax
movb $42, %bl
int $0x80
Or in plain C (no exit):
void __attribute__((noreturn)) _start() {
while(1);
}
Compiled with:
gcc -nostdlib -o example example.s
gcc -nostdlib -o example example.c
In pure C? As others have said you still need a way to make syscalls, so you might need to drop down to inline asm for that. That said, if using gcc check out -ffreestanding.
You'd need a way to prevent the C compiler from generating code that depends on libc, which with gcc can be done with -fno-hosted. And you'd need one assembly language routine to implement syscall(2). They're not hard to write if you can get suitable OS doco. After that you'd be off to the races.
Well, you would need to use some system calls to load all it's information into memory, so I doubt it.
And you would almost have to use exit(), just because of the way that Linux works.
Yes you can, but it's pretty tricky.
There is essentially absolutely no point.
You can statically link a program, but then the appropriate pieces of the C library are included in its binary (so it doesn't have any dependencies).
You can completely do without the C library, in which case you need to make system calls using the appropriate low-level interface, which is architecture dependent, and not necessarily int 0x80.
If your goal is making a very small self-contained binary, you might be better off static-linking against something like uclibc.

Resources