What is the use of _start() in C? - c

I learned from my colleague that one can write and execute a C program without writing a main() function. It can be done like this:
my_main.c
/* Compile this with gcc -nostartfiles */
#include <stdlib.h>
void _start() {
int ret = my_main();
exit(ret);
}
int my_main() {
puts("This is a program without a main() function!");
return 0;
}
Compile it with this command:
gcc -o my_main my_main.c –nostartfiles
Run it with this command:
./my_main
When would one need to do this kind of thing? Is there any real world scenario where this would be useful?

The symbol _start is the entry point of your program. That is, the address of that symbol is the address jumped to on program start. Normally, the function with the name _start is supplied by a file called crt0.o which contains the startup code for the C runtime environment. It sets up some stuff, populates the argument array argv, counts how many arguments are there, and then calls main. After main returns, exit is called.
If a program does not want to use the C runtime environment, it needs to supply its own code for _start. For instance, the reference implementation of the Go programming language does so because they need a non-standard threading model which requires some magic with the stack. It's also useful to supply your own _start when you want to write really tiny programs or programs that do unconventional things.

While main is the entry point for your program from a programmers perspective, _start is the usual entry point from the OS perspective (the first instruction that is executed after your program was started from the OS)
In a typical C and especially C++ program, a lot of work has been done before the execution enters main. Especially stuff like initialization of global variables. Here you can find a good explanation of everything that's going on between _start() and main() and also after main has exited again (see comment below).
The necessary code for that is usually provided by the compiler writers in a startup file, but with the flag –nostartfiles you essentially tell the compiler: "Don't bother giving me the standard startup file, give me full control over what is happening right from the start".
This is sometimes necessary and often used on embedded systems. E.g. if you don't have an OS and you have to manually enable certain parts of your memory system (e.g. caches) before the initialization of your global objects.

Here is a good overview of what happens during program startup before main. In particular, it shows that __start is the actual entry point to your program from OS viewpoint.
It is the very first address from which the instruction pointer will start counting in your program.
The code there invokes some C runtime library routines just to do some housekeeping, then call your main, and then bring things down and call exit with whatever exit code main returned.
A picture is worth a thousand words:
P.S: this answer is transplanted from another question which SO has helpfully closed as duplicate of this one.

When would one need to do this kind of thing?
When you want your own startup code for your program.
main is not the first entry for a C program, _start is the first entry behind the curtain.
Example in Linux:
_start: # _start is the entry point known to the linker
xor %ebp, %ebp # effectively RBP := 0, mark the end of stack frames
mov (%rsp), %edi # get argc from the stack (implicitly zero-extended to 64-bit)
lea 8(%rsp), %rsi # take the address of argv from the stack
lea 16(%rsp,%rdi,8), %rdx # take the address of envp from the stack
xor %eax, %eax # per ABI and compatibility with icc
call main # %edi, %rsi, %rdx are the three args (of which first two are C standard) to main
mov %eax, %edi # transfer the return of main to the first argument of _exit
xor %eax, %eax # per ABI and compatibility with icc
call _exit # terminate the program
Is there any real world scenario where this would be useful?
If you mean, implement our own _start:
Yes, in most of the commercial embedded software I have worked with, we need to implement our own _start regarding to our specific memory and performance requirements.
If you mean, drop the main function and change it to something else:
No, I don't see any benefit doing that.

Related

Linking and calling printf from gas assembly

There are a few related questions to this which I've come across, such as Printf with gas assembly and Calling C printf from assembly but I'm hoping this is a bit different.
I have the following program:
.section .data
format:
.ascii "%d\n"
.section .text
.globl _start
_start:
// print "55"
mov $format, %rdi
mov $55, %rsi
mov $0, %eax
call printf # how to link?
// exit
mov $60, %eax
mov $0, %rdi
syscall
Two questions related to this:
Is it possible to use only as (gas) and ld to link this to the printf function, using _start as the entry point? If so, how could that be done?
If not, other than changing _start to main, what would be the gcc invocation to run things properly?
It is possible to use ld, but not recommended: if you use libc functions, you need to initialise the C runtime. That is done automatically if you let the C compiler provide _start and start your program as main. If you use the libc but not the C runtime initialisation code, it may seem to work, but it can also lead to strange spurious failure.
If you start your program from main (your second case) instead, it's as simple as doing gcc -o program program.s where program.s is your source file. On some Linux distributions you may also need to supply -no-pie as your program is not written in PIC style (don't worry about this for now).
Note also that I recommend not mixing libc calls with raw system calls. Instead of doing a raw exit system call, call the C library function exit. This lets the C runtime deinitialise itself correctly, including flushing any IO streams.
Now if you assemble and link your program as I said in the first paragraph, you'll notice that it might crash. This is because the stack needs to be aligned to a multiple of 16 bytes on calls to functions. You can ensure this alignment by pushing a qword of data on the stack at the beginning of each of your functions (remember to pop it back off at the end).

Printing strings in Assembly, calling that function in C [duplicate]

This question already has an answer here:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
(1 answer)
Closed 2 years ago.
all. I was trying to get into programming with NASM, and I also wanted to learn how to make those functions callable in C. I am fairly certain the code that I have so far is correct, in that I need to set up a stack frame, and then undo that stack frame before I return from the routine. I also know I need to return a zero to ensure that there were no errors. I am using debian linux as well, in case I need to adjust for my OS.
The code:
global hello
section .data
message: db "Hello, world!",0 ; A C string needs the null terminator.
section .text
hello:
push rbp ; C calling convention demands that a
mov rbp,rsp ; stack frame be set up like so.
;THIS IS WHERE THE MAGIC (DOESN'T) HAPPEN
pop rbp ; Restore the stack
mov rax,0 ; normal, no error, return value
ret ; return
I feel as if I should point out that I ask this because all of the programs I found made external calls to printf. I do not wish to do this, I would really like to learn how to print things in assembly. So I suppose my questions are: What are the calling conventions for C functions in NASM? How do I print a string in NASM 64bit assembly?
Also, to make sure I have this part right, is this the proper way to call the assembly function in C?
#include <stdio.h>
int main() {
hello();
return 0;
}
EDIT: Okay, I was able to work this out. Here's the assembly code. I assembled the .asm file along with the .c file using nasm -f elf64 -l hello.lst hello.asm && gcc -o hello hello.c hello.o
section .text
global hello
hello:
push rbp ; C calling convention demands that a
mov rbp,rsp ; stack frame be set up like so.
mov rdx,len ; Message length
mov rcx,message ; Message to be written
mov rax,4 ; System call number (sys_write)
int 0x80 ; Call kernel
pop rbp ; Restore the stack
mov rax,0 ; normal, no error, return value
ret
section .data
message: db "Hello, world!",0xa ; 0xa represents the newline char.
len: equ $ - message
The relevant C code (hello.c) looked like this:
int main(int argc, char **argv) {
hello();
return 0;
}
Some explanations include the lack of an #include, due to the I/O being done in the assembly file. Another thing that I needed to see to believe was that all the work was not done in assembly, as I did not have a _start identifier, or whatever that's called. Definitely need to learn more about system calls. Thank you so much to everyone who pointed me in the right direction.
As was cleared up in comments, any interaction between the world outside and your code is done through system calls. C stdio functions format text into an output buffer, then write it with write(2). Or read(2) into an input buffer, and scanf or read lines from that.
Writing in asm doesn't mean you should avoid libc functions when they're useful, e.g. printf/scanf. Usually it only makes sense to write small parts of a program in asm for speed. e.g. write one function that has a hot loop in asm, and call it from C or whatever other language. Doing the I/O with all the necessary error-checking of system call return values would not be very fun in asm. If you're curious what happens under the hood, read the compiler output and/or single-step the asm. You'll sometimes learn nice tricks from the compiler, and sometimes you'll see it generate less efficient code than you could have written by hand.
This is a problem:
mov rax,4 ; System call number (sys_write)
int 0x80 ; Call kernel
Although 64bit processes can use the i386 int 0x80 system call ABI, it is the 32bit ABI, with only 32bit pointers and so on. You will have a problem as soon as you go to write(2) a char array that's on the stack (since amd64 Linux processes start with a stack pointer that has the high bits set. Heap memory, and .data and .rodata memory mapped from the executable are mapped into the lower 32b of address space.)
The native amd64 ABI uses syscall, and the system call numbers aren't the same as the i386 ABI. I found this table of syscalls listing the number and which parameter goes in which register. sys/syscall.h eventually includes /usr/include/x86_64-linux-gnu/asm/unistd_64.h to get the actual #define __NR_write 1 macros, and so on. There are standard rules for mapping arguments in order to registers. (Given in the ABI doc, IIRC).

int 80 doesn't appear in assembly code

Problem
Let's consider:
int main(){
write(1, "hello", 5);
return 0;
}
I am reading a book that suggests the assembly output for the above code should be:
main:
mov $4, %eax
mov $1 %ebx
mov %string, %ecx
mov $len, %edx
int $0x80
(The above code was compiled with 32 bit architecture. Passing arguments by registers isn't caused by '64 bit convention passing arguments by registers' but it is caused by the fact, we make a syscall. )
And the output on my 64 bit Ubuntu machine with: gcc -S main.c -m32
is:
pushl $4
pushl $string
pushl $1
call write
My doubts
So it confused me. Why did gcc compile it as "normal" call, not as syscall.
In this situation, what is the way to make the processor use a kernel function (like write)?
I am reading a book that suggests the assembly output for the above code should be ...
You shouldn't believe everything you read :-)
There is no requirement that C code be turned into specific assembly code, the only requirement that the C standard mandates is that the resulting code behave in a certain manner.
Whether that's done by directly calling the OS system call with int $80 (or sysenter), or whether it's done by calling a library routine write() which eventually calls the OS in a similar fashion, is largely irrelevant.
If you were to locate and disassemble the write() code, you may well find it simply reads those values off the stack into registers and then calls the OS in much the same way as the code you've shown containing int $80.
As an aside, what if you wanted to port gcc to a totally different architecture that uses call 5 to do OS-level system calls. If gcc is injecting specific int $80 calls into the assembly stream, that's not going to work too well.
But, if it's injecting a call to a write() function, all you have to do is make sure you link it with the correct library containing a modified write() function (one that does call 5 rather than int $80).

Is it possible to convert C to asm without link libc on Linux?

Test platform is on Linux 32 bit. (But certain solution on windows 32 bit is also welcome)
Here is a c code snippet:
int a = 0;
printf("%d\n", a);
And if I use gcc to generate assembly code
gcc -S test.c
Then I will get:
movl $0, 28(%esp)
movl 28(%esp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
leave
ret
And this assembly code needs linking to libc to work(because of the call printf)
My question is :
Is it possible to convert C to asm with only explicit using system call automatically, without using libc?
Like this:
pop ecx
add ecx,host_msg-host_reloc
mov eax,4
mov ebx,1
mov edx,host_msg_len
int 80h
mov eax,1
xor ebx,ebx
int 80h
Directly call the int 80h software interrupt.
Is it possible? If so, is there any tool on this issue?
Thank you!
Not from that source code. A call to printf() cannot be converted by the compiler to a call to the write system call - the printf() library function contains a significant amount of logic which is not present in the system call (such as processing the format string and converting integer and floating-point numbers to strings).
It is possible to generate system calls directly, but only by using inline assembly. For instance, to generate a call to _exit(0) (not quite the same as exit()!), you would write:
#include <asm/unistd.h>
...
int retval;
asm("int $0x80" : "=a" (retval) : "a" (__NR_exit_group), "b" (0) : "memory");
For more information on GCC inline assembly, particularly on the constraints I'm using here to map variables to registers, please read the GCC Inline Assembly HOWTO. It's rather old, but still perfectly relevant.
Note that doing this is not recommended. The exact calling conventions for system calls (e.g, which registers are used for the call number and arguments, how errors are returned, etc) are different on different architectures, operating systems, and even between 32-bit and 64-bit x86. Writing code this way will make it very difficult to maintain.
You can certainly compile C code to assembly without linking to libc, but you can't use the C library functions. Libc's entire purpose IS to provide the interface from C library functions to Linux system calls (or Windows, or whatever system you're on). So, if you didn't want to use libc, you would have to write your own wrappers to the system calls.
If you compile some C code which does not use any function from the C library (e.g. does not use printf or malloc etc etc....) in the free-standing mode of the GCC compiler (i.e. with -ffreestanding flag to gcc), you'll need either to call some assembler function (from some other object or library) or to use asm instruction (you won't be able to do any kind of input output without making a syscall).
Read also the Assembly HowTo, the x86 calling conventions and the ABI relevant to your kernel (probably x86-64 ABI) and understand quite well what are system calls, starting with syscalls(2) and what is the VDSO (int 80 is not the best way to make syscalls these days, SYSENTER is often better). Study the source code of some libc, in particular of MUSL libc (whose source code is very readable).
On Windows (which is not free software and which I don't know) the question could be much more difficult: I am not sure that the system call level is exactly and completely documented.
The libffi enables you to call arbitrary functions from C. You could also cast function pointers from dlsym(3). You could consider JIT techniques (e.g. libjit, GNU lightning, asmjit etc...).

Function Prologue and Epilogue removed by GCC Optimization

Taking an empty program
//demo.c
int main(void)
{
}
Compiling the program at default optimization.
gcc -S demo.c -o dasm.asm
I get the assembly output as
//Removed labels and directive which are not relevant
main:
pushl %ebp // prologue of main
movl %esp, %ebp // prologue of main
popl %ebp // epilogue of main
ret
Now Compiling the program at -O2 optimization.
gcc -O2 -S demo.c -o dasm.asm
I get the optimized assembly
main:
rep
ret
In my initial search , i found that the optimization flag -fomit-frame-pointer was responsible for removing the prologue and epilogue.
I found more information about the flag , in the gcc compiler manual.But could not understand this reason below , given by the manual , for removing the prologue and epilogue.
Don't keep the frame pointer in a register for functions that don't
need one.
Is there any other way , of putting the above reason ?
What is the reason for "rep" instruction , appearing at -02 optimization ?
Why does main function , not require a stack frame initialization ?
If the setting up of the frame pointer , is not done from within the main function , then who does this job ?
Is it done by the OS or is it the functionality of the hardware ?
Compilers are getting smart, it knew you didn't need a stack frame pointer stored in a register because whatever you put into your main() function didn't use the stack.
As for rep ret:
Here's the principle. The processor tries to fetch the next few
instructions to be executed, so that it can start the process of
decoding and executing them. It even does this with jump and return
instructions, guessing where the program will head next.
What AMD says here is that, if a ret instruction immediately follows a
conditional jump instruction, their predictor cannot figure out where
the ret instruction is going. The pre-fetching has to stop until the
ret actually executes, and only then will it be able to start looking
ahead again.
The "rep ret" trick apparently works around the problem, and lets the
predictor do its job. The "rep" has no effect on the instruction.
Source: Some forum, google a sentence to find it.
One thing to note is that just because there is no prologue it doesn't mean there is no stack, you can still push and pop with ease it's just that complex stack manipulation will be difficult.
Functions that don't have prologue/epilogue are usually dubbed naked. Hackers like to use them a lot because they don't contaminate the stack when you jmp to them, I must confess I know of no other use to them outside optimization. In Visual Studio it's done via:
__declspec(naked)

Resources