gcc: segmentation fault when compiling with nostdlib - c

I'm experimenting with entry points and got a segfault.
prog.c:
int main() {
return 0;
}
compiling and linking with:
gcc -Wall prog.c -nostdlib -c -o prog.o
ld prog.o -e main -o prog.out
objdump:
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000000b 00000000004000b0 00000000004000b0 000000b0 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .eh_frame 00000038 00000000004000c0 00000000004000c0 000000c0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .comment 0000001c 0000000000000000 0000000000000000 000000f8 2**0
CONTENTS, READONLY
Disassembly of section .text:
00000000004000b0 <main>:
4000b0: 55 push %rbp
4000b1: 48 89 e5 mov %rsp,%rbp
4000b4: b8 00 00 00 00 mov $0x0,%eax
4000b9: 5d pop %rbp
4000ba: c3 retq
Valgrind output:
Access not within mapped region at address 0x0

retq takes the return address from the top of the stack and executes from there... problem is that according to the way Linux executes binaries, the number of arguments is on the stack and execution is transferred to address 0x1 (if no arguments are given)
set some dummy arguments with gdb (set args x y z)
you can compile and link with debug info (-g) and then use gdb
set a breakpoint on the retq instruction (br *0x4000ba) and run the program
execute the final instruction and observe the SIGSEGV address correspond to the number of arguments + 1
the program should exit with a system call, not retq
see
http://eli.thegreenplace.net/2012/08/13/how-statically-linked-programs-run-on-linux/
for some useful background info

Related

Will an executable access shared-libraries' global variable via GOT?

I was learning dynamic linking recently and gave it a try:
dynamic.c
int global_variable = 10;
int XOR(int a) {
return global_variable;
}
test.c
#include <stdio.h>
extern int global_variable;
extern int XOR(int);
int main() {
global_variable = 3;
printf("%d\n", XOR(0x10));
}
The compiling commands are:
clang -shared -fPIC -o dynamic.so dynamic.c
clang -o test test.c dynamic.so
I was expecting that in executable test the main function will access global_variable via GOT. However, on the contrary, the global_variable is placed in test's data section and XOR in dynamic.so access the global_variable indirectly.
Could anyone tell me why the compiler didn't ask the test to access global_variable via GOT, but asked the shared object file to do so?
Part of the point of a shared library is that one copy gets loaded into memory, and multiple processes can access that one copy. But every program has its own copy of each of the library's variables. If they were accessed relative to the library's GOT then those would instead be shared among the processes using the library, just like the functions are.
There are other possibilities, but it is clean and consistent for each executable to provide for itself all the variables it needs. That requires the library functions to access all of its variables with static storage duration (not just external ones) indirectly, relative to the program. This is ordinary dynamic linking, just going the opposite direction from what you usually think of.
Turns out my clang produced PIC by default so it messed with results.
I will leave updated answer here, and the original can be read below it.
After digging a bit more into the topic i have noticed that compilation of test.c does not generate a .got section by itself. You can check it by compiling the executable into an object file and omitting the linking step for now (-c option):
clang -c -o test.o test.c
If you inspect the sections of resulting object file with readelf -S you will notice that there is no .got in there:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000035 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000210
0000000000000060 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000075
0000000000000004 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 00000079
0000000000000013 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 0000008c
0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.pr[...] NOTE 0000000000000000 00000090
0000000000000030 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000c0
0000000000000038 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 00000270
0000000000000018 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 000000f8
00000000000000d8 0000000000000018 12 4 8
[12] .strtab STRTAB 0000000000000000 000001d0
000000000000003e 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 00000288
0000000000000074 0000000000000000 0 0 1
This means that the entirety of .got section present in the test executable actually comes from dynamic.so, as it is PIC and uses GOT.
Would it be possible to compile dynamic.so as non-PIC as well? Turns out it apparently used to be 10 years ago (the article compiles examples to 32-bits, they dont have to work on 64 bits!). Linked article describes how a non-PIC shared library was relocated at load time - basically, every time an address that needed to be relocated after loading was present in machine code, it was instead set to zeroes and a relocation of a certain type was set in the library. During loading of the library the loader filled the zeros with actual runtime address of data/code that was needed. It is important to note that it cannot be applied in your though as 64-bit shared libraries cannot be made out of non-PIC (Source).
If you compile dynamic.so as a shared 32-bit library instead and do not use the -fPIC option (you usually need special repositories enabled to compile 32-bit code and have 32-bit libc installed):
gcc -m32 dynamic.c -shared -o dynamic.so
You will notice that:
// readelf -s dynamic.so
(... lots of output)
27: 00004010 4 OBJECT GLOBAL DEFAULT 19 global_variable
// readelf -S dynamic.so
(... lots of output)
[17] .got PROGBITS 00003ff0 002ff0 000010 04 WA 0 0 4
[18] .got.plt PROGBITS 00004000 003000 00000c 04 WA 0 0 4
[19] .data PROGBITS 0000400c 00300c 000008 00 WA 0 0 4
[20] .bss NOBITS 00004014 003014 000004 00 WA 0 0 1
global_variable is at offset 0x4010 which is inside .data section. Also, while .got is present (at offset 0x3ff0), it only contains relocations coming from other sources than your code:
// readelf -r
Offset Info Type Sym.Value Sym. Name
00003f28 00000008 R_386_RELATIVE
00003f2c 00000008 R_386_RELATIVE
0000400c 00000008 R_386_RELATIVE
00003ff0 00000106 R_386_GLOB_DAT 00000000 _ITM_deregisterTM[...]
00003ff4 00000206 R_386_GLOB_DAT 00000000 __cxa_finalize#GLIBC_2.1.3
00003ff8 00000306 R_386_GLOB_DAT 00000000 __gmon_start__
00003ffc 00000406 R_386_GLOB_DAT 00000000 _ITM_registerTMCl[...]
This article introduces GOT as part of introduction on PIC, and i have found that to be the case in plenty of places, which would suggest that indeed GOT is only used by PIC code although i am not 100% sure of it and i recommend researching the topic more.
What does this mean for you? A section in the first article i linked called "Extra credit #2" contains an explanation for a similar scenario. Although it is 10 years old, uses 32-bit code and the shared library is non-PIC it shares some similarities with your situation and might explain the problem you presented in your question.
Also keep in mind that (although similar) -fPIE and -fPIC are two separate options with slightly different effects and that if your executable during inspection is not loaded at 0x400000 then it probably is compiled as PIE without your knowledge which might also have impact on results. In the end it all boils down to what data is to be shared between processes, what data/code can be loaded at arbitrary address, what has to be loaded at fixed address etc. Hope this helps.
Also two other answers on Stack Overflow which seem relevant to me: here and here. Both the answers and comments.
Original answer:
I tried reproducing your problem with exactly the same code and compilation commands as the ones you provided, but it seems like both main and XOR use the GOT to access the global_variable. I will answer by providing example output of commands that i used to inspect the data flow. If your outputs differ from mine, it means there is some other difference between our environments (i mean a big difference, if only addresses/values are different then its ok). Best way to find that difference is for you to provide commands you originally used as well as their output.
First step is to check what address is accessed whenever a write or read to global_variable happens. For that we can use objdump -D -j .text test command to disassemble the code and look at the main function:
0000000000001150 <main>:
1150: 55 push %rbp
1151: 48 89 e5 mov %rsp,%rbp
1154: 48 8b 05 8d 2e 00 00 mov 0x2e8d(%rip),%rax # 3fe8 <global_variable>
115b: c7 00 03 00 00 00 movl $0x3,(%rax)
1161: bf 10 00 00 00 mov $0x10,%edi
1166: e8 d5 fe ff ff call 1040 <XOR#plt>
116b: 89 c6 mov %eax,%esi
116d: 48 8d 3d 90 0e 00 00 lea 0xe90(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1174: b0 00 mov $0x0,%al
1176: e8 b5 fe ff ff call 1030 <printf#plt>
117b: 31 c0 xor %eax,%eax
117d: 5d pop %rbp
117e: c3 ret
117f: 90 nop
Numbers in the first column are not absolute addresses - instead they are offsets relative to the base address at which the executable will be loaded. For the sake of explanation i will refer to them as "offsets".
The assembly at offset 0x115b and 0x1161 comes directly from the line global_variable = 3; in your code. To confirm that, you could compile the program with -g for debug symbols and invoke objdump with -S. This will display source code above corresponding assembly.
We will focus on what these two instructions are doing. First instruction is a mov of 8 bytes from a location in memory to the rax register. The location in memory is given as relative to the current rip value, offset by a constant 0x2e8d. Objdump already calculated the value for us, and it is equal to 0x3fe8. So this will take 8 bytes present in memory at the 0x3fe8 offset and store them in the rax register.
Next instruction is again a mov, the suffix l tells us that data size is 4 bytes this time. It stores a 4 byte integer with value equal to 0x3 in the location pointed to by the current value of rax (not in the rax itself! brackets around a register such as those in (%rax) signify that the location in the instruction is not the register itself, but rather where its contents are pointing to!).
To summarize, we read a pointer to a 4 byte variable from a certain location at offset 0x3fe8 and later store an immediate value of 0x3 at the location specified by said pointer. Now the question is: where does that offset of 0x3fe8 come from?
It actually comes from GOT. To show the contents of the .got section we can use the objdump -s -j .got test command. -s means we want to focus on actual raw contents of the section, without any disassembling. The output in my case is:
test: file format elf64-x86-64
Contents of section .got:
3fd0 00000000 00000000 00000000 00000000 ................
3fe0 00000000 00000000 00000000 00000000 ................
3ff0 00000000 00000000 00000000 00000000 ................
The whole section is obviously set to zero, as GOT is populated with data after loading the program into memory, but what is important is the address range. We can see that .got starts at 0x3fd0 offset and ends at 0x3ff0. This means it also includes the 0x3fe8 offset - which means the location of global_variable is indeed stored in GOT.
Another way of finding this information is to use readelf -S test to show sections of the executable file and scroll down to the .got section:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
(...lots of sections...)
[22] .got PROGBITS 0000000000003fd0 00002fd0
0000000000000030 0000000000000008 WA 0 0 8
Looking at the Address and Size columns, we can see that the section is loaded at offset 0x3fd0 in memory and its size is 0x30 - which corresponds to what objdump displayed. Note that in readelf ouput "Offset" is actually the offset into the file form which the program is loaded - not the offset in memory that we are interested in.
by issuing the same commands on the dynamic.so library we get similar results:
00000000000010f0 <XOR>:
10f0: 55 push %rbp
10f1: 48 89 e5 mov %rsp,%rbp
10f4: 89 7d fc mov %edi,-0x4(%rbp)
10f7: 48 8b 05 ea 2e 00 00 mov 0x2eea(%rip),%rax # 3fe8 <global_variable##Base-0x38>
10fe: 8b 00 mov (%rax),%eax
1100: 5d pop %rbp
1101: c3 ret
So we see that both main and XOR use GOT to find the location of global_variable.
As for the location of global_variable we need to run the program to populate GOT. For that we can use GDB. We can run our program in GDB by invoking it this way:
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:." gdb ./test
LD_LIBRARY_PATH environment variable tells linker where to look for shared objects, so we extend it to include the current directory "." so that it may find dynamic.so.
After the GDB loads our code, we may invoke break main to set up a breakpoint at main and run to run the program. The program execution should pause at the beginning of the main function, giving us a view into our executable after it was fully loaded into memory, with GOT populated.
Running disassemble main in this state will show us the actual absolute offsets into memory:
Dump of assembler code for function main:
0x0000555555555150 <+0>: push %rbp
0x0000555555555151 <+1>: mov %rsp,%rbp
=> 0x0000555555555154 <+4>: mov 0x2e8d(%rip),%rax # 0x555555557fe8
0x000055555555515b <+11>: movl $0x3,(%rax)
0x0000555555555161 <+17>: mov $0x10,%edi
0x0000555555555166 <+22>: call 0x555555555040 <XOR#plt>
0x000055555555516b <+27>: mov %eax,%esi
0x000055555555516d <+29>: lea 0xe90(%rip),%rdi # 0x555555556004
0x0000555555555174 <+36>: mov $0x0,%al
0x0000555555555176 <+38>: call 0x555555555030 <printf#plt>
0x000055555555517b <+43>: xor %eax,%eax
0x000055555555517d <+45>: pop %rbp
0x000055555555517e <+46>: ret
End of assembler dump.
(gdb)
Our 0x3fe8 offset has turned into an absolute address of equal to 0x555555557fe8. We may again check that this location comes from the .got section by issuing maintenance info sections inside GDB, which will list a long list of sections and their memory mappings. For me .got is placed in this address range:
[21] 0x555555557fd0->0x555555558000 at 0x00002fd0: .got ALLOC LOAD DATA HAS_CONTENTS
Which contains 0x555555557fe8.
To finally inspect the address of global_variable itself we may examine the contents of that memory by issuing x/xag 0x555555557fe8. Arguments xag of the x command deal with the size, format and type of data being inspected - for explanation invoke help x in GDB. On my machine the command returns:
0x555555557fe8: 0x7ffff7fc4020 <global_variable>
On your machine it may only display the address and the data, without the "<global_variable>" helper, which probably comes from an extension i have installed called pwndbg. It is ok, because the value at that address is all we need. We now know that the global_variable is located in memory under the address 0x7ffff7fc4020. Now we may issue info proc mappings in GDB to find out what address range does this address belong to. My output is pretty long, but among all the ranges listed there is one of interest to us:
0x7ffff7fc4000 0x7ffff7fc5000 0x1000 0x3000 /home/user/test_got/dynamic.so
The address is inside of that memory area, and GDB tells us that it comes from the dynamic.so library.
In case any of the outputs of said commands are different for you (change in a value is ok - i mean a fundamental difference like addresses not belonging to certain address ranges etc.) please provide more information about what exactly did you do to come to the conclusion that global_variable is stored in the .data section - what commands did you invoke and what outputs they produced.

GCC + LD + NDISASM = huge amount of assembler instructions

I'm a newbie to C and GCC compilers and trying to study how C is compiled into machine code by disassembling binaries produced, but the result of compiling and then disassembling a very simple function seems overcomplicated.
I have basic.c file:
int my_function(){
int a = 0xbaba;
int b = 0xffaa;
return a + b;
}
Then I compile it using gcc -ffreestanding -c basic.c -o basic.o
And when I dissasemble basic.o object file I get quite an expected output:
0000000000000000 <my_function>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: c7 45 fc ba ba 00 00 movl $0xbaba,-0x4(%rbp)
b: c7 45 f8 aa ff 00 00 movl $0xffaa,-0x8(%rbp)
12: 8b 55 fc mov -0x4(%rbp),%edx
15: 8b 45 f8 mov -0x8(%rbp),%eax
18: 01 d0 add %edx,%eax
1a: 5d pop %rbp
1b: c3 retq
Looks great. But then I use linker to produce raw binary: ld -o basic.bin -Ttext 0x0 --oformat binary basic.o
So after disassembling this basic.bin file with command ndisasm -b 32 basic.bin > basic.dis, I get something interesting here:
00000000 55 push ebp
00000001 48 dec eax
00000002 89E5 mov ebp,esp
00000004 C745FCBABA0000 mov dword [ebp-0x4],0xbaba
0000000B C745F8AAFF0000 mov dword [ebp-0x8],0xffaa
00000012 8B55FC mov edx,[ebp-0x4]
00000015 8B45F8 mov eax,[ebp-0x8]
00000018 01D0 add eax,edx
0000001A 5D pop ebp
0000001B C3 ret
0000001C 0000 add [eax],al
0000001E 0000 add [eax],al
00000020 1400 adc al,0x0
00000022 0000 add [eax],al
00000024 0000 add [eax],al
00000026 0000 add [eax],al
00000028 017A52 add [edx+0x52],edi
0000002B 0001 add [ecx],al
0000002D 7810 js 0x3f
0000002F 011B add [ebx],ebx
00000031 0C07 or al,0x7
00000033 08900100001C or [eax+0x1c000001],dl
00000039 0000 add [eax],al
0000003B 001C00 add [eax+eax],bl
0000003E 0000 add [eax],al
00000040 C0FFFF sar bh,byte 0xff
00000043 FF1C00 call far [eax+eax]
00000046 0000 add [eax],al
00000048 00410E add [ecx+0xe],al
0000004B 108602430D06 adc [esi+0x60d4302],al
00000051 57 push edi
00000052 0C07 or al,0x7
00000054 0800 or [eax],al
00000056 0000 add [eax],al
I don't really know where the commands like SAR, JS, DEC come from and why they are required. I guess, that's because I specify invalid arguments for compiler or linker.
As I concluded from #Michael Petch comments:
The binary representation of required function is represented by 00000000-0000001B lines of code snippet of the disassembled file and executes command ret at the end so the second part of the file (0000001B-00000056) is never executed - it's metadata.
As per #Michael Petch and #Jester comments:
I could figure out that the object file consists of many sections https://en.wikipedia.org/wiki/Object_file
The generated basic.o file originally had three sections:
.text (function itself)
.comment (not represented in the binary file)
.eh_frame
What is .eh_frame section and why GCC compiler creates it, is described here:
Why GCC compiled C program needs .eh_frame section?
By running gcc with argument -fno-asynchronous-unwind-tables I could get rid of .eh_frame section from object file.

Remove stack frame setup/initilization

I have the following program:
void main1() {
((void(*)(void)) (0xabcdefabcdef)) ();
}
I create it with the following commands:
clang -fno-stack-protector -c -static -nostdlib -fpic -fpie -O0 -fno-asynchronous-unwind-tables main.c -o shellcode.o
ld shellcode.o -o shellcode -S -static -dylib -e main1 -order_file order.txt
gobjcopy -O binary --only-section=.text shellcode shellcode.output
The assembly looks like the following:
//
// ram
// ram: 00000000-00000011
//
**************************************************************
* FUNCTION *
**************************************************************
undefined FUN_00000000()
undefined AL:1 <RETURN>
FUN_00000000
00000000 55 PUSH RBP
00000001 48 89 e5 MOV RBP,RSP
00000004 48 b8 ef MOV RAX,0xabcdefabcdef
cd ab ef
cd ab 00 00
0000000e ff d0 CALL RAX
00000010 5d POP RBP
00000011 c3 RET
How do I get clang to remove the PUSH RBP, MOV RBP,RSP and POP RBP instructions as they are unnecessary?
I can do this if I write the program in assembly with the following lines:
.globl start
start:
movq $0xabcdefabcdef, %rax
call *%rax
ret
and with the following build commands:
clang -static -nostdlib main.S -o crashme.o
gobjcopy -O binary --only-section=.text crashme.o crashme.output
and the resulting assembly:
//
// ram
// ram: 00000000-0000000c
//
assume DF = 0x0 (Default)
00000000 48 b8 ef MOV RAX,0xabcdefabcdef
cd ab ef
cd ab 00 00
0000000a ff d0 CALL RAX
0000000c c3 RET
but I would much rather write C code instead of assembly.
You forgot to enable optimization. Any optimization level like -O3 enables -fomit-frame-pointer.
It will also optimize the tailcall into a jmp instead of call/ret though. If you need to avoid that for some reason, maybe you can use -fomit-frame-pointer at the default -O0.
For shellcode you might want -Os to optimize for code size. Or even clang's -Oz; that will have a side-effect of avoiding some 0 bytes in the machine code by using push imm8 / pop reg to put small constants in registers, instead of mov reg, imm32.

gcc subtracting from esp before call

I am planning to use C to write a small kernel and I really don't want it to bloat with unnecessary instructions.
I have two C files which are called main.c and hello.c. I compile and link them using the following GCC command:
gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o
I am dumping .text section using following OBJDUMP command:
objdump -w -j .text -D -mi386 -Maddr16,data16,intel main.o
and get the following dump:
00001000 <main>:
1000: 67 66 8d 4c 24 04 lea ecx,[esp+0x4]
1006: 66 83 e4 f0 and esp,0xfffffff0
100a: 67 66 ff 71 fc push DWORD PTR [ecx-0x4]
100f: 66 55 push ebp
1011: 66 89 e5 mov ebp,esp
1014: 66 51 push ecx
1016: 66 83 ec 04 sub esp,0x4
101a: 66 e8 10 00 00 00 call 1030 <hello>
1020: 90 nop
1021: 66 83 c4 04 add esp,0x4
1025: 66 59 pop ecx
1027: 66 5d pop ebp
1029: 67 66 8d 61 fc lea esp,[ecx-0x4]
102e: 66 c3 ret
00001030 <hello>:
1030: 66 55 push ebp
1032: 66 89 e5 mov ebp,esp
1035: 90 nop
1036: 66 5d pop ebp
1038: 66 c3 ret
My questions are: Why are machine codes at the following lines being generated?
I can see that subtraction and addition completes each other, but why are they generated? I don't have any variable to be allocated on stack. I'd appreciate a source to read about usage of ECX.
1016: 66 83 ec 04 sub esp,0x4
1021: 66 83 c4 04 add esp,0x4
main.c
extern void hello();
void main(){
hello();
}
hello.c
void hello(){}
lscript.ld
SECTIONS{
.text 0x1000 : {*(.text)}
}
As I mentioned in my comments:
The first few lines (plus the push ecx) are to ensure the stack is aligned on a 16-byte boundary which is required by the Linux System V i386 ABI. The pop ecx and lea before the ret in main is to undo that alignment work.
#RossRidge has provided a link to another Stackoverflow answer that details this quite well.
In this case you seem to be doing real mode development. GCC isn't well suited for this but it can work and I will assume you know what you are doing. I mention some of the pitfalls of using -m16 in this Stackoverflow answer. I put this warning in that answer regarding real mode development with GCC:
There are so many pitfalls in doing this that I recommend against it.
If you remain undeterred and wish to continue forward you can do a few things to minimize the code. The 16-byte alignment of the stack at the point a function call is made is part of the more recent Linux System V i386 ABIs. Since you are generating code for a non-Linux environment you can change the stack alignment to 4 using compiler option -mpreferred-stack-boundary=2 . The GCC manual says:
-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).
If we add that to your GCC command we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2:
00001000 <main>:
1000: 66 55 push ebp
1002: 66 89 e5 mov ebp,esp
1005: 66 e8 04 00 00 00 call 100f <hello>
100b: 66 5d pop ebp
100d: 66 c3 ret
0000100f <hello>:
100f: 66 55 push ebp
1011: 66 89 e5 mov ebp,esp
1014: 66 5d pop ebp
1016: 66 c3 ret
Now all the extra alignment code to get it on a 16-byte boundary has disappeared. We are left with typical function frame pointer prologue and epilogue code. This is often in the form of push ebp and mov ebp,esp pop ebp. we can remove these with the -fomit-frame-pointer define in the GCC manual as:
The option -fomit-frame-pointer removes the frame pointer for all functions which might make debugging harder.
If we add that option we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2 -fomit-frame-pointer:
00001000 <main>:
1000: 66 e8 02 00 00 00 call 1008 <hello>
1006: 66 c3 ret
00001008 <hello>:
1008: 66 c3 ret
You can then optimize for size with -Os. The GCC manual says this:
-Os
Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
This has a side effect that main will be placed into a section called .text.startup. If we display both with objdump -w -j .text -j .text.startup -D -mi386 -Maddr16,data16,intel main.o we get:
Disassembly of section .text:
00001000 <hello>:
1000: 66 c3 ret
Disassembly of section .text.startup:
00001002 <main>:
1002: e9 fb ff jmp 1000 <hello>
If you have functions in separate objects you can alter the calling convention so the first 3 Integer class parameters are passed in registers rather than the stack. The Linux kernel uses this method as well. Information on this can be found in the GCC documentation:
regparm (number)
On the Intel 386, the regparm attribute causes the compiler to pass arguments number one to number if they are of integral type in registers EAX, EDX, and ECX instead of on the stack. Functions that take a variable number of arguments will continue to be passed all of their arguments on the stack.
I wrote a Stackoverflow answer with code that uses __attribute__((regparm(3))) that may be a useful source of further information.
Other Suggestions
I recommend you consider compiling each object individually rather than altogether. This is also advantageous since it can be more easily be done in a Makefile later on.
If we look at your command line with the extra options mentioned above you'd have:
gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o \
-mpreferred-stack-boundary=2 -fomit-frame-pointer -Os
I recommend you do it this way:
gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
-fomit-frame-pointer main.c -o main.o
gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
-fomit-frame-pointer hello.c -o hello.o
The -c option (I added it to the beginning) forces the compiler to just generate the object file from the source and not to perform linking. You will also notice the -T lscript.ld has been removed. We have created .o files above. We can now use GCC to link all of them together:
gcc -ffreestanding -nostdlib -Wl,--build-id=none -m16 \
-Tlscript.ld main.o hello.o -o main.elf
The -ffreestanding will force the linker to not use the C runtime, the -Wl,--build-id=none will tell the compiler not to generate some noise in the executable for build notes. In order for this to really work you'll need a slightly more complex linker script that places the .text.startup before .text. This script also adds the .data section, the .rodata and .bss sections. The DISCARD option removes exception handling data and other unneeded information.
ENTRY(main)
SECTIONS{
.text 0x1000 : SUBALIGN(4) {
*(.text.startup);
*(.text);
}
.data : SUBALIGN(4) {
*(.data);
*(.rodata);
}
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss);
}
. = ALIGN(4);
__bss_end = .;
/DISCARD/ : {
*(.eh_frame);
*(.comment);
*(.note.gnu.build-id);
}
}
If we look at a complete OBJDUMP with objdump -w -D -mi386 -Maddr16,data16,intel main.elf we would see:
Disassembly of section .text:
00001000 <main>:
1000: e9 01 00 jmp 1004 <hello>
1003: 90 nop
00001004 <hello>:
1004: 66 c3 ret
If you want to convert main.elf to a binary file that you can place in a disk image and read it (ie. via BIOS interrupt 0x13), you can create it this way:
objcopy -O binary main.elf main.bin
If you dump main.bin with NDISASM using ndisasm -b16 -o 0x1000 main.bin you'd see:
00001000 E90100 jmp word 0x1004
00001003 90 nop
00001004 66C3 o32 ret
Cross Compiler
I can't stress this enough but you should consider using a GCC cross compiler. The OSDev Wiki has information on building one. It also has this to say about why:
Why do I need a Cross Compiler?
You need to use a cross-compiler unless you are developing on your own operating system. The compiler must know the correct target platform (CPU, operating system), otherwise you will run into trouble. If you use the compiler that comes with your system, then the compiler won't know it is compiling something else entirely. Some tutorials suggest using your system compiler and passing a lot of problematic options to the compiler. This will certainly give you a lot of problems in the future and the solution is build a cross-compiler.

C Standard Library Functions vs. System Calls. Which is `open()`?

I know fopen() is in the C standard library, so that I can definitely call the fopen() function in a C program. What I am confused about is why I can call the open() function as well. open() should be a system call, so it is not a C function in the standard library. As I am successfully able to call the open() function, am I calling a C function or a system call?
EJP's comments to the question and Steve Summit's answer are exactly to the point: open() is both a syscall and a function in the standard C library; fopen() is a function in the standard C library, that sets up a file handle -- a data structure of type FILE that contains additional stuff like optional buffering --, and internally calls open() also.
In the hopes to further understanding, I shall show hello.c, an example Hello world -program written in C for Linux on 64-bit x86 (x86-64 AKA AMD64 architecture), which does not use the standard C library at all.
First, hello.c needs to define some macros with inline assembly for us to be able to call the syscalls. These are very architecture- and operating system dependent, which is why this only works in Linux on x86-64 architecture:
/* Freestanding Hello World example in Linux on x86_64/x86.
* Compile using
* gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello
*/
#define STDOUT_FILENO 1
#define EXIT_SUCCESS 0
#ifndef __x86_64__
#error This program only works on x86_64 architecture!
#endif
#define SYS_write 1
#define SYS_exit 60
#define SYSCALL1_NORET(nr, arg1) \
__asm__ ( "syscall\n\t" \
: \
: "a" (nr), "D" (arg1) \
: "rcx", "r11" )
#define SYSCALL3(retval, nr, arg1, arg2, arg3) \
__asm__ ( "syscall\n\t" \
: "=a" (retval) \
: "a" (nr), "D" (arg1), "S" (arg2), "d" (arg3) \
: "rcx", "r11" )
The Freestanding in the comment at the beginning of the file refers to "freestanding execution environment"; it is the case when there is no C library available at all. For example, the Linux kernel is written the same way. The normal environment we are familiar with is called "hosted execution environment", by the way.
Next, we can define two functions, or "wrappers", around the syscalls:
static inline void my_exit(int retval)
{
SYSCALL1_NORET(SYS_exit, retval);
}
static inline int my_write(int fd, const void *data, int len)
{
int retval;
if (fd == -1 || !data || len < 0)
return -1;
SYSCALL3(retval, SYS_write, fd, data, len);
if (retval < 0)
return -1;
return retval;
}
Above, my_exit() is roughly equivalent to C standard library exit() function, and my_write() to write().
The C language does not define any kind of a way to do a syscall, so that is why we always need a "wrapper" function of some sort. (The GNU C library does provide a syscall() function for us to do any syscall we wish -- but the point of this example is to not use the C library at all.)
The wrapper functions always involve a bit of (inline) assembly. Again, since C does not have a built-in way to do a syscall, we need to "extend" the language by adding some assembly code. This (inline) assembly, and the syscall numbers, is what makes this example, operating system and architecture dependent. And yes: the GNU C library, for example, contains the equivalent wrappers for quite a few architectures.
Some of the functions in the C library do not use any syscalls. We also need one, the equivalent of strlen():
static inline int my_strlen(const char *str)
{
int len = 0L;
if (!str)
return -1;
while (*str++)
len++;
return len;
}
Note that there is no NULL used anywhere in the above code. It is because it is a macro defined by the C library. Instead, I'm relying on "logical null": (!pointer) is true if and only if pointer is a zero pointer, which is what NULL is on all architectures in Linux. I could have defined NULL myself, but I didn't, in the hopes that somebody might notice the lack of it.
Finally, main() itself is something the GNU C library calls, as in Linux, the actual start point of the binary is called _start. The _start is provided by the hosted runtime environment, and initializes the C library data structures and does other similar preparations. Our example program is so simple we do not need it, so we can just put our simple main program part into _start instead:
void _start(void)
{
const char *msg = "Hello, world!\n";
my_write(STDOUT_FILENO, msg, my_strlen(msg));
my_exit(EXIT_SUCCESS);
}
If you put all of the above together, and compile it using
gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello
per the comment at the start of the file, you will end up with a small (about two kilobytes) static binary, that when run,
./hello
outputs
Hello, world!
You can use file hello to examine the contents of the file. You could run strip hello to remove all (unneeded) symbols, reducing the file size further down to about one and a half kilobytes, if file size was really important. (It will make the object dump less interesting, however, so before you do that, check out the next step first.)
We can use objdump -x hello to examine the sections in the file:
hello: file format elf64-x86-64
hello
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004001e1
Program Header:
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
filesz 0x00000000000002f0 memsz 0x00000000000002f0 flags r-x
NOTE off 0x0000000000000120 vaddr 0x0000000000400120 paddr 0x0000000000400120 align 2**2
filesz 0x0000000000000024 memsz 0x0000000000000024 flags r--
EH_FRAME off 0x000000000000022c vaddr 0x000000000040022c paddr 0x000000000040022c align 2**2
filesz 0x000000000000002c memsz 0x000000000000002c flags r--
STACK off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-
Sections:
Idx Name Size VMA LMA File off Algn
0 .note.gnu.build-id 00000024 0000000000400120 0000000000400120 00000120 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 000000d9 0000000000400144 0000000000400144 00000144 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .rodata 0000000f 000000000040021d 000000000040021d 0000021d 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .eh_frame_hdr 0000002c 000000000040022c 000000000040022c 0000022c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .eh_frame 00000098 0000000000400258 0000000000400258 00000258 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .comment 00000034 0000000000000000 0000000000000000 000002f0 2**0
CONTENTS, READONLY
SYMBOL TABLE:
0000000000400120 l d .note.gnu.build-id 0000000000000000 .note.gnu.build-id
0000000000400144 l d .text 0000000000000000 .text
000000000040021d l d .rodata 0000000000000000 .rodata
000000000040022c l d .eh_frame_hdr 0000000000000000 .eh_frame_hdr
0000000000400258 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l df *ABS* 0000000000000000 hello.c
0000000000400144 l F .text 0000000000000016 my_exit
000000000040015a l F .text 000000000000004e my_write
00000000004001a8 l F .text 0000000000000039 my_strlen
0000000000000000 l df *ABS* 0000000000000000
000000000040022c l .eh_frame_hdr 0000000000000000 __GNU_EH_FRAME_HDR
00000000004001e1 g F .text 000000000000003c _start
0000000000601000 g .eh_frame 0000000000000000 __bss_start
0000000000601000 g .eh_frame 0000000000000000 _edata
0000000000601000 g .eh_frame 0000000000000000 _end
The .text section contains our code, and .rodata immutable constants; here, just the Hello, world! string literal. The rest of the sections are stuff the linker adds and the system uses. We can see that we have f(hex) = 15 bytes of read-only data, and d9(hex) = 217 bytes of code; the rest of the file (about a kilobyte or so) is ELF stuff added by the linker for the kernel to use when executing this binary.
We can even examine the actual assembly code contained in hello, by running objdump -d hello:
hello: file format elf64-x86-64
Disassembly of section .text:
0000000000400144 <my_exit>:
400144: 55 push %rbp
400145: 48 89 e5 mov %rsp,%rbp
400148: 89 7d fc mov %edi,-0x4(%rbp)
40014b: b8 3c 00 00 00 mov $0x3c,%eax
400150: 8b 55 fc mov -0x4(%rbp),%edx
400153: 89 d7 mov %edx,%edi
400155: 0f 05 syscall
400157: 90 nop
400158: 5d pop %rbp
400159: c3 retq
000000000040015a <my_write>:
40015a: 55 push %rbp
40015b: 48 89 e5 mov %rsp,%rbp
40015e: 89 7d ec mov %edi,-0x14(%rbp)
400161: 48 89 75 e0 mov %rsi,-0x20(%rbp)
400165: 89 55 e8 mov %edx,-0x18(%rbp)
400168: 83 7d ec ff cmpl $0xffffffff,-0x14(%rbp)
40016c: 74 0d je 40017b <my_write+0x21>
40016e: 48 83 7d e0 00 cmpq $0x0,-0x20(%rbp)
400173: 74 06 je 40017b <my_write+0x21>
400175: 83 7d e8 00 cmpl $0x0,-0x18(%rbp)
400179: 79 07 jns 400182 <my_write+0x28>
40017b: b8 ff ff ff ff mov $0xffffffff,%eax
400180: eb 24 jmp 4001a6 <my_write+0x4c>
400182: b8 01 00 00 00 mov $0x1,%eax
400187: 8b 7d ec mov -0x14(%rbp),%edi
40018a: 48 8b 75 e0 mov -0x20(%rbp),%rsi
40018e: 8b 55 e8 mov -0x18(%rbp),%edx
400191: 0f 05 syscall
400193: 89 45 fc mov %eax,-0x4(%rbp)
400196: 83 7d fc 00 cmpl $0x0,-0x4(%rbp)
40019a: 79 07 jns 4001a3 <my_write+0x49>
40019c: b8 ff ff ff ff mov $0xffffffff,%eax
4001a1: eb 03 jmp 4001a6 <my_write+0x4c>
4001a3: 8b 45 fc mov -0x4(%rbp),%eax
4001a6: 5d pop %rbp
4001a7: c3 retq
00000000004001a8 <my_strlen>:
4001a8: 55 push %rbp
4001a9: 48 89 e5 mov %rsp,%rbp
4001ac: 48 89 7d e8 mov %rdi,-0x18(%rbp)
4001b0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4001b7: 48 83 7d e8 00 cmpq $0x0,-0x18(%rbp)
4001bc: 75 0b jne 4001c9 <my_strlen+0x21>
4001be: b8 ff ff ff ff mov $0xffffffff,%eax
4001c3: eb 1a jmp 4001df <my_strlen+0x37>
4001c5: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4001c9: 48 8b 45 e8 mov -0x18(%rbp),%rax
4001cd: 48 8d 50 01 lea 0x1(%rax),%rdx
4001d1: 48 89 55 e8 mov %rdx,-0x18(%rbp)
4001d5: 0f b6 00 movzbl (%rax),%eax
4001d8: 84 c0 test %al,%al
4001da: 75 e9 jne 4001c5 <my_strlen+0x1d>
4001dc: 8b 45 fc mov -0x4(%rbp),%eax
4001df: 5d pop %rbp
4001e0: c3 retq
00000000004001e1 <_start>:
4001e1: 55 push %rbp
4001e2: 48 89 e5 mov %rsp,%rbp
4001e5: 48 83 ec 10 sub $0x10,%rsp
4001e9: 48 c7 45 f8 1d 02 40 movq $0x40021d,-0x8(%rbp)
4001f0: 00
4001f1: 48 8b 45 f8 mov -0x8(%rbp),%rax
4001f5: 48 89 c7 mov %rax,%rdi
4001f8: e8 ab ff ff ff callq 4001a8 <my_strlen>
4001fd: 89 c2 mov %eax,%edx
4001ff: 48 8b 45 f8 mov -0x8(%rbp),%rax
400203: 48 89 c6 mov %rax,%rsi
400206: bf 01 00 00 00 mov $0x1,%edi
40020b: e8 4a ff ff ff callq 40015a <my_write>
400210: bf 00 00 00 00 mov $0x0,%edi
400215: e8 2a ff ff ff callq 400144 <my_exit>
40021a: 90 nop
40021b: c9 leaveq
40021c: c3 retq
The assembly itself is not really that interesting, except that in my_write and my_exit you can see how the inline assembly generated by the SYSCALL...() macro just loads the variables into specific registers, and does the "do syscall" -- which just happens to be an x86-64 assembly instruction also called syscall here; in 32-bit x86 architecture, it is int $80, and yet something else in other architectures.
There is a final wrinkle, related to the reason why I used the prefix my_ for the functions analog to the functions in the C library: the C compiler can provide optimized shortcuts for some C library functions. For GCC, these are listed here; the list includes strlen().
This means we do not actually need the my_strlen() function, because we can use the optimized __builtin_strlen() function GCC provides, even in freestanding environment. The built-ins are usually very optimized; in the case of __builtin_strlen() on x86-64 using GCC-5.4.0, it optimizes to just a couple of register loads and a repnz scasb %es:(%rdi),%al instruction (which looks long, but actually takes just two bytes).
In other words, the final wrinkle is that there is a third type of function, compiler built-ins, that are provided by the compiler (but otherwise just like the functions provided by the C library) in optimized form, depending on the compiler options and architecture used.
If we were to expand the above example so that we'd open a file and write the Hello, world! into it, and compare low-level unistd.h (open()/write()/close()) and standard I/O stdio.h (fopen()/puts()/fclose()) approaches, we'd find that the major difference is in that the FILE handle used by the standard I/O approach contains a lot of extra stuff (that makes the standard file handles quite versatile, just not useful in such a trivial example), most visible in the buffering approach it has. On the assembly level, we'd still see the same syscalls -- open, write, close -- used.
Even though at first glance the ELF format (used for binaries in Linux) contains a lot of "unneeded stuff" (about a kilobyte for our example program above), it is actually a very powerful format. It, and the dynamic loader in Linux, provides a way to auto-load libraries when a program starts (using LD_PRELOAD environment variable), and to interpose functions in other libraries -- essentially, replace them with new ones, but with a way to still be able to call the original interposed version of the function. There are lots of useful tricks, fixes, experiments, and debugging methods these allow.
Although the distinction between "system call" and "library function" can be a useful one to keep in mind, there's the issue that you have to be able to call system calls somehow. In general, then, every system call is present in the C library -- as a thin little library function that does nothing but make the transfer to the system call (however that's implemented).
So, yes, you can call open() from C code if you want to. (And somewhere, perhaps in a file called fopen.c, the author of your C library probably called it too, within the implementation of fopen().)
The starting point for answering your question is to ask another question: What is a system call?
Generally, one thinks of a system call as a procedure that executes at an elevated processor privilege level. Generally, this means switching from user mode to kernel mode (some systems use multiple modes).
The mechanism for and application to enter kernel mode depends upon the system (and one Intel there are multiple ways). The general sequence for invoking a system service is the process executes an instruction that triggers a change processor mode exception. The CPU responds to the exception by invoking the appropriate exception/interrupt handler then dispatches to the appropriate operating system service.
The problem for C programming is that invoking a system service requires executing a specific hardware instruction and setting hardware register values. Operating systems provide wrapper functions that that handle the packing of parameters into registers, triggering the exception, then unpacking the return values from registers.
The open() function usually be a wrapper for high level languages to invoke system services. If you think about, fopen() is generally a "wrapper" for open().
So what we normally think of as a system call is a function that does nothing other than invoke a system service.

Resources