Will an executable access shared-libraries' global variable via GOT?

Will an executable access shared-libraries' global variable via GOT? - c

I was learning dynamic linking recently and gave it a try:
dynamic.c
int global_variable = 10;
int XOR(int a) {
return global_variable;
}
test.c
#include <stdio.h>
extern int global_variable;
extern int XOR(int);
int main() {
global_variable = 3;
printf("%d\n", XOR(0x10));
}
The compiling commands are:
clang -shared -fPIC -o dynamic.so dynamic.c
clang -o test test.c dynamic.so
I was expecting that in executable test the main function will access global_variable via GOT. However, on the contrary, the global_variable is placed in test's data section and XOR in dynamic.so access the global_variable indirectly.
Could anyone tell me why the compiler didn't ask the test to access global_variable via GOT, but asked the shared object file to do so?

Part of the point of a shared library is that one copy gets loaded into memory, and multiple processes can access that one copy. But every program has its own copy of each of the library's variables. If they were accessed relative to the library's GOT then those would instead be shared among the processes using the library, just like the functions are.
There are other possibilities, but it is clean and consistent for each executable to provide for itself all the variables it needs. That requires the library functions to access all of its variables with static storage duration (not just external ones) indirectly, relative to the program. This is ordinary dynamic linking, just going the opposite direction from what you usually think of.

Turns out my clang produced PIC by default so it messed with results.
I will leave updated answer here, and the original can be read below it.
After digging a bit more into the topic i have noticed that compilation of test.c does not generate a .got section by itself. You can check it by compiling the executable into an object file and omitting the linking step for now (-c option):
clang -c -o test.o test.c
If you inspect the sections of resulting object file with readelf -S you will notice that there is no .got in there:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000035 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000210
0000000000000060 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000075
0000000000000004 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 00000079
0000000000000013 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 0000008c
0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.pr[...] NOTE 0000000000000000 00000090
0000000000000030 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000c0
0000000000000038 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 00000270
0000000000000018 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 000000f8
00000000000000d8 0000000000000018 12 4 8
[12] .strtab STRTAB 0000000000000000 000001d0
000000000000003e 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 00000288
0000000000000074 0000000000000000 0 0 1
This means that the entirety of .got section present in the test executable actually comes from dynamic.so, as it is PIC and uses GOT.
Would it be possible to compile dynamic.so as non-PIC as well? Turns out it apparently used to be 10 years ago (the article compiles examples to 32-bits, they dont have to work on 64 bits!). Linked article describes how a non-PIC shared library was relocated at load time - basically, every time an address that needed to be relocated after loading was present in machine code, it was instead set to zeroes and a relocation of a certain type was set in the library. During loading of the library the loader filled the zeros with actual runtime address of data/code that was needed. It is important to note that it cannot be applied in your though as 64-bit shared libraries cannot be made out of non-PIC (Source).
If you compile dynamic.so as a shared 32-bit library instead and do not use the -fPIC option (you usually need special repositories enabled to compile 32-bit code and have 32-bit libc installed):
gcc -m32 dynamic.c -shared -o dynamic.so
You will notice that:
// readelf -s dynamic.so
(... lots of output)
27: 00004010 4 OBJECT GLOBAL DEFAULT 19 global_variable
// readelf -S dynamic.so
(... lots of output)
[17] .got PROGBITS 00003ff0 002ff0 000010 04 WA 0 0 4
[18] .got.plt PROGBITS 00004000 003000 00000c 04 WA 0 0 4
[19] .data PROGBITS 0000400c 00300c 000008 00 WA 0 0 4
[20] .bss NOBITS 00004014 003014 000004 00 WA 0 0 1
global_variable is at offset 0x4010 which is inside .data section. Also, while .got is present (at offset 0x3ff0), it only contains relocations coming from other sources than your code:
// readelf -r
Offset Info Type Sym.Value Sym. Name
00003f28 00000008 R_386_RELATIVE
00003f2c 00000008 R_386_RELATIVE
0000400c 00000008 R_386_RELATIVE
00003ff0 00000106 R_386_GLOB_DAT 00000000 _ITM_deregisterTM[...]
00003ff4 00000206 R_386_GLOB_DAT 00000000 __cxa_finalize#GLIBC_2.1.3
00003ff8 00000306 R_386_GLOB_DAT 00000000 __gmon_start__
00003ffc 00000406 R_386_GLOB_DAT 00000000 _ITM_registerTMCl[...]
This article introduces GOT as part of introduction on PIC, and i have found that to be the case in plenty of places, which would suggest that indeed GOT is only used by PIC code although i am not 100% sure of it and i recommend researching the topic more.
What does this mean for you? A section in the first article i linked called "Extra credit #2" contains an explanation for a similar scenario. Although it is 10 years old, uses 32-bit code and the shared library is non-PIC it shares some similarities with your situation and might explain the problem you presented in your question.
Also keep in mind that (although similar) -fPIE and -fPIC are two separate options with slightly different effects and that if your executable during inspection is not loaded at 0x400000 then it probably is compiled as PIE without your knowledge which might also have impact on results. In the end it all boils down to what data is to be shared between processes, what data/code can be loaded at arbitrary address, what has to be loaded at fixed address etc. Hope this helps.
Also two other answers on Stack Overflow which seem relevant to me: here and here. Both the answers and comments.
Original answer:
I tried reproducing your problem with exactly the same code and compilation commands as the ones you provided, but it seems like both main and XOR use the GOT to access the global_variable. I will answer by providing example output of commands that i used to inspect the data flow. If your outputs differ from mine, it means there is some other difference between our environments (i mean a big difference, if only addresses/values are different then its ok). Best way to find that difference is for you to provide commands you originally used as well as their output.
First step is to check what address is accessed whenever a write or read to global_variable happens. For that we can use objdump -D -j .text test command to disassemble the code and look at the main function:
0000000000001150 <main>:
1150: 55 push %rbp
1151: 48 89 e5 mov %rsp,%rbp
1154: 48 8b 05 8d 2e 00 00 mov 0x2e8d(%rip),%rax # 3fe8 <global_variable>
115b: c7 00 03 00 00 00 movl $0x3,(%rax)
1161: bf 10 00 00 00 mov $0x10,%edi
1166: e8 d5 fe ff ff call 1040 <XOR#plt>
116b: 89 c6 mov %eax,%esi
116d: 48 8d 3d 90 0e 00 00 lea 0xe90(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1174: b0 00 mov $0x0,%al
1176: e8 b5 fe ff ff call 1030 <printf#plt>
117b: 31 c0 xor %eax,%eax
117d: 5d pop %rbp
117e: c3 ret
117f: 90 nop
Numbers in the first column are not absolute addresses - instead they are offsets relative to the base address at which the executable will be loaded. For the sake of explanation i will refer to them as "offsets".
The assembly at offset 0x115b and 0x1161 comes directly from the line global_variable = 3; in your code. To confirm that, you could compile the program with -g for debug symbols and invoke objdump with -S. This will display source code above corresponding assembly.
We will focus on what these two instructions are doing. First instruction is a mov of 8 bytes from a location in memory to the rax register. The location in memory is given as relative to the current rip value, offset by a constant 0x2e8d. Objdump already calculated the value for us, and it is equal to 0x3fe8. So this will take 8 bytes present in memory at the 0x3fe8 offset and store them in the rax register.
Next instruction is again a mov, the suffix l tells us that data size is 4 bytes this time. It stores a 4 byte integer with value equal to 0x3 in the location pointed to by the current value of rax (not in the rax itself! brackets around a register such as those in (%rax) signify that the location in the instruction is not the register itself, but rather where its contents are pointing to!).
To summarize, we read a pointer to a 4 byte variable from a certain location at offset 0x3fe8 and later store an immediate value of 0x3 at the location specified by said pointer. Now the question is: where does that offset of 0x3fe8 come from?
It actually comes from GOT. To show the contents of the .got section we can use the objdump -s -j .got test command. -s means we want to focus on actual raw contents of the section, without any disassembling. The output in my case is:
test: file format elf64-x86-64
Contents of section .got:
3fd0 00000000 00000000 00000000 00000000 ................
3fe0 00000000 00000000 00000000 00000000 ................
3ff0 00000000 00000000 00000000 00000000 ................
The whole section is obviously set to zero, as GOT is populated with data after loading the program into memory, but what is important is the address range. We can see that .got starts at 0x3fd0 offset and ends at 0x3ff0. This means it also includes the 0x3fe8 offset - which means the location of global_variable is indeed stored in GOT.
Another way of finding this information is to use readelf -S test to show sections of the executable file and scroll down to the .got section:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
(...lots of sections...)
[22] .got PROGBITS 0000000000003fd0 00002fd0
0000000000000030 0000000000000008 WA 0 0 8
Looking at the Address and Size columns, we can see that the section is loaded at offset 0x3fd0 in memory and its size is 0x30 - which corresponds to what objdump displayed. Note that in readelf ouput "Offset" is actually the offset into the file form which the program is loaded - not the offset in memory that we are interested in.
by issuing the same commands on the dynamic.so library we get similar results:
00000000000010f0 <XOR>:
10f0: 55 push %rbp
10f1: 48 89 e5 mov %rsp,%rbp
10f4: 89 7d fc mov %edi,-0x4(%rbp)
10f7: 48 8b 05 ea 2e 00 00 mov 0x2eea(%rip),%rax # 3fe8 <global_variable##Base-0x38>
10fe: 8b 00 mov (%rax),%eax
1100: 5d pop %rbp
1101: c3 ret
So we see that both main and XOR use GOT to find the location of global_variable.
As for the location of global_variable we need to run the program to populate GOT. For that we can use GDB. We can run our program in GDB by invoking it this way:
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:." gdb ./test
LD_LIBRARY_PATH environment variable tells linker where to look for shared objects, so we extend it to include the current directory "." so that it may find dynamic.so.
After the GDB loads our code, we may invoke break main to set up a breakpoint at main and run to run the program. The program execution should pause at the beginning of the main function, giving us a view into our executable after it was fully loaded into memory, with GOT populated.
Running disassemble main in this state will show us the actual absolute offsets into memory:
Dump of assembler code for function main:
0x0000555555555150 <+0>: push %rbp
0x0000555555555151 <+1>: mov %rsp,%rbp
=> 0x0000555555555154 <+4>: mov 0x2e8d(%rip),%rax # 0x555555557fe8
0x000055555555515b <+11>: movl $0x3,(%rax)
0x0000555555555161 <+17>: mov $0x10,%edi
0x0000555555555166 <+22>: call 0x555555555040 <XOR#plt>
0x000055555555516b <+27>: mov %eax,%esi
0x000055555555516d <+29>: lea 0xe90(%rip),%rdi # 0x555555556004
0x0000555555555174 <+36>: mov $0x0,%al
0x0000555555555176 <+38>: call 0x555555555030 <printf#plt>
0x000055555555517b <+43>: xor %eax,%eax
0x000055555555517d <+45>: pop %rbp
0x000055555555517e <+46>: ret
End of assembler dump.
(gdb)
Our 0x3fe8 offset has turned into an absolute address of equal to 0x555555557fe8. We may again check that this location comes from the .got section by issuing maintenance info sections inside GDB, which will list a long list of sections and their memory mappings. For me .got is placed in this address range:
[21] 0x555555557fd0->0x555555558000 at 0x00002fd0: .got ALLOC LOAD DATA HAS_CONTENTS
Which contains 0x555555557fe8.
To finally inspect the address of global_variable itself we may examine the contents of that memory by issuing x/xag 0x555555557fe8. Arguments xag of the x command deal with the size, format and type of data being inspected - for explanation invoke help x in GDB. On my machine the command returns:
0x555555557fe8: 0x7ffff7fc4020 <global_variable>
On your machine it may only display the address and the data, without the "<global_variable>" helper, which probably comes from an extension i have installed called pwndbg. It is ok, because the value at that address is all we need. We now know that the global_variable is located in memory under the address 0x7ffff7fc4020. Now we may issue info proc mappings in GDB to find out what address range does this address belong to. My output is pretty long, but among all the ranges listed there is one of interest to us:
0x7ffff7fc4000 0x7ffff7fc5000 0x1000 0x3000 /home/user/test_got/dynamic.so
The address is inside of that memory area, and GDB tells us that it comes from the dynamic.so library.
In case any of the outputs of said commands are different for you (change in a value is ok - i mean a fundamental difference like addresses not belonging to certain address ranges etc.) please provide more information about what exactly did you do to come to the conclusion that global_variable is stored in the .data section - what commands did you invoke and what outputs they produced.

Related

How Does C Compiled to ASM Know Where to Branch to for External Functions?

How does C compiled to ARM ASM know where to branch to for external functions?
For example, here is a simple C program:
#include <stdio.h>
int main() {
printf("Hello World!");
return 0;
}
and its corresponding ARM ASM program:
.arch armv6
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "main.c"
.text
.section .rodata
.align 2
.LC0:
.ascii "Hello World!\000"
.text
.align 2
.global main
.arch armv6
.syntax unified
.arm
.fpu vfp
.type main, %function
main:
# args = 0, pretend = 0, frame = 0
# frame_needed = 1, uses_anonymous_args = 0
push {fp, lr}
add fp, sp, #4
ldr r0, .L3
bl printf
mov r3, #0
mov r0, r3
pop {fp, pc}
.L4:
.align 2
.L3:
.word .LC0
.size main, .-main
.ident "GCC: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110"
.section .note.GNU-stack,"",%progbits
I dont see a "printf" tag anywhere so i am assuming that it links outside of the program. but how does it know where to search? it wouldnt look everywhere, because there might be duplicate tags, but there are also libraries that are placed (in the computers perspective) at random, though i also dont see anywhere where it defines a library location.
so where does it link, for more than just the standard C library?
and how can i compile it to not rely on those external dependencies?
or know where the libraries are so i know which files i can delete?
i am currently operating linux on a raspberry pi 400

My computer has an x86_64 processor, but the principle is the same. I'm using gcc 9.3.0.
I copied your code into a file called main.c and compiled it to assembly with gcc -S main.c. It produced the file main.s with the following contents:
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello World!"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf#PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
.section .note.GNU-stack,"",#progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
There are a lot of assembler directives here that can make it confusing to read, so I assembled it into an object file (gcc -c main.s) and then ran objdump -d main.o to disassemble it. Here is the output of the disassembly:
main.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # f <main+0xf>
f: b8 00 00 00 00 mov $0x0,%eax
14: e8 00 00 00 00 callq 19 <main+0x19>
19: b8 00 00 00 00 mov $0x0,%eax
1e: 5d pop %rbp
1f: c3 retq
The first three instructions here are boilerplate, so we'll ignore them. The first interesting instruction is
lea 0x0(%rip),%rdi
This is meant to load the address of the "Hello World!" string into register %rdi. Confusingly, it appears to simply be copying %rip into %rdi.
The next instruction puts a 0 into register %eax. I actually don't know why this is, but it's not really relevant to this discussion.
Then comes the actual call to printf:
callq 19 <main+0x19>
Once again, this uses an address that doesn't seem correct. You may notice that address 0x19 actually points to the next instruction.
The next 3 instructions basically perform the final return 0.
To really answer your question we need to look at more than just assembly code. At this point I would recommend taking some time to research the format of ELF files. I would consider that topic to be beyond the scope of this answer, but it will help you understand what I'm about to show you.
I first want to point out that in both your assembly and mine, the "Hello World!" string is preceded by this directive:
.section .rodata
whereas the main function is preceded by
.text
which is shorthand for
.section .text
These directives instruct the assembler on how to arrange the code and data in the object file. You can see this by printing the section headers of the object file:
$ readelf -S main.o
There are 14 section headers, starting at offset 0x318:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000020 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000258
0000000000000030 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000060
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000060
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000060
000000000000000d 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 0000006d
000000000000002b 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 00000098
0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.propert NOTE 0000000000000000 00000098
0000000000000020 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000b8
0000000000000038 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 00000288
0000000000000018 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 000000f0
0000000000000138 0000000000000018 12 10 8
[12] .strtab STRTAB 0000000000000000 00000228
000000000000002a 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 000002a0
0000000000000074 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
If you can figure out how to read this output, you will see that the .text section is 0x20 bytes in size (which matches the above disassembly output), and the .rodata section is 0xd (13) bytes in size (i.e. strlen("Hello World!") plus a null byte). The answer your question, however, is in the relocation data:
$ readelf -r main.o
Relocation section '.rela.text' at offset 0x258 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000b 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
000000000015 000c00000004 R_X86_64_PLT32 0000000000000000 printf - 4
Relocation section '.rela.eh_frame' at offset 0x288 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
This output is also very confusing to read if you don't know what it means. The first thing to understand is that the relocation sections tell the linker about places in the code that depend on symbols, either in other sections of the same file, or, more frequently, symbols that are defined in other files. The .rela.text section, for example, contains relocation information about the .text section. When this object file is linked into the final executable, the linker will overwrite part of the .text section with the missing addresses.
So, looking at the first entry under .rela.text, we see an offset of 0xb. Looking at the disassembly, we can see that offset 0xb references the fourth byte of the lea instruction's 7-byte encoding. The type, R_X86_64_PC32, tells us that that instruction is expecting a 32-bit PC-relative address, so we can expect the linker to overwrite the next 4 bytes (currently all 0). The rightmost column tells us, in human readable format, that this address needs to be populated with the address of the .rodata section minus 4 (with PC-relative addressing you have to remember that the PC will be pointing at the next instruction). It leaves out the fact, implicit for relocation type R_X86_64_PC32, that it will then subtract from that the final address of byte 0xb in the .text section, which will make that a valid PC-relative pointer to the "Hello World!" string data.
Similarly, the second entry tells the linker to copy the address of printf (minus 4) to offset 0x15 in the .text section, which would be the last 4 bytes of the callq instruction encoding. In this case, the type is R_X86_64_PLT32, which tells us that it's pointing to an entry in the procedure linkage table (PLT). A PLT is used for dynamic linking so that shared object libraries (in this case libc.so) can be loaded into physical memory once and shared by many running executables.
As a note, that might answer some of your specific questions, your compiler automatically links all the runtime libraries needed to execute a program. This includes any standard library functions, which would be part of libc.so. The only way to run without "external dependencies" would be to run on a bare-metal system (i.e. one without an operating system). Any operating system you use will have to do some amount of work to get your program to the start of main().

Which Clang/GCC linker flag should be used to produce offsets in code that stay within the binary range?

I'm trying to link my code with an external static library, that has this piece of code in the binary:
0000000000000000 <some_method>:
0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <some_method+0x7>
7: c3 retq
After linking with my code, the linker writes an actual offset instead of the zeroes:
00000000000175c0 <some_method>:
175c0: 48 8d 05 39 aa 20 00 lea 0x20aa39(%rip),%rax # 222000 <some_method.method>
175c7: c3 retq
Offset 222000 is supposed to be in the .data section according to the readelf output, which is supposed to be OK, but the problem is that I need to copy my binary code "as is" into some memory space and make it run from there, without using any OS loaders that know how to relocate different sections of the binary in the process address space. The memory address to which I load my binary can change too, so I can't use static non-relative offsets in my code either.
I want all my RIP-relative offsets in the code to be only within the binary file size range, so for example if my binary is 0x10000 bytes size, and I load it at address 0x200000, I don't want any offsets to go beyond the address 0x210000. Is there a way to tell the linker to do that somehow?

Why does register_tm_clones and deregister_tm_clones reference an address past the .bss section? Where is this memory allocated?

register_tm_clones and deregister_tm_clones are referencing memory addresses past the end of my RW sections. How is this memory tracked?
Example: In the example below deregister_tm_clones references memory address 0x601077, but the last RW section we allocated, .bss starts at 0x601069 and has size 0x7, adding we get 0x601070. So the reference is clearly past whats been allocated for the .bss section and should be in our heap space, but who's managing it.
objdump -d main
...
0000000000400540 <deregister_tm_clones>:
400540: b8 77 10 60 00 mov $0x601077,%eax
400545: 55 push %rbp
400546: 48 2d 70 10 60 00 sub $0x601070,%rax
40054c: 48 83 f8 0e cmp $0xe,%rax
...
readelf -S main
...
[25] .data PROGBITS 0000000000601040 00001040
0000000000000029 0000000000000000 WA 0 0 16
[26] .bss NOBITS 0000000000601069 00001069
0000000000000007 0000000000000000 WA 0 0 1
[27] .comment PROGBITS 0000000000000000 00001069
0000000000000058 0000000000000001 MS 0 0 1
[28] .shstrtab STRTAB 0000000000000000 000019f2
000000000000010c 0000000000000000 0 0 1
[29] .symtab SYMTAB 0000000000000000 000010c8
00000000000006c0 0000000000000018 30 47 8
[30] .strtab STRTAB 0000000000000000 00001788
000000000000026a 0000000000000000 0 0 1
Note that the references start exactly at the end of the .bss section. When I examine the memory allocated using gdb, I see that there is plenty of space, so it works fine, but I don't see how this memory is managed.
Start Addr End Addr Size Offset objfile
0x400000 0x401000 0x1000 0x0 /home/nobody/main
0x600000 0x601000 0x1000 0x0 /home/nobody/main
0x601000 0x602000 0x1000 0x1000 /home/nobody/main
0x7ffff7a17000 0x7ffff7bd0000 0x1b9000 0x0 /usr/lib64/libc-2.23.so
I can find no other reference to it in any other sections. There is also no space reserved for it in by the segment loaded for .bss:
LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x0000000000000259 0x0000000000000260 RW 200000
Can anyone clarify these functions? Where is the source? I've read all the references on transactional memory, but they cover programming not implementation. I can not find a compiler option to remove this code, except of course -nostdlibs which leaves you with nothing.
Are these part of malloc perhaps? Still for code that's not using malloc, threading, or STM, I'm not sure I agree these should be linked into my code.
See also What functions does gcc add to the linux ELF?
More details:
$ make main
cc -c -o main.o main.c
cc -o main main.o
$ which cc
/usr/bin/cc
$ cc --version
cc (GCC) 6.2.1 20160916 (Red Hat 6.2.1-2)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ cc --verbose
Using built-in specs.
COLLECT_GCC=cc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.2.1/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap
--enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto
--prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared
--enable-threads=posix --enable-checking=release --enable-multilib
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-linker-hash-style=gnu --enable-plugin --enable-initfini-array
--disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function
--with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 6.2.1 20160916 (Red Hat 6.2.1-2) (GCC)

It is just very silly pointer arithmetic code generated by gcc for deregister_tm_clones(). It does not actually access the memory at those addresses.
Summary
No accesses are done at those pointers; they just act as address labels, and GCC is being silly about how it compares the two (relocated) addresses.
The two functions are needed as part of transaction support in C and C++. For further details, see GNU libitm.
Background
I'm running Ubuntu 16.04.3 LTS (Xenial Xerus) on x86-64, with GCC versions 4.8.5, 4.9.4, 5.4.1, 6.3.0, and 7.1.0 installed. The register_tm_clones() and deregister_tm_clones() get compiled in from /usr/lib/gcc/x86-64/VERSION/crtbegin.o. For all versions, register_tm_clones() is okay (no odd addresses). For versions 4.9.4, 5.4.1, and 6.3.0, the code for deregister_tm_clones() is the same, and includes a very odd pointer comparison test. The code for deregister_tm_clones() is fixed in 7.1.0, where it is a straightforward address test.
The sources for the two functions are in libgcc/crtstuff.c in the GCC sources.
On this machine, objdump -t /usr/lib/gcc/ARCH/VERSION/crtbegin.o shows .tm_clone_table, __TMC_LIST__, and __TMC_END__, for all GCC versions I mentioned above, so in the GCC sources, both USE_TM_CLONE_REGISTRY and HAVE_GAS_HIDDEN are defined. Thus, we can describe the two functions in C as
typedef void (*func_ptr) (void);
extern void _ITM_registerTMCloneTable(void *, size_t);
extern void _ITM_deregisterTMCloneTable(void *);
static func_ptr __TMC_LIST__[] = { };
extern func_ptr __TMC_END__[];
void deregister_tm_clones(void)
{
void (*fn)(void);
if (__TMC_LIST__ != __TMC_END__) {
fn = _ITM_deregisterTMCloneTable;
if (fn != NULL)
fn(__TMC_LIST__);
}
}
void register_tm_clones(void)
{
void (*fn)(void);
size_t size;
size = (__TMC_END__ - __TMC_LIST__) / 2;
if (size > 0) {
fn = _ITM_registerTMCloneTable;
if (fn != NULL)
fn(__TMC_LIST__, size);
}
}
Essentially, __TMC_LIST__ is an array of function pointers, and size is the number of function pointer pairs in the array. If the array is not empty, a function called _ITM_registerTMCloneTable() or _ITM_deregisterTMCloneTable(), which are defined in libitm.a, GNU libitm. When the _ITM_registerTMCloneTable/_ITM_deregisterTMCloneTable symbols are not defined, the relocation code yields zero as their address.
So, when the array is empty, and/or _ITM_registerTMCloneTable/_ITMderegisterTMCloneTable symbols are not defined, the code does nothing: only some fancy pointer arithmetic.
Note that the code does not load the pointer values from any memory address. The addresses (of __TMC_LIST__, __TMC_END__, _ITM_registerTMCloneTable, and _ITM_deregisterTMCloneTable) are supplied by the linker/relocator, as immediate 32-bit literals in the code. (This is why, if you look at the disassembly of the object files, you see only zeros for the addresses.)
Investigation
The problematic code for deregister_tm_clones occurs at the very beginning:
004008c0 <deregister_tm_clones>:
4008c0: b8 57 bb 6c 00 mov $0x6cbb57,%eax
4008c5: 55 push %rbp
4008c6: 48 2d 50 bb 6c 00 sub $0x6cbb50,%rax
4008cc: 48 83 f8 0e cmp $0xe,%rax
4008d0: 48 89 e5 mov %rsp,%rbp
4008d3: 76 1b jbe 4008f0 <deregister_tm_clones+0x30>
4008d5: b8 00 00 00 00 mov $0x0,%eax
4008da: 48 85 c0 test %rax,%rax
4008dd: 74 11 je 4008f0 <deregister_tm_clones+0x30>
4008df: 5d pop %rbp
4008e0: bf 50 bb 6c 00 mov $0x6cbb50,%edi
4008e5: ff e0 jmpq *%rax
4008e7: (9-byte NOP)
4008f0: 5d pop %rbp
4008f1: c3 retq
4008f2: (14-byte NOP)
400900:
(This particular example comes from compiling a basic Hello, World! example in C using gcc-6.3.0 on x86-64 statically).
If we look at the section headers (objdump -h) for the same binary, we see that addresses 0x6cbb50 to 0x6cbb5f are actually not mapped to any segment; that
24 .data 00001ad0 00000000006ca080 00000000006ca080 000ca080 2**5
25 .bss 00001878 00000000006cbb60 00000000006cbb60 000cbb50 2**5
i.e. .data covers addresses 0x6ca080 to 0x6cbb4f, and .bss covers
0x6cbb60 to 0x6cd3d8.
It would seem like the assembly code is using invalid addresseses!
However, the 0x6cbb50 address is quite valid, because there is a zero-size hidden symbol at that address (objdump -t):
006cbb50 g O .data 0000000000000000 .hidden __TMC_END__
Because I compiled the binary statically, the __TMC_END__ symbol is part of the .data segment here; normally, it is in .bss. In any case, it does not matter, because __TMC_END__ symbol is of zero size: We can use its address as part of whatever calculations we want, we just cannot dereference it, because it contains no data, having zero size.
This leaves the very first relocated address in the deregister_tm_clones function, 0x0x6cbb57 in this case.
If we look at what the code actually does with that value, it turns out that for some braindead reason, the compiled binary code is essetially computing
long temporary = relocated__TMC_LIST__address + 7;
long difference = temporary - relocated__TMC_END__address;
if (difference <= 14)
return;
Because the comparison function used is a signed comparison, the above behaves exactly the same as
long temporary = relocated__TMC_LIST__address;
long difference = temporary - relocated__TMC_END__address;
if (difference <= 7)
return;
In any case, it is obvious that __TMC_LIST__ == __TMC_END__, and that the relocated addresses are the same, in both OP's binary, and the binary above.
Addendum
I do not know exactly why GCC generates
if ((__TMC_END__ + 7) - __TMC_LIST <= 14)
rather than
if (__TMC_END__ <= __TMC_LIST__)
but in GCC bug 77813 Marc Glisse does mention that it (the former above) is indeed what GCC ends up generating. (The bug itself is not directly related to this, as it is about GCC optimizing the expression to zero, affecting only libitm users, and easily fixed.)
Also, between gcc-6.3.0 and gcc-7.1.0, when the generated code dropped that inanity, the C sources for the functions did not change. What changed is how GCC generates code (in some situations) for such pointer comparisons.

C Standard Library Functions vs. System Calls. Which is `open()`?

I know fopen() is in the C standard library, so that I can definitely call the fopen() function in a C program. What I am confused about is why I can call the open() function as well. open() should be a system call, so it is not a C function in the standard library. As I am successfully able to call the open() function, am I calling a C function or a system call?

EJP's comments to the question and Steve Summit's answer are exactly to the point: open() is both a syscall and a function in the standard C library; fopen() is a function in the standard C library, that sets up a file handle -- a data structure of type FILE that contains additional stuff like optional buffering --, and internally calls open() also.
In the hopes to further understanding, I shall show hello.c, an example Hello world -program written in C for Linux on 64-bit x86 (x86-64 AKA AMD64 architecture), which does not use the standard C library at all.
First, hello.c needs to define some macros with inline assembly for us to be able to call the syscalls. These are very architecture- and operating system dependent, which is why this only works in Linux on x86-64 architecture:
/* Freestanding Hello World example in Linux on x86_64/x86.
* Compile using
* gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello
*/
#define STDOUT_FILENO 1
#define EXIT_SUCCESS 0
#ifndef __x86_64__
#error This program only works on x86_64 architecture!
#endif
#define SYS_write 1
#define SYS_exit 60
#define SYSCALL1_NORET(nr, arg1) \
__asm__ ( "syscall\n\t" \
: \
: "a" (nr), "D" (arg1) \
: "rcx", "r11" )
#define SYSCALL3(retval, nr, arg1, arg2, arg3) \
__asm__ ( "syscall\n\t" \
: "=a" (retval) \
: "a" (nr), "D" (arg1), "S" (arg2), "d" (arg3) \
: "rcx", "r11" )
The Freestanding in the comment at the beginning of the file refers to "freestanding execution environment"; it is the case when there is no C library available at all. For example, the Linux kernel is written the same way. The normal environment we are familiar with is called "hosted execution environment", by the way.
Next, we can define two functions, or "wrappers", around the syscalls:
static inline void my_exit(int retval)
{
SYSCALL1_NORET(SYS_exit, retval);
}
static inline int my_write(int fd, const void *data, int len)
{
int retval;
if (fd == -1 || !data || len < 0)
return -1;
SYSCALL3(retval, SYS_write, fd, data, len);
if (retval < 0)
return -1;
return retval;
}
Above, my_exit() is roughly equivalent to C standard library exit() function, and my_write() to write().
The C language does not define any kind of a way to do a syscall, so that is why we always need a "wrapper" function of some sort. (The GNU C library does provide a syscall() function for us to do any syscall we wish -- but the point of this example is to not use the C library at all.)
The wrapper functions always involve a bit of (inline) assembly. Again, since C does not have a built-in way to do a syscall, we need to "extend" the language by adding some assembly code. This (inline) assembly, and the syscall numbers, is what makes this example, operating system and architecture dependent. And yes: the GNU C library, for example, contains the equivalent wrappers for quite a few architectures.
Some of the functions in the C library do not use any syscalls. We also need one, the equivalent of strlen():
static inline int my_strlen(const char *str)
{
int len = 0L;
if (!str)
return -1;
while (*str++)
len++;
return len;
}
Note that there is no NULL used anywhere in the above code. It is because it is a macro defined by the C library. Instead, I'm relying on "logical null": (!pointer) is true if and only if pointer is a zero pointer, which is what NULL is on all architectures in Linux. I could have defined NULL myself, but I didn't, in the hopes that somebody might notice the lack of it.
Finally, main() itself is something the GNU C library calls, as in Linux, the actual start point of the binary is called _start. The _start is provided by the hosted runtime environment, and initializes the C library data structures and does other similar preparations. Our example program is so simple we do not need it, so we can just put our simple main program part into _start instead:
void _start(void)
{
const char *msg = "Hello, world!\n";
my_write(STDOUT_FILENO, msg, my_strlen(msg));
my_exit(EXIT_SUCCESS);
}
If you put all of the above together, and compile it using
gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello
per the comment at the start of the file, you will end up with a small (about two kilobytes) static binary, that when run,
./hello
outputs
Hello, world!
You can use file hello to examine the contents of the file. You could run strip hello to remove all (unneeded) symbols, reducing the file size further down to about one and a half kilobytes, if file size was really important. (It will make the object dump less interesting, however, so before you do that, check out the next step first.)
We can use objdump -x hello to examine the sections in the file:
hello: file format elf64-x86-64
hello
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004001e1
Program Header:
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
filesz 0x00000000000002f0 memsz 0x00000000000002f0 flags r-x
NOTE off 0x0000000000000120 vaddr 0x0000000000400120 paddr 0x0000000000400120 align 2**2
filesz 0x0000000000000024 memsz 0x0000000000000024 flags r--
EH_FRAME off 0x000000000000022c vaddr 0x000000000040022c paddr 0x000000000040022c align 2**2
filesz 0x000000000000002c memsz 0x000000000000002c flags r--
STACK off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-
Sections:
Idx Name Size VMA LMA File off Algn
0 .note.gnu.build-id 00000024 0000000000400120 0000000000400120 00000120 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 000000d9 0000000000400144 0000000000400144 00000144 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .rodata 0000000f 000000000040021d 000000000040021d 0000021d 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .eh_frame_hdr 0000002c 000000000040022c 000000000040022c 0000022c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .eh_frame 00000098 0000000000400258 0000000000400258 00000258 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .comment 00000034 0000000000000000 0000000000000000 000002f0 2**0
CONTENTS, READONLY
SYMBOL TABLE:
0000000000400120 l d .note.gnu.build-id 0000000000000000 .note.gnu.build-id
0000000000400144 l d .text 0000000000000000 .text
000000000040021d l d .rodata 0000000000000000 .rodata
000000000040022c l d .eh_frame_hdr 0000000000000000 .eh_frame_hdr
0000000000400258 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l df *ABS* 0000000000000000 hello.c
0000000000400144 l F .text 0000000000000016 my_exit
000000000040015a l F .text 000000000000004e my_write
00000000004001a8 l F .text 0000000000000039 my_strlen
0000000000000000 l df *ABS* 0000000000000000
000000000040022c l .eh_frame_hdr 0000000000000000 __GNU_EH_FRAME_HDR
00000000004001e1 g F .text 000000000000003c _start
0000000000601000 g .eh_frame 0000000000000000 __bss_start
0000000000601000 g .eh_frame 0000000000000000 _edata
0000000000601000 g .eh_frame 0000000000000000 _end
The .text section contains our code, and .rodata immutable constants; here, just the Hello, world! string literal. The rest of the sections are stuff the linker adds and the system uses. We can see that we have f(hex) = 15 bytes of read-only data, and d9(hex) = 217 bytes of code; the rest of the file (about a kilobyte or so) is ELF stuff added by the linker for the kernel to use when executing this binary.
We can even examine the actual assembly code contained in hello, by running objdump -d hello:
hello: file format elf64-x86-64
Disassembly of section .text:
0000000000400144 <my_exit>:
400144: 55 push %rbp
400145: 48 89 e5 mov %rsp,%rbp
400148: 89 7d fc mov %edi,-0x4(%rbp)
40014b: b8 3c 00 00 00 mov $0x3c,%eax
400150: 8b 55 fc mov -0x4(%rbp),%edx
400153: 89 d7 mov %edx,%edi
400155: 0f 05 syscall
400157: 90 nop
400158: 5d pop %rbp
400159: c3 retq
000000000040015a <my_write>:
40015a: 55 push %rbp
40015b: 48 89 e5 mov %rsp,%rbp
40015e: 89 7d ec mov %edi,-0x14(%rbp)
400161: 48 89 75 e0 mov %rsi,-0x20(%rbp)
400165: 89 55 e8 mov %edx,-0x18(%rbp)
400168: 83 7d ec ff cmpl $0xffffffff,-0x14(%rbp)
40016c: 74 0d je 40017b <my_write+0x21>
40016e: 48 83 7d e0 00 cmpq $0x0,-0x20(%rbp)
400173: 74 06 je 40017b <my_write+0x21>
400175: 83 7d e8 00 cmpl $0x0,-0x18(%rbp)
400179: 79 07 jns 400182 <my_write+0x28>
40017b: b8 ff ff ff ff mov $0xffffffff,%eax
400180: eb 24 jmp 4001a6 <my_write+0x4c>
400182: b8 01 00 00 00 mov $0x1,%eax
400187: 8b 7d ec mov -0x14(%rbp),%edi
40018a: 48 8b 75 e0 mov -0x20(%rbp),%rsi
40018e: 8b 55 e8 mov -0x18(%rbp),%edx
400191: 0f 05 syscall
400193: 89 45 fc mov %eax,-0x4(%rbp)
400196: 83 7d fc 00 cmpl $0x0,-0x4(%rbp)
40019a: 79 07 jns 4001a3 <my_write+0x49>
40019c: b8 ff ff ff ff mov $0xffffffff,%eax
4001a1: eb 03 jmp 4001a6 <my_write+0x4c>
4001a3: 8b 45 fc mov -0x4(%rbp),%eax
4001a6: 5d pop %rbp
4001a7: c3 retq
00000000004001a8 <my_strlen>:
4001a8: 55 push %rbp
4001a9: 48 89 e5 mov %rsp,%rbp
4001ac: 48 89 7d e8 mov %rdi,-0x18(%rbp)
4001b0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4001b7: 48 83 7d e8 00 cmpq $0x0,-0x18(%rbp)
4001bc: 75 0b jne 4001c9 <my_strlen+0x21>
4001be: b8 ff ff ff ff mov $0xffffffff,%eax
4001c3: eb 1a jmp 4001df <my_strlen+0x37>
4001c5: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4001c9: 48 8b 45 e8 mov -0x18(%rbp),%rax
4001cd: 48 8d 50 01 lea 0x1(%rax),%rdx
4001d1: 48 89 55 e8 mov %rdx,-0x18(%rbp)
4001d5: 0f b6 00 movzbl (%rax),%eax
4001d8: 84 c0 test %al,%al
4001da: 75 e9 jne 4001c5 <my_strlen+0x1d>
4001dc: 8b 45 fc mov -0x4(%rbp),%eax
4001df: 5d pop %rbp
4001e0: c3 retq
00000000004001e1 <_start>:
4001e1: 55 push %rbp
4001e2: 48 89 e5 mov %rsp,%rbp
4001e5: 48 83 ec 10 sub $0x10,%rsp
4001e9: 48 c7 45 f8 1d 02 40 movq $0x40021d,-0x8(%rbp)
4001f0: 00
4001f1: 48 8b 45 f8 mov -0x8(%rbp),%rax
4001f5: 48 89 c7 mov %rax,%rdi
4001f8: e8 ab ff ff ff callq 4001a8 <my_strlen>
4001fd: 89 c2 mov %eax,%edx
4001ff: 48 8b 45 f8 mov -0x8(%rbp),%rax
400203: 48 89 c6 mov %rax,%rsi
400206: bf 01 00 00 00 mov $0x1,%edi
40020b: e8 4a ff ff ff callq 40015a <my_write>
400210: bf 00 00 00 00 mov $0x0,%edi
400215: e8 2a ff ff ff callq 400144 <my_exit>
40021a: 90 nop
40021b: c9 leaveq
40021c: c3 retq
The assembly itself is not really that interesting, except that in my_write and my_exit you can see how the inline assembly generated by the SYSCALL...() macro just loads the variables into specific registers, and does the "do syscall" -- which just happens to be an x86-64 assembly instruction also called syscall here; in 32-bit x86 architecture, it is int $80, and yet something else in other architectures.
There is a final wrinkle, related to the reason why I used the prefix my_ for the functions analog to the functions in the C library: the C compiler can provide optimized shortcuts for some C library functions. For GCC, these are listed here; the list includes strlen().
This means we do not actually need the my_strlen() function, because we can use the optimized __builtin_strlen() function GCC provides, even in freestanding environment. The built-ins are usually very optimized; in the case of __builtin_strlen() on x86-64 using GCC-5.4.0, it optimizes to just a couple of register loads and a repnz scasb %es:(%rdi),%al instruction (which looks long, but actually takes just two bytes).
In other words, the final wrinkle is that there is a third type of function, compiler built-ins, that are provided by the compiler (but otherwise just like the functions provided by the C library) in optimized form, depending on the compiler options and architecture used.
If we were to expand the above example so that we'd open a file and write the Hello, world! into it, and compare low-level unistd.h (open()/write()/close()) and standard I/O stdio.h (fopen()/puts()/fclose()) approaches, we'd find that the major difference is in that the FILE handle used by the standard I/O approach contains a lot of extra stuff (that makes the standard file handles quite versatile, just not useful in such a trivial example), most visible in the buffering approach it has. On the assembly level, we'd still see the same syscalls -- open, write, close -- used.
Even though at first glance the ELF format (used for binaries in Linux) contains a lot of "unneeded stuff" (about a kilobyte for our example program above), it is actually a very powerful format. It, and the dynamic loader in Linux, provides a way to auto-load libraries when a program starts (using LD_PRELOAD environment variable), and to interpose functions in other libraries -- essentially, replace them with new ones, but with a way to still be able to call the original interposed version of the function. There are lots of useful tricks, fixes, experiments, and debugging methods these allow.

Although the distinction between "system call" and "library function" can be a useful one to keep in mind, there's the issue that you have to be able to call system calls somehow. In general, then, every system call is present in the C library -- as a thin little library function that does nothing but make the transfer to the system call (however that's implemented).
So, yes, you can call open() from C code if you want to. (And somewhere, perhaps in a file called fopen.c, the author of your C library probably called it too, within the implementation of fopen().)

The starting point for answering your question is to ask another question: What is a system call?
Generally, one thinks of a system call as a procedure that executes at an elevated processor privilege level. Generally, this means switching from user mode to kernel mode (some systems use multiple modes).
The mechanism for and application to enter kernel mode depends upon the system (and one Intel there are multiple ways). The general sequence for invoking a system service is the process executes an instruction that triggers a change processor mode exception. The CPU responds to the exception by invoking the appropriate exception/interrupt handler then dispatches to the appropriate operating system service.
The problem for C programming is that invoking a system service requires executing a specific hardware instruction and setting hardware register values. Operating systems provide wrapper functions that that handle the packing of parameters into registers, triggering the exception, then unpacking the return values from registers.
The open() function usually be a wrapper for high level languages to invoke system services. If you think about, fopen() is generally a "wrapper" for open().
So what we normally think of as a system call is a function that does nothing other than invoke a system service.

gcc: segmentation fault when compiling with nostdlib

I'm experimenting with entry points and got a segfault.
prog.c:
int main() {
return 0;
}
compiling and linking with:
gcc -Wall prog.c -nostdlib -c -o prog.o
ld prog.o -e main -o prog.out
objdump:
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000000b 00000000004000b0 00000000004000b0 000000b0 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .eh_frame 00000038 00000000004000c0 00000000004000c0 000000c0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .comment 0000001c 0000000000000000 0000000000000000 000000f8 2**0
CONTENTS, READONLY
Disassembly of section .text:
00000000004000b0 <main>:
4000b0: 55 push %rbp
4000b1: 48 89 e5 mov %rsp,%rbp
4000b4: b8 00 00 00 00 mov $0x0,%eax
4000b9: 5d pop %rbp
4000ba: c3 retq
Valgrind output:
Access not within mapped region at address 0x0

retq takes the return address from the top of the stack and executes from there... problem is that according to the way Linux executes binaries, the number of arguments is on the stack and execution is transferred to address 0x1 (if no arguments are given)
set some dummy arguments with gdb (set args x y z)
you can compile and link with debug info (-g) and then use gdb
set a breakpoint on the retq instruction (br *0x4000ba) and run the program
execute the final instruction and observe the SIGSEGV address correspond to the number of arguments + 1
the program should exit with a system call, not retq
see
http://eli.thegreenplace.net/2012/08/13/how-statically-linked-programs-run-on-linux/
for some useful background info

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight