Loading a dynamic library at run-time yields inconsistent and unexpected results, missing symbols and empty PLT entries. Why?

Loading a dynamic library at run-time yields inconsistent and unexpected results, missing symbols and empty PLT entries. Why? - c

I've been fighting with this problem for quite some time, and I've been unable to find a solution or even an explanation for it. So sorry if the question is long, but bear with me as I just want to make it 100% clear in the hopes that someone more experienced than me will be able to figure it out.
I'm keeping the C syntax highlight on for all snippets because it makes them a little bit clearer even if not really correct.
What I want to do
I have a C program which uses some functions from a dynamic library (libzip). Here it is boiled down to a minimal reproducible example (it basically does nothing, but it works just fine):
#include <zip.h>
int main(void) {
int err;
zip_t *myzip;
myzip = zip_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
zip_close(myzip);
return 0;
}
Normally, to compile it, I would simply do:
gcc -c prog.c
gcc -o prog prog.o -lzip
This creates, as expected, an ELF which requires libzip to run:
$ ldd prog
linux-vdso.so.1 (0x00007ffdafb53000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f81eedc7000)
/lib64/ld-linux-x86-64.so.2 (0x00007f81ef780000)
libzip.so.4 => /usr/lib/x86_64-linux-gnu/libzip.so.4 (0x00007f81ef166000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f81eebad000)
(libz is just a dependency of libzip)
What I really want to do though, is to load the library myself using dlopen(). Pretty simple task, no? Well yes, or at least I thought.
To achieve this, I should just need to call dlopen and let the loader do its job:
#include <zip.h>
#include <dlfcn.h>
int main(void) {
void *lib;
int err;
zip_t *myzip;
lib = dlopen("libzip.so", RTLD_LAZY | RTLD_GLOBAL);
if (lib == NULL)
return 1;
myzip = zip_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
zip_close(myzip);
return 0;
}
Of course, since I want to manually load the library myself, I will not link it this time:
# Create prog.o
gcc -c prog.c
# Do a dry-run just to make sure all symbols are resolved
gcc -o /dev/null prog.o -ldl -lzip
# Now recompile only with libdl
gcc -o prog prog.o -ldl -Wl,--unresolved-symbols=ignore-in-object-files
The flag --unresolved-symbols=ignore-in-object-files tells ld to not worry about my prog.o having unresolved symbols at link time (I want to take care of that myself at runtime).
The problem
The above Should Just Work™, and indeed it does seem to... but I have two machines, and being the pedantic nerd I am I just thought "well, better make sure and compile it on both of them".
First machine
x86-64, Linux 4.9, Debian 9, gcc 6.3.0, ld 2.28. Here everything works as expected.
I can clearly see that the symbols are there:
$ readelf --dyn-syms prog
Symbol table '.dynsym' contains 15 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
===> 4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND zip_close
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND dlopen#GLIBC_2.2.5 (3)
===> 6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND zip_open
7: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
8: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
9: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize#GLIBC_2.2.5 (2)
10: 0000000000201040 0 NOTYPE GLOBAL DEFAULT 25 _edata
11: 0000000000201048 0 NOTYPE GLOBAL DEFAULT 26 _end
12: 0000000000201040 0 NOTYPE GLOBAL DEFAULT 26 __bss_start
13: 00000000000006a0 0 FUNC GLOBAL DEFAULT 11 _init
14: 0000000000000924 0 FUNC GLOBAL DEFAULT 15 _fini
The PLT entries are also there as expected and look fine:
$ objdump -j .plt -M intel -d prog
Disassembly of section .plt:
00000000000006c0 <.plt>:
6c0: ff 35 42 09 20 00 push QWORD PTR [rip+0x200942] # 201008 <_GLOBAL_OFFSET_TABLE_+0x8>
6c6: ff 25 44 09 20 00 jmp QWORD PTR [rip+0x200944] # 201010 <_GLOBAL_OFFSET_TABLE_+0x10>
6cc: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
00000000000006d0 <zip_close#plt>:
6d0: ff 25 42 09 20 00 jmp QWORD PTR [rip+0x200942] # 201018 <zip_close>
6d6: 68 00 00 00 00 push 0x0
6db: e9 e0 ff ff ff jmp 6c0 <.plt>
00000000000006e0 <dlopen#plt>:
6e0: ff 25 3a 09 20 00 jmp QWORD PTR [rip+0x20093a] # 201020 <dlopen#GLIBC_2.2.5>
6e6: 68 01 00 00 00 push 0x1
6eb: e9 d0 ff ff ff jmp 6c0 <.plt>
00000000000006f0 <zip_open#plt>:
6f0: ff 25 32 09 20 00 jmp QWORD PTR [rip+0x200932] # 201028 <zip_open>
6f6: 68 02 00 00 00 push 0x2
6fb: e9 c0 ff ff ff jmp 6c0 <.plt>
And the program runs without any problem:
$ ./prog
$ echo $?
0
Even looking inside it with a debugger I can clearly see the symbols getting correctly resolved like any normal dynamic symbol:
0x55555555479b <main+43> lea rax, [rbp - 0x14]
0x55555555479f <main+47> mov rdx, rax
0x5555555547a2 <main+50> mov esi, 9
0x5555555547a7 <main+55> lea rdi, [rip + 0xc0] <0x7ffff7ffd948>
0x5555555547ae <main+62> call zip_open#plt <0x555555554620>
|
v ### PLT entry:
0x555555554620 <zip_open#plt> jmp qword ptr [rip + 0x200a02] <0x555555755028>
|
v
0x555555554626 <zip_open#plt+6> push 2
0x55555555462b <zip_open#plt+11> jmp 0x5555555545f0
|
v ### PLT stub:
0x5555555545f0 push qword ptr [rip + 0x200a12] <0x555555755008>
0x5555555545f6 jmp qword ptr [rip + 0x200a14] <0x7ffff7def0d0>
|
v ### Symbol gets correctly resolved
0x7ffff7def0d0 <_dl_runtime_resolve_fxsave> push rbx
0x7ffff7def0d1 <_dl_runtime_resolve_fxsave+1> mov rbx, rsp
0x7ffff7def0d4 <_dl_runtime_resolve_fxsave+4> and rsp, 0xfffffffffffffff0
0x7ffff7def0d8 <_dl_runtime_resolve_fxsave+8> sub rsp, 0x240
Second machine
x86-64, Linux 4.15, Ubuntu 18.04, gcc 7.4, ld 2.30. Here, something really strange is going on.
Compilation doesn't yield any warning or error, but I do not see the symbols:
$ readelf --dyn-syms prog
Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND dlopen#GLIBC_2.2.5 (3)
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
6: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize#GLIBC_2.2.5 (2)
The PLT entries are there, but they are filled with zeroes, and aren't even recognized by objdump:
$ objdump -j .plt -M intel -d prog
Disassembly of section .plt:
0000000000000560 <.plt>:
560: ff 35 4a 0a 20 00 push QWORD PTR [rip+0x200a4a] # 200fb0 <_GLOBAL_OFFSET_TABLE_+0x8>
566: ff 25 4c 0a 20 00 jmp QWORD PTR [rip+0x200a4c] # 200fb8 <_GLOBAL_OFFSET_TABLE_+0x10>
56c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
...
# ^^^
# Here, these three dots are actually hiding another 0x10+ bytes filled of 0x0
# zip_close#plt should be here instead...
0000000000000580 <dlopen#plt>:
580: ff 25 42 0a 20 00 jmp QWORD PTR [rip+0x200a42] # 200fc8 <dlopen#GLIBC_2.2.5>
586: 68 00 00 00 00 push 0x0
58b: e9 d0 ff ff ff jmp 560 <.plt>
...
# ^^^
# Here, these three dots are actually hiding another 0x10+ bytes filled of 0x0
# zip_open#plt should be here instead...
When the program is run, dlopen() works fine and loads libzip into memory, but then when zip_open() gets called, it just generates a segmentation fault:
$ ./prog
Segmentation fault (code dumped)
Taking a look with a debugger, the issue is even more obvious (in case it wasn't already obvious enough). The PLT entries filled with zeroes just end up decoding to a bunch of add instructions dereferencing rax, which contains an invalid address and makes the program segfault and die:
0x5555555546e5 <main+43> lea rax, [rbp - 0x14]
0x5555555546e9 <main+47> mov rdx, rax
0x5555555546ec <main+50> mov esi, 9
0x5555555546f1 <main+55> lea rdi, [rip + 0xc6]
0x5555555546f8 <main+62> call dlopen#plt+16 <0x555555554590>
|
v ### Broken PLT enrty (all 0x0, will cause a segfault):
0x555555554590 <dlopen#plt+16> add byte ptr [rax], al
0x555555554592 <dlopen#plt+18> add byte ptr [rax], al
0x555555554594 <dlopen#plt+20> add byte ptr [rax], al
0x555555554596 <dlopen#plt+22> add byte ptr [rax], al
0x555555554598 <dlopen#plt+24> add byte ptr [rax], al
0x55555555459a <dlopen#plt+26> add byte ptr [rax], al
0x55555555459c <dlopen#plt+28> add byte ptr [rax], al
0x55555555459e <dlopen#plt+30> add byte ptr [rax], al
### Next PLT entry...
0x5555555545a0 <__cxa_finalize#plt> jmp qword ptr [rip + 0x200a52] <0x7ffff7823520>
|
v
0x7ffff7823520 <__cxa_finalize> push r15
0x7ffff7823522 <__cxa_finalize+2> push r14
Questions
So, first of all... why is this happening?
I thought that this was supposed to work, isn't it? If not, why? And why only on one of the two machines?
But most importantly: how can I fix this?
For question 3 I want to emphasize that the whole point of this is that I want to load the library myself, without linking it, so please refrain from just commenting that this is bad practice, or whatever else.

The above Should Just Work™, and indeed it does seem to...
No, it should not, and if it appears to, that's more of an accident. In general, using --unresolved-symbols=... is a really bad idea™, and will almost never do what you want.
The solution is trivial: you just need to look up zip_open and zip_close, like so:
int main(void) {
void *lib;
zip_t *p_open(const char *, int, int *);
void *p_close(zip_t*);
int err;
zip_t *myzip;
lib = dlopen("libzip.so", RTLD_LAZY | RTLD_GLOBAL);
if (lib == NULL)
return 1;
p_open = (zip_t(*)(const char *, int, int *))dlsym(lib, "zip_open");
if (p_open == NULL)
return 1;
p_close = (void(*)(zip_t*))dlsym(lib, "zip_close");
if (p_close == NULL)
return 1;
myzip = p_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
p_close(myzip);
return 0;
}

To add to EmployedRussian's answer, you can achieve what you need with the help of Implib.so tool. It would generate stubs for all library symbols (e.g. zip_open) which would call dlopen/dlsym internally and forward calls from your program to shared library:
$ gcc -c prog.c
$ implib-gen.py path/to/libzip.so
$ gcc -o prog prog.o libzip.tramp.S libzip.init.c -ldl
(note that you no longer need fancy linker flags and linker dry runs).
As a side note what you are trying to do is called delayed loading and is a standard feature of Windows DLLS.

Related

Understanding RIP relative relocations

I am trying to understand the relocations for the object file generated by this simple program:
int answer = 42;
int compute()
{
return answer;
}
int main()
{
return compute();
}
I compile it with simply gcc -c main.cpp.
Then, examining objdump -DCrw -M intel main.o, we see among other things:
Disassembly of section .text:
0000000000000000 <compute()>:
0: f3 0f 1e fa endbr64
4: 55 push rbp
5: 48 89 e5 mov rbp,rsp
8: 8b 05 00 00 00 00 mov eax,DWORD PTR [rip+0x0] # e <compute()+0xe> a: R_X86_64_PC32 answer-0x4
e: 5d pop rbp
f: c3 ret
(...)
Disassembly of section .data:
0000000000000000 <answer>:
0: 2a 00 sub al,BYTE PTR [rax]
...
We can also look at the relocations as reported by readelf -rW main.o:
Relocation section '.rela.text' at offset 0x250 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
000000000000000a 0000000900000002 R_X86_64_PC32 0000000000000000 answer - 4
(...)
Let's look at the relocation for the use of answer in compute():
To return answer from the function, we have this instruction to put the value of answer into eax:
8: 8b 05 00 00 00 00 mov eax,DWORD PTR [rip+0x0]
I.e. "copy whatever is at rip + an offset we don't know yet so let's just put 0s for it for now into eax".
The relocation for that instruction is
Offset Info Type Symbol's Value Symbol's Name + Addend
000000000000000a 0000000900000002 R_X86_64_PC32 0000000000000000 answer - 4
The offset value 0xa means that there's a relocation to be done at 0xa, or the third byte of the mov instruction above which starts on 0x8. This is the placeholder 00 00 00 00 we saw above. The relocation type is R_X86_64_PC32, which according to the System V Application Binary Interface means "S+A-P", where:
S = the value of the symbol whose index resides in the relocation entry
A = the addend used to compute the value of the relocatable field
P = the place (section offset or address) of the storage unit being relocated (computed using r_offset).
The value of the symbol (the value at 0xa) is 0. A is the addend -4 on that line.
I am not certain what P is, is it the address of the .data section?
And where does the -4 come from?
This calculation seems to end up as "4 bytes less the address of the .data section". How does that represent the distance from RIP (0xe) to wherever answer ends up in the final executable?
If I look at the final executable though (gcc -o main main.o, objdump -DCrw -M intel main), I see our mov instruction has been relocated:
1131: 8b 05 d9 2e 00 00 mov eax,DWORD PTR [rip+0x2ed9]
The 00 00 00 00 placeholder has been replaced with d9 2e 00 00. rip is now at 0x1137, so rip+0x2ed9 is 0x4010. And sure enough, at 0x4010 we find the answer:
Disassembly of section .data:
(...)
0000000000004010 <answer>:
4010: 2a 00 sub al,BYTE PTR [rax]
So everything works out fine in the end, I just don't understand exactly how it happens.

Cannot call assembly function from C kernel

I am writing a 32-bit C kernel and I tried to call a function that I wrote in assembly.
So I wrote an assembly file containing the function myfunc and then I wrote the kernel which defined myfunc as a global variable and I linked them together so I can use it in my C code.
It worked fine, but when I tried to call it, it caused a triple fault.
However, if I use inline assembly instead,
asm("call myfunc");
It does the job.
Also notice that the disassembly for asm("call myfunc"); and for myfunc(); aren't similar:
The disassembly when using myfunc():
00000000 <kmain>:
0: f3 0f 1e fb endbr32
4: 53 push ebx
5: 83 ec 08 sub esp,0x8
8: e8 fc ff ff ff call 9 <kmain+0x9>
d: 81 c3 02 00 00 00 add ebx,0x2
13: e8 fc ff ff ff call 14 <kmain+0x14>
18: f4 hlt
19: 83 c4 08 add esp,0x8
1c: 5b pop ebx
1d: c3 ret
The disassembly when using asm("call myfunc");:
00000000 <kmain>:
0: f3 0f 1e fb endbr32
4: e8 fc ff ff ff call 5 <kmain+0x5>
9: f4 hlt
a: c3 ret
Here's how I build it:
nasm -f elf32 kernel/klink.asm -o klink.o
gcc -c kernel/main.c -m32 -nostdlib -nodefaultlibs -O1 -fno-builtin
ld -m elf_i386 -T link.ld -o KERNEL klink.o main.o
objcopy --dump-section .text=KERNEL.SYS KERNEL
link.ld:
OUTPUT_FORMAT(elf32-i386)
ENTRY(kmain)
SECTIONS
{
. = 0x100000;
.text : { *(.text) }
}
The kernel (main.c):
void kmain() {
extern void myfunc();
myfunc(); //Here it causes a triple fault.
//asm("call myfunc"); //But this works fine.
asm("hlt");
}
And this is the assembly file that contains the function (klink.asm):
[BITS 32]
SECTION .text
GLOBAL start ;define start as a global variable
GLOBAL myfunc ;also myfunc
jmp start ;jump straight into start
myfunc:
;idk what to write here... just some code :)
EXTERN kmain ;define kmain as an external variable
start:
jmp kmain ;jump into kmain

C Standard Library Functions vs. System Calls. Which is `open()`?

I know fopen() is in the C standard library, so that I can definitely call the fopen() function in a C program. What I am confused about is why I can call the open() function as well. open() should be a system call, so it is not a C function in the standard library. As I am successfully able to call the open() function, am I calling a C function or a system call?

EJP's comments to the question and Steve Summit's answer are exactly to the point: open() is both a syscall and a function in the standard C library; fopen() is a function in the standard C library, that sets up a file handle -- a data structure of type FILE that contains additional stuff like optional buffering --, and internally calls open() also.
In the hopes to further understanding, I shall show hello.c, an example Hello world -program written in C for Linux on 64-bit x86 (x86-64 AKA AMD64 architecture), which does not use the standard C library at all.
First, hello.c needs to define some macros with inline assembly for us to be able to call the syscalls. These are very architecture- and operating system dependent, which is why this only works in Linux on x86-64 architecture:
/* Freestanding Hello World example in Linux on x86_64/x86.
* Compile using
* gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello
*/
#define STDOUT_FILENO 1
#define EXIT_SUCCESS 0
#ifndef __x86_64__
#error This program only works on x86_64 architecture!
#endif
#define SYS_write 1
#define SYS_exit 60
#define SYSCALL1_NORET(nr, arg1) \
__asm__ ( "syscall\n\t" \
: \
: "a" (nr), "D" (arg1) \
: "rcx", "r11" )
#define SYSCALL3(retval, nr, arg1, arg2, arg3) \
__asm__ ( "syscall\n\t" \
: "=a" (retval) \
: "a" (nr), "D" (arg1), "S" (arg2), "d" (arg3) \
: "rcx", "r11" )
The Freestanding in the comment at the beginning of the file refers to "freestanding execution environment"; it is the case when there is no C library available at all. For example, the Linux kernel is written the same way. The normal environment we are familiar with is called "hosted execution environment", by the way.
Next, we can define two functions, or "wrappers", around the syscalls:
static inline void my_exit(int retval)
{
SYSCALL1_NORET(SYS_exit, retval);
}
static inline int my_write(int fd, const void *data, int len)
{
int retval;
if (fd == -1 || !data || len < 0)
return -1;
SYSCALL3(retval, SYS_write, fd, data, len);
if (retval < 0)
return -1;
return retval;
}
Above, my_exit() is roughly equivalent to C standard library exit() function, and my_write() to write().
The C language does not define any kind of a way to do a syscall, so that is why we always need a "wrapper" function of some sort. (The GNU C library does provide a syscall() function for us to do any syscall we wish -- but the point of this example is to not use the C library at all.)
The wrapper functions always involve a bit of (inline) assembly. Again, since C does not have a built-in way to do a syscall, we need to "extend" the language by adding some assembly code. This (inline) assembly, and the syscall numbers, is what makes this example, operating system and architecture dependent. And yes: the GNU C library, for example, contains the equivalent wrappers for quite a few architectures.
Some of the functions in the C library do not use any syscalls. We also need one, the equivalent of strlen():
static inline int my_strlen(const char *str)
{
int len = 0L;
if (!str)
return -1;
while (*str++)
len++;
return len;
}
Note that there is no NULL used anywhere in the above code. It is because it is a macro defined by the C library. Instead, I'm relying on "logical null": (!pointer) is true if and only if pointer is a zero pointer, which is what NULL is on all architectures in Linux. I could have defined NULL myself, but I didn't, in the hopes that somebody might notice the lack of it.
Finally, main() itself is something the GNU C library calls, as in Linux, the actual start point of the binary is called _start. The _start is provided by the hosted runtime environment, and initializes the C library data structures and does other similar preparations. Our example program is so simple we do not need it, so we can just put our simple main program part into _start instead:
void _start(void)
{
const char *msg = "Hello, world!\n";
my_write(STDOUT_FILENO, msg, my_strlen(msg));
my_exit(EXIT_SUCCESS);
}
If you put all of the above together, and compile it using
gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello
per the comment at the start of the file, you will end up with a small (about two kilobytes) static binary, that when run,
./hello
outputs
Hello, world!
You can use file hello to examine the contents of the file. You could run strip hello to remove all (unneeded) symbols, reducing the file size further down to about one and a half kilobytes, if file size was really important. (It will make the object dump less interesting, however, so before you do that, check out the next step first.)
We can use objdump -x hello to examine the sections in the file:
hello: file format elf64-x86-64
hello
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004001e1
Program Header:
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
filesz 0x00000000000002f0 memsz 0x00000000000002f0 flags r-x
NOTE off 0x0000000000000120 vaddr 0x0000000000400120 paddr 0x0000000000400120 align 2**2
filesz 0x0000000000000024 memsz 0x0000000000000024 flags r--
EH_FRAME off 0x000000000000022c vaddr 0x000000000040022c paddr 0x000000000040022c align 2**2
filesz 0x000000000000002c memsz 0x000000000000002c flags r--
STACK off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-
Sections:
Idx Name Size VMA LMA File off Algn
0 .note.gnu.build-id 00000024 0000000000400120 0000000000400120 00000120 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 000000d9 0000000000400144 0000000000400144 00000144 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .rodata 0000000f 000000000040021d 000000000040021d 0000021d 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .eh_frame_hdr 0000002c 000000000040022c 000000000040022c 0000022c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .eh_frame 00000098 0000000000400258 0000000000400258 00000258 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .comment 00000034 0000000000000000 0000000000000000 000002f0 2**0
CONTENTS, READONLY
SYMBOL TABLE:
0000000000400120 l d .note.gnu.build-id 0000000000000000 .note.gnu.build-id
0000000000400144 l d .text 0000000000000000 .text
000000000040021d l d .rodata 0000000000000000 .rodata
000000000040022c l d .eh_frame_hdr 0000000000000000 .eh_frame_hdr
0000000000400258 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l df *ABS* 0000000000000000 hello.c
0000000000400144 l F .text 0000000000000016 my_exit
000000000040015a l F .text 000000000000004e my_write
00000000004001a8 l F .text 0000000000000039 my_strlen
0000000000000000 l df *ABS* 0000000000000000
000000000040022c l .eh_frame_hdr 0000000000000000 __GNU_EH_FRAME_HDR
00000000004001e1 g F .text 000000000000003c _start
0000000000601000 g .eh_frame 0000000000000000 __bss_start
0000000000601000 g .eh_frame 0000000000000000 _edata
0000000000601000 g .eh_frame 0000000000000000 _end
The .text section contains our code, and .rodata immutable constants; here, just the Hello, world! string literal. The rest of the sections are stuff the linker adds and the system uses. We can see that we have f(hex) = 15 bytes of read-only data, and d9(hex) = 217 bytes of code; the rest of the file (about a kilobyte or so) is ELF stuff added by the linker for the kernel to use when executing this binary.
We can even examine the actual assembly code contained in hello, by running objdump -d hello:
hello: file format elf64-x86-64
Disassembly of section .text:
0000000000400144 <my_exit>:
400144: 55 push %rbp
400145: 48 89 e5 mov %rsp,%rbp
400148: 89 7d fc mov %edi,-0x4(%rbp)
40014b: b8 3c 00 00 00 mov $0x3c,%eax
400150: 8b 55 fc mov -0x4(%rbp),%edx
400153: 89 d7 mov %edx,%edi
400155: 0f 05 syscall
400157: 90 nop
400158: 5d pop %rbp
400159: c3 retq
000000000040015a <my_write>:
40015a: 55 push %rbp
40015b: 48 89 e5 mov %rsp,%rbp
40015e: 89 7d ec mov %edi,-0x14(%rbp)
400161: 48 89 75 e0 mov %rsi,-0x20(%rbp)
400165: 89 55 e8 mov %edx,-0x18(%rbp)
400168: 83 7d ec ff cmpl $0xffffffff,-0x14(%rbp)
40016c: 74 0d je 40017b <my_write+0x21>
40016e: 48 83 7d e0 00 cmpq $0x0,-0x20(%rbp)
400173: 74 06 je 40017b <my_write+0x21>
400175: 83 7d e8 00 cmpl $0x0,-0x18(%rbp)
400179: 79 07 jns 400182 <my_write+0x28>
40017b: b8 ff ff ff ff mov $0xffffffff,%eax
400180: eb 24 jmp 4001a6 <my_write+0x4c>
400182: b8 01 00 00 00 mov $0x1,%eax
400187: 8b 7d ec mov -0x14(%rbp),%edi
40018a: 48 8b 75 e0 mov -0x20(%rbp),%rsi
40018e: 8b 55 e8 mov -0x18(%rbp),%edx
400191: 0f 05 syscall
400193: 89 45 fc mov %eax,-0x4(%rbp)
400196: 83 7d fc 00 cmpl $0x0,-0x4(%rbp)
40019a: 79 07 jns 4001a3 <my_write+0x49>
40019c: b8 ff ff ff ff mov $0xffffffff,%eax
4001a1: eb 03 jmp 4001a6 <my_write+0x4c>
4001a3: 8b 45 fc mov -0x4(%rbp),%eax
4001a6: 5d pop %rbp
4001a7: c3 retq
00000000004001a8 <my_strlen>:
4001a8: 55 push %rbp
4001a9: 48 89 e5 mov %rsp,%rbp
4001ac: 48 89 7d e8 mov %rdi,-0x18(%rbp)
4001b0: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
4001b7: 48 83 7d e8 00 cmpq $0x0,-0x18(%rbp)
4001bc: 75 0b jne 4001c9 <my_strlen+0x21>
4001be: b8 ff ff ff ff mov $0xffffffff,%eax
4001c3: eb 1a jmp 4001df <my_strlen+0x37>
4001c5: 83 45 fc 01 addl $0x1,-0x4(%rbp)
4001c9: 48 8b 45 e8 mov -0x18(%rbp),%rax
4001cd: 48 8d 50 01 lea 0x1(%rax),%rdx
4001d1: 48 89 55 e8 mov %rdx,-0x18(%rbp)
4001d5: 0f b6 00 movzbl (%rax),%eax
4001d8: 84 c0 test %al,%al
4001da: 75 e9 jne 4001c5 <my_strlen+0x1d>
4001dc: 8b 45 fc mov -0x4(%rbp),%eax
4001df: 5d pop %rbp
4001e0: c3 retq
00000000004001e1 <_start>:
4001e1: 55 push %rbp
4001e2: 48 89 e5 mov %rsp,%rbp
4001e5: 48 83 ec 10 sub $0x10,%rsp
4001e9: 48 c7 45 f8 1d 02 40 movq $0x40021d,-0x8(%rbp)
4001f0: 00
4001f1: 48 8b 45 f8 mov -0x8(%rbp),%rax
4001f5: 48 89 c7 mov %rax,%rdi
4001f8: e8 ab ff ff ff callq 4001a8 <my_strlen>
4001fd: 89 c2 mov %eax,%edx
4001ff: 48 8b 45 f8 mov -0x8(%rbp),%rax
400203: 48 89 c6 mov %rax,%rsi
400206: bf 01 00 00 00 mov $0x1,%edi
40020b: e8 4a ff ff ff callq 40015a <my_write>
400210: bf 00 00 00 00 mov $0x0,%edi
400215: e8 2a ff ff ff callq 400144 <my_exit>
40021a: 90 nop
40021b: c9 leaveq
40021c: c3 retq
The assembly itself is not really that interesting, except that in my_write and my_exit you can see how the inline assembly generated by the SYSCALL...() macro just loads the variables into specific registers, and does the "do syscall" -- which just happens to be an x86-64 assembly instruction also called syscall here; in 32-bit x86 architecture, it is int $80, and yet something else in other architectures.
There is a final wrinkle, related to the reason why I used the prefix my_ for the functions analog to the functions in the C library: the C compiler can provide optimized shortcuts for some C library functions. For GCC, these are listed here; the list includes strlen().
This means we do not actually need the my_strlen() function, because we can use the optimized __builtin_strlen() function GCC provides, even in freestanding environment. The built-ins are usually very optimized; in the case of __builtin_strlen() on x86-64 using GCC-5.4.0, it optimizes to just a couple of register loads and a repnz scasb %es:(%rdi),%al instruction (which looks long, but actually takes just two bytes).
In other words, the final wrinkle is that there is a third type of function, compiler built-ins, that are provided by the compiler (but otherwise just like the functions provided by the C library) in optimized form, depending on the compiler options and architecture used.
If we were to expand the above example so that we'd open a file and write the Hello, world! into it, and compare low-level unistd.h (open()/write()/close()) and standard I/O stdio.h (fopen()/puts()/fclose()) approaches, we'd find that the major difference is in that the FILE handle used by the standard I/O approach contains a lot of extra stuff (that makes the standard file handles quite versatile, just not useful in such a trivial example), most visible in the buffering approach it has. On the assembly level, we'd still see the same syscalls -- open, write, close -- used.
Even though at first glance the ELF format (used for binaries in Linux) contains a lot of "unneeded stuff" (about a kilobyte for our example program above), it is actually a very powerful format. It, and the dynamic loader in Linux, provides a way to auto-load libraries when a program starts (using LD_PRELOAD environment variable), and to interpose functions in other libraries -- essentially, replace them with new ones, but with a way to still be able to call the original interposed version of the function. There are lots of useful tricks, fixes, experiments, and debugging methods these allow.

Although the distinction between "system call" and "library function" can be a useful one to keep in mind, there's the issue that you have to be able to call system calls somehow. In general, then, every system call is present in the C library -- as a thin little library function that does nothing but make the transfer to the system call (however that's implemented).
So, yes, you can call open() from C code if you want to. (And somewhere, perhaps in a file called fopen.c, the author of your C library probably called it too, within the implementation of fopen().)

The starting point for answering your question is to ask another question: What is a system call?
Generally, one thinks of a system call as a procedure that executes at an elevated processor privilege level. Generally, this means switching from user mode to kernel mode (some systems use multiple modes).
The mechanism for and application to enter kernel mode depends upon the system (and one Intel there are multiple ways). The general sequence for invoking a system service is the process executes an instruction that triggers a change processor mode exception. The CPU responds to the exception by invoking the appropriate exception/interrupt handler then dispatches to the appropriate operating system service.
The problem for C programming is that invoking a system service requires executing a specific hardware instruction and setting hardware register values. Operating systems provide wrapper functions that that handle the packing of parameters into registers, triggering the exception, then unpacking the return values from registers.
The open() function usually be a wrapper for high level languages to invoke system services. If you think about, fopen() is generally a "wrapper" for open().
So what we normally think of as a system call is a function that does nothing other than invoke a system service.

Creating x86 bootloader

I am writing a bootloader as follows:
bits 16
[org 0x7c00]
KERN_OFFSET equ 0x1000
mov [BOOTDISK], dl
mov dl, 0x0 ;0 is for floppy-disk
mov ah, 0x2 ;Read function for the interrupt
mov al, 0x15 ;Read 15 sectors conating kernel
mov ch, 0x0 ;Use cylinder 0
mov cl, 0x2 ;Start from the second sector which contains kernel
mov dh, 0x0 ;Read head 0
mov bx, KERN_OFFSET
int 0x13
jc disk_error
cmp al, 0x15
jne disk_error
jmp KERN_OFFSET:0x0
jmp $
disk_error:
jmp $
BOOTDISK: db 0
times 510-($-$$) db 0
dw 0xaa55
The kernel is a simple C program which prints "e" on the VGA display (seen on QEmu):
void main()
{
extern void put_in_mem();
char c = 'e';
put_in_mem(c, 0xA0);
}
I am using this code in 16 bit (real mode) in QEmu so I am using the compiler bcc for this code using:
bcc -ansi -c -o kernel.o kernel.c
I have the following questions:
1. When I try to disassemble this code, using
objdump -D -b binary -mi386 kernel.o
I get an output like this (only initial portion of output):
kernel.o: file format binary
Disassembly of section .data:
00000000 <.data>:
0: a3 86 01 00 2a mov %eax,0x2a000186
5: 3e 00 00 add %al,%ds:(%eax)
8: 00 22 add %ah,(%edx)
a: 00 00 add %al,(%eax)
c: 00 19 add %bl,(%ecx)
e: 00 00 add %al,(%eax)
10: 00 55 55 add %dl,0x55(%ebp)
13: 55 push %ebp
14: 55 push %ebp
15: 00 00 add %al,(%eax)
17: 00 02 add %al,(%edx)
19: 22 00 and (%eax),%al
This output does not seem to correspond to the kernel.c file I made. For example I could not see where 'e' is stored as ASCII 0x65 or where is the call to put_in_mem made. Is something wrong with the way I am disassembling the code?
To make the object file of the kernel for QEmu I used the following command:
ld86 -o kernel -d kernel.o put_in_mem.o
Here put_in_mem.o is the object file created after assembling the put_in_mem.asm file which contains the definition of the function put_in_mem() used in kernel.c.
Then floppy image for QEmu is made using:
cat boot.o kernel > floppy_img
But when I try to look at the address 0x10000 (using GDB), where the kernel was supposed to be present after loading (using the boot.asm program), it was not present.
Why is this happening?
Further, in ld command we used -Ttext option to specify the load address of the binary, should we use some similar option here with ld86?

Your kernel.o is in an object file format not understood by objdump so it tries to disassemble everything in it, including headers and whatnot. Try to disassemble the linked output kernel instead. Also objdump might not understand 16 bit code. Better try objdump86 if you have that available.
As to why it's not present: you are looking at the wrong place. You are loading it to offset 0x1000 (3 zeroes) but you are looking at 0x10000 (4 zeroes). Also note that you don't set up ES which is bad practice. Maybe you intended to set ES to 0x1000 and BX to 0x0000 and then you would find your kernel at 0x10000 physical address.
The -Ttext doesn't influence loading, it only specifies where the code expects to find itself.

Where do static local variables go

Where are static local variables stored in memory? Local variables can be accessed only inside the function in which they are declared.
Global static variables go into the .data segment.
If both the name of the static global and static local variable are same, how does the compiler distinguish them?

Static variables go into the same segment as global variables. The only thing that's different between the two is that the compiler "hides" all static variables from the linker: only the names of extern (global) variables get exposed. That is how compilers allow static variables with the same name to exist in different translation units. Names of static variables remain known during the compilation phase, but then their data is placed into the .data segment anonymously.

Static variable is almost similar to global variable and hence the uninitialized static variable is in BSS and the initialized static variable is in data segment.

As mentioned by dasblinken, GCC 4.8 puts local statics on the same place as globals.
More precisely:
static int i = 0 goes on .bss
static int i = 1 goes on .data
Let's analyze one Linux x86-64 ELF example to see it ourselves:
#include <stdio.h>
int f() {
static int i = 1;
i++;
return i;
}
int main() {
printf("%d\n", f());
printf("%d\n", f());
return 0;
}
To reach conclusions, we need to understand the relocation information. If you've never touched that, consider reading this post first.
Compile it:
gcc -ggdb -c main.c
Decompile the code with:
objdump -S main.o
f contains:
int f() {
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
static int i = 1;
i++;
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
a: 83 c0 01 add $0x1,%eax
d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 13 <f+0x13>
return i;
13: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 19 <f+0x19>
}
19: 5d pop %rbp
1a: c3 retq
Which does 3 accesses to i:
4 moves to the eax to prepare for the increment
d moves the incremented value back to memory
13 moves i to the eax for the return value. It is obviously unnecessary since eax already contains it, and -O3 is able to remove that.
So let's focus just on 4:
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
Let's look at the relocation data:
readelf -r main.o
which says how the text section addresses will be modified by the linker when it is making the executable.
It contains:
Relocation section '.rela.text' at offset 0x660 contains 9 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000006 000300000002 R_X86_64_PC32 0000000000000000 .data - 4
We look at .rela.text and not the others because we are interested in relocations of .text.
Offset 6 falls right into the instruction that starts at byte 4:
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
^^
This is offset 6
From our knowledge of x86-64 instruction encoding:
8b 05 is the mov part
00 00 00 00 is the address part, which starts at byte 6
AMD64 System V ABI Update tells us that R_X86_64_PC32 acts on 4 bytes (00 00 00 00) and calculates the address as:
S + A - P
which means:
S: the segment pointed to: .data
A: the Added: -4
P: the address of byte 6 when loaded
-P is needed because GCC used RIP relative addressing, so we must discount the position in .text
-4 is needed because RIP points to the following instruction at byte 0xA but P is byte 0x6, so we need to discount 4.
Conclusion: after linking it will point to the first byte of the .data segment.