analyzing i386 assembled function... line by line - c

hi i'm such newb in assemble and OS world. and yes this is my homework which i'm in stuck in deep dark of i386 manual. please help me or give me some hint.. here's code i have to analyze ine by line. this function is part of EOS(educational OS), doing about interrupt request in hal(hardware abstraction layer). i did "objdump -d interrupt.o" and got this assemble code. of course in i386.
00000000 <eos_ack_irq>:
0: 55 push %ebp ; push %ebp to stack to save stack before
1: b8 fe ff ff ff mov $0xfffffffe,%eax ; what is this??
6: 89 e5 mov %esp,%ebp ; couple with "push %ebp". known as prolog assembly function.
8: 8b 4d 08 mov 0x8(%ebp),%ecx ; set %ecx as value of (%ebp+8)...and what is this do??
b: 5d pop %ebp ; pop the top of stack to %ebp. i know this is for getting back to callee..
c: d3 c0 rol %cl,%eax ; ????? what is this for???
e: 21 05 00 00 00 00 and %eax,0x0 ; make %eax as 0. for what??
14: c3 ret ; return what register??
00000015 <eos_get_irq>:
15: 8b 15 00 00 00 00 mov 0x0,%edx
1b: b8 1f 00 00 00 mov $0x1f,%eax
20: 55 push %ebp
21: 89 e5 mov %esp,%ebp
23: 56 push %esi
24: 53 push %ebx
25: bb 01 00 00 00 mov $0x1,%ebx
2a: 89 de mov %ebx,%esi
2c: 88 c1 mov %al,%cl
2e: d3 e6 shl %cl,%esi
30: 85 d6 test %edx,%esi
32: 75 06 jne 3a <eos_get_irq+0x25>
34: 48 dec %eax
35: 83 f8 ff cmp $0xffffffff,%eax
38: 75 f0 jne 2a <eos_get_irq+0x15>
3a: 5b pop %ebx
3b: 5e pop %esi
3c: 5d pop %ebp
3d: c3 ret
0000003e <eos_disable_irq_line>:
3e: 55 push %ebp
3f: b8 01 00 00 00 mov $0x1,%eax
44: 89 e5 mov %esp,%ebp
46: 8b 4d 08 mov 0x8(%ebp),%ecx
49: 5d pop %ebp
4a: d3 e0 shl %cl,%eax
4c: 09 05 00 00 00 00 or %eax,0x0
52: c3 ret
00000053 <eos_enable_irq_line>:
53: 55 push %ebp
54: b8 fe ff ff ff mov $0xfffffffe,%eax
59: 89 e5 mov %esp,%ebp
5b: 8b 4d 08 mov 0x8(%ebp),%ecx
5e: 5d pop %ebp
5f: d3 c0 rol %cl,%eax
61: 21 05 00 00 00 00 and %eax,0x0
67: c3 ret
and here's pre-assembled C code
/* ack the specified irq */
void eos_ack_irq(int32u_t irq) {
/* clear the corresponding bit in _irq_pending register */
_irq_pending &= ~(0x1<<irq);
}
/* get the irq number */
int32s_t eos_get_irq() {
/* get the highest bit position in the _irq_pending register */
int i = 31;
for(; i>=0; i--) {
if (_irq_pending & (0x1<<i)) {
return i;
}
}
return -1;
}
/* mask an irq */
void eos_disable_irq_line(int32u_t irq) {
/* turn on the corresponding bit */
_irq_mask |= (0x1<<irq);
}
/* unmask an irq */
void eos_enable_irq_line(int32u_t irq) {
/* turn off the corresponding bit */
_irq_mask &= ~(0x1<<irq);
}
so these functions do ack and get and mask and unmask an interrupt request. and i'm stuck at the first one. so if you are mercy enough, would you please get me some hint or answer to analyze the first function? i'll try to get the others... and i'm very sorry for another homework.. (my TA doesn't look email)

21 05 00 00 00 00 (that and) is actually an and with a memory operand (namely and [0], eax) which the AT&T syntax obscures (but technically it does say that, note the absence of a $ sign). It makes more sense that way (the offset of 0 suggests you didn't link the code before disassembling).
mov $0xfffffffe, %eax is doing exactly what it looks like it's doing (note that 0xfffffffe is all ones except the lowest bit), and that means the function has been implemented like this:
_irq_pending &= rotate_left(0xFFFFFFFE, irq);
Saving a not operation. It has to be a rotate there instead of a shift in order to make the low bits 1 if necessary.

Related

AFL-GCC compiles differently than GCC

I want to understand AFL's code instrumentation in detail.
Compiling a sample program sample.c
int main(int argc, char **argv) {
int ret = 0;
if(argc > 1) {
ret = 7;
} else {
ret = 12;
}
return ret;
}
with gcc -c -o obj/sample-gcc.o src/sample.c and afl-gcc -c -o obj/sample-afl-gcc.o src/sample.c and disassembling both with objdump -d leads to different Assembly code:
[GCC]
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 89 7d ec mov %edi,-0x14(%rbp)
b: 48 89 75 e0 mov %rsi,-0x20(%rbp)
f: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
16: 83 7d ec 01 cmpl $0x1,-0x14(%rbp)
1a: 7e 09 jle 25 <main+0x25>
1c: c7 45 fc 07 00 00 00 movl $0x7,-0x4(%rbp)
23: eb 07 jmp 2c <main+0x2c>
25: c7 45 fc 0c 00 00 00 movl $0xc,-0x4(%rbp)
2c: 8b 45 fc mov -0x4(%rbp),%eax
2f: 5d pop %rbp
30: c3 retq
[AFL-GCC]
0000000000000000 <main>:
0: 48 8d a4 24 68 ff ff lea -0x98(%rsp),%rsp
7: ff
8: 48 89 14 24 mov %rdx,(%rsp)
c: 48 89 4c 24 08 mov %rcx,0x8(%rsp)
11: 48 89 44 24 10 mov %rax,0x10(%rsp)
16: 48 c7 c1 0e ff 00 00 mov $0xff0e,%rcx
1d: e8 00 00 00 00 callq 22 <main+0x22>
22: 48 8b 44 24 10 mov 0x10(%rsp),%rax
27: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx
2c: 48 8b 14 24 mov (%rsp),%rdx
30: 48 8d a4 24 98 00 00 lea 0x98(%rsp),%rsp
37: 00
38: f3 0f 1e fa endbr64
3c: 31 c0 xor %eax,%eax
3e: 83 ff 01 cmp $0x1,%edi
41: 0f 9e c0 setle %al
44: 8d 44 80 07 lea 0x7(%rax,%rax,4),%eax
48: c3 retq
AFL (usually) adds a trampoline in front of every basic block to track executed paths [https://github.com/mirrorer/afl/blob/master/afl-as.h#L130]
-> Instruction 0x00 lea until 0x30 lea
AFL (usually) adds a main payload to the program (which I excluded due to simplicity) [https://github.com/mirrorer/afl/blob/master/afl-as.h#L381]
AFL claims to use a wrapper for GCC, so I expected everything else to be equal. Why is the if-else-condition still compiled differently?
Bonus question: If a binary without source code available should be instrumented manually without using AFL's QEMU-mode or Unicorn-mode, can this be achieved by (naively) adding the main payload and each trampoline manually to the binary file or are there better approaches?
Re: Why the compilation with gcc and with afl-gcc is different, a short look at the afl-gcc source shows that by default it modifies the compiler parameters, setting -O3 -funroll-loops (as well as defining __AFL_COMPILER and FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION).
According to the documentation (docs/env_variables.txt):
By default, the wrapper appends -O3 to optimize builds. Very rarely,
this will cause problems in programs built with -Werror, simply
because -O3 enables more thorough code analysis and can spew out
additional warnings. To disable optimizations, set AFL_DONT_OPTIMIZE.

GCC optimizer generating error in nostdlib code

I have the following code:
void cp(void *a, const void *b, int n) {
for (int i = 0; i < n; ++i) {
((char *) a)[i] = ((const char *) b)[i];
}
}
void _start(void) {
char buf[20];
const char m[] = "123456789012345";
cp(buf, m, 15);
register int rax __asm__ ("rax") = 60; // exit
register int rdi __asm__ ("rdi") = 0; // status
__asm__ volatile (
"syscall" :: "r" (rax), "r" (rdi) : "cc", "rcx", "r11"
);
__builtin_unreachable();
}
If I compile it with gcc -nostdlib -O1 "./a.c" -o "./a", I get a functioning program, but if I compile it with -O2, I get a program that generates a segmentation fault.
This is the generated code with -O1:
0000000000001000 <cp>:
1000: b8 00 00 00 00 mov $0x0,%eax
1005: 0f b6 14 06 movzbl (%rsi,%rax,1),%edx
1009: 88 14 07 mov %dl,(%rdi,%rax,1)
100c: 48 83 c0 01 add $0x1,%rax
1010: 48 83 f8 0f cmp $0xf,%rax
1014: 75 ef jne 1005 <cp+0x5>
1016: c3 retq
0000000000001017 <_start>:
1017: 48 83 ec 30 sub $0x30,%rsp
101b: 48 b8 31 32 33 34 35 movabs $0x3837363534333231,%rax
1022: 36 37 38
1025: 48 ba 39 30 31 32 33 movabs $0x35343332313039,%rdx
102c: 34 35 00
102f: 48 89 04 24 mov %rax,(%rsp)
1033: 48 89 54 24 08 mov %rdx,0x8(%rsp)
1038: 48 89 e6 mov %rsp,%rsi
103b: 48 8d 7c 24 10 lea 0x10(%rsp),%rdi
1040: ba 0f 00 00 00 mov $0xf,%edx
1045: e8 b6 ff ff ff callq 1000 <cp>
104a: b8 3c 00 00 00 mov $0x3c,%eax
104f: bf 00 00 00 00 mov $0x0,%edi
1054: 0f 05 syscall
And this is the generated code with -O2:
0000000000001000 <cp>:
1000: 31 c0 xor %eax,%eax
1002: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1008: 0f b6 14 06 movzbl (%rsi,%rax,1),%edx
100c: 88 14 07 mov %dl,(%rdi,%rax,1)
100f: 48 83 c0 01 add $0x1,%rax
1013: 48 83 f8 0f cmp $0xf,%rax
1017: 75 ef jne 1008 <cp+0x8>
1019: c3 retq
101a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000001020 <_start>:
1020: 48 8d 44 24 d8 lea -0x28(%rsp),%rax
1025: 48 8d 54 24 c9 lea -0x37(%rsp),%rdx
102a: b9 31 00 00 00 mov $0x31,%ecx
102f: 66 0f 6f 05 c9 0f 00 movdqa 0xfc9(%rip),%xmm0 # 2000 <_start+0xfe0>
1036: 00
1037: 48 8d 70 0f lea 0xf(%rax),%rsi
103b: 0f 29 44 24 c8 movaps %xmm0,-0x38(%rsp)
1040: eb 0d jmp 104f <_start+0x2f>
1042: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1048: 0f b6 0a movzbl (%rdx),%ecx
104b: 48 83 c2 01 add $0x1,%rdx
104f: 88 08 mov %cl,(%rax)
1051: 48 83 c0 01 add $0x1,%rax
1055: 48 39 f0 cmp %rsi,%rax
1058: 75 ee jne 1048 <_start+0x28>
105a: b8 3c 00 00 00 mov $0x3c,%eax
105f: 31 ff xor %edi,%edi
1061: 0f 05 syscall
The crash happens at 103b, instruction movaps %xmm0,-0x38(%rsp).
I noticed that if m contains less than 15 characters, then the generated code is different and the crash does not happen.
What am I doing wrong?
_start is not a function. It's not called by anything, and on entry the stack is 16-byte aligned, not (as the ABI requires) 8 bytes away from 16-byte alignment.
(The ABI requires 16-byte alignment before a call, and call pushes an 8-byte return address. So on function entry RSP-8 and RSP+8 are 16-byte aligned.)
At -O2 GCC uses alignment-required 16-byte instructions to implement the copy done by cp(), copying the "123456789012345" from static storage to the stack.
At -O1, GCC just uses two mov r64, imm64 instructions to get bytes into integer regs for 8-byte stores. These don't require alignment.
Workarounds
Just write a main in C like a normal person if you want everything to work.
Or if you're trying to microbenchmark something light-weight in asm, you can use gcc -nostdlib -O3 -mincoming-stack-boundary=3 (docs) to tell GCC that functions can't assume they're called with more than 8-byte alignment. Unlike -mpreferred-stack-boundary=3, this will still align by 16 before making further calls. So if you have other non-leaf functions, you might want to just use an attribute on your hacky C _start() instead of affecting the whole file.
A worse, more hacky way would be to try putting
asm("push %rax"); at the very top of _start to modify RSP by 8, where GCC hopefully runs it before doing anything else with the stack. GNU C Basic asm statements are implicitly volatile so you don't need asm volatile, although that wouldn't hurt.
You're 100% on your own and responsible for correctly tricking the compiler by using inline asm that works for whatever optimization level you're using.
Another safer way would be write your own light-weight _start that calls main:
// at global scope:
asm(
".globl _start \n"
"_start: \n"
" mov (%rsp), %rdi \n" // argc
" lea 8(%rsp), %rsi \n" // argv
" lea 8(%rsi, %rdi, 8), %rdx \n" // envp
" call main \n"
// NOT DONE: stdio cleanup or other atexit stuff
// DO NOT USE WITH GLIBC; use libc's CRT code if you use libc
" mov %eax, %edi \n"
" mov $231, %eax \n"
" syscall" // exit_group( main() )
);
int main(int argc, char**argv, char**envp) {
... your code here
return 0;
}
If you didn't want main to return, you could just pop %rdi; mov %rsp, %rsi ; jmp main to give it argc and argv without a return address.
Then main can exit via inline asm, or by calling exit() or _exit() if you link libc. (But if you link libc, you should usually use its _start.)
See also: How Get arguments value using inline assembly in C without Glibc? for other hand-rolled _start versions; this is pretty much like #zwol's there.

System V AMD64 ABI floating point varargs order

I compiled a call to printf with different kinds of args.
Here's the code + generated asm:
int main(int argc, char const *argv[]){
// 0: 55 push rbp
// 1: 48 89 e5 mov rbp,rsp
// 4: 48 83 ec 20 sub rsp,0x20
// 8: 89 7d fc mov DWORD PTR [rbp-0x4],edi
// b: 48 89 75 f0 mov QWORD PTR [rbp-0x10],rsi
printf("%s %f %d %f\n", "aye u gonna get some", 133.7f, 420, 69.f);
// f: f2 0f 10 05 00 00 00 00 movsd xmm0,QWORD PTR [rip+0x0] # 17 <main+0x17> 13: R_X86_64_PC32 .rodata+0x2c 69
// 17: 48 8b 05 00 00 00 00 mov rax,QWORD PTR [rip+0x0] # 1e <main+0x1e> 1a: R_X86_64_PC32 .rodata+0x34 133.7
// 1e: 66 0f 28 c8 movapd xmm1,xmm0
// 22: ba a4 01 00 00 mov edx,0x1a4 (420)
// 27: 48 89 45 e8 mov QWORD PTR [rbp-0x18],rax
// 2b: f2 0f 10 45 e8 movsd xmm0,QWORD PTR [rbp-0x18]
// 30: 48 8d 35 00 00 00 00 lea rsi,[rip+0x0] # 37 <main+0x37> 33: R_X86_64_PC32 .rodata-0x4 "aye u wanna get some"
// 37: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 3e <main+0x3e> 3a: R_X86_64_PC32 .rodata+0x18 "%s %f %d %f\n"
// 3e: b8 02 00 00 00 mov eax,0x2
// 43: e8 00 00 00 00 call 48 <main+0x48> 44: R_X86_64_PLT32 printf-0x4
return 0;
// 48: b8 00 00 00 00 mov eax,0x0
// 4d: c9 leave
// 4e: c3 ret
}
Most stuff here makes sense to me.
In fact everything here makes some level of sense to me.
"%s %f %d %f\n" -> rdi
"aye u gonna get some" -> rsi
133.7 -> xmm0
420 -> rdx
69 -> xmm1
2 -> rax (to indicate there are 2 floating point arguments)
Now what I don't understand is how printf (or any other varargs function) would figure out the position of these floating point arguments among the others.
It can't be compiler magic either since it's dynamically linked.
So the only thing I can think of is maybe it's just va_arg internals, and how when you provide a type, if it's floating point, it must get from the xmms (or stack) instead of otherwise.
Is that correct? If not, how does the other side know where to get 'em? Thanks in advance.
For printf the format string indicates the type of the remaining argments.
The implementation of va_arg knows the type as it is an argument of va_arg, and the correct register can be deduced from the types.

Declare variable first or directly, is there a difference?

Is there a difference between declaring a variable first and then assigning a value or directly declaring and assigning a value in the compiled function? Does the compiled function do the same work? e.g, does it still read the parameters, declare variables and then assign value or is there a difference between the two examples in the compiled versions?
example:
void foo(u32 value) {
u32 extvalue = NULL;
extvalue = value;
}
compared with
void foo(u32 value) {
u32 extvalue = value;
}
I am under the impression that there is no difference between those two functions if you look at the compiled code, e.g they will look the same and i will not be able to tell which is which.
it depends on the compiler & the optimization level of course.
A dumb compiler/low optimization level when it sees:
u32 extvalue = NULL;
extvalue = value;
could set to NULL then to value in the next line.
Since extvalue isn't used in-between, the NULL initialization is useless and most compilers directly set to value as an easy optimization
Note that declaring a variable isn't really an instruction per se. The compiler just allocates auto memory to store this variable.
I've tested a simple code with and without assignment and the result is diff
erent when using gcc compiler 6.2.1 with -O0 (don't optimize anything) flag:
#include <stdio.h>
void foo(int value) {
int extvalue = 0;
extvalue = value;
printf("%d",extvalue);
}
disassembled:
Disassembly of section .text:
00000000 <_foo>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 28 sub $0x28,%esp
6: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%ebp) <=== here we see the init
d: 8b 45 08 mov 0x8(%ebp),%eax
10: 89 45 f4 mov %eax,-0xc(%ebp)
13: 8b 45 f4 mov -0xc(%ebp),%eax
16: 89 44 24 04 mov %eax,0x4(%esp)
1a: c7 04 24 00 00 00 00 movl $0x0,(%esp)
21: e8 00 00 00 00 call 26 <_foo+0x26>
26: c9 leave
27: c3 ret
now:
void foo(int value) {
int extvalue;
extvalue = value;
printf("%d",extvalue);
}
disassembled:
Disassembly of section .text:
00000000 <_foo>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 28 sub $0x28,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 89 45 f4 mov %eax,-0xc(%ebp)
c: 8b 45 f4 mov -0xc(%ebp),%eax
f: 89 44 24 04 mov %eax,0x4(%esp)
13: c7 04 24 00 00 00 00 movl $0x0,(%esp)
1a: e8 00 00 00 00 call 1f <_foo+0x1f>
1f: c9 leave
20: c3 ret
21: 90 nop
22: 90 nop
23: 90 nop
the 0 init has disappeared. The compiler didn't optimize the initialization in that case.
If I switch to -O2 (good optimization level) the code is then identical in both cases, compiler found that the initialization wasn't necessary (still, silent, no warnings):
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 18 sub $0x18,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: c7 04 24 00 00 00 00 movl $0x0,(%esp)
10: 89 44 24 04 mov %eax,0x4(%esp)
14: e8 00 00 00 00 call 19 <_foo+0x19>
19: c9 leave
1a: c3 ret
I tried these functions in godbolt:
void foo(uint32_t value)
{
uint32_t extvalue = NULL;
extvalue = value;
}
void bar(uint32_t value)
{
uint32_t extvalue = value;
}
I ported to the actual type uint32_t rather than u32 which is not standard. The resulting non-optimized assembly generated by x86-64 GCC 6.3 is:
foo(unsigned int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-4], 0
mov eax, DWORD PTR [rbp-20]
mov DWORD PTR [rbp-4], eax
nop
pop rbp
ret
bar(unsigned int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov eax, DWORD PTR [rbp-20]
mov DWORD PTR [rbp-4], eax
nop
pop rbp
ret
So clearly the non-optimized code retains the (weird, as pointed out by others since it's not written to a pointer) NULL assignment, which is of course pointless.
I'd vote for the second one since it's shorter (less to hold in one's head when reading the code), and never allow/recommend the pointless setting to NULL before overwriting with the proper value. I would consider that a bug, since you're saying/doing something you don't mean.

Doubts in executable and relocatable object file

I have written a simple Hello World program.
#include <stdio.h>
int main() {
printf("Hello World");
return 0;
}
I wanted to understand how the relocatable object file and executable file look like.
The object file corresponding to the main function is
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
9: b8 00 00 00 00 mov $0x0,%eax
e: e8 00 00 00 00 callq 13 <main+0x13>
13: b8 00 00 00 00 mov $0x0,%eax
18: c9 leaveq
19: c3 retq
Here the function call for printf is callq 13. One thing i don't understand is why is it 13. That means call the function at adresss 13, right??. 13 has the next instruction, right?? Please explain me what does this mean??
The executable code corresponding to main is
00000000004004cc <main>:
4004cc: 55 push %rbp
4004cd: 48 89 e5 mov %rsp,%rbp
4004d0: bf dc 05 40 00 mov $0x4005dc,%edi
4004d5: b8 00 00 00 00 mov $0x0,%eax
4004da: e8 e1 fe ff ff callq 4003c0 <printf#plt>
4004df: b8 00 00 00 00 mov $0x0,%eax
4004e4: c9 leaveq
4004e5: c3 retq
Here it is callq 4003c0. But the binary instruction is e8 e1 fe ff ff. There is nothing that corresponds to 4003c0. What is that i am getting wrong?
Thanks.
Bala
In the first case, take a look at the instruction encoding - it's all zeroes where the function address would go. That's because the object hasn't been linked yet, so the addresses for external symbols haven't been hooked up yet. When you do the final link into the executable format, the system sticks another placeholder in there, and then the dynamic linker will finally add the correct address for printf() at runtime. Here's a quick example for a "Hello, world" program I wrote.
First, the disassembly of the object file:
00000000 <_main>:
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: 55 push %ebp
b: 89 e5 mov %esp,%ebp
d: 51 push %ecx
e: 83 ec 04 sub $0x4,%esp
11: e8 00 00 00 00 call 16 <_main+0x16>
16: c7 04 24 00 00 00 00 movl $0x0,(%esp)
1d: e8 00 00 00 00 call 22 <_main+0x22>
22: b8 00 00 00 00 mov $0x0,%eax
27: 83 c4 04 add $0x4,%esp
2a: 59 pop %ecx
2b: 5d pop %ebp
2c: 8d 61 fc lea -0x4(%ecx),%esp
2f: c3 ret
Then the relocations:
main.o: file format pe-i386
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
00000012 DISP32 ___main
00000019 dir32 .rdata
0000001e DISP32 _puts
As you can see there's a relocation there for _puts, which is what the call to printf turned into. That relocation will get noticed at link time and fixed up. In the case of dynamic library linking, the relocations and fixups might not get fully resolved until the program is running, but you'll get the idea from this example, I hope.
The target of the call in the E8 instruction (call) is specified as relative offset from the current instruction pointer (IP) value.
In your first code sample the offset is obviously 0x00000000. It basically says
call +0
The actual address of printf is not known yet, so the compiler just put the 32-bit value 0x00000000 there as a placeholder.
Such incomplete call with zero offset will naturally be interpreted as the call to the current IP value. On your platform, the IP is pre-incremented, meaning that when some instruction is executed, the IP contains the address of the next instruction. I.e. when instruction at the address 0xE is executed the IP contains value 0x13. And the call +0 is naturally interpreted as the call to instruction 0x13. This is why you see that 0x13 in the disassembly of the incomplete code.
Once the code is complete, the placeholder 0x00000000 offset is replaced with the actual offset of printf function in the code. The offset can be positive (forward) or negative (backward). In your case the IP at the moment of the call is 0x4004DF, while the address of printf function is 0x4003C0. For this reason, the machine instruction will contain a 32-bit offset value equal to 0x4003C0 - 0x4004DF, which is negative value -287. So what you see in the code is actually
call -287
-287 is 0xFFFFFEE1 in binary. This is exactly what you see in your machine code. It is just that the tool you are using displayed it backwards.
Calls are relative in x86, IIRC if you have e8 , the call location is addr+5.
e1 fe ff ff a is little endian encoded relative jump. It really means fffffee1.
Now add this to the address of the call instruction + 5:
(0xfffffee1 + 0x4004da + 5) % 2**32 = 0x4003c0

Resources