I wonder how to switch the call to a function for another inside an executable (.exe in my case)
Here is the code I try to play with
#include <stdio.h>
void hello()
{
printf("Hello world!");
}
void investigate()
{
printf("Investigate all the things!");
}
main()
{
hello();
}
Once I compiled the above code (with gcc) and got an executable (.exe) out of it, I want to switch the "hello" call with "investigate".
--Edit--
My environment: Windows 10 (64bit), mingw with gcc/g++ 4.8.1
--Edit 2--
I'm fine with Linux answer (any Ubuntu or any OpenSuse and any architecture) too as for me it's very important to have a proof-of-concept.
Assuming the compiler is not omitting dead functions entirely, that it is not inlining the function and that the call won't go through the PLT, once you have compiled the executable, you can simply edit the call instruction.
Note that the two functions must be "compatible", where the notion of compatibility is fuzzy, it means "the new must satisfy at least the same assumptions the compiler made when calling the old one".
The ABI is of course one such assumption but it may not be the only one.
If your compiler omitted dead function, you can't switch the function (one is missing).
If your compiler inlined the call, you can't switch the function (there is no call). You can work against the compiler and rewrite the code at the call-site (in the C source), this is called patching.
If your compiler used the PLT, you need to change the GOT entry used by the PLT stub. You may need to document your self a bit but changing the linked procedure is actually a feature of PLT machinery.
If your compiler did nothing of that, this should be case for such a simple source when no optimisations are enabled, you can use objdump -d <file> to find the call-site and the address of the new function:
000000000040051d <hello>:
40051d: 55 push %rbp
40051e: 48 89 e5 mov %rsp,%rbp
400521: bf f0 05 40 00 mov $0x4005f0,%edi
400526: b8 00 00 00 00 mov $0x0,%eax
40052b: e8 d0 fe ff ff callq 400400 <printf#plt>
400530: 5d pop %rbp
400531: c3 retq
0000000000400532 <investigate>:
400532: 55 push %rbp
400533: 48 89 e5 mov %rsp,%rbp
400536: bf fd 05 40 00 mov $0x4005fd,%edi
40053b: b8 00 00 00 00 mov $0x0,%eax
400540: e8 bb fe ff ff callq 400400 <printf#plt>
400545: 5d pop %rbp
400546: c3 retq
0000000000400547 <main>:
400547: 55 push %rbp
400548: 48 89 e5 mov %rsp,%rbp
40054b: b8 00 00 00 00 mov $0x0,%eax
400550: e8 c8 ff ff ff callq 40051d <hello>
400555: b8 00 00 00 00 mov $0x0,%eax
40055a: 5d pop %rbp
40055b: c3 retq
40055c: 0f 1f 40 00 nopl 0x0(%rax)
Then change the immediate value of the call instruction with the difference between the target address and the address after the end of the call instruction (it doesn't matter where the origin is as long as it's the same for both addresses).
Target = 400532
After the end of call = 400555
Difference = 400532 - 400555 = -23 = 0xFFFFFFDD
Change from:
400550: e8 c8 ff ff ff
to:
400550: e8 dd ff ff ff
Note that immediates are little-endiands.
You can use an hexeditor to edit the code, to find the offset into the file you can either use an elf reader and do a bit of math your self or you can simply search for the bytes of the call instruction (check also the bytes around the call to be sure).
After the edit, the binary has been patched:
0000000000400532 <investigate>:
400532: 55 push %rbp
400533: 48 89 e5 mov %rsp,%rbp
400536: bf fd 05 40 00 mov $0x4005fd,%edi
40053b: b8 00 00 00 00 mov $0x0,%eax
400540: e8 bb fe ff ff callq 400400 <printf#plt>
400545: 5d pop %rbp
400546: c3 retq
0000000000400547 <main>:
400547: 55 push %rbp
400548: 48 89 e5 mov %rsp,%rbp
40054b: b8 00 00 00 00 mov $0x0,%eax
400550: e8 dd ff ff ff callq 400532 <investigate>
400555: b8 00 00 00 00 mov $0x0,%eax
40055a: 5d pop %rbp
40055b: c3 retq
Related
When I compile my code below it prints
I am running :)
forever(Until I send KeyboardInterrupt signal to the program),
but when I uncomment // printf("done:%d\n", done);, recompile and run it, it will print only two times, prints done: 1 and then returns.
I'm new to ucontext.h and I'm very confused about how this code is working and
why a single printf is changing whole behavior of the code, if you replace printf with done++; it would do the same but if you replace it with done = 2; it does not affect anything and works as we had the printf commented at first place.
Can anyone explain:
Why is this code acting like this and what's the logic behind it?
Sorry for my bad English,
Thanks a lot.
#include <ucontext.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
register int done = 0;
ucontext_t one;
ucontext_t two;
getcontext(&one);
printf("I am running :)\n");
sleep(1);
if (!done)
{
done = 1;
swapcontext(&two, &one);
}
// printf("done:%d\n", done);
return 0;
}
This is a compiler optimization "problem". When the "printf()" is commented, the compiler deduces that "done" will not be used after the "if (!done)", so it does not set it to 1 as it is not worth. But when the "printf()" is present, "done" is used after "if (!done)", so the compiler sets it.
Assembly code with the "printf()":
$ gcc ctx.c -o ctx -g
$ objdump -S ctx
[...]
int main(void)
{
11e9: f3 0f 1e fa endbr64
11ed: 55 push %rbp
11ee: 48 89 e5 mov %rsp,%rbp
11f1: 48 81 ec b0 07 00 00 sub $0x7b0,%rsp
11f8: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
11ff: 00 00
1201: 48 89 45 f8 mov %rax,-0x8(%rbp)
1205: 31 c0 xor %eax,%eax
register int done = 0;
1207: c7 85 5c f8 ff ff 00 movl $0x0,-0x7a4(%rbp) <------- done set to 0
120e: 00 00 00
ucontext_t one;
ucontext_t two;
getcontext(&one);
1211: 48 8d 85 60 f8 ff ff lea -0x7a0(%rbp),%rax
1218: 48 89 c7 mov %rax,%rdi
121b: e8 c0 fe ff ff callq 10e0 <getcontext#plt>
1220: f3 0f 1e fa endbr64
printf("I am running :)\n");
1224: 48 8d 3d d9 0d 00 00 lea 0xdd9(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
122b: e8 70 fe ff ff callq 10a0 <puts#plt>
sleep(1);
1230: bf 01 00 00 00 mov $0x1,%edi
1235: e8 b6 fe ff ff callq 10f0 <sleep#plt>
if (!done)
123a: 83 bd 5c f8 ff ff 00 cmpl $0x0,-0x7a4(%rbp)
1241: 75 27 jne 126a <main+0x81>
{
done = 1;
1243: c7 85 5c f8 ff ff 01 movl $0x1,-0x7a4(%rbp) <----- done set to 1
124a: 00 00 00
swapcontext(&two, &one);
124d: 48 8d 95 60 f8 ff ff lea -0x7a0(%rbp),%rdx
1254: 48 8d 85 30 fc ff ff lea -0x3d0(%rbp),%rax
125b: 48 89 d6 mov %rdx,%rsi
125e: 48 89 c7 mov %rax,%rdi
1261: e8 6a fe ff ff callq 10d0 <swapcontext#plt>
1266: f3 0f 1e fa endbr64
}
printf("done:%d\n", done);
126a: 8b b5 5c f8 ff ff mov -0x7a4(%rbp),%esi
1270: 48 8d 3d 9d 0d 00 00 lea 0xd9d(%rip),%rdi # 2014 <_IO_stdin_used+0x14>
1277: b8 00 00 00 00 mov $0x0,%eax
127c: e8 3f fe ff ff callq 10c0 <printf#plt>
return 0;
Assembly code without the "printf()":
$ gcc ctx.c -o ctx -g
$ objdump -S ctx
[...]
int main(void)
{
11c9: f3 0f 1e fa endbr64
11cd: 55 push %rbp
11ce: 48 89 e5 mov %rsp,%rbp
11d1: 48 81 ec b0 07 00 00 sub $0x7b0,%rsp
11d8: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
11df: 00 00
11e1: 48 89 45 f8 mov %rax,-0x8(%rbp)
11e5: 31 c0 xor %eax,%eax
register int done = 0;
11e7: c7 85 5c f8 ff ff 00 movl $0x0,-0x7a4(%rbp) <------ done set to 0
11ee: 00 00 00
ucontext_t one;
ucontext_t two;
getcontext(&one);
11f1: 48 8d 85 60 f8 ff ff lea -0x7a0(%rbp),%rax
11f8: 48 89 c7 mov %rax,%rdi
11fb: e8 c0 fe ff ff callq 10c0 <getcontext#plt>
1200: f3 0f 1e fa endbr64
printf("I am running :)\n");
1204: 48 8d 3d f9 0d 00 00 lea 0xdf9(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
120b: e8 80 fe ff ff callq 1090 <puts#plt>
sleep(1);
1210: bf 01 00 00 00 mov $0x1,%edi
1215: e8 b6 fe ff ff callq 10d0 <sleep#plt>
if (!done)
121a: 83 bd 5c f8 ff ff 00 cmpl $0x0,-0x7a4(%rbp)
1221: 75 1d jne 1240 <main+0x77>
{
done = 1; <------------- done is no set here (it is optimized by the compiler)
swapcontext(&two, &one);
1223: 48 8d 95 60 f8 ff ff lea -0x7a0(%rbp),%rdx
122a: 48 8d 85 30 fc ff ff lea -0x3d0(%rbp),%rax
1231: 48 89 d6 mov %rdx,%rsi
1234: 48 89 c7 mov %rax,%rdi
1237: e8 74 fe ff ff callq 10b0 <swapcontext#plt>
123c: f3 0f 1e fa endbr64
}
//printf("done:%d\n", done);
return 0;
1240: b8 00 00 00 00 mov $0x0,%eax
}
1245: 48 8b 4d f8 mov -0x8(%rbp),%rcx
1249: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
1250: 00 00
1252: 74 05 je 1259 <main+0x90>
1254: e8 47 fe ff ff callq 10a0 <__stack_chk_fail#plt>
1259: c9 leaveq
125a: c3 retq
125b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
To disable the optimization on "done", add the "volatile" keyword in its definition:
volatile register int done = 0;
This makes the program work in both cases.
(There is some overlap with Rachid K's answer as it was posted while I was writing this.)
I am guessing you are declaring done as register in hopes that it will actually be put in a register, so that its value will be saved and restored by the context switch. But the compiler is never obliged to honor this; most modern compilers ignore register declarations completely and make their own decisions about register usage. And in particular, gcc without optimizations will nearly always put local variables in memory on the stack.
As such, in your test case, the value of done is not restored by the context switch. So when getcontext returns for the second time, done has the same value as when swapcontext was called.
When the printf is present, as Rachid also points out, the done = 1 is actually stored before the swapcontext, so on the second return of getcontext, done has the value 1, the if block is skipped, and the program prints done:1 and exits.
However, when the printf is absent, the compiler notices that the value of done is never used after its assignment (since it assumes swapcontext is a normal function and doesn't know that it will actually return somewhere else), so it optimizes out the dead store (yes, even though optimizations are off). Thus we have done == 0 when getcontext returns the second time, and you get an infinite loop. This is maybe what you were expecting if you thought done would be placed in a register, but if so, you got the "right" behavior for the wrong reason.
If you enable optimizations, you'll see something else again: the compiler notices that done can't be affected by the call to getcontext (again assuming it's a normal function call) and therefore it is guaranteed to be 0 at the if. So the test need not be done at all, because it will always be true. The swapcontext is then executed unconditionally, and as for done, it's optimized completely out of existence, because it no longer has any effect on the code. You'll again see an infinite loop.
Because of this issue, you really can't make any safe assumptions about local variables that have been modified in between the getcontext and swapcontext. When getcontext returns for the second time, you might or might not see the changes. There are further issues if the compiler chose to reorder some of your code around the function call (which it knows no reason not to do, since again it thinks these are ordinary function calls that can't see your local variables).
The only way to get any certainty is to declare a variable volatile. Then you can be sure that intermediate changes will be seen, and the compiler will not assume that getcontext can't change it. The value seen at the second return of getcontext will be the same as at the call to swapcontext. If you write volatile int done = 0; you ought to see just two "I am running" messages, regardless of other code or optimization settings.
I know and understand the purpose of volatile variables and optimisation in general (well, I think I do!). This question relates specifically to what happens if a variable is accessed outside the module it is declared in.
In the following scenario, if funcThatWaits was called inside bar.c, it could be optimised and not fetch the value of sTheVar each loop iteration.
However, when GetTheVar is called externally could the same optimisation apply or does the function call ensure sTheVar will always be read each loop iteration?
I am not suggesting this is good code or practice, but an example for the sake of the question.
bar.h
int GetTheVar(void);
bar.c
static /*volatile*/ int sTheVar;
int GetTheVar(void)
{
return sTheVar;
}
static void someISROrFuncCalledFromAnotherThread(void)
{
sTheVar = 1;
}
foo.c
#include "bar.h"
void funcThatWaits(void)
{
while(GetTheVar() != 1) {}
}
when GetTheVar is called externally could the same optimisation apply or does the function call ensure sTheVar will always be read each loop iteration?
The same optimization may apply. For instance, if you are using LTO (Link-Time Optimization), then the compiler knows everything about GetTheVar and will likely decide funcThatWaits is an infinite loop (which, by the way, would be UB).
Function calls are not going to be optimized away since, for all the caller knows, the function being called could depend on some exogenous state.
I compiled the following three files using gcc:
foo.c
#include "bar.h"
void funcThatWaits(void) {
while ( getVar() != 1 );
}
bar.c
#include "foo.h"
static int theVar;
int getTheVar(void) {
return theVar;
}
void theFunc(void) {
funcThatWaits();
}
test.c
#include "bar.h"
int main() {
theFunc();
return 0;
}
Compiling those three into a.out and running objdump -d a.out, the following comes out:
00000000000005fa <main>:
5fa: 55 push %rbp
5fb: 48 89 e5 mov %rsp,%rbp
5fe: e8 25 00 00 00 callq 628 <theFunc>
603: b8 00 00 00 00 mov $0x0,%eax
608: 5d pop %rbp
609: c3 retq
000000000000060a <funcThatWaits>:
60a: 55 push %rbp
60b: 48 89 e5 mov %rsp,%rbp
60e: 90 nop
60f: e8 08 00 00 00 callq 61c <getTheVar>
614: 83 f8 01 cmp $0x1,%eax
617: 75 f6 jne 60f <funcThatWaits+0x5>
619: 90 nop
61a: 5d pop %rbp
61b: c3 retq
000000000000061c <getTheVar>:
61c: 55 push %rbp
61d: 48 89 e5 mov %rsp,%rbp
620: 8b 05 ee 09 20 00 mov 0x2009ee(%rip),%eax # 201014 <theVar>
626: 5d pop %rbp
627: c3 retq
0000000000000628 <theFunc>:
628: 55 push %rbp
629: 48 89 e5 mov %rsp,%rbp
62c: e8 d9 ff ff ff callq 60a <funcThatWaits>
631: 90 nop
632: 5d pop %rbp
633: c3 retq
634: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
63b: 00 00 00
63e: 66 90 xchg %ax,%ax
Background
I'm making an app that needs to run several tasks concurrently. I can't use threads and such because the app should work without any OS (i.e. straight from the bootsector). Using x86 tasks looks like an overkill (both logically and performance-wise). Thus, I decided to implement a task-switching utility myself. I would save processor state, make a call to the task code and then restore the previous state. So I have to make the call from inline assembly.
Problem
Here's some example code:
#include <stdio.h>
void func() {
printf("Hello, world!\n");
}
void (*funcptr)();
int main() {
funcptr = func;
asm(
"call *%0;"
:
:"r"(funcptr)
);
return 0;
}
It compiles perfectly under icc with no options, gcc and clang and yields "Hello, world!" when run. However, if I compile it with icc main.c -ipo, it segfaults.
I disassembled the code that was generated by icc main.c and got the following:
0000000000401220 <main>:
401220: 55 push %rbp
401221: 48 89 e5 mov %rsp,%rbp
401224: 48 83 e4 80 and $0xffffffffffffff80,%rsp
401228: 48 81 ec 80 00 00 00 sub $0x80,%rsp
40122f: bf 03 00 00 00 mov $0x3,%edi
401234: 33 f6 xor %esi,%esi
401236: e8 45 00 00 00 callq 401280 <__intel_new_feature_proc_init>
40123b: 0f ae 1c 24 stmxcsr (%rsp)
40123f: 48 c7 05 f6 78 00 00 movq $0x401270,0x78f6(%rip) # 408b40 <funcptr>
401246: 70 12 40 00
40124a: b8 70 12 40 00 mov $0x401270,%eax
40124f: 81 0c 24 40 80 00 00 orl $0x8040,(%rsp)
401256: 0f ae 14 24 ldmxcsr (%rsp)
40125a: ff d0 callq *%rax
40125c: 33 c0 xor %eax,%eax
40125e: 48 89 ec mov %rbp,%rsp
401261: 5d pop %rbp
401262: c3 retq
401263: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
401268: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40126f: 00
0000000000401270 <func>:
401270: bf 04 40 40 00 mov $0x404004,%edi
401275: e9 e6 fd ff ff jmpq 401060 <puts#plt>
40127a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
On the other hand, icc main.c -ipo yields:
0000000000401210 <main>:
401210: 55 push %rbp
401211: 48 89 e5 mov %rsp,%rbp
401214: 48 83 e4 80 and $0xffffffffffffff80,%rsp
401218: 48 81 ec 80 00 00 00 sub $0x80,%rsp
40121f: bf 03 00 00 00 mov $0x3,%edi
401224: 33 f6 xor %esi,%esi
401226: e8 25 00 00 00 callq 401250 <__intel_new_feature_proc_init>
40122b: 0f ae 1c 24 stmxcsr (%rsp)
40122f: 81 0c 24 40 80 00 00 orl $0x8040,(%rsp)
401236: 48 8b 05 cb 2d 00 00 mov 0x2dcb(%rip),%rax # 404008 <funcptr_2.dp.0>
40123d: 0f ae 14 24 ldmxcsr (%rsp)
401241: ff d0 callq *%rax
401243: 33 c0 xor %eax,%eax
401245: 48 89 ec mov %rbp,%rsp
401248: 5d pop %rbp
401249: c3 retq
40124a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
So, while -ipo didn't remove funcptr variable (see address 401236), it did remove assignment. I guess that icc noticed that func is not called from C code so it can be safely removed, so funcptr is allowed to contain garbage. However, it didn't notice that I'm calling func indirectly via assembly.
What I tried
Replacing "r"(funcptr) with "r"(func) works but I can't hardcode a specific function (see background).
Calling funcptr and/or func before and/or after inline assembly block don't help because icc just inlines printf("Hello, world!\n");.
I can't get rid of inline assembly because I have to do low-level register, flags and stack manipulation before and after call.
Making funcptr volatile yields the following warning but still segfaults:
a value of type "void (*)()" cannot be assigned to an entity of type "volatile void (*)()"
Adding volatile to almost every other word doesn't help either.
Moving func and/or funcptr to other source files and then linking them together doesn't help.
Moving inline assembly to a separate function doesn't work.
Am I doing something wrong or is it an icc bug? If the former, how do I fix the code? If the latter, is there any workaround and should I report the bug?
$ icc --version
icc (ICC) 19.1.0.166 20191121
Copyright (C) 1985-2019 Intel Corporation. All rights reserved.
I have tried the following code on gcc 4.4.5 on Linux and gcc-llvm on Mac OSX(Xcode 4.2.1) and this. The below are the source and the generated disassembly of the relevant functions. (Added: compiled with gcc -O2 main.c)
#include <stdio.h>
__attribute__((noinline))
static void g(long num)
{
long m, n;
printf("%p %ld\n", &m, n);
return g(num-1);
}
__attribute__((noinline))
static void h(long num)
{
long m, n;
printf("%ld %ld\n", m, n);
return h(num-1);
}
__attribute__((noinline))
static void f(long * num)
{
scanf("%ld", num);
g(*num);
h(*num);
return f(num);
}
int main(void)
{
printf("int:%lu long:%lu unsigned:%lu\n", sizeof(int), sizeof(long), sizeof(unsigned));
long num;
f(&num);
return 0;
}
08048430 <g>:
8048430: 55 push %ebp
8048431: 89 e5 mov %esp,%ebp
8048433: 53 push %ebx
8048434: 89 c3 mov %eax,%ebx
8048436: 83 ec 24 sub $0x24,%esp
8048439: 8d 45 f4 lea -0xc(%ebp),%eax
804843c: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
8048443: 00
8048444: 89 44 24 04 mov %eax,0x4(%esp)
8048448: c7 04 24 d0 85 04 08 movl $0x80485d0,(%esp)
804844f: e8 f0 fe ff ff call 8048344 <printf#plt>
8048454: 8d 43 ff lea -0x1(%ebx),%eax
8048457: e8 d4 ff ff ff call 8048430 <g>
804845c: 83 c4 24 add $0x24,%esp
804845f: 5b pop %ebx
8048460: 5d pop %ebp
8048461: c3 ret
8048462: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
8048469: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
08048470 <h>:
8048470: 55 push %ebp
8048471: 89 e5 mov %esp,%ebp
8048473: 83 ec 18 sub $0x18,%esp
8048476: 66 90 xchg %ax,%ax
8048478: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
804847f: 00
8048480: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp)
8048487: 00
8048488: c7 04 24 d8 85 04 08 movl $0x80485d8,(%esp)
804848f: e8 b0 fe ff ff call 8048344 <printf#plt>
8048494: eb e2 jmp 8048478 <h+0x8>
8048496: 8d 76 00 lea 0x0(%esi),%esi
8048499: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
080484a0 <f>:
80484a0: 55 push %ebp
80484a1: 89 e5 mov %esp,%ebp
80484a3: 53 push %ebx
80484a4: 89 c3 mov %eax,%ebx
80484a6: 83 ec 14 sub $0x14,%esp
80484a9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
80484b0: 89 5c 24 04 mov %ebx,0x4(%esp)
80484b4: c7 04 24 e1 85 04 08 movl $0x80485e1,(%esp)
80484bb: e8 94 fe ff ff call 8048354 <__isoc99_scanf#plt>
80484c0: 8b 03 mov (%ebx),%eax
80484c2: e8 69 ff ff ff call 8048430 <g>
80484c7: 8b 03 mov (%ebx),%eax
80484c9: e8 a2 ff ff ff call 8048470 <h>
80484ce: eb e0 jmp 80484b0 <f+0x10>
We can see that g() and h() are mostly identical except the & (address of) operator beside the argument m of printf()(and the irrelevant %ld and %p).
However, h() is tail-call optimized and g() is not. Why?
In g(), you're taking the address of a local variable and passing it to a function. A "sufficiently smart compiler" should realize that printf does not store that pointer. Instead, gcc and llvm assume that printf might store the pointer somewhere, so the call frame containing m might need to be "live" further down in the recursion. Therefore, no TCO.
It's the & that does it. It tells the compiler that m should be stored on the stack. Even though it is passed to printf, the compiler has to assume that it might be accessed by somebody else and thus must the cleaned from the stack after the call to g.
In this particular case, as printf is known by the compiler (and it knows that it does not save pointers), it could probably be taught to perform this optimization.
For more info on this, look up 'escape anlysis'.
I have written a simple Hello World program.
#include <stdio.h>
int main() {
printf("Hello World");
return 0;
}
I wanted to understand how the relocatable object file and executable file look like.
The object file corresponding to the main function is
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
9: b8 00 00 00 00 mov $0x0,%eax
e: e8 00 00 00 00 callq 13 <main+0x13>
13: b8 00 00 00 00 mov $0x0,%eax
18: c9 leaveq
19: c3 retq
Here the function call for printf is callq 13. One thing i don't understand is why is it 13. That means call the function at adresss 13, right??. 13 has the next instruction, right?? Please explain me what does this mean??
The executable code corresponding to main is
00000000004004cc <main>:
4004cc: 55 push %rbp
4004cd: 48 89 e5 mov %rsp,%rbp
4004d0: bf dc 05 40 00 mov $0x4005dc,%edi
4004d5: b8 00 00 00 00 mov $0x0,%eax
4004da: e8 e1 fe ff ff callq 4003c0 <printf#plt>
4004df: b8 00 00 00 00 mov $0x0,%eax
4004e4: c9 leaveq
4004e5: c3 retq
Here it is callq 4003c0. But the binary instruction is e8 e1 fe ff ff. There is nothing that corresponds to 4003c0. What is that i am getting wrong?
Thanks.
Bala
In the first case, take a look at the instruction encoding - it's all zeroes where the function address would go. That's because the object hasn't been linked yet, so the addresses for external symbols haven't been hooked up yet. When you do the final link into the executable format, the system sticks another placeholder in there, and then the dynamic linker will finally add the correct address for printf() at runtime. Here's a quick example for a "Hello, world" program I wrote.
First, the disassembly of the object file:
00000000 <_main>:
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: 55 push %ebp
b: 89 e5 mov %esp,%ebp
d: 51 push %ecx
e: 83 ec 04 sub $0x4,%esp
11: e8 00 00 00 00 call 16 <_main+0x16>
16: c7 04 24 00 00 00 00 movl $0x0,(%esp)
1d: e8 00 00 00 00 call 22 <_main+0x22>
22: b8 00 00 00 00 mov $0x0,%eax
27: 83 c4 04 add $0x4,%esp
2a: 59 pop %ecx
2b: 5d pop %ebp
2c: 8d 61 fc lea -0x4(%ecx),%esp
2f: c3 ret
Then the relocations:
main.o: file format pe-i386
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
00000012 DISP32 ___main
00000019 dir32 .rdata
0000001e DISP32 _puts
As you can see there's a relocation there for _puts, which is what the call to printf turned into. That relocation will get noticed at link time and fixed up. In the case of dynamic library linking, the relocations and fixups might not get fully resolved until the program is running, but you'll get the idea from this example, I hope.
The target of the call in the E8 instruction (call) is specified as relative offset from the current instruction pointer (IP) value.
In your first code sample the offset is obviously 0x00000000. It basically says
call +0
The actual address of printf is not known yet, so the compiler just put the 32-bit value 0x00000000 there as a placeholder.
Such incomplete call with zero offset will naturally be interpreted as the call to the current IP value. On your platform, the IP is pre-incremented, meaning that when some instruction is executed, the IP contains the address of the next instruction. I.e. when instruction at the address 0xE is executed the IP contains value 0x13. And the call +0 is naturally interpreted as the call to instruction 0x13. This is why you see that 0x13 in the disassembly of the incomplete code.
Once the code is complete, the placeholder 0x00000000 offset is replaced with the actual offset of printf function in the code. The offset can be positive (forward) or negative (backward). In your case the IP at the moment of the call is 0x4004DF, while the address of printf function is 0x4003C0. For this reason, the machine instruction will contain a 32-bit offset value equal to 0x4003C0 - 0x4004DF, which is negative value -287. So what you see in the code is actually
call -287
-287 is 0xFFFFFEE1 in binary. This is exactly what you see in your machine code. It is just that the tool you are using displayed it backwards.
Calls are relative in x86, IIRC if you have e8 , the call location is addr+5.
e1 fe ff ff a is little endian encoded relative jump. It really means fffffee1.
Now add this to the address of the call instruction + 5:
(0xfffffee1 + 0x4004da + 5) % 2**32 = 0x4003c0