This question already has answers here:
Segmentation fault in C and infinite loop - self calling main function
(4 answers)
Is there a limit of stack size of a process in linux
(4 answers)
GCC Inline Assembly: Jump to label outside block
(2 answers)
How does linux know when to allocate more pages to a call stack?
(1 answer)
How is Stack memory allocated when using 'push' or 'sub' x86 instructions?
(2 answers)
Closed 1 year ago.
I was playing with inline assembly, and I've noticed something strange. I've written a program which calls a wrapper function of jmp and executes in loop:
#include <stdint.h>
void asm_jmp(void* address)
{
__asm__("jmp\t*%%rax"
:
:"a" (address)
:);
}
int main()
{
asm_jmp(&main + 4);
}
I've opened it with gdb and I noticed it iterates hundreds and hundreds of times before giving segmentation fault. Maybe I'm missing something, but I don't see where there could be a problem in this program which causes it to segfault.
Initially I thought that calling asm_jmp in loop saturated the stack, since each call adds an address onto the stack, but there is no return to free the space occupied by that address. Is this the problem? Or there's something else?
Here is the assembly obtained with objdump:
0000000000001119 <asm_jmp>:
1119: 55 push %rbp
111a: 48 89 e5 mov %rsp,%rbp
111d: 48 89 7d f8 mov %rdi,-0x8(%rbp)
1121: 48 8b 45 f8 mov -0x8(%rbp),%rax
1125: ff e0 jmp *%rax
1127: 90 nop
1128: 5d pop %rbp
1129: c3 ret
000000000000112a <main>:
112a: 55 push %rbp
112b: 48 89 e5 mov %rsp,%rbp
112e: 48 8d 05 0d 00 00 00 lea 0xd(%rip),%rax # 1142 <main+0x18>
1135: 48 89 c7 mov %rax,%rdi
1138: e8 dc ff ff ff call 1119 <asm_jmp>
113d: b8 00 00 00 00 mov $0x0,%eax
1142: 5d pop %rbp
1143: c3 ret
1144: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
114b: 00 00 00
114e: 66 90 xchg %ax,%ax
I've, for a few hours, been trying to enlarge my understanding of Assembly Language, by trying to read and understand the instructions of a very simple program I wrote in C to initiate myself to how arguments were handled in ASM.
#include <stdio.h>
int say_hello();
int main(void) {
printf("say_hello() -> %d\n", say_hello(10, 20, 30, 40, 50, 60, 70, 80, 90, 100));
}
int say_hello(int a, int b, int c, int d, int e, int f, int g, int h, int i, int j) {
printf("a:b:c:d:e:f:g:h:i:j -> %d:%d:%d:%d:%d:%d:%d:%d:%d:%d\n", a, b, c, d, e, f, g, h, i, j);
return 1000;
}
The program is as I said, very basic and contains two functions, the main and another one called say_hello which takes 10 arguments, from a to j and print each one of them in a printf call. I've tried doing the same process (So trying to understand the instructions and what's happening), with the same program and less arguments, I think I was able to understand most of it, but then I was wondering, "ok but what's happening if I have so many arguments, there isn't any more register available to store the value in"
So I went to look for how many registers were available and usable in my case, and I found out from this website that "only" (not sure, correct me if I'm wrong) the following registers could be used in my case to store argument values in them edi, esi, r8d, r9d, r10d, r11d, edx, ecx, which is 8, so I went to modify my C program and I added a few more arguments, so that I reach the 8 limit, I even added one more, I don't really know why, let's say just in case.
So when I compiled my program using gcc with no optimization related option whatsoever, I was expecting the main() function to push the values that were left after all the 8 registers have been used, but I wasn't expecting anything from the say_hello() method, that's pretty much why I tried this out in the first place.
So I went to compile my program, then disassembled it using the objdump command (More specifically, this is the full command I used: objdump -d -M intel helloworld) and I started looking for my main method, which was doing pretty much as I expected
000000000000064a <main>:
64a: 55 push rbp
64b: 48 89 e5 mov rbp,rsp
64e: 6a 64 push 0x64
650: 6a 5a push 0x5a
652: 6a 50 push 0x50
654: 6a 46 push 0x46
656: 41 b9 3c 00 00 00 mov r9d,0x3c
65c: 41 b8 32 00 00 00 mov r8d,0x32
662: b9 28 00 00 00 mov ecx,0x28
667: ba 1e 00 00 00 mov edx,0x1e
66c: be 14 00 00 00 mov esi,0x14
671: bf 0a 00 00 00 mov edi,0xa
676: b8 00 00 00 00 mov eax,0x0
67b: e8 1e 00 00 00 call 69e <say_hello>
680: 48 83 c4 20 add rsp,0x20
684: 89 c6 mov esi,eax
686: 48 8d 3d 0b 01 00 00 lea rdi,[rip+0x10b] # 798 <_IO_stdin_used+0x8>
68d: b8 00 00 00 00 mov eax,0x0
692: e8 89 fe ff ff call 520 <printf#plt>
697: b8 00 00 00 00 mov eax,0x0
69c: c9 leave
69d: c3 ret
So it, as I expected pushed the values that were left after all the registers had been used into the stack, and then just did the usual work to pass values from one method to another. But then I went to look for the say_hello method, and it got me really confused.
000000000000069e <say_hello>:
69e: 55 push rbp
69f: 48 89 e5 mov rbp,rsp
6a2: 48 83 ec 20 sub rsp,0x20
6a6: 89 7d fc mov DWORD PTR [rbp-0x4],edi
6a9: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
6ac: 89 55 f4 mov DWORD PTR [rbp-0xc],edx
6af: 89 4d f0 mov DWORD PTR [rbp-0x10],ecx
6b2: 44 89 45 ec mov DWORD PTR [rbp-0x14],r8d
6b6: 44 89 4d e8 mov DWORD PTR [rbp-0x18],r9d
6ba: 44 8b 45 ec mov r8d,DWORD PTR [rbp-0x14]
6be: 8b 7d f0 mov edi,DWORD PTR [rbp-0x10]
6c1: 8b 4d f4 mov ecx,DWORD PTR [rbp-0xc]
6c4: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8]
6c7: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
6ca: 48 83 ec 08 sub rsp,0x8
6ce: 8b 75 28 mov esi,DWORD PTR [rbp+0x28]
6d1: 56 push rsi
6d2: 8b 75 20 mov esi,DWORD PTR [rbp+0x20]
6d5: 56 push rsi
6d6: 8b 75 18 mov esi,DWORD PTR [rbp+0x18]
6d9: 56 push rsi
6da: 8b 75 10 mov esi,DWORD PTR [rbp+0x10]
6dd: 56 push rsi
6de: 8b 75 e8 mov esi,DWORD PTR [rbp-0x18]
6e1: 56 push rsi
6e2: 45 89 c1 mov r9d,r8d
6e5: 41 89 f8 mov r8d,edi
6e8: 89 c6 mov esi,eax
6ea: 48 8d 3d bf 00 00 00 lea rdi,[rip+0xbf] # 7b0 <_IO_stdin_used+0x20>
6f1: b8 00 00 00 00 mov eax,0x0
6f6: e8 25 fe ff ff call 520 <printf#plt>
6fb: 48 83 c4 30 add rsp,0x30
6ff: b8 e8 03 00 00 mov eax,0x3e8
704: c9 leave
705: c3 ret
706: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
70d: 00 00 00
I'm really sorry in advance, I'm not exactly sure I really understand well what the square brackets do, but from what I've read and understand it's a way to "point" to the address containing the value I want (please correct me if I'm wrong), so for example mov DWORD PTR [rbp-0x4],edi moves the value in edi to the value at the address rsp-0x4, right?
I'm also not actually not sure why this process is required, can't the say_hello method just read edi for example and that's it? Why does the program have to move it into [rbp-0x4] and then re-reading it back from [rbp-0x4] to eax ?
So the program just goes on and reads every value it needs and put them into an available register, and when it reaches the point when there's no register left, it just starts moving all of them into esi and then pushing them onto the stack, then repeating the process until all the 10 arguments have been stored somewhere.
So that makes sense, I was satisfied and then just went to double check if I really had got it well, so I started reading from bottom to top, starting from 0x6ea to 0x6e2 so the sample I'm working on is
6e2: 45 89 c1 mov r9d,r8d
6e5: 41 89 f8 mov r8d,edi
6e8: 89 c6 mov esi,eax
6ea: 48 8d 3d bf 00 00 00 lea rdi,[rip+0xbf] # 7b0 <_IO_stdin_used+0x20>
So just like on all my previous tests, I was expecting the arguments to go in "reverse" like the first argument is the last instruction executed, and the last one the first instruction executed, so I started double checking every field.
So the first one, rdi was [rip+0x10b] which I thought for sure was pointing to my string.
So then I moved to 0x6e8, which moves eax which is currently equal to the value stored in [rbp-0x4], which is equal to edi as stated at 0x6a6, and edi is equal to 0xa (10) as stated on 0x671, so my first argument is my string, and the second one is 10, which is exactly what I expected.
But then when I jumped on the instruction executed right before 0x6e8, so 0x6e5 I was expecting it to be 20, so I did the same process. edi is moved to r8d and is currently equal to the value stored in [rbp-0x10] which is equal to ecx which is equal to, as stated at 0x662.. 40? What the heck? I'm confused, why would it be 40? Then I tried looking up the instruction right above that one, and found 50, and did the same for the next one, and again I found 60!! Why? Is the way I get those values wrong? Am I missing something in the instructions? Or did I just assume something by looking at my previous programs (which all had way less arguments, and were all in "reverse" like I said earlier) that I should not have?
I'm sorry if this is a dumb post, I'm very new to ASM (few hours of experience!) and just trying to get my mind cleared on that one, as I really can't figure it out alone. I'm also sorry if this post is too long, I was trying to include a lot of informations so that what I'm trying to do is clear, the result I get is clear, and what my problem is is clear aswell. Thanks a lot for reading and even a bigger thanks to anyone who will help!
I am new to Buffer Overflow exploits and I started with a simple C program.
Code
#include <stdio.h>
#include <strings.h>
void execs(void){
printf("yay!!");
}
void return_input (void)
{
char array[30];
gets(array);
}
int main()
{
return_input();
return 0;
}
Compilation stage
I compiled the above program with cc by disabling stack protector as:
cc test.c -o test -fno-stack-protector
The dump of the elf file using objdump is as follows :
0804843b <execs>:
804843b: 55 push %ebp
804843c: 89 e5 mov %esp,%ebp
804843e: 83 ec 08 sub $0x8,%esp
8048441: 83 ec 0c sub $0xc,%esp
8048444: 68 10 85 04 08 push $0x8048510
8048449: e8 b2 fe ff ff call 8048300 <printf#plt>
804844e: 83 c4 10 add $0x10,%esp
8048451: 90 nop
8048452: c9 leave
8048453: c3 ret
08048454 <return_input>:
8048454: 55 push %ebp
8048455: 89 e5 mov %esp,%ebp
8048457: 83 ec 28 sub $0x28,%esp
804845a: 83 ec 0c sub $0xc,%esp
804845d: 8d 45 da lea -0x26(%ebp),%eax
8048460: 50 push %eax
8048461: e8 aa fe ff ff call 8048310 <gets#plt>
8048466: 83 c4 10 add $0x10,%esp
8048469: 90 nop
804846a: c9 leave
804846b: c3 ret
0804846c <main>:
804846c: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048470: 83 e4 f0 and $0xfffffff0,%esp
8048473: ff 71 fc pushl -0x4(%ecx)
8048476: 55 push %ebp
8048477: 89 e5 mov %esp,%ebp
8048479: 51 push %ecx
804847a: 83 ec 04 sub $0x4,%esp
804847d: e8 d2 ff ff ff call 8048454 <return_input>
8048482: b8 00 00 00 00 mov $0x0,%eax
8048487: 83 c4 04 add $0x4,%esp
804848a: 59 pop %ecx
804848b: 5d pop %ebp
804848c: 8d 61 fc lea -0x4(%ecx),%esp
804848f: c3 ret
So, In order to exploit the buffer(array), we need to find the number of bytes allocated in the return_input stack frame which by looking at the dump,
lea -0x26(%ebp),%eax
is 0x26 in hex or roughly 38 in decimal. So, giving input as :
38+4(random chars)+(return addr of execs)
would execute the execs function. I used the following:
python -c 'print "a"*42+"\x3b\x84\x04\x08"' | ./test
But output Error was:
Segmentation fault(core dumped)
When I opened the core(core dumped file) using gdb, I could find that the segmentation fault was experienced when executing on the following address :
0xb76f2300
System information:
Ubuntu version : 16.10
Kernel version : 4.8.0-46-generic
Question?
What was I doing wrong in code?
I guess the reason is simple: you didn't halt/abort your program in the execs. That address 0xb76f2300 is on stack, so I suspect it is the return from the execs that fails when it tries to return to the value of the stored stack pointer.
That you don't see any message is because the stdout is line-buffered, and your message didn't have a new-line character, nor did you flush it explicitly; thus the yay!! will still be in the buffers.
Also, use a debugger.
I am reading this writeup on how to perform a ret2libc exploit. It states that the arguments passed to a function are stored at ebp+8.
Now, if I take a simple C program
#include <stdlib.h>
int main() {
system("/bin/sh");
}
And compile it:
gcc -m32 -o test_sh test_sh.c
and look at the disassembly of it with
objdump -d -M intel test_sh
0804840b <main>:
804840b: 8d 4c 24 04 lea ecx,[esp+0x4]
804840f: 83 e4 f0 and esp,0xfffffff0
8048412: ff 71 fc push DWORD PTR [ecx-0x4]
8048415: 55 push ebp
8048416: 89 e5 mov ebp,esp
8048418: 51 push ecx
8048419: 83 ec 04 sub esp,0x4
804841c: 83 ec 0c sub esp,0xc
804841f: 68 c4 84 04 08 push 0x80484c4
8048424: e8 b7 fe ff ff call 80482e0 <system#plt>
8048429: 83 c4 10 add esp,0x10
804842c: b8 00 00 00 00 mov eax,0x0
8048431: 8b 4d fc mov ecx,DWORD PTR [ebp-0x4]
8048434: c9 leave
8048435: 8d 61 fc lea esp,[ecx-0x4]
8048438: c3 ret
8048439: 66 90 xchg ax,ax
804843b: 66 90 xchg ax,ax
804843d: 66 90 xchg ax,ax
804843f: 90 nop
The line
804841f: 68 c4 84 04 08 push 0x80484c4
Pushes the address of the string "/bin/sh" onto the stack. Immediately afterwards the system#plt function is called. So how does one arrive to ebp+8 from the above output?
Help would be appreciated!
The arguments passed to a function are stored at ebp+8.
That's from the called function's point of view, not from the calling function's point of view. The calling function has its own arguments at ebp+8, and your main() does not use any of its arguments, hence, you don't see any use of ebp+8 in your main().
You can see ebp+8 being used as follows:
Try writing a second function, with arguments, and calling it from main(), instead of invoking system(). You still won't see any use of ebp+8 within main(), but you will see it being used in your second function.
Try declaring your main() to accept its char** argv argument, then try sending argv[0] to printf().
You don't because EBP + 8 is only relevant after the prologue in system#plt after creating a new procedure frame.
push ebp
mov ebp, esp
at this point in system#plt the contents of memory location pointed to by EBP + 8 will equal 0x80484C4.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have heard that it is possible to obtain the source code from an executable if it was compiled with debugging (-g) enabled. Is this true? If so, how would one go about doing it?
You can't restore source code from binary executable.
You can use decompiler like REC Studio or Boomerang to convert dissassembled binary into c code, but this code won't be anything like initial code you compiled. It would be more like assembly written in C syntax. If your application is complicated, it probably won't be able to compile. Debugging symbols can help, but not a lot. Many information is lost during compilation and can't be restored.
You can use objdump utility.
objdump has option '-S' which also provides source code and respective assembly code. This also helps in debugging crashes (segmentation faults).
e.g. I have copied the code snippet and its objdump (just for example).
Only the main() of objdump is copied, it contains other info too.
Steps to do :
Suppose you wrote code in file temp.c .
gcc -g temp.c -o temp
objdump -DS temp > temp.dump
#include <stdio.h>
int main()
{
int a, *b;
a = 0;
b = 0;
printf("a=%d *b=%d", a, *b);
return 0;
}
0804841d <main>:
#include <stdio.h>
int main()
{
804841d: 55 push %ebp
804841e: 89 e5 mov %esp,%ebp
8048420: 83 e4 f0 and $0xfffffff0,%esp
8048423: 83 ec 20 sub $0x20,%esp
int a, *b;
a = 0;
8048426: c7 44 24 18 00 00 00 movl $0x0,0x18(%esp)
804842d: 00
b = 0;
804842e: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
8048435: 00
printf("a=%d *b=%d", a, *b);
8048436: 8b 44 24 1c mov 0x1c(%esp),%eax
804843a: 8b 00 mov (%eax),%eax
804843c: 89 44 24 08 mov %eax,0x8(%esp)
8048440: 8b 44 24 18 mov 0x18(%esp),%eax
8048444: 89 44 24 04 mov %eax,0x4(%esp)
8048448: c7 04 24 f0 84 04 08 movl $0x80484f0,(%esp)
804844f: e8 9c fe ff ff call 80482f0 <printf#plt>
return 0;
8048454: b8 00 00 00 00 mov $0x0,%eax
}
8048459: c9 leave
804845a: c3 ret
804845b: 66 90 xchg %ax,%ax
804845d: 66 90 xchg %ax,%ax
804845f: 90 nop