When I run the following C code, I get different output depending on whether or not the code was run with optimization turned on (gcc -O) or not.
#include <stdio.h>
int main()
{
int b = 55;
int a[2] = {4, 5};
int index;
printf(" index a[index]\n ");
printf("==================\n ");
for(index = 0; index < 6; index++)
{
printf("%2d%12d\n", index, a[index]);
}
return 0;
}
I understand that accessing an index out-of-bounds in C will simply access the stack memory further down from the array (assuming there is enough stack space allocated for that index, otherwise it segfaults) because arrays are just pointers in C. But how does the optimization affect this?
Accessing out-of-bounds is undefined behavior. So the compiler is allowed to do anything it wants and anything is allowed to happen. So there isn't much of a point in trying to "guess" what will happen.
In your case, optimization is probably affecting the ordering and contents of the stack beyond the array. This would give you the varying results.
You're causing undefined behaviour, plain and simple. You can't really say "why does this undefined behaviour cause this result under this circumstance, but a different result under another?"
Use objdump or gdb if you want to see what instructions are causing the stack to be different under optimization.
EDIT: For example, when compiling with the -O flag, there's quite a few differences to the stack at the beginning of main alone before the first printf (compiled as 32-bit for clarity):
Unoptimized:
0x080483c4 <+0>: push ebp
0x080483c5 <+1>: mov ebp,esp
0x080483c7 <+3>: and esp,0xfffffff0
0x080483ca <+6>: sub esp,0x20
0x080483cd <+9>: mov DWORD PTR [esp+0x18],0x37
0x080483d5 <+17>: mov DWORD PTR [esp+0x10],0x4
0x080483dd <+25>: mov DWORD PTR [esp+0x14],0x5
0x080483e5 <+33>: mov eax,0x8048514
0x080483ea <+38>: mov DWORD PTR [esp],eax
0x080483ed <+41>: call 0x80482e0 <printf#plt>
Optimized:
0x080483c4 <+0>: push ebp
0x080483c5 <+1>: mov ebp,esp
0x080483c7 <+3>: push ebx
0x080483c8 <+4>: and esp,0xfffffff0
0x080483cb <+7>: sub esp,0x20
0x080483ce <+10>: mov DWORD PTR [esp+0x18],0x4
0x080483d6 <+18>: mov DWORD PTR [esp+0x1c],0x5
0x080483de <+26>: mov DWORD PTR [esp],0x8048504
0x080483e5 <+33>: call 0x80482e0 <printf#plt>
Related
I'm learning about basic buffer overflows, and I have the following C code:
int your_fcn()
{
char buffer[4];
int *ret;
ret = buffer + 8;
(*ret) += 16;
return 1;
}
int main()
{
int mine = 0;
int yours = 0;
yours = your_fcn();
mine = yours + 1;
if(mine > yours)
printf("You lost!\n");
else
printf("You won!\n");
return EXIT_SUCCESS;
}
My goal is to bypass the line mine = yours + 1;, skip straight to the if statement comparison, so I can "win". main() cannot be touched, only your_fcn() can.
My approach is to override the return address with a buffer overflow. So in this case, I identified that the return address should be 8 bytes away from buffer, since buffer is 4 bytes and EBP is 4 bytes. I then used gdb to identify that the line I want to jump to is 16 bytes away from the function call. Here is the result from gdb:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000054a <+0>: lea 0x4(%esp),%ecx
0x0000054e <+4>: and $0xfffffff0,%esp
0x00000551 <+7>: pushl -0x4(%ecx)
0x00000554 <+10>: push %ebp
0x00000555 <+11>: mov %esp,%ebp
0x00000557 <+13>: push %ebx
0x00000558 <+14>: push %ecx
0x00000559 <+15>: sub $0x10,%esp
0x0000055c <+18>: call 0x420 <__x86.get_pc_thunk.bx>
0x00000561 <+23>: add $0x1a77,%ebx
0x00000567 <+29>: movl $0x0,-0xc(%ebp)
0x0000056e <+36>: movl $0x0,-0x10(%ebp)
0x00000575 <+43>: call 0x51d <your_fcn>
0x0000057a <+48>: mov %eax,-0x10(%ebp)
0x0000057d <+51>: mov -0x10(%ebp),%eax
0x00000580 <+54>: add $0x1,%eax
0x00000583 <+57>: mov %eax,-0xc(%ebp)
0x00000586 <+60>: mov -0xc(%ebp),%eax
0x00000589 <+63>: cmp -0x10(%ebp),%eax
0x0000058c <+66>: jle 0x5a2 <main+88>
0x0000058e <+68>: sub $0xc,%esp
0x00000591 <+71>: lea -0x1988(%ebx),%eax
I see the line 0x00000575 <+43>: call 0x51d <your_fcn> and 0x00000583 <+57>: mov %eax,-0xc(%ebp) are four lines away from each other, which tells me I should offset ret by 16 bytes. But the address from gdb says something different. That is, the function call starts on 0x00000575 and the line I want to jump to is on 0x00000583, which means that they are 15 bytes away?
Either way, whether I use 16 bytes or 15 bytes, I get a segmentation fault error and I still "lose".
Question: What am I doing wrong? Why don't the address given in gdb go by 4 bytes at a time and what's actually going on here. How can I correctly jump to the line I want?
Clarification: This is being done on a x32 machine on a VM running linux Ubuntu. I'm compiling with the command gcc -fno-stack-protector -z execstack -m32 -g guesser.c -o guesser.o, which turns stack protector off and forces x32 compilation.
gdb of your_fcn() as requested:
(gdb) disassemble your_fcn
Dump of assembler code for function your_fcn:
0x0000051d <+0>: push %ebp
0x0000051e <+1>: mov %esp,%ebp
0x00000520 <+3>: sub $0x10,%esp
0x00000523 <+6>: call 0x5c3 <__x86.get_pc_thunk.ax>
0x00000528 <+11>: add $0x1ab0,%eax
0x0000052d <+16>: lea -0x8(%ebp),%eax
0x00000530 <+19>: add $0x8,%eax
0x00000533 <+22>: mov %eax,-0x4(%ebp)
0x00000536 <+25>: mov -0x4(%ebp),%eax
0x00000539 <+28>: mov (%eax),%eax
0x0000053b <+30>: lea 0xc(%eax),%edx
0x0000053e <+33>: mov -0x4(%ebp),%eax
0x00000541 <+36>: mov %edx,(%eax)
0x00000543 <+38>: mov $0x1,%eax
0x00000548 <+43>: leave
0x00000549 <+44>: ret
x86 has variable length instructions, so you cannot simply count instructions and multiply by 4. Since you have the output from gdb, trust it to determine the address of each instruction.
The return address from the function is the address after the call instruction. In the code shown, this would be main+48.
The if statement starts at main+60, not main+57. The instruction at main+57 stores yours+1 into mine. So to adjust the return address to return to the if statement, you should add 12 (that is, 60 - 48).
Doing that skips the assignments to both yours and mine. Since they are both initialized to 0, it will print "You won".
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm doing a security project for my school.
For this project I have a binary, and I have to do 2 things, make a pseudo code of this binary and do the exploit.
To get better in ASM I'm trying to do exactly the same source code in c. I have a problem with edx in the main. I have no idee how to do this in c:
0x080484a5 <+41>: mov edx,0x8048468
This is the full main code:
Dump of assembler code for function main:
0x0804847c <+0>: push ebp
0x0804847d <+1>: mov ebp,esp
0x0804847f <+3>: and esp,0xfffffff0
0x08048482 <+6>: sub esp,0x20
0x08048485 <+9>: mov DWORD PTR [esp],0x40
0x0804848c <+16>: call 0x8048350 <malloc#plt>
0x08048491 <+21>: mov DWORD PTR [esp+0x1c],eax
0x08048495 <+25>: mov DWORD PTR [esp],0x4
0x0804849c <+32>: call 0x8048350 <malloc#plt>
0x080484a1 <+37>: mov DWORD PTR [esp+0x18],eax
0x080484a5 <+41>: mov edx,0x8048468
0x080484aa <+46>: mov eax,DWORD PTR [esp+0x18]
0x080484ae <+50>: mov DWORD PTR [eax],edx
0x080484b0 <+52>: mov eax,DWORD PTR [ebp+0xc]
0x080484b3 <+55>: add eax,0x4
0x080484b6 <+58>: mov eax,DWORD PTR [eax]
0x080484b8 <+60>: mov edx,eax
0x080484ba <+62>: mov eax,DWORD PTR [esp+0x1c]
0x080484be <+66>: mov DWORD PTR [esp+0x4],edx
0x080484c2 <+70>: mov DWORD PTR [esp],eax
0x080484c5 <+73>: call 0x8048340 <strcpy#plt>
0x080484ca <+78>: mov eax,DWORD PTR [esp+0x18]
0x080484ce <+82>: mov eax,DWORD PTR [eax]
0x080484d0 <+84>: call eax
0x080484d2 <+86>: leave
0x080484d3 <+87>: ret
Can you help me to find how to do the line main + 41 please :) ?
Thank you
0x8048468 is probably a pointer value based on the number. Maybe a function pointer, because it's in the page above 0x8048350 (the PLT entry for malloc). But maybe just a pointer to a static buffer (maybe a read-only buffer, like a string literal).
So perhaps void *edx = "hello world"; or void *edx = &some_function, and then use it somehow. C statements don't map to single asm instructions, but with un-optimized output (gcc -O0), each C statement does map to a contiguous block of instructions that finishes with all values in memory. (This means you can modify C variables with a debugger and still have it "work" in un-optimized code.)
I didn't trace through the mess of store/reload that looks like un-optimized code, so I'm not sure what exactly is being done with that value after it's stored to memory in the next instruction after the mov-immediate.
Look at your compiler's asm output if you have source (gcc -S instead of compiling all the way to a binary and then disassemblign), or use objdump -drwC -Mintel to get relocation info for that value if there is any. Or use nm to look for it in the symbol table.
If it is a function pointer, the disassembly for that address should make some sense.
This is my C function:
int reg(struct my_callback_struct *p, int data)
{
return p->data = data;
}
This is the same in assembly:
0x000000000040057d <+0>: push %rbp
0x000000000040057e <+1>: mov %rsp,%rbp
0x0000000000400581 <+4>: mov %rdi,-0x8(%rbp)
0x0000000000400585 <+8>: mov %esi,-0xc(%rbp)
0x0000000000400588 <+11>: mov -0x8(%rbp),%rax
0x000000000040058c <+15>: mov -0xc(%rbp),%edx
0x000000000040058f <+18>: mov %edx,(%rax)
0x0000000000400591 <+20>: mov -0x8(%rbp),%rax
0x0000000000400595 <+24>: mov (%rax),%eax
0x0000000000400597 <+26>: pop %rbp
0x0000000000400598 <+27>: retq
I think I understand what's going on. $rdi holds the pointer (address) and $esi the number 12.
This is how I called the function:
p->callback_func(p,12);
What I don't understand is:
0x0000000000400591 <+20>: mov -0x8(%rbp),%rax
Because on <+11> we have already filled $rax with the pointer address. Why load it twice?
Indeed, the code is correct in that the instructions perform the functions called for by the C code. But not even the most trivial of optimizations have been performed.
This is easily corrected by turning on some level of compiler optimization. Probably the first level will clean up redundant loads no matter which compiler is used.
Note that extreme levels of optimization can result in correct code which is very difficult to follow.
When I have a loop like:
for (int i = 0; i < SlowVariable; i++)
{
//
}
I know that in VB6 the SlowVariable is accessed every iteration of the loop, making the following much more efficient:
int cnt = SlowVariable;
for (int i = 0; i < cnt; i++)
{
//
}
Do I need the make the same optimizations in GCC? Or does it evaluate SlowVariable only once?
This is called "hoisting" SlowVariabele out of the loop.
The compiler can do it only if it can prove that the value of SlowVariabele is the same every time, and that evaluating SlowVariabele has no side-effects.
So for example consider the following code (I assume for the sake of example that accessing through a pointer is "slow" for some reason):
void foo1(int *SlowVariabele, int *result) {
for (int i = 0; i < *SlowVariabele; ++i) {
--*result;
}
}
The compiler cannot (in general) hoist, because for all it knows it will be called with result == SlowVariabele, and so the value of *SlowVariabele is changing during the loop.
On the other hand:
void foo2(int *result) {
int val = 12;
int *SlowVariabele = &val;
for (int i = 0; i < *SlowVariabele; ++i) {
--*result;
}
}
Now at least in principle, the compiler can know that val never changes in the loop, and so it can hoist. Whether it actually does so is a matter of how aggressive the optimizer is and how good its analysis of the function is, but I'd expect any serious compiler to be capable of it.
Similarly, if foo1 was called with pointers that the compiler can determine (at the call site) are non-equal, and if the call is inlined, then the compiler could hoist. That's what restrict is for:
void foo3(int *restrict SlowVariabele, int *restrict result) {
for (int i = 0; i < *SlowVariabele; ++i) {
--*result;
}
}
restrict (introduced in C99) means "you must not call this function with result == SlowVariabele", and allows the compiler to hoist.
Similarly:
void foo4(int *SlowVariabele, float *result) {
for (int i = 0; i < *SlowVariabele; ++i) {
--*result;
}
}
The strict aliasing rules mean that SlowVariable and result must not refer to the same location (or the program has undefined behaviour anyway), and so again the compiler can hoist.
Generally, variables can't be slow (or fast) unless they are mapped to some weird kind of memory (you usually want to declare them volatile in this case).
But indeed, using a local variable creates more opportunities for optimization, and the effect may be very visible. The compiler can "cache" a global variable by itself only if it's able to prove that no function called within a loop can read or write that global variable. When you call an external function within a loop, the compiler probably won't be able to prove such a thing.
This depends on how the compiler optimize, for example here:
#include <stdio.h>
int main(int argc, char **argv)
{
unsigned int i;
unsigned int z = 10;
for( i = 0 ; i < z ; i++ )
printf("%d\n", i);
return 0;
}
If you compiled it using gcc example.c -o example, the result code will be:
0x0040138c <+0>: push ebp
0x0040138d <+1>: mov ebp,esp
0x0040138f <+3>: and esp,0xfffffff0
0x00401392 <+6>: sub esp,0x20
0x00401395 <+9>: call 0x4018f4 <__main>
0x0040139a <+14>: mov DWORD PTR [esp+0x18],0xa
0x004013a2 <+22>: mov DWORD PTR [esp+0x1c],0x0
0x004013aa <+30>: jmp 0x4013c4 <main+56>
0x004013ac <+32>: mov eax,DWORD PTR [esp+0x1c]
0x004013b0 <+36>: mov DWORD PTR [esp+0x4],eax
0x004013b4 <+40>: mov DWORD PTR [esp],0x403064
0x004013bb <+47>: call 0x401b2c <printf>
0x004013c0 <+52>: inc DWORD PTR [esp+0x1c]
0x004013c4 <+56>: mov eax,DWORD PTR [esp+0x1c] ; (1)
0x004013c8 <+60>: cmp eax,DWORD PTR [esp+0x18] ; (2)
0x004013cc <+64>: jb 0x4013ac <main+32>
0x004013ce <+66>: mov eax,0x0
0x004013d3 <+71>: leave
0x004013d4 <+72>: ret
0x004013d5 <+73>: nop
0x004013d6 <+74>: nop
0x004013d7 <+75>: nop
The value of i will be movied from the stack into eax.
Then the CPU will compare eax or i, with the value of z, which is in the stack.
All of this happen on every round.
If you optimized the code using gcc -O2 example.c -o example, the result will be:
0x00401b70 <+0>: push ebp
0x00401b71 <+1>: mov ebp,esp
0x00401b73 <+3>: push ebx
0x00401b74 <+4>: and esp,0xfffffff0
0x00401b77 <+7>: sub esp,0x10
0x00401b7a <+10>: call 0x4018a8 <__main>
0x00401b7f <+15>: xor ebx,ebx
0x00401b81 <+17>: lea esi,[esi+0x0]
0x00401b84 <+20>: mov DWORD PTR [esp+0x4],ebx
0x00401b88 <+24>: mov DWORD PTR [esp],0x403064
0x00401b8f <+31>: call 0x401ae0 <printf>
0x00401b94 <+36>: inc ebx
0x00401b95 <+37>: cmp ebx,0xa ; (1)
0x00401b98 <+40>: jne 0x401b84 <main+20>
0x00401b9a <+42>: xor eax,eax
0x00401b9c <+44>: mov ebx,DWORD PTR [ebp-0x4]
0x00401b9f <+47>: leave
0x00401ba0 <+48>: ret
0x00401ba1 <+49>: nop
0x00401ba2 <+50>: nop
0x00401ba3 <+51>: nop
The compiler knows that there is no point of checking the value of z, so it modifies the code to something like for( i = 0 ; i < 10 ; i++ ).
In case the compiler doesn't konw the value of z like in this code:
#include <stdio.h>
void loop(unsigned int z) {
unsigned int i;
for( i = 0 ; i < z ; i++ )
printf("%d\n", i);
}
int main(int argc, char **argv)
{
unsigned int z = 10;
loop(z);
return 0;
}
The result will be:
0x0040138c <+0>: push esi
0x0040138d <+1>: push ebx
0x0040138e <+2>: sub esp,0x14
0x00401391 <+5>: mov esi,DWORD PTR [esp+0x20] ; (1)
0x00401395 <+9>: test esi,esi
0x00401397 <+11>: je 0x4013b1 <loop+37>
0x00401399 <+13>: xor ebx,ebx ; (2)
0x0040139b <+15>: nop
0x0040139c <+16>: mov DWORD PTR [esp+0x4],ebx
0x004013a0 <+20>: mov DWORD PTR [esp],0x403064
0x004013a7 <+27>: call 0x401b0c <printf>
0x004013ac <+32>: inc ebx
0x004013ad <+33>: cmp ebx,esi
0x004013af <+35>: jne 0x40139c <loop+16>
0x004013b1 <+37>: add esp,0x14
0x004013b4 <+40>: pop ebx
0x004013b5 <+41>: pop esi
0x004013b6 <+42>: ret
0x004013b7 <+43>: nop
z will endup in some unused register esi, registers are the fastest storage classed.
There is no local variable i, on the stack, the compiler used ebx to store the value of i, also register.
After all, it depends on the compiler and the optimization options you use, but, in all cases, C still faster, much faster, than VB.
It depends on your compiler, but I believe most of the contemporary compilers will optimize that for you if the value of SlowVariable is constant.
"It" (the language) doesn't say. It must behave as if the variable is evaluated every time, of course.
An optimizing compiler can do a lot of clever things, so it's always best to leave these sorts of micro-optimizations to the compiler.
If you're going down the optimization by hand route, be sure to profile (=measure) and read the generated code.
Actually it depends on "SlowVariable" and on the behavior of your compiler. If your slow variable is e.g. volatile the compiler won't do any effort to cache it as the keyword volatile won't permit it. If it's not "volatile" there is a good chance that the compiler optimizes consecutive accesses to this variable by loading it once into the register.
The code below is from the well-known article Smashing The Stack For Fun And Profit.
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = buffer1 + 12;
(*ret)+=8;
}
void main() {
int x;
x=0;
function(1,2,3);
x=1;
printf("%d\n",x);
}
I think I must explain my target of this code.
The stack model is below. The number below the word is the number of bytes of the variable in the stack. So, if I want to rewrite RET to skip the statement I want, I calculate the offset from buffer1 to RET is 8+4=12. Since the architecture is x86 Linux.
buffer2 buffer1 BSP RET a b c
(12) (8) (4) (4) (4) (4) (4)
I want to skip the statement x=1; and let printf() output 0 on the screen.
I compile the code with:
gcc stack2.c -g
and run it in gdb:
gdb ./a.out
gdb gives me the result like this:
Program received signal SIGSEGV, Segmentation fault.
main () at stack2.c:17
17 x = 1;
I think Linux uses some mechanism to protect against stack overflow. Maybe Linux stores the RET address in another place and compares the RET address in the stack before functions return.
And what is the detail about the mechanism? How should I rewrite the code to make the program output 0?
OK,the disassemble code is below.It comes form the output of gdb since I think is more easy to read for you.And anybody can tell me how to paste a long code sequence?Copy and paste one by one makes me too tired...
Dump of assembler code for function main:
0x08048402 <+0>: push %ebp
0x08048403 <+1>: mov %esp,%ebp
0x08048405 <+3>: sub $0x10,%esp
0x08048408 <+6>: movl $0x0,-0x4(%ebp)
0x0804840f <+13>: movl $0x3,0x8(%esp)
0x08048417 <+21>: movl $0x2,0x4(%esp)
0x0804841f <+29>: movl $0x1,(%esp)
0x08048426 <+36>: call 0x80483e4 <function>
0x0804842b <+41>: movl $0x1,-0x4(%ebp)
0x08048432 <+48>: mov $0x8048520,%eax
0x08048437 <+53>: mov -0x4(%ebp),%edx
0x0804843a <+56>: mov %edx,0x4(%esp)
0x0804843e <+60>: mov %eax,(%esp)
0x08048441 <+63>: call 0x804831c <printf#plt>
0x08048446 <+68>: mov $0x0,%eax
0x0804844b <+73>: leave
0x0804844c <+74>: ret
Dump of assembler code for function function:
0x080483e4 <+0>: push %ebp
0x080483e5 <+1>: mov %esp,%ebp
0x080483e7 <+3>: sub $0x14,%esp
0x080483ea <+6>: lea -0x9(%ebp),%eax
0x080483ed <+9>: add $0x3,%eax
0x080483f0 <+12>: mov %eax,-0x4(%ebp)
0x080483f3 <+15>: mov -0x4(%ebp),%eax
0x080483f6 <+18>: mov (%eax),%eax
0x080483f8 <+20>: lea 0x8(%eax),%edx
0x080483fb <+23>: mov -0x4(%ebp),%eax
0x080483fe <+26>: mov %edx,(%eax)
0x08048400 <+28>: leave
0x08048401 <+29>: ret
I check the assemble code and find some mistake about my program,and I have rewrite (*ret)+=8 to (*ret)+=7,since 0x08048432 <+48>minus0x0804842b <+41> is 7.
Because that article is from 1996 and the assumptions are incorrect.
Refer to "Smashing The Modern Stack For Fun And Profit"
http://www.ethicalhacker.net/content/view/122/24/
From the above link:
However, the GNU C Compiler (gcc) has evolved since 1998, and as a result, many people are left wondering why they can't get the examples to work for them, or if they do get the code to work, why they had to make the changes that they did.
The function function overwrites some place of the stack outside of its own, which is this case is the stack of main. What it overwrites I don't know, but it causes the segmentation fault you see. It might be some protection employed by the operating system, but it might as well be the generated code just does something wrong when wrong value is at that position on the stack.
This is a really good example of what may happen when you write outside of your allocated memory. It might crash directly, it might crash somewhere completely different, or if might not crash at all but instead just do some calculation wrong.
Try ret = buffer1 + 3;
Explanation: ret is an integer pointer; incrementing it by 1 adds 4 bytes to the address on 32bit machines.