The title is actually my second problem.
2 problems arose when I was learning the CSAPP 2nd edition, chapter 3. There're 2 relative simple files. Here's the first one:
// code.c
int accum = 0;
int sum(int x, int y)
{
int t = x + y;
accum += t;
return t;
}
The second one:
// main.c
int main() {
return sum(1, 3);
}
I used gcc to compile them, following the book. In order to get a 32-bit program, I added an -m32 option(Mine is 64-bit Ubuntu):
$ gcc -m32 -O1 -O prog code.c main.c
Things were good up to this far. But when I disassembled it using GDB, it really confused me. I got the following result that conflicted with what the book says:
(gdb) disas sum
Dump of assembler code for function sum:
0x080483ed <+0>: mov 0x8(%esp),%eax
0x080483f1 <+4>: add 0x4(%esp),%eax
0x080483f5 <+8>: add %eax,0x804a020
0x080483fb <+14>: ret
End of assembler dump.
(gdb) disas main
Dump of assembler code for function main:
0x080483fc <+0>: push %ebp
0x080483fd <+1>: mov %esp,%ebp
0x080483ff <+3>: and $0xfffffff0,%esp
0x08048402 <+6>: sub $0x10,%esp
0x08048405 <+9>: movl $0x3,0x4(%esp)
0x0804840d <+17>: movl $0x1,(%esp)
0x08048414 <+24>: call 0x80483ed <sum>
0x08048419 <+29>: leave
0x0804841a <+30>: ret
End of assembler dump.
Now here are my problems:
Why isn't there any saving %ebp or moving %esp when calling the function sum as the book describes? Or is sum inlined by the compiler?
The value 3 & 1 were already stored in the M[%esp + 4] & M[%esp], respectively. And after calling the sum, there's no instruction that alters the value stored in %esp. But inside the sum, the first instruction retrieves M[%esp + 8], which is actually 3(I set the breakpoint using GDB &checked the value), while the M[%esp + 4] stores the value 1. How come? Later I set 2 breakpoints:
(gdb) break *0x08048414
(gdb) break sum
Then I found the value stored in %esp was different at these 2 breakpoints:
Breakpoint 6, 0x08048414 in main ()
(gdb) print $esp
$8 = (void *) 0xffffd020
(gdb) continue
Continuing.
Breakpoint 5, 0x080483ed in sum ()
(gdb) print $esp
$9 = (void *) 0xffffd01c
Why would this occur?
Why isn't there any saving %ebp or moving %esp when calling the function sum as the book describes?
You might have enabled option to omit frame pointer, most probably with -Ox compiler option. You can force GCC to still keep it with -fno-omit-frame-pointer GCC command line argument:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-O
But inside the sum, the first instruction retrieves M[%esp + 8], which is actually 3, while the M[%esp + 4] stores the value 1. How come?
Call instruction pushes eip register to the stack and moves esp. You compiled it in 32 bit mode, so the offset is 4 bytes.
Related
this is my gdb output. How can I make it write the line numbers, instead of ...227 to be main+1, as it shows when I disassemble it?
It is not clear exactly what you are asking since machine instruction address and source-code line number are not directly related. Possibly suited to your need is to use mixed source/disassembly. For example:
(gdb) disassemble /m main
Dump of assembler code for function main:
5 {
0x08048330 <+0>: push %ebp
0x08048331 <+1>: mov %esp,%ebp
0x08048333 <+3>: sub $0x8,%esp
0x08048336 <+6>: and $0xfffffff0,%esp
0x08048339 <+9>: sub $0x10,%esp
6 printf ("Hello.\n");
0x0804833c <+12>: movl $0x8048440,(%esp)
0x08048343 <+19>: call 0x8048284 <puts#plt>
7 return 0;
8 }
0x08048348 <+24>: mov $0x0,%eax
0x0804834d <+29>: leave
0x0804834e <+30>: ret
End of assembler dump.
This shows each line of source code ahead of the machine code disassembly associates with it. Both the source line numbers and instruction addresses and offsets are shown. Note that it is likely to be far less comprehensible if you apply optimisation as often code is eliminated or re-ordered such that it no longer has a direct correspondence to the source code order.
If rather you want to show the current program counter address/offset as you step, then that can be done with the display /i $pc command:
(gdb) display /i $pc
(gdb) run
Starting program: /home/a.out
Breakpoint 2, main () at main.c:13
13 printf("Hello World");
1: x/i $pc
=> 0x40053a <main+4>: mov $0x4005d4,%edi
(gdb) step
__printf (format=0x4005d4 "Hello World") at printf.c:28
28 printf.c: No such file or directory.
1: x/i $pc
=> 0x7ffff7a686b0 <__printf>: sub $0xd8,%rsp
(gdb)
For example in the following code
"justatest" and the format "%s" is defined in heap:
char str[15]="justatest";
int main(){
printf("%s",str);
return 0;
}
in GDB,i got the assembly code before call to printf as:
=> 0x0804841f <+14>: movl $0x804a020,0x4(%esp)
0x08048427 <+22>: movl $0x80484d8,(%esp)
0x0804842e <+29>: call 0x80482f0 <printf#plt>
Do i have to examine the parameter 1by1 using "x/s 0x804a020" and "x/s 0x80484d8"
or is there a Table of constants defined in heap that i can directly refer to?
thanks!
Your understanding about str reside on heap is not correct. Its global variable which gets stored into the data segment. Regarding your print global variable, you can do as follows on my GNU/Linux terminal.
$ gcc -g -Wall hello.c
$ gdb -q ./a.out
Reading symbols from /home/mantosh/practice/a.out...done.
(gdb) break main
Breakpoint 1 at 0x400524: file hello.c, line 6.
(gdb) run
Starting program: /home/mantosh/practice/a.out
Breakpoint 1, main () at bakwas.c:6
6 printf("%s",str);
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400520 <+0>: push %rbp
0x0000000000400521 <+1>: mov %rsp,%rbp
=> 0x0000000000400524 <+4>: mov $0x601020,%esi
0x0000000000400529 <+9>: mov $0x4005e4,%edi
0x000000000040052e <+14>: mov $0x0,%eax
0x0000000000400533 <+19>: callq 0x4003f0 <printf#plt>
0x0000000000400538 <+24>: mov $0x0,%eax
0x000000000040053d <+29>: pop %rbp
0x000000000040053e <+30>: retq
End of assembler dump.
(gdb) p str
$1 = "justatest\000\000\000\000\000"
(gdb) p &str
$2 = (char (*)[15]) 0x601020
// These are addresses of two arguments which would be passed in printf.
// From assembly instruction we can verify that before calling the printf
// these are getting stored into the registers.
(gdb) x/s 0x4005e4
0x4005e4: "%s"
(gdb) x/s 0x601020
0x601020 <str>: "justatest
later i found that for object files without a debugging symbols table
objdump -t obj
would contains most of the symbols of global variables/functions and their address
,and
objdump -D obj instead of -d
would include all sections such as .text/.data/.rodata instead of .text only
these two combined provided sufficient access to what i mentioned aboved, such as switch tables/const strings/global variables
i am currently working on gdb disassembly to help me understand more detail about the c program so i write a c program:
#include <stdio.h>
void swap(int a, int b){
int temp = a;
a = b;
b = temp;
}
void main(){
int a = 1,b = 2;
swap(a, b);
}
I use gdb and run disass /m main to get those:
(gdb) disass /m main
Dump of assembler code for function main:
8 void main(){
0x0000000000400492 <+0>: push %rbp
0x0000000000400493 <+1>: mov %rsp,%rbp
0x0000000000400496 <+4>: sub $0x10,%rsp
9 int a = 1,b = 2;
0x000000000040049a <+8>: movl $0x1,-0x8(%rbp)
0x00000000004004a1 <+15>: movl $0x2,-0x4(%rbp)
10 swap(a, b);
0x00000000004004a8 <+22>: mov -0x4(%rbp),%edx
0x00000000004004ab <+25>: mov -0x8(%rbp),%eax
0x00000000004004ae <+28>: mov %edx,%esi
0x00000000004004b0 <+30>: mov %eax,%edi
0x00000000004004b2 <+32>: callq 0x400474 <swap>
11 }
0x00000000004004b7 <+37>: leaveq
0x00000000004004b8 <+38>: retq
End of assembler dump.
My question is those -0x8(%rbp) means what?
A memory or a register?
I do know that 1 is store in -0x8(%rbp) and 2 is in -0x4(%rbp), How can i show the value in
thoes kind of 'place' ?
I try to use (gdb) p -0x8(%rbp) but get this:
A syntax error in expression, near `%rbp)'.
Registers in gdb can be referred with the prefix '$'
p *(int *)($rbp - 8)
RBP and RSP most likely refer to memory locations, specifically to stack. Other registers are more or less generic purpose registers and can point to memory too.
It means "the data stored when you subtract eight from the address stored in rbp". Try looking at the stack commands available in gdb: http://www.delorie.com/gnu/docs/gdb/gdb_41.html
The actually meaning of those structures such as -0x8(%rbp) depends on the architecture (or the assembly language). But in this case, -0x8(%rbp) is a memory address, probably value of %rbp minus 8.
In gdb, you can print the value of those memory address by doing something like
info r rbp
p *(int *)(value_of_rbp - 8)
How can I make GDB do extra dereferences in a printing function like
x/s?
When I try explicit dereferences in x/ I get the error "Attempt to
dereference a generic pointer". Using x/ multiple times works, since
each use includes an implicit dereference, but this is annoying since
I have to copy and paste each intermediate result.
Example
Consider the very useful C program, example.c:
#include <stdio.h>
int main(int argc, char **argv) {
printf("argv[0] = %s\n", argv[0]);
}
If I build it and load it into GDB, I see that argv is stored at
0xc(%ebp), since a double dererence of that is passed as the second
argument to printf (i.e. in 0x4(%esp)) on line 26:
$ gcc -o example example.c
$ gdb example
(gdb) disass main
Dump of assembler code for function main:
0x080483e4 <+0>: push %ebp
0x080483e5 <+1>: mov %esp,%ebp
0x080483e7 <+3>: and $0xfffffff0,%esp
0x080483ea <+6>: sub $0x10,%esp
0x080483ed <+9>: mov 0xc(%ebp),%eax
0x080483f0 <+12>: mov (%eax),%edx
0x080483f2 <+14>: mov $0x80484e0,%eax
0x080483f7 <+19>: mov %edx,0x4(%esp)
0x080483fb <+23>: mov %eax,(%esp)
0x080483fe <+26>: call 0x8048300 <printf#plt>
0x08048403 <+31>: leave
0x08048404 <+32>: ret
End of assembler dump.
I break at printf and run the program with arguments first and
second:
(gdb) break *main + 26
Breakpoint 1 at 0x80483fe
(gdb) run first second
Starting program: /var/tmp/SO-attempt-to-dereference-generic-pointer/example first second
I attempt to print argv[0] in GDB, but I get the "generic pointer"
error:
Breakpoint 1, 0x080483e5 in main ()
(gdb) x/s **(0xc + $ebp)
Attempt to dereference a generic pointer.
However, by using 'x/xw' to manually dereference a few times, I'm
eventually able to print argv[0] (and argv[1]):
(gdb) x/xw 0xc + $ebp
0xbfffeba4: 0xbfffec34
(gdb) x/xw 0xbfffec34
0xbfffec34: 0xbfffedc8
(gdb) x/s 0xbfffedc8
0xbfffedc8: "/var/tmp/SO-attempt-to-dereference-generic-pointer/example"
(gdb) x/xw 0xbfffec34 + 4
0xbfffec38: 0xbfffee03
(gdb) x/s 0xbfffee03
0xbfffee03: "first"
(gdb)
But this is annoying and indirect (as pointer programming is wont to be?)
The solution is to cast the pointers before dereferencing them.
For example, picking up where we left off above:
(gdb) x/s **((char ***) (0xc + $ebp))
0xbfffedc8: "/var/tmp/SO-attempt-to-dereference-generic-pointer/example"
(gdb) x/s *(*((char ***) (0xc + $ebp)) + 1)
0xbfffee03: "first"
(gdb) x/s *(*((char ***) (0xc + $ebp)) + 2)
0xbfffee09: "second"
Note that the stack address 0xc + $ebp is itself a pointer to the
contents of that stack location, and so we need char *** and not
char **.
I am trying to make the buffer exploitation example (example3.c from http://insecure.org/stf/smashstack.html) work on Debian Lenny 2.6 version. I know the gcc version and the OS version is different than the one used by Aleph One. I have disabled any stack protection mechanisms using -fno-stack-protector and sysctl -w kernel.randomize_va_space=0 arguments. To account for the differences in my setup and Aleph One's I introduced two parameters : offset1 -> Offset from buffer1 variable to the return address and offset2 -> how many bytes to jump to skip a statement. I tried to figure out these parameters by analyzing assembly code but was not successful. So, I wrote a shell script that basically runs the buffer overflow program with simultaneous values of offset1 and offset2 from (1-60). But much to my surprise I am still not able to break this program. It would be great if someone can guide me for the same. I have attached the code and assembly output for consideration. Sorry for the really long post :)
Thanks.
// Modified example3.c from Aleph One paper - Smashing the stack
void function(int a, int b, int c, int offset1, int offset2) {
char buffer1[5];
char buffer2[10];
int *ret;
ret = (int *)buffer1 + offset1;// how far is return address from buffer ?
(*ret) += offset2; // modify the value of return address
}
int main(int argc, char* argv[]) {
int x;
x = 0;
int offset1 = atoi(argv[1]);
int offset2 = atoi(argv[2]);
function(1,2,3, offset1, offset2);
x = 1; // Goal is to skip this statement using buffer overflow
printf("X : %d\n",x);
return 0;
}
-----------------
// Execute the buffer overflow program with varying offsets
#!/bin/bash
for ((i=1; i<=60; i++))
do
for ((j=1; j<=60; j++))
do
echo "`./test $i $j`"
done
done
-- Assembler output
(gdb) disassemble main
Dump of assembler code for function main:
0x080483c2 <main+0>: lea 0x4(%esp),%ecx
0x080483c6 <main+4>: and $0xfffffff0,%esp
0x080483c9 <main+7>: pushl -0x4(%ecx)
0x080483cc <main+10>: push %ebp
0x080483cd <main+11>: mov %esp,%ebp
0x080483cf <main+13>: push %ecx
0x080483d0 <main+14>: sub $0x24,%esp
0x080483d3 <main+17>: movl $0x0,-0x8(%ebp)
0x080483da <main+24>: movl $0x3,0x8(%esp)
0x080483e2 <main+32>: movl $0x2,0x4(%esp)
0x080483ea <main+40>: movl $0x1,(%esp)
0x080483f1 <main+47>: call 0x80483a4 <function>
0x080483f6 <main+52>: movl $0x1,-0x8(%ebp)
0x080483fd <main+59>: mov -0x8(%ebp),%eax
0x08048400 <main+62>: mov %eax,0x4(%esp)
0x08048404 <main+66>: movl $0x80484e0,(%esp)
0x0804840b <main+73>: call 0x80482d8 <printf#plt>
0x08048410 <main+78>: mov $0x0,%eax
0x08048415 <main+83>: add $0x24,%esp
0x08048418 <main+86>: pop %ecx
0x08048419 <main+87>: pop %ebp
0x0804841a <main+88>: lea -0x4(%ecx),%esp
0x0804841d <main+91>: ret
End of assembler dump.
(gdb) disassemble function
Dump of assembler code for function function:
0x080483a4 <function+0>: push %ebp
0x080483a5 <function+1>: mov %esp,%ebp
0x080483a7 <function+3>: sub $0x20,%esp
0x080483aa <function+6>: lea -0x9(%ebp),%eax
0x080483ad <function+9>: add $0x30,%eax
0x080483b0 <function+12>: mov %eax,-0x4(%ebp)
0x080483b3 <function+15>: mov -0x4(%ebp),%eax
0x080483b6 <function+18>: mov (%eax),%eax
0x080483b8 <function+20>: lea 0x7(%eax),%edx
0x080483bb <function+23>: mov -0x4(%ebp),%eax
0x080483be <function+26>: mov %edx,(%eax)
0x080483c0 <function+28>: leave
0x080483c1 <function+29>: ret
End of assembler dump.
The disassembly for function you provided seems to use hardcoded values of offset1 and offset2, contrary to your C code.
The address for ret should be calculated using byte/char offsets: ret = (int *)(buffer1 + offset1), otherwise you'll get hit by pointer math (especially in this case, when your buffer1 is not at a nice aligned offset from the return address).
offset1 should be equal to 0x9 + 0x4 (the offset used in lea + 4 bytes for the push %ebp). However, this can change unpredictably each time you compile - the stack layout might be different, the compiler might create some additional stack alignment, etc.
offset2 should be equal to 7 (the length of the instruction you're trying to skip).
Note that you're getting a little lucky here - the function uses the cdecl calling convention, which means the caller is responsible for removing arguments off the stack after returning from the function, which normally looks like this:
push arg3
push arg2
push arg1
call func
add esp, 0Ch ; remove as many bytes as were used by the pushed arguments
Your compiler chose to combine this correction with the one after printf, but it could also decide to do this after your function call. In this case the add esp, <number> instruction would be present between your return address and the instruction you want to skip - you can probably imagine that this would not end well.