Here is my disas code:
Dump of assembler code for function main:
0x00000000000006b0 <+0>: push %rbp
0x00000000000006b1 <+1>: mov %rsp,%rbp
0x00000000000006b4 <+4>: sub $0x10,%rsp
0x00000000000006b8 <+8>: movl $0xa,-0xc(%rbp)
0x00000000000006bf <+15>: lea -0xc(%rbp),%rax
0x00000000000006c3 <+19>: mov %rax,-0x8(%rbp)
0x00000000000006c7 <+23>: lea 0x96(%rip),%rdi # 0x764
0x00000000000006ce <+30>: mov $0x0,%eax
0x00000000000006d3 <+35>: callq 0x560 <printf#plt>
0x00000000000006d8 <+40>: mov $0x0,%eax
0x00000000000006dd <+45>: leaveq
0x00000000000006de <+46>: retq
when I set the breakpoint at 0x06b4 by b *0x00000000000006b4 and run the code it is giving an error
Starting program: /root/print.out
Warning:
Cannot insert breakpoint 4.
Cannot access memory at address 0x6b4
but when I do it with b 4 and run the code ,it is working normal. so what am I doing wrong in the first case.
Dump of assembler code for function main:
0x00000000000006b0 <+0>: push %rbp
0x00000000000006b1 <+1>: mov %rsp,%rbp
You are looking at position-independent executable (a special kind of shared library). The code for main gets relocated to a different address when the executable starts running.
Because there is no code at 0x6b4 once the executable is relocated, GDB complains that it can't set a breakpoint there.
but when I do it with b 4 and run the code ,it is working normal.
In this case, GDB understands that you want to set breakpoint on line 4, and inserts appropriate breakpoint after the executable has been relocated.
Use info break to see what the relocated address is.
Related
This is the disassembly of a very simple C program (strcpy() a constant string and print it):
No symbol table is loaded. Use the "file" command.
Reading symbols from string...done.
(gdb) break 6
Breakpoint 1 at 0x6b8: file string.c, line 6.
(gdb) break 7
Breakpoint 2 at 0x6f2: file string.c, line 7.
(gdb) r
Starting program: /home/wsllnx/Detached/string
Breakpoint 1, main () at string.c:6
6 strcpy(buf, "Memento Mori\n\tInjected_string");
(gdb) disass main
Dump of assembler code for function main:
0x00005555554006b0 <+0>: push %rbp
0x00005555554006b1 <+1>: mov %rsp,%rbp
0x00005555554006b4 <+4>: sub $0x70,%rsp
0x00005555554006b8 <+8>: lea -0x70(%rbp),%rax
0x00005555554006bc <+12>: movabs $0x206f746e656d654d,%rdx
0x00005555554006c6 <+22>: mov %rdx,(%rax)
0x00005555554006c9 <+25>: movabs $0x6e49090a69726f4d,%rcx
0x00005555554006d3 <+35>: mov %rcx,0x8(%rax)
0x00005555554006d7 <+39>: movabs $0x735f64657463656a,%rsi
0x00005555554006e1 <+49>: mov %rsi,0x10(%rax)
0x00005555554006e5 <+53>: movl $0x6e697274,0x18(%rax)
0x00005555554006ec <+60>: movw $0x67,0x1c(%rax)
0x00005555554006f2 <+66>: lea -0x70(%rbp),%rax
0x00005555554006f6 <+70>: mov %rax,%rdi
0x00005555554006f9 <+73>: mov $0x0,%eax
0x00005555554006fe <+78>: callq 0x555555400560 <printf#plt>
0x0000555555400703 <+83>: mov $0x0,%eax
0x0000555555400708 <+88>: leaveq
0x0000555555400709 <+89>: retq
End of assembler dump.
(gdb)
I am currently learning how to fully use GBD and I was wondering:
How can I access particular address like '0x206f746e656d654d'? When I try to do so with x/x or x/s GDB says:
'0x206f746e656d654d: Cannot access memory at address 0x206f746e656d654d'
Same goes for 0x6e49090a69726f4d, 0x735f64657463656a and so on...
Thanks in advance to all the useful answers.
Those aren't actually memory addresses. It's a compiler optimization to represent ASCII values using 64-bit constants. Instead of actually calling strcpy() the compiler is moving the string constant values through registers.
0x206f746e656d654d is the ASCII values for the string 'Memento ' (with a space) in x86 little-endian format.
Okay, so in C code, I have it looping through the command line arguments and printing each one out. I compiled it and opened it in GDB to see what the main function looks like because I was attempting to do the same thing in assembly. I ended up figuring out what my problem was - that my print function was using the same registers as the main function was. I ended up just pushing each onto the stack before the function call and popping them back off after. Only thing I don't understand is why this code doesn't seem to do that and why it doesn't run into the same problem as I was.
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: sub $0x20,%rsp
0x0000000000400535 <+8>: mov %edi,-0x14(%rbp)
0x0000000000400538 <+11>: mov %rsi,-0x20(%rbp)
0x000000000040053c <+15>: jmp 0x400561 <main+52>
0x000000000040053e <+17>: mov -0x4(%rbp),%eax
0x0000000000400541 <+20>: cltq
0x0000000000400543 <+22>: lea 0x0(,%rax,8),%rdx
0x000000000040054b <+30>: mov -0x20(%rbp),%rax
0x000000000040054f <+34>: add %rdx,%rax
0x0000000000400552 <+37>: mov (%rax),%rax
0x0000000000400555 <+40>: mov %rax,%rdi
0x0000000000400558 <+43>: callq 0x400410 <puts#plt>
0x000000000040055d <+48>: addl $0x1,-0x4(%rbp)
0x0000000000400561 <+52>: mov -0x4(%rbp),%eax
0x0000000000400564 <+55>: cmp -0x14(%rbp),%eax
0x0000000000400567 <+58>: jl 0x40053e <main+17>
0x0000000000400569 <+60>: leaveq
0x000000000040056a <+61>: retq
Any input is appreciated, thanks.
(gdb) disass 0x400410
Dump of assembler code for function puts#plt:
0x0000000000400410 <+0>: jmpq *0x200c02(%rip) # 0x601018 <puts#got.plt>
0x0000000000400416 <+6>: pushq $0x0
0x000000000040041b <+11>: jmpq 0x400400
End of assembler dump.
(gdb) disass 0x601018
Dump of assembler code for function puts#got.plt:
0x0000000000601018 <+0>: (bad)
0x0000000000601019 <+1>: add $0x40,%al
0x000000000060101b <+3>: add %al,(%rax)
0x000000000060101d <+5>: add %al,(%rax)
0x000000000060101f <+7>: add %ah,(%rsi)
End of assembler dump.
In fact I can't even seem to find where it's printing out in puts. I must be missing something, just don't know what.
The disassembly you show for puts is not correct. Library symbols for dynamically loaded libraries are resolved, well, dynamically. The compiler generates a call to a stub (procedure linkage table or PLT), the loader resolves that at runtime, the second time you call that function the address has been resolved and it runs faster. Disassemble it on the 2nd iteration and you will see the actual puts code being run, and you will see the registers being pushed.
More info here.
This instruction:
jmpq *0x200c02(%rip) # 0x601018 <puts#got.plt>
reads the quadword (8 bytes) from the address given by the offset from the instruction pointer and jumps there. So to see where this is going, you don't want to use disas 0x601018, you want to use x /1xg 0x601018 to see what is in those bytes (read the pointer), and then call disas on that value to see the actual code for puts
This stuff is all dynamic linkage stuff that is set up to call functions in dynamic libraries. plt is an abbreviation for "program linkage table" and is a set of trampolines created by the linker whenever an object calls a function in some other dynamic library. got is an abbreviation for "global object table" and is a table of function pointers built by the linker and filled in by the dynamic linker when the program is loaded.
I'm trying to find out why in windows has so much more instructions for the same program than linux.
So I just used int a=0xbeef; and printf("test\n"); in C and compiled in Linux and Windows. When I debug and disassembly the main frame I got this:
On linux:
0x080483e4 <+0>: push %ebp
0x080483e5 <+1>: mov %esp,%ebp
0x080483e7 <+3>: and $0xfffffff0,%esp
0x080483ea <+6>: sub $0x20,%esp
0x080483ed <+9>: movl $0xbeef,0x1c(%esp)
0x080483f5 <+17>: movl $0x80484d0,(%esp)
0x080483fc <+24>: call 0x8048318 <puts#plt>
0x08048401 <+29>: leave
0x08048402 <+30>: ret
Ok that's nice. I see the movl 0x1c offset of esp to put the value there.
But in windows I got this:
0x401290 <main>: push %ebp
0x401291 <main+1>: mov %esp,%ebp
0x401293 <main+3>: sub $0x18,%esp
0x401296 <main+6>: and $0xfffffff0,%esp
0x401299 <main+9>: mov $0x0,%eax
0x40129e <main+14>: add $0xf,%eax
0x4012a1 <main+17>: add $0xf,%eax
0x4012a4 <main+20>: shr $0x4,%eax
0x4012a7 <main+23>: shl $0x4,%eax
0x4012aa <main+26>: mov %eax,0xfffffff8(%ebp)
0x4012ad <main+29>: mov 0xfffffff8(%ebp),%eax
0x4012b0 <main+32>: call 0x401720 <_alloca>
0x4012b5 <main+37>: call 0x4013c0 <__main>
0x4012ba <main+42>: movl $0xbeef,0xfffffffc(%ebp)
0x4012c1 <main+49>: movl $0x403000,(%esp,1)
0x4012c8 <main+56>: call 0x401810 <printf>
0x4012cd <main+61>: mov $0x0,%eax
0x4012d2 <main+66>: leave
0x4012d3 <main+67>: ret
First of all, I don't know why the windows compiler (mingw) generate so much code. Is 2x more than Linux... this makes me thinking. And another thing: from main+9 to main+37 I can't see the point of that code.
I would thank if someone answer to this, I'm just curious :)
edit:
With -O3 argument on linux I got the same and in windows something like magic happends:
0x401290 <main>: push %ebp
0x401291 <main+1>: mov $0x10,%eax
0x401296 <main+6>: mov %esp,%ebp
0x401298 <main+8>: sub $0x8,%esp
0x40129b <main+11>: and $0xfffffff0,%esp
0x40129e <main+14>: call 0x401700 <_alloca>
0x4012a3 <main+19>: call 0x4013a0 <__main>
0x4012a8 <main+24>: movl $0x403000,(%esp,1)
0x4012af <main+31>: call 0x4017f0 <puts>
0x4012b4 <main+36>: leave
0x4012b5 <main+37>: xor %eax,%eax
0x4012b7 <main+39>: ret
leave then xor then ret. ok :D the call _alloca and call __main still there. and I don't know what is 0x401291 <main+1>: mov $0x10,%eax doing here :D
You seem to be compiling with old 3.x series gcc, you'd better upgrade. Newer versions don't invoke alloca. I suspect the particular version of alloca might use register convention so the mov 0x10,%eax is probably setting up the argument for that call.
__main is a startup function defined in crtbegin.o that executes global constructors and registers a function using atexit that will run the global destructors.
Also note that main gets special treatment, like the stack alignment code and the above-mentioned initialization. It may be a good idea to instead compare a "plain" function if you are just interested in code generation issues.
One notes that in the upper example the compiler is calling puts, while in the lower it is printf. I would suspect you have optimization on in the top example, but not on the lower example.
It also might be a debug vs non-debug build.
I've been working on the bufbomb lab from CSAPPS and I've gotten stuck on one of the phases.
I won't get into the gore-y details of the project since I just need a nudge in the right direction. I'm having a hard time finding the starting address of the array called "buf" in the given assembly.
We're given a function called getbuf:
#define NORMAL_BUFFER_SIZE 32
int getbuf()
{
char buf[NORMAL_BUFFER_SIZE];
Gets(buf);
return 1;
}
And the assembly dumps:
Dump of assembler code for function getbuf:
0x08048d92 <+0>: sub $0x3c,%esp
0x08048d95 <+3>: lea 0x10(%esp),%eax
0x08048d99 <+7>: mov %eax,(%esp)
0x08048d9c <+10>: call 0x8048c66 <Gets>
0x08048da1 <+15>: mov $0x1,%eax
0x08048da6 <+20>: add $0x3c,%esp
0x08048da9 <+23>: ret
End of assembler dump.
Dump of assembler code for function Gets:
0x08048c66 <+0>: push %ebp
0x08048c67 <+1>: push %edi
0x08048c68 <+2>: push %esi
0x08048c69 <+3>: push %ebx
0x08048c6a <+4>: sub $0x1c,%esp
0x08048c6d <+7>: mov 0x30(%esp),%esi
0x08048c71 <+11>: movl $0x0,0x804e100
0x08048c7b <+21>: mov %esi,%ebx
0x08048c7d <+23>: jmp 0x8048ccf <Gets+105>
0x08048c7f <+25>: mov %eax,%ebp
0x08048c81 <+27>: mov %al,(%ebx)
0x08048c83 <+29>: add $0x1,%ebx
0x08048c86 <+32>: mov 0x804e100,%eax
0x08048c8b <+37>: cmp $0x3ff,%eax
0x08048c90 <+42>: jg 0x8048ccf <Gets+105>
0x08048c92 <+44>: lea (%eax,%eax,2),%edx
0x08048c95 <+47>: mov %ebp,%ecx
0x08048c97 <+49>: sar $0x4,%cl
0x08048c9a <+52>: mov %ecx,%edi
0x08048c9c <+54>: and $0xf,%edi
0x08048c9f <+57>: movzbl 0x804a478(%edi),%edi
0x08048ca6 <+64>: mov %edi,%ecx
---Type <return> to continue, or q <return> to quit---
0x08048ca8 <+66>: mov %cl,0x804e140(%edx)
0x08048cae <+72>: mov %ebp,%ecx
0x08048cb0 <+74>: and $0xf,%ecx
0x08048cb3 <+77>: movzbl 0x804a478(%ecx),%ecx
0x08048cba <+84>: mov %cl,0x804e141(%edx)
0x08048cc0 <+90>: movb $0x20,0x804e142(%edx)
0x08048cc7 <+97>: add $0x1,%eax
0x08048cca <+100>: mov %eax,0x804e100
0x08048ccf <+105>: mov 0x804e110,%eax
0x08048cd4 <+110>: mov %eax,(%esp)
0x08048cd7 <+113>: call 0x8048820 <_IO_getc#plt>
0x08048cdc <+118>: cmp $0xffffffff,%eax
0x08048cdf <+121>: je 0x8048ce6 <Gets+128>
0x08048ce1 <+123>: cmp $0xa,%eax
0x08048ce4 <+126>: jne 0x8048c7f <Gets+25>
0x08048ce6 <+128>: movb $0x0,(%ebx)
0x08048ce9 <+131>: mov 0x804e100,%eax
0x08048cee <+136>: movb $0x0,0x804e140(%eax,%eax,2)
0x08048cf6 <+144>: mov %esi,%eax
0x08048cf8 <+146>: add $0x1c,%esp
0x08048cfb <+149>: pop %ebx
0x08048cfc <+150>: pop %esi
0x08048cfd <+151>: pop %edi
---Type <return> to continue, or q <return> to quit---
0x08048cfe <+152>: pop %ebp
0x08048cff <+153>: ret
End of assembler dump.
I'm having a difficult time locating where the starting address of buf is (or where buf is at all in this mess!). If someone could point that out to me, I'd greatly appreciate it.
Attempt at a solution
Reading symbols from /home/user/CS247/buflab/buflab-handout/bufbomb...(no debugging symbols found)...done.
(gdb) break getbuf
Breakpoint 1 at 0x8048d92
(gdb) run -u user < firecracker-exploit.bin
Starting program: /home/user/CS247/buflab/buflab-handout/bufbomb -u user < firecracker-exploit.bin
Userid: ...
Cookie: ...
Breakpoint 1, 0x08048d92 in getbuf ()
(gdb) print buf
No symbol table is loaded. Use the "file" command.
(gdb)
As has been pointed out by some other people, buf is allocated on the stack at run time. See these lines in the getbuf() function:
0x08048d92 <+0>: sub $0x3c,%esp
0x08048d95 <+3>: lea 0x10(%esp),%eax
0x08048d99 <+7>: mov %eax,(%esp)
The first line subtracts 0x3c (60) bytes from the stack pointer, effectively allocating that much space. The extra bytes beyond 32 are probably for parameters for Gets (Its hard to tell what the calling convention is for Gets is precisely, so its hard to say) The second line gets the address of the 16 bytes up. This leaves 44 bytes above it that are unallocated. The third line puts that address onto the stack for probably for the gets function call. (remember the stack grows down, so the stack pointer will be pointing at the last item on the stack). I am not sure why the compiler generated such strange offsets (60 bytes and then 44) but there is probably a good reason. If I figure it out I will update here.
Inside the gets function we have the following lines:
0x08048c66 <+0>: push %ebp
0x08048c67 <+1>: push %edi
0x08048c68 <+2>: push %esi
0x08048c69 <+3>: push %ebx
0x08048c6a <+4>: sub $0x1c,%esp
0x08048c6d <+7>: mov 0x30(%esp),%esi
Here we see that we save the state of some of the registers, which add up to 16-bytes, and then Gets reserves 28 (0x1c) bytes on the stack. The last line is key: It grabs the value at 0x30 bytes up the stack and loads it into %esi. This value is the address of buf put on the stack by getbuf. Why? 4 for the return addres plus 16 for the registers+28 reserved = 48. 0x30 = 48, so it is grabbing the last item placed on the stack by getbuf() before calling gets.
To get the address of buf you have to actually run the program in the debugger because the address will probably be different everytime you run the program, or even call the function for that matter. You can set a break point at any of these lines above and either dump the %eax register when the it contains the address to be placed on the stack on the second line of getbuf, or dump the %esi register when it is pulled off of the stack. This will be the pointer to your buffer.
to be able to see debugging info while using gdb,you must use the -g3 switch with gcc when you compile.see man gcc for more details on the -g switch.
Only then, gcc will add debugging info (symbol table) into the executable.
0x08048cd4 <+110>: mov %eax,(%esp)
0x08048cd7 <+113>: **call 0x8048820 <_IO_getc#plt>**
0x08048cdc <+118>: cmp $0xffffffff,%eax
0x0848cdf <+121>: je 0x8048ce6 <Gets+128>
0x08048ce1 <+123>: cmp $0xa,%eax
0x08048ce4 <+126>: jne 0x8048c7f <Gets+25>
0x08048ce6 <+128>: movb $0x0,(%ebx)
0x08048ce9 <+131>: mov 0x804e100,%eax
0x08048cee <+136>: movb $0x0,0x804e140(%eax,%eax,2)
0x08048cf6 <+144>: mov %esi,%eax
0x08048cf8 <+146>: add $0x1c,%esp
0x08048cfb <+149>: **pop %ebx**
0x08048cfc <+150>: **pop %esi**
0x08048cfd <+151>: **pop %edi**
---Type <return> to continue, or q <return> to quit---
0x08048cfe <+152>: **pop %ebp**
0x08048cff <+153>: ret
End of assembler dump.
I Don't know your flavour of asm but there's a call in there which may use the start address
The end of the program pops various pointers
That's where I'd start looking
If you can tweak the asm for these functions you can input your own routines to dump data as the function runs and before those pointers get popped
buf is allocated on the stack. Therefore, you will not be able to spot its address from an assembly listing. In other words, buf is allocated (and its address therefore known) only when you enter the function getbuf() at runtime.
If you must know the address, one option would be to use gbd (but make sure you compile with the -g flag to enable debugging support) and then:
gdb a.out # I'm assuming your binary is a.out
break getbuf # Set a breakpoint where you want gdb to stop
run # Run the program. Supply args if you need to
# WAIT FOR your program to reach getbuf and stop
print buf
If you want to go this route, a good gdb tutorial (example) is essential.
You could also place a printf inside getbuf and debug that way - it depends on what you are trying to do.
One other point leaps out from your code. Upon return from getbuf, the result of Gets will be trashed. This is because Gets is presumably writing its results into the stack-allocated buf. When you return from getbuf, your stack is blown and you cannot reliably access buf.
i have written a c program for printing the arguments
#include<stdio.h>
void main( int argc, char *argv[])
{
int i=0;
for(i=0;i<argc;i++)
printf("argument %d=%s\n",i,argv[i]);
}
The assembly dump for the above program using gdb i got is.
Dump of assembler code for function main:
0x00000000004004f4 <+0>: push %rbp
0x00000000004004f5 <+1>: mov %rsp,%rbp
0x00000000004004f8 <+4>: sub $0x20,%rsp
0x00000000004004fc <+8>: mov %edi,-0x14(%rbp)
0x00000000004004ff <+11>: mov %rsi,-0x20(%rbp)
0x0000000000400503 <+15>: movl $0x0,-0x4(%rbp)
0x000000000040050a <+22>: movl $0x0,-0x4(%rbp)
0x0000000000400511 <+29>: jmp 0x40053e <main+74>
0x0000000000400513 <+31>: mov -0x4(%rbp),%eax
0x0000000000400516 <+34>: cltq
0x0000000000400518 <+36>: shl $0x3,%rax
0x000000000040051c <+40>: add -0x20(%rbp),%rax
0x0000000000400520 <+44>: mov (%rax),%rdx
0x0000000000400523 <+47>: mov $0x40063c,%eax
0x0000000000400528 <+52>: mov -0x4(%rbp),%ecx
0x000000000040052b <+55>: mov %ecx,%esi
0x000000000040052d <+57>: mov %rax,%rdi
0x0000000000400530 <+60>: mov $0x0,%eax
0x0000000000400535 <+65>: callq 0x4003f0 <printf#plt>
0x000000000040053a <+70>: addl $0x1,-0x4(%rbp)
0x000000000040053e <+74>: mov -0x4(%rbp),%eax
0x0000000000400541 <+77>: cmp -0x14(%rbp),%eax
0x0000000000400544 <+80>: jl 0x400513 <main+31>
0x0000000000400546 <+82>: leaveq
0x0000000000400547 <+83>: retq
End of assembler dump.
Now what i want is " The memory locations (addresses) at which the argument is passing.
for eg if i run the program as " arg 1"
it prints the arguments.
Now I want to know on which instruction it is extracting this argument ,or which register holds the arguments (also take the case if i pass more than 1 argument.. (will this tell me the memory address on which it resides?)
I want to know on which instruction it is extracting this argument ,or
which register holds the arguments
This sort of question is best answered by obtaining an assembly language reference manual for the target machine and studying it. However, your specific questions are easily answered by examination without being very familiar with the specifics of the assembly language:
0x0000000000400503 <+15>: movl $0x0,-0x4(%rbp)
0x000000000040050a <+22>: movl $0x0,-0x4(%rbp)
This is the code generated for the redundant i = 0 statements.
0x0000000000400513 <+31>: mov -0x4(%rbp),%eax
The value of i is now in the %eax register.
0x0000000000400518 <+36>: shl $0x3,%rax
0x000000000040051c <+40>: add -0x20(%rbp),%rax
This calculates the address of argv[i] and puts it in the %rax register.
0x0000000000400520 <+44>: mov (%rax),%rdx
This loads the value of argv[i] into the %rdx register. Code following calls printf with i and argv[i] as arguments.
Now what i want is " The memory locations (addresses) at which the
argument is passing.
You can no more determine that by looking at the asm than you can determine the value of argc by looking at the asm ... these are values that vary and are only determined when actually running the program. If you want to determine the address at runtime, you could use
printf("address of argv = %p\n", (void*)argv);
If that's what you want, then dumping asm and learning what it means is unnecessary and not relevant to your goal.
Since you are using gdb to debug, I guess you might use Linux. Then, use gcc -S -Wall -fverbose-asm foo.c (possibly with optimization flags like -O2) to get a more understandable foo.s assembly file, assuming your file is foo.c (if it is hello.c replace foo by hello). Then look inside foo.s with an editor (like gedit or emacs).
And main is almost an ordinary function (except its signature has to be what is permitted by the standard, e.g. int main(int argc, char**argv) ....) and it is called by crt0.o (compile with gcc -v to learn which one).
You might want to read the Linux Assembly Howto and e.g. the x86-64 ABI and the x86 calling convention wikipage.
Notice that without any optimization flags (e.g. -O1 or -O2) the gcc compiler generates very naive code.