Understanding the assembly dump of the following program - c

i have written a c program for printing the arguments
#include<stdio.h>
void main( int argc, char *argv[])
{
int i=0;
for(i=0;i<argc;i++)
printf("argument %d=%s\n",i,argv[i]);
}
The assembly dump for the above program using gdb i got is.
Dump of assembler code for function main:
0x00000000004004f4 <+0>: push %rbp
0x00000000004004f5 <+1>: mov %rsp,%rbp
0x00000000004004f8 <+4>: sub $0x20,%rsp
0x00000000004004fc <+8>: mov %edi,-0x14(%rbp)
0x00000000004004ff <+11>: mov %rsi,-0x20(%rbp)
0x0000000000400503 <+15>: movl $0x0,-0x4(%rbp)
0x000000000040050a <+22>: movl $0x0,-0x4(%rbp)
0x0000000000400511 <+29>: jmp 0x40053e <main+74>
0x0000000000400513 <+31>: mov -0x4(%rbp),%eax
0x0000000000400516 <+34>: cltq
0x0000000000400518 <+36>: shl $0x3,%rax
0x000000000040051c <+40>: add -0x20(%rbp),%rax
0x0000000000400520 <+44>: mov (%rax),%rdx
0x0000000000400523 <+47>: mov $0x40063c,%eax
0x0000000000400528 <+52>: mov -0x4(%rbp),%ecx
0x000000000040052b <+55>: mov %ecx,%esi
0x000000000040052d <+57>: mov %rax,%rdi
0x0000000000400530 <+60>: mov $0x0,%eax
0x0000000000400535 <+65>: callq 0x4003f0 <printf#plt>
0x000000000040053a <+70>: addl $0x1,-0x4(%rbp)
0x000000000040053e <+74>: mov -0x4(%rbp),%eax
0x0000000000400541 <+77>: cmp -0x14(%rbp),%eax
0x0000000000400544 <+80>: jl 0x400513 <main+31>
0x0000000000400546 <+82>: leaveq
0x0000000000400547 <+83>: retq
End of assembler dump.
Now what i want is " The memory locations (addresses) at which the argument is passing.
for eg if i run the program as " arg 1"
it prints the arguments.
Now I want to know on which instruction it is extracting this argument ,or which register holds the arguments (also take the case if i pass more than 1 argument.. (will this tell me the memory address on which it resides?)

I want to know on which instruction it is extracting this argument ,or
which register holds the arguments
This sort of question is best answered by obtaining an assembly language reference manual for the target machine and studying it. However, your specific questions are easily answered by examination without being very familiar with the specifics of the assembly language:
0x0000000000400503 <+15>: movl $0x0,-0x4(%rbp)
0x000000000040050a <+22>: movl $0x0,-0x4(%rbp)
This is the code generated for the redundant i = 0 statements.
0x0000000000400513 <+31>: mov -0x4(%rbp),%eax
The value of i is now in the %eax register.
0x0000000000400518 <+36>: shl $0x3,%rax
0x000000000040051c <+40>: add -0x20(%rbp),%rax
This calculates the address of argv[i] and puts it in the %rax register.
0x0000000000400520 <+44>: mov (%rax),%rdx
This loads the value of argv[i] into the %rdx register. Code following calls printf with i and argv[i] as arguments.
Now what i want is " The memory locations (addresses) at which the
argument is passing.
You can no more determine that by looking at the asm than you can determine the value of argc by looking at the asm ... these are values that vary and are only determined when actually running the program. If you want to determine the address at runtime, you could use
printf("address of argv = %p\n", (void*)argv);
If that's what you want, then dumping asm and learning what it means is unnecessary and not relevant to your goal.

Since you are using gdb to debug, I guess you might use Linux. Then, use gcc -S -Wall -fverbose-asm foo.c (possibly with optimization flags like -O2) to get a more understandable foo.s assembly file, assuming your file is foo.c (if it is hello.c replace foo by hello). Then look inside foo.s with an editor (like gedit or emacs).
And main is almost an ordinary function (except its signature has to be what is permitted by the standard, e.g. int main(int argc, char**argv) ....) and it is called by crt0.o (compile with gcc -v to learn which one).
You might want to read the Linux Assembly Howto and e.g. the x86-64 ABI and the x86 calling convention wikipage.
Notice that without any optimization flags (e.g. -O1 or -O2) the gcc compiler generates very naive code.

Related

How can I access particular memory address during a GDB session?

This is the disassembly of a very simple C program (strcpy() a constant string and print it):
No symbol table is loaded. Use the "file" command.
Reading symbols from string...done.
(gdb) break 6
Breakpoint 1 at 0x6b8: file string.c, line 6.
(gdb) break 7
Breakpoint 2 at 0x6f2: file string.c, line 7.
(gdb) r
Starting program: /home/wsllnx/Detached/string
Breakpoint 1, main () at string.c:6
6 strcpy(buf, "Memento Mori\n\tInjected_string");
(gdb) disass main
Dump of assembler code for function main:
0x00005555554006b0 <+0>: push %rbp
0x00005555554006b1 <+1>: mov %rsp,%rbp
0x00005555554006b4 <+4>: sub $0x70,%rsp
0x00005555554006b8 <+8>: lea -0x70(%rbp),%rax
0x00005555554006bc <+12>: movabs $0x206f746e656d654d,%rdx
0x00005555554006c6 <+22>: mov %rdx,(%rax)
0x00005555554006c9 <+25>: movabs $0x6e49090a69726f4d,%rcx
0x00005555554006d3 <+35>: mov %rcx,0x8(%rax)
0x00005555554006d7 <+39>: movabs $0x735f64657463656a,%rsi
0x00005555554006e1 <+49>: mov %rsi,0x10(%rax)
0x00005555554006e5 <+53>: movl $0x6e697274,0x18(%rax)
0x00005555554006ec <+60>: movw $0x67,0x1c(%rax)
0x00005555554006f2 <+66>: lea -0x70(%rbp),%rax
0x00005555554006f6 <+70>: mov %rax,%rdi
0x00005555554006f9 <+73>: mov $0x0,%eax
0x00005555554006fe <+78>: callq 0x555555400560 <printf#plt>
0x0000555555400703 <+83>: mov $0x0,%eax
0x0000555555400708 <+88>: leaveq
0x0000555555400709 <+89>: retq
End of assembler dump.
(gdb)
I am currently learning how to fully use GBD and I was wondering:
How can I access particular address like '0x206f746e656d654d'? When I try to do so with x/x or x/s GDB says:
'0x206f746e656d654d: Cannot access memory at address 0x206f746e656d654d'
Same goes for 0x6e49090a69726f4d, 0x735f64657463656a and so on...
Thanks in advance to all the useful answers.
Those aren't actually memory addresses. It's a compiler optimization to represent ASCII values using 64-bit constants. Instead of actually calling strcpy() the compiler is moving the string constant values through registers.
0x206f746e656d654d is the ASCII values for the string 'Memento ' (with a space) in x86 little-endian format.

Confused as to how printf and strings are handled when disassembling a C program

I'm writing small C programs and disassembling them using objdump and gdb to see what the assembly looks like. I don't understand how functions like printf work on an assembly level. The C program I last compiled is:
#include <stdio.h>
int main(){
printf("test");
return 0;
}
A gdb disassembly of the program produces the following:
Dump of assembler code for function main:
0x00401460 <+0>: push ebp
0x00401461 <+1>: mov ebp,esp
0x00401463 <+3>: and esp,0xfffffff0
0x00401466 <+6>: sub esp,0x10
0x00401469 <+9>: call 0x4019c0 <__main>
0x0040146e <+14>: mov DWORD PTR [esp],0x405064
0x00401475 <+21>: call 0x403a60 <printf>
0x0040147a <+26>: mov eax,0x0
0x0040147f <+31>: leave
0x00401480 <+32>: ret
0x00401481 <+33>: nop
0x00401482 <+34>: nop
0x00401483 <+35>: nop
0x00401484 <+36>: xchg ax,ax
0x00401486 <+38>: xchg ax,ax
0x00401488 <+40>: xchg ax,ax
0x0040148a <+42>: xchg ax,ax
0x0040148c <+44>: xchg ax,ax
0x0040148e <+46>: xchg ax,ax
End of assembler dump.
I understand that the printf function is part of the standard C library, which is a precompiled DLL. However the string I'm passing to printf is not precompiled and should be found somewhere in my program. Where is it?
The only place I can think of is main+14. I run 405064 into a hexadecimal to ascii and utf converter, neither of which produced "test". Does that mean 0x405064 is the address where the "test" string is stored? If so, why is it in a random address rather than one relative to the main function? Or is text encoded differently in memory and my hex to ascii converter is pointless?
Also, if the stack is allocated 16 bytes, why is the string stored in [esp]? Aren't characters 1 byte long, and therefore the string should be stored in [ebp-4]?

Preserving Registers?

Okay, so in C code, I have it looping through the command line arguments and printing each one out. I compiled it and opened it in GDB to see what the main function looks like because I was attempting to do the same thing in assembly. I ended up figuring out what my problem was - that my print function was using the same registers as the main function was. I ended up just pushing each onto the stack before the function call and popping them back off after. Only thing I don't understand is why this code doesn't seem to do that and why it doesn't run into the same problem as I was.
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: sub $0x20,%rsp
0x0000000000400535 <+8>: mov %edi,-0x14(%rbp)
0x0000000000400538 <+11>: mov %rsi,-0x20(%rbp)
0x000000000040053c <+15>: jmp 0x400561 <main+52>
0x000000000040053e <+17>: mov -0x4(%rbp),%eax
0x0000000000400541 <+20>: cltq
0x0000000000400543 <+22>: lea 0x0(,%rax,8),%rdx
0x000000000040054b <+30>: mov -0x20(%rbp),%rax
0x000000000040054f <+34>: add %rdx,%rax
0x0000000000400552 <+37>: mov (%rax),%rax
0x0000000000400555 <+40>: mov %rax,%rdi
0x0000000000400558 <+43>: callq 0x400410 <puts#plt>
0x000000000040055d <+48>: addl $0x1,-0x4(%rbp)
0x0000000000400561 <+52>: mov -0x4(%rbp),%eax
0x0000000000400564 <+55>: cmp -0x14(%rbp),%eax
0x0000000000400567 <+58>: jl 0x40053e <main+17>
0x0000000000400569 <+60>: leaveq
0x000000000040056a <+61>: retq
Any input is appreciated, thanks.
(gdb) disass 0x400410
Dump of assembler code for function puts#plt:
0x0000000000400410 <+0>: jmpq *0x200c02(%rip) # 0x601018 <puts#got.plt>
0x0000000000400416 <+6>: pushq $0x0
0x000000000040041b <+11>: jmpq 0x400400
End of assembler dump.
(gdb) disass 0x601018
Dump of assembler code for function puts#got.plt:
0x0000000000601018 <+0>: (bad)
0x0000000000601019 <+1>: add $0x40,%al
0x000000000060101b <+3>: add %al,(%rax)
0x000000000060101d <+5>: add %al,(%rax)
0x000000000060101f <+7>: add %ah,(%rsi)
End of assembler dump.
In fact I can't even seem to find where it's printing out in puts. I must be missing something, just don't know what.
The disassembly you show for puts is not correct. Library symbols for dynamically loaded libraries are resolved, well, dynamically. The compiler generates a call to a stub (procedure linkage table or PLT), the loader resolves that at runtime, the second time you call that function the address has been resolved and it runs faster. Disassemble it on the 2nd iteration and you will see the actual puts code being run, and you will see the registers being pushed.
More info here.
This instruction:
jmpq *0x200c02(%rip) # 0x601018 <puts#got.plt>
reads the quadword (8 bytes) from the address given by the offset from the instruction pointer and jumps there. So to see where this is going, you don't want to use disas 0x601018, you want to use x /1xg 0x601018 to see what is in those bytes (read the pointer), and then call disas on that value to see the actual code for puts
This stuff is all dynamic linkage stuff that is set up to call functions in dynamic libraries. plt is an abbreviation for "program linkage table" and is a set of trampolines created by the linker whenever an object calls a function in some other dynamic library. got is an abbreviation for "global object table" and is a table of function pointers built by the linker and filled in by the dynamic linker when the program is loaded.

Overflow buffer in C on x86_64 to call function

Hello i have such code
#include <stdio.h>
#define SECRET "1234567890AZXCVBNFRT"
int checksecret(){
char buf[32];
gets(buf);
if(strcmp(SECRET,buf)==0) return 1;
else return 0;
}
void outsecret(){
printf("%s\n",SECRET);
}
int main(int argc, char** argv){
if (checksecret()){
outsecret();
};
}
disass of outsecret
(gdb) disassemble outsecret
Dump of assembler code for function outsecret:
0x00000000004005f4 <+0>: push %rbp
0x00000000004005f5 <+1>: mov %rsp,%rbp
0x00000000004005f8 <+4>: mov $0x4006b4,%edi
0x00000000004005fd <+9>: callq 0x400480 <puts#plt>
0x0000000000400602 <+14>: pop %rbp
0x0000000000400603 <+15>: retq
I have an assumption that i don't know SECRET, so i try to run my program with such string python -c 'print "A" * 32 + "\x40\x05\xf4"[::-1]'. But it fails with segmentation fault. What i am doing wrong? Thank you for any help.
PS
I want to call function outsecret by overwriting return code in checksecret
You have to remember that all strings have an extra character that terminates the string, so if you input 32 characters then gets will write 33 characters to the buffer. Writing beyond the limits of an array leads to undefined behavior which often leads to crashes.
The gets function have no bounds-checking, and is very dangerous to use. It has been deprecated since long, and in the latest C11 standard it has even been removed.
$ python -c 'print "A" * 32 + "\x40\x05\xf4"[::1]'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA#
$ perl -le 'print length("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA#")'
33
Your input string is too long for buffer size of 32 characters (extra one is needed for '\0' terminating null character). You are victim to buffer or array overflow (sometimes also called as array overrun).
Note that gets() is deprecated in C99 and eventually it has been dropped in C11 Standard for security reasons.
I want to call function outsecret by overwriting return code in
checksecret
Beware, you are about to leave relatively safe regions of C Standard. This means that behaviour is relative to compiler, compiler's versions, optimization settings, ABI and so on (maybe inclucing current phase of moon).
As of x86 calling conventions integer return value is stored directly in %eax register (that's assuming that you have x86 or x86-64 CPU). Stack-likely-located array buf is handled by %rbp offsets within current stack frame. Let's consult with gdb disassemble command:
$ gcc -O0 test.c
$ gdb -q a.out
(gdb) b checksecret
(gdb) r
Breakpoint 1, 0x0000000000400631 in checksecret ()
(gdb) disas
Dump of assembler code for function checksecret:
0x000000000040062d <+0>: push %rbp
0x000000000040062e <+1>: mov %rsp,%rbp
=> 0x0000000000400631 <+4>: sub $0x30,%rsp
0x0000000000400635 <+8>: mov %fs:0x28,%rax
0x000000000040063e <+17>: mov %rax,-0x8(%rbp)
0x0000000000400642 <+21>: xor %eax,%eax
0x0000000000400644 <+23>: lea -0x30(%rbp),%rax
0x0000000000400648 <+27>: mov %rax,%rdi
0x000000000040064b <+30>: callq 0x400530 <gets#plt>
0x0000000000400650 <+35>: lea -0x30(%rbp),%rax
0x0000000000400654 <+39>: mov %rax,%rsi
0x0000000000400657 <+42>: mov $0x400744,%edi
0x000000000040065c <+47>: callq 0x400510 <strcmp#plt>
0x0000000000400661 <+52>: test %eax,%eax
0x0000000000400663 <+54>: jne 0x40066c <checksecret+63>
0x0000000000400665 <+56>: mov $0x1,%eax
0x000000000040066a <+61>: jmp 0x400671 <checksecret+68>
0x000000000040066c <+63>: mov $0x0,%eax
0x0000000000400671 <+68>: mov -0x8(%rbp),%rdx
0x0000000000400675 <+72>: xor %fs:0x28,%rdx
0x000000000040067e <+81>: je 0x400685 <checksecret+88>
0x0000000000400680 <+83>: callq 0x4004f0 <__stack_chk_fail#plt>
0x0000000000400685 <+88>: leaveq
0x0000000000400686 <+89>: retq
There is no way overwrite %eax directly from C code, but what you could do is to overwrite selective fragment of code section. In your case what you want is to replace:
0x000000000040066c <+63>: mov $0x0,%eax
with
0x000000000040066c <+63>: mov $0x1,%eax
It's easy to accomplish by gdb itself:
(gdb) x/2bx 0x40066c
0x40066c <checksecret+63>: 0xb8 0x00
set {unsigned char}0x40066d = 1
Now let's confirm it:
(gdb) x/i 0x40066c
0x40066c <checksecret+63>: mov $0x1,%eax
From that point checksecret() is returning 1 even if SECRET does not match. However It wouldn't be so easy to do it by buf itself, as you need to know (guess somehow?) correct offset of particular code section instruction.
Above answers are pretty clear and corret way to exploit buffer overflow vulnerability. But there is a different way to do same thing without exploit vulnerability.
mince#rootlab tmp $ gcc test.c -o test
mince#rootlab tmp $ strings test
/lib64/ld-linux-x86-64.so.2
libc.so.6
gets
puts
__stack_chk_fail
strcmp
__libc_start_main
__gmon_start__
GLIBC_2.4
GLIBC_2.2.5
UH-X
UH-X
[]A\A]A^A_
1234567890AZXCVBNFRT
;*3$
Please look at last 2 row. You will see your secret key in there.

gcc on windows generating garbage? windows vs linux

I'm trying to find out why in windows has so much more instructions for the same program than linux.
So I just used int a=0xbeef; and printf("test\n"); in C and compiled in Linux and Windows. When I debug and disassembly the main frame I got this:
On linux:
0x080483e4 <+0>: push %ebp
0x080483e5 <+1>: mov %esp,%ebp
0x080483e7 <+3>: and $0xfffffff0,%esp
0x080483ea <+6>: sub $0x20,%esp
0x080483ed <+9>: movl $0xbeef,0x1c(%esp)
0x080483f5 <+17>: movl $0x80484d0,(%esp)
0x080483fc <+24>: call 0x8048318 <puts#plt>
0x08048401 <+29>: leave
0x08048402 <+30>: ret
Ok that's nice. I see the movl 0x1c offset of esp to put the value there.
But in windows I got this:
0x401290 <main>: push %ebp
0x401291 <main+1>: mov %esp,%ebp
0x401293 <main+3>: sub $0x18,%esp
0x401296 <main+6>: and $0xfffffff0,%esp
0x401299 <main+9>: mov $0x0,%eax
0x40129e <main+14>: add $0xf,%eax
0x4012a1 <main+17>: add $0xf,%eax
0x4012a4 <main+20>: shr $0x4,%eax
0x4012a7 <main+23>: shl $0x4,%eax
0x4012aa <main+26>: mov %eax,0xfffffff8(%ebp)
0x4012ad <main+29>: mov 0xfffffff8(%ebp),%eax
0x4012b0 <main+32>: call 0x401720 <_alloca>
0x4012b5 <main+37>: call 0x4013c0 <__main>
0x4012ba <main+42>: movl $0xbeef,0xfffffffc(%ebp)
0x4012c1 <main+49>: movl $0x403000,(%esp,1)
0x4012c8 <main+56>: call 0x401810 <printf>
0x4012cd <main+61>: mov $0x0,%eax
0x4012d2 <main+66>: leave
0x4012d3 <main+67>: ret
First of all, I don't know why the windows compiler (mingw) generate so much code. Is 2x more than Linux... this makes me thinking. And another thing: from main+9 to main+37 I can't see the point of that code.
I would thank if someone answer to this, I'm just curious :)
edit:
With -O3 argument on linux I got the same and in windows something like magic happends:
0x401290 <main>: push %ebp
0x401291 <main+1>: mov $0x10,%eax
0x401296 <main+6>: mov %esp,%ebp
0x401298 <main+8>: sub $0x8,%esp
0x40129b <main+11>: and $0xfffffff0,%esp
0x40129e <main+14>: call 0x401700 <_alloca>
0x4012a3 <main+19>: call 0x4013a0 <__main>
0x4012a8 <main+24>: movl $0x403000,(%esp,1)
0x4012af <main+31>: call 0x4017f0 <puts>
0x4012b4 <main+36>: leave
0x4012b5 <main+37>: xor %eax,%eax
0x4012b7 <main+39>: ret
leave then xor then ret. ok :D the call _alloca and call __main still there. and I don't know what is 0x401291 <main+1>: mov $0x10,%eax doing here :D
You seem to be compiling with old 3.x series gcc, you'd better upgrade. Newer versions don't invoke alloca. I suspect the particular version of alloca might use register convention so the mov 0x10,%eax is probably setting up the argument for that call.
__main is a startup function defined in crtbegin.o that executes global constructors and registers a function using atexit that will run the global destructors.
Also note that main gets special treatment, like the stack alignment code and the above-mentioned initialization. It may be a good idea to instead compare a "plain" function if you are just interested in code generation issues.
One notes that in the upper example the compiler is calling puts, while in the lower it is printf. I would suspect you have optimization on in the top example, but not on the lower example.
It also might be a debug vs non-debug build.

Resources