Multi dimensional Arrays - c

#include <stdio.h>
int multi[2][3] = {{17, 23, 19}, {72, 34, 44}};
int main()
{
printf("%p\n", multi); //line 1
printf("%p\n", *multi); //line 2
if(*multi == multi)
puts("They are equal!");
return 0;
}
How line 1 and 2 is different?
I'm getting output :
They are equal
Also can somebody refer a good tutorial on pointers and its use with multidimensional arrays .. .

The value is the same but the type is different.
multi is of type int [2][3] and when evaluated it is converted to the type int (*)[3]
*multi is of type int [3] and when evaluated it is of type int *.
Actually:
*multi == multi
is accepted by your compiler but the expression is not valid in C because the two operands of the == operators are of different types. To perform the comparison you would need to cast one of the two operands.

The question to you your answer is given by gcc when you compile your code:
ml.c:10: warning: comparison of distinct pointer types lacks a cast
multi is a type of 2 dimensional int array. that is int[][]
Where *multi is a type of 1 dimensional int array. That is int[]
That's why they are not the same object. One has to be cast to be eligible for comparison. Lets see how this wrong code works under the hood.
Surprisingly there's no cmp instruction at all!(compiled with -g -O0). Actually you don't need a cmp here. Because multi will be decayed to a pointer to &multi[0][0]. and *multi will be decayed to &multi[0]. So from the memory's point of view, they are the same, and c compiler happily optimizes them (even with -O0 :)).
(gdb) disassemble
Dump of assembler code for function main:
0x0000000000400504 <+0>: push rbp
0x0000000000400505 <+1>: mov rbp,rsp
=> 0x0000000000400508 <+4>: mov eax,0x400648
0x000000000040050d <+9>: mov esi,0x600900
0x0000000000400512 <+14>: mov rdi,rax
0x0000000000400515 <+17>: mov eax,0x0
0x000000000040051a <+22>: call 0x4003f0 <printf#plt>
0x000000000040051f <+27>: mov edx,0x600900
0x0000000000400524 <+32>: mov eax,0x400648
0x0000000000400529 <+37>: mov rsi,rdx
0x000000000040052c <+40>: mov rdi,rax
0x000000000040052f <+43>: mov eax,0x0
0x0000000000400534 <+48>: call 0x4003f0 <printf#plt>
0x0000000000400539 <+53>: mov edi,0x40064c
0x000000000040053e <+58>: call 0x400400 <puts#plt>
0x0000000000400543 <+63>: mov eax,0x0
0x0000000000400548 <+68>: leave
0x0000000000400549 <+69>: ret
only thing its doing before calling puts() is moving the address of the string which it should print into argument register.
gdb) x/10cb 0x40064c
0x40064c <__dso_handle+12>: 84 'T' 104 'h' 101 'e' 121 'y' 32 ' ' 97 'a' 114 'r' 101 'e'
0x400654 <__dso_handle+20>: 32 ' ' 101 'e'
There you go, you are confusing the compiler enough :) that it stripped away the cmp may with a always true optimization. :)
Expert C programming has a chapter named (surprise! surprise!)
Chapter 4. The Shocking Truth: C Arrays and Pointers Are NOT the Same!
Highly recommended.

When you compile with -Wall, the compiler will warn you about the line (cause: comparison of distinct pointer types):
if(*multi == multi)
multi is array and in C his address is the address of his first element aka multi[0].
*multi is pointer to the first element of the array aka multi[0].
You are comparing two addresses that contain the same data: ({17, 23, 19}), which explain why you get this output.
Hope this help.
Regards.

In brief in C a matrix is stored as a series of consecutive arrays wich are the rows of the matrix :
m has type pointer to array of 3 ints
and is the address of the first array / row of the matrix
m* has type pointer to int
and is the address of the first
element of the first row
the same apply to m+1 wich is the address of the second array
and for *( m + 1 ) wich is the address of the first element of the second array/row.
hope this helps.

Related

Why char and short data are stored in 4 byte registers? [duplicate]

This question already has answers here:
Why doesn't GCC use partial registers?
(3 answers)
Closed 7 months ago.
I'm learning convert C to assembly, then I found char and short data are stored in 4 byte registers.
note: I use -Og -g to compiler C, and use gdb disas main! In addition, my computer is 64bit.
Below is code about char and correspond to assembly(I think short and char are same problem, so I put one of two code):
#include <stdio.h>
int main(void) {
const int LEN = 3;
char c[LEN];
c[0] = 1;
c[1] = 2;
c[2] = 3;
for(int i = 0; i < LEN; i ++) {
printf("%d\n", c[i]);
}
return 0;
}
a part of disassembler code!
0x000000000000116e <+5>: sub $0x10,%rsp
0x0000000000001172 <+9>: mov %fs:0x28,%rax
0x000000000000117b <+18>: mov %rax,0x8(%rsp)
0x0000000000001180 <+23>: xor %eax,%eax
0x0000000000001182 <+25>: movb $0x1,0x5(%rsp)
0x0000000000001187 <+30>: movb $0x2,0x6(%rsp)
0x000000000000118c <+35>: movb $0x3,0x7(%rsp)
0x0000000000001191 <+40>: mov $0x0,%ebx
0x0000000000001196 <+45>: jmp 0x11b9 <main+80>
0x0000000000001198 <+47>: movslq %ebx,%rax
# why %edx?
0x000000000000119b <+50>: movsbl 0x5(%rsp,%rax,1),%edx
0x00000000000011a0 <+55>: lea 0xe5d(%rip),%rsi # 0x2004
0x00000000000011a7 <+62>: mov $0x1,%edi
0x00000000000011ac <+67>: mov $0x0,%eax
0x00000000000011b1 <+72>: callq 0x1070 <__printf_chk#plt>
0x00000000000011b6 <+77>: add $0x1,%ebx
0x00000000000011b9 <+80>: cmp $0x2,%ebx
0x00000000000011bc <+83>: jle 0x1198 <main+47>
I have learnt a little about java data types, hmm, like byte, char, or short is promoted to int. I'm not sure they are something related.
With the %d format you specify that an "int" is to be printed, thus the value needs to get loaded to (at least) an int-sized register.
When you merely reference a char or short variable in an expression, the language rules say that it is immediately promoted to int.  So, given char c, d; if we say c + d this is the same as saying (int)c + (int)d by the rules of the language.  And also within the expression context printf("%d\n", c); is the same as printf("%d\n", (int)c);
Even if you cast a char variable to char it will still immediately be promoted to int, so if you say (char)c that's the same as saying (int)(char)(int)c.  This is the reason that we can cast int i; to a shorter type (unsigned short)i and get a zero extended full sized int (from the lower 16 bits of i) as a result, or (short)i and get a sign extended full sized int (also from the lower 16 bits) as a result.
This automatic and immediate promotion to int for the shorter data types happens independently of function calling and parameter passing.  So, in printf("%d\n", c); we are passing an int (that happens to be widened from a char) and that's what printf sees.
but why char and short need the promotion?
This is by the definition of the language.  We can guess at rationale, namely that it simplifies the arithmetic operators, and also that we need some rules to rely upon even if they were different from that.
From ISO/IEC 9899:201x Committee Draft — April 12, 2011 N1570
EXAMPLE 2 In executing the fragment
char c1, c2;
/* ... */
c1 = c1 + c2;
the ‘‘integer promotions’’ require that the abstract machine promote the value of each variable to int size
and then add the two ints and truncate the sum. Provided the addition of two chars can be done without
§5.1.2.3 Environment 15
overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only
produce the same result, possibly omitting the promotions.

Assembly Code to C, what are the arguments in the C code that will make the Assembly code

So I just signed up for this online course, and this was part of my first assignment, I have already found the missing pieces in the assembly code and have gotten this far.
This is the assembly code:
0x08048394 <call1+0>: push %ebp
0x08048395 <call1+1>: mov %esp,%ebp
0x08048397 <call1+3>: sub $0x10,%esp
0x0804839a <call1+6>: mov %ebx,(%esp)
0x0804839d <call1+9>: mov %esi,0x4(%esp)
0x080483a1 <call1+13>: mov 0x8(%ebp),%edx
0x080483a4 <call1+16>: mov 0xc(%ebp),%ecx
0x080483a7 <call1+19>: mov (%ecx,%edx,4),%eax
0x080483aa <call1+22>: mov 0x10(%ebp),%ebx
0x080483ad <call1+25>: mov (%ebx,%edx,4),%esi
0x080483b0 <call1+28>: cmp %esi,%eax
0x080483b2 <call1+30>: jle 0x80483b9 <call1+37>
0x080483b4 <call1+32>: mov %eax,(%ebx,%edx,4)
0x080483b7 <call1+35>: jmp 0x80483be <call1+42>
0x080483b9 <call1+37>: mov %esi,(%ecx,%edx,4)
0x080483bc <call1+40>: mov %esi,%eax
0x080483be <call1+42>: pop %ebx
0x080483bf <call1+43>: pop %esi
0x080483c0 <call1+44>: add $0x8,%esp
0x080483c3 <call1+47>: leave
0x080483c4 <call1+48>: ret
My question is, what arguments in the following C code snippet will lead to the above assembly code :
int main(){
int a1[] = {10, 12, 3, 4, 25};
int a2[] = {9, 28, 7, 16, 5};
call1(_________________________________);
}
I think its just a1 and a2 but I am not sure which is why I need some help.
This assembly code to me looks like it may just be swapping the values of the two arrays...
Am I right, or completely off?
As I said in my comment above, this question is ill-formed: "what arguments in the following C code snippet will lead to the above assembly code?" - any argument will lead to that code. That is the code of the function itself, it will always be the same no matter what arguments you pass to it. If you want to figure out which arguments are passed, you need to look at the assembly code of the caller (main).
However, even without the full code of main, with the part of C source that you have and the assembly of the function we can infer the following:
The function is passed 3 arguments, as we can see it referencing 0x8(%ebp), 0xc(%ebp) and 0x10(%ebp). These arguments are in order first, second and third.
The first argument (offset 0x8 from ebp) is used as an index, as we can see from:
mov 0x8(%ebp),%edx
mov 0xc(%ebp),%ecx
mov (%ecx,%edx,4),%eax
The other two arguments (offsets 0xc and 0x10) are treated as pointers to arrays and indexed with the first.
Given the above, a fair reconstruction of the code would be the following:
int call1(int index, int *a1, int *a2) {
int eax, esi;
eax = a1[index];
esi = a2[index];
if (eax <= esi) {
a1[index] = esi;
eax = esi;
} else {
a2[index] = eax;
}
return eax;
}
This assembly code to me looks like it may just be swapping the values of the two arrays... Am I right, or completely off?
Yep, that seems right to me. Of course in reality we do not know what is actually passed to the function, but if the call made in main is the following:
call1(some_index, a1, a2);
Then the function takes a1, a2 and some index, and it checks if the element of a1 at the given index is lower than or equal to the element of a2 at the same index. If so, the first element is overridden by the second, otherwise the second element is overridden by the first. In any case, the function is returning the value of the largest element.
Note that we actually have no idea if the first argument passed is a1 or a2. It could be either way, or it could even be a1 + something and a2 + something_else. What the exact parameters are can only be determined by looking at the full code (C or assembly) of main!

Why this program need more than 45 input to occur buffer overflow(segmentaion fault)?

Why this program needs more than 45 input to occur buffer overflow(segmentaion fault)?
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
char whatever[20];
strcpy(whatever, argv[1]);
return 0;
}
I mean it should be more than 24 char input.by the way there is no grsecurity enabled in my system.and i'm using ubuntu 7.04 32bit on virtual box.
Ok, what's interesting here is the disassembly of main:
push %ebp
mov %esp,%ebp
sub $0x38,%esp
and $0xfffffff0,%esp
mov $0x0,%eax
sub %eax,%esp
mov 0xc(%ebp),%eax
add $0x4,%eax
mov (%eax),%eax
mov %eax,0x4(%esp)
lea 0xffffffd8(%ebp),%eax
mov %eax,(%esp)
call 80482a0 <strcpy#plt>
mov $0x0,%eax
leave
ret
Before entering main, the stack pointer esp points to the return address pushed by call. Let's call that &ret.
The first opcode in the function pushes the base pointer of the previous frame, and then sets the current base pointer to the stack pointer. So ebp = &ret - 4.
When setting up the call to strcpy, the value right at esp is the first parameter. Here:
mov %eax,(%esp)
call 80482a0 <strcpy#plt>
So the value in eax is the first parameter. If we look at the previous instruction, we can see what that value is:
lea 0xffffffd8(%ebp),%eax
Ok, this notation basically means: eax = ebp + 0xffffffd8, which is equivalent to eax = ebp - 40 (see Two's Complement). Basically, you flip all the bits (and get 0x27=39), stick a minus sign (-39), and subtract 1 (-40).
And in relation to the frame's return address: eax = &ret - 44
So it would take at least 45 bytes to overrun the return address.
But you say 47. This is interesting, and it might have to do with the specific input you supplied.
You see, x86 is a little-endian little endian machine, which means that in memory, integers are stored LSB-first. So, when overwriting the stored return address, you first overwrite it's LSB.
If your input happens to be in the vicinity of the LSB, you might cause a faulty termination, but not a segmentation fault, as you will cause a branch to a legitimate address.
If you'll share your input, it might help shed some light on those two missing bytes :)

Assembler debug of undefined expression

I'm trying to get a better understanding of how compilers produce code for undefined expressions e.g. for the following code:
int main()
{
int i = 5;
i = i++;
return 0;
}
This is the assembler code generated by gcc 4.8.2 (Optimisation is off -O0 and I’ve inserted my own line numbers for reference purposes):
(gdb) disassemble main
Dump of assembler code for function main:
(1) 0x0000000000000000 <+0>: push %rbp
(2) 0x0000000000000001 <+1>: mov %rsp,%rbp
(3) 0x0000000000000004 <+4>: movl $0x5,-0x4(%rbp)
(4) 0x000000000000000b <+11>: mov -0x4(%rbp),%eax
(5) 0x000000000000000e <+14>: lea 0x1(%rax),%edx
(6) 0x0000000000000011 <+17>: mov %edx,-0x4(%rbp)
(7) 0x0000000000000014 <+20>: mov %eax,-0x4(%rbp)
(8) 0x0000000000000017 <+23>: mov $0x0,%eax
(9) 0x000000000000001c <+28>: pop %rbp
(10) 0x000000000000001d <+29>: retq
End of assembler dump.
Execution of this code results in the value of i remaining at the value of 5 (verified with a printf() statement) i.e. i doesn't appear to ever be incremented. I understand that different compilers will evaluate/compile undefined expressions in differnet ways and this may just be the way that gcc does it i.e. I could get a different result with a different compiler.
With respect to the assembler code, as I understand:
Ignoring line - 1-2 setting up of stack/base pointers etc.
line 3/4 - is how the value of 5 is assigned to i.
Can anyone explain what is happening on line 5-6? It looks as if i will be ultimately reassigned the value of 5 (line 7), but is the increment operation (required for the post increment operation i++) simply abandoned/skipped by the compiler in the case?
These three lines contain your answer:
lea 0x1(%rax),%edx
mov %edx,-0x4(%rbp)
mov %eax,-0x4(%rbp)
The increment operation isn't skipped. lea is the increment, taking the value from %rax and storing the incremented value in %edx. %edx is stored but then overwritten by the next line which uses the original value from %eax.
They key to understanding this code is to know how lea works. It stands for load effective address, so while it looks like a pointer dereference, it actually just does the math needed to get the final address of [whatever], and then keeps the address, instead of the value at that address. This means it can be used for any mathematical expression that can be expressed efficiently using addressing modes, as an alternative to mathematical opcodes. It's frequently used as a way to get a multiply and add into a single instruction for this reason. In particular, in this case it's used to increment the value and move the result to a different register in one instruction, where inc would instead overwrite it in-place.
Line 5-6, is the i++. The lea 0x1(%rax),%edx is i + 1 and mov %edx,-0x4(%rbp) writes that back to i. However line 7, the mov %eax,-0x4(%rbp) writes the original value back into i. The code looks like:
(4) eax = i
(5) edx = i + 1
(6) i = edx
(7) i = eax

Arrays pointers on 32bit and 64bit systems

The following code prints different results on 32bit and 64bit systems:
#include <stdio.h>
void swapArray(int **a, int **b)
{
int *temp = *a;
*a = *b;
*b = temp;
}
int main()
{
int a[2] = {1, 3};
int b[2] = {2, 4};
swapArray(&a, &b);
printf("%d\n", a[0]);
printf("%d\n", a[1]);
return 0;
}
After compiling it in 32bit system, the output is:
2
3
On 64bit the output is:
2
4
As I understand, the function swapArray just swaps the pointers to the first elements in a and b. So after calling swapArray, a should point to 2 and b should point to 1.
For this reason a[0] should yield 2, and a[1] should reference the next byte in memory after the location of 2, which contains 4.
Can anyone please explain?
Edit:
Thanks to the comments and answers, I now notice that &a and &b are of type int (*)[] and not int **. This obviously makes the code incorrect (and indeed I get a compiler warning). It is intriguing, though, why the compiler (gcc) just gives a warning and not an error.
I am still left with the question what causes different results on different systems, but since the code is incorrect, it is less relevant.
Edit 2:
As for the different results on different systems, I suggest reading AndreyT's comment.
swapArray(&a, &b);
&a and &b are not of type int ** but of type int (*)[2]. BTW your compiler is kind enough to accept your program but a compiler has the right to refuse to translate it.
Before answering your question lets see what happens under the hood during a pointer operation. I'm using a very simple code to demonstrate this :
#include <stdio.h>
int main() {
int *p;
int **p2;
int x = 3;
p = &x;
p2 = &p;
return 0;
}
Now look at the disassembly :
(gdb) disassemble
Dump of assembler code for function main:
0x0000000000400474 <+0>: push rbp
0x0000000000400475 <+1>: mov rbp,rsp
0x0000000000400478 <+4>: mov DWORD PTR [rbp-0x14],0x3
0x000000000040047f <+11>: lea rax,[rbp-0x14]
0x0000000000400483 <+15>: mov QWORD PTR [rbp-0x10],rax
0x0000000000400487 <+19>: lea rax,[rbp-0x10]
0x000000000040048b <+23>: mov QWORD PTR [rbp-0x8],rax
=> 0x000000000040048f <+27>: mov eax,0x0
0x0000000000400494 <+32>: leave
0x0000000000400495 <+33>: ret
The disassembly is pretty self evident. But a few note need to be added here,
My function's stack frame starts from here:
0x0000000000400474 <+0>: push rbp
0x0000000000400475 <+1>: mov rbp,rsp
So lets what they have for now
(gdb) info registers $rbp
rbp 0x7fffffffe110 0x7fffffffe110
here we are putting value 3 in [rbp - 0x14]'s address. lets see the memory map
(gdb) x/1xw $rbp - 0x14
0x7fffffffe0fc: 0x00000003
Its important to notice the DWORD datatype is used, which is a 32 bits wide. So on the side note, integer literals like 3 is treated treated as 4 bytes unit.
Next instruction uses lea to load the effective address of the value just saved in earlier instruction.
0x000000000040047f <+11>: lea rax,[rbp-0x14]
It means that now $rax will have the value 0x7fffffffe0fc.
(gdb) p/x $rax
$4 = 0x7fffffffe0fc
Next we will save this address into memory using
0x0000000000400483 <+15>: mov QWORD PTR [rbp-0x10],rax
Important thing to note that a QWORD which is used here. Because 64bit systems have 8 byte native pointer size. 0x14 - 0x10 = 4 bytes were used in earlier mov instruction.
Next we have :
0x0000000000400487 <+19>: lea rax,[rbp-0x10]
0x000000000040048b <+23>: mov QWORD PTR [rbp-0x8],rax
This is again for the second indirection. always all the value related to addresses are QWORD. This is important thing to take a note of this.
Now lets come to your code.
Before calling to swaparray you have :
=> 0x00000000004004fe <+8>: mov DWORD PTR [rbp-0x10],0x1
0x0000000000400505 <+15>: mov DWORD PTR [rbp-0xc],0x3
0x000000000040050c <+22>: mov DWORD PTR [rbp-0x20],0x2
0x0000000000400513 <+29>: mov DWORD PTR [rbp-0x1c],0x4
0x000000000040051a <+36>: lea rdx,[rbp-0x20]
0x000000000040051e <+40>: lea rax,[rbp-0x10]
0x0000000000400522 <+44>: mov rsi,rdx
0x0000000000400525 <+47>: mov rdi,rax
This is very trivial. Your array is initialized and the effect of & operator is visible when the effective address of the start of array is loaded into $rdi and $rsi.
Now lets see what its doing inside swaparray().
The start of your array is saved into $rdi and $rsi. So lets see their contents
(gdb) p/x $rdi
$2 = 0x7fffffffe100
(gdb) p/x $rsi
$3 = 0x7fffffffe0f0
0x00000000004004c8 <+4>: mov QWORD PTR [rbp-0x18],rdi
0x00000000004004cc <+8>: mov QWORD PTR [rbp-0x20],rsi
Now the first statement int *temp = *a is performed by following instructions.
0x00000000004004d0 <+12>: mov rax,QWORD PTR [rbp-0x18]
0x00000000004004d4 <+16>: mov rax,QWORD PTR [rax]
0x00000000004004d7 <+19>: mov QWORD PTR [rbp-0x8],rax
Now comes the defining moment, what's happening with your *a?
It loads into $rax the value stored in [rbp - 0x18]. where the value $rdi was saved. which in turn holds the address of the first element of the first array.
performs another indirection by using the address stored into $rax to fetch a QWARD and loads it into $rax. So what it will return? it will return a QWARD from 0x7fffffffe100. Which will in effect form a 8 byte quantity from two four byte quantity saved there. To elaborate,
The memory there is like below.
(gdb) x/2xw $rdi
0x7fffffffe100: 0x00000001 0x00000003
Now if you fetch a QWORD
(gdb) x/1xg $rdi
0x7fffffffe100: 0x0000000300000001
So already you are actually screwed. Because you are fetching with incorrect boundary.
The rest of the codes can be explained in similar manner.
Now why its different in 32 bit platform? because in 32 bit platform the native pointer width is 4 bytes. So the thing here will be different there. The main problem with your semantically incorrect code originates from the difference in integer type width and native pointer types. If you have both the same, you may still work around your code.
But you should never write code which assumes the size of native types. That's why standards are for. that's why your compiler is giving you warning.
From language point of view its a type mismatch which is already pointed out in the earlier answers so i'm not going into that.
You can't swap arrays using the pointer trick (they are not pointers!). You would either have to create pointers to those arrays and use the pointers or dynamically allocate the arrays using malloc etc.
The results I get on a 64-bit system are different than yours for example, I get:
2
3
test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped
And with clang on my mac I get an error:
test.cpp: In function ‘int main()’:
test.cpp:13: error: cannot convert ‘int (*)[2]’ to ‘int**’ for argument ‘1’ to ‘void swapArray(int**, int**)’
I assume that this is undefined behavior and you are trying to interpret what is probably junk output.

Resources