Multidimensional array and addressing - c

I have an issue with the multidimensional arrays. Maybe the solution is much easier.
int arr[2][2]; //multidimensional array
My simple question is: why the
arr[0][2] and arr[1][0]
or
arr[1][2] and arr[2][0]
are on the same address in my case?
I checked this problem in Linux and Windows environment. And the issue is the same. I have checked tutorials and other sources, but no answer.

The pointer &arr[0][2] is the one-past-the-end pointer of the array arr[0]. This is the same address as that of the first element of the next array, arr[1], which is &arr[1][0], because arrays are laid out contiguously in memory.
arr[2][0] is a bit tricker: arr[2] is not a valid access, but &arr[2] is the one-past-the-end pointer of the array arr. But since that pointer cannot be dereferenced, it doesn't make sense to talk about arr[2][0]. arr doesn't have a third element.

C stores multi-dimensional arrays in what is called row-major order. In that configuration, all the data for a single row is stored in consecutive memory:
arr[2][2] -> r0c0, r0c1, r1c0, r1c2
The alternative would be column-major order, which places the columns consecutively.
Since you have specified the length of the row (number of cols) as 2, it follows that accessing column 2 (the third column) will compute an address that "wraps around" to the next row.
The math looks like:
&(arr[row][col])
= arr # base address
+ row * ncols * sizeof(element)
+ col * sizeof(element)
= arr + sizeof(element) * (row * ncols + col)
In your case, arr[0][2] is arr + (0*2 + 2) * sizeof(int), while arr[1][0] is arr + (1*2 + 0)*sizeof(int).
You can do similar math for the other variations.

Array indexing is identical to pointer arithmetic (actually, the array name first is converted ("decays") to a pointer to the first element before the []-operator is applied):
arr[r][c] <=> *(arr + r * INNER_LENGTH + c)
Your array has two entries per dimension. In C indexes start from 0, so for each dimension valid indexes are 0 and 1 (i.e. total_entries - 1). Which makes three of your expressions suspective in the first place:
arr[0][2] // [outer dimension/index][inner dimension/index]
arr[1][2]
arr[2][0]
We have these cases:
Both indexes are valid: no problem.
Only the address is taken, the element is not accessed and
the outer index is valid and the inner (see below) index equals the length of the inner dimension: comparison and certain address arithmetic is allowed (other constraints apply!).
the outer index equals the length of the outer dimension, and the inner index is 0: The same.
Anything else: the address is invalid and any usage (take address, dereference, etc.) invokes undefined behaviour.
What exactly goes on in memory might become a bit more clear if we use different lengths for the dimensions and have a look how the data is stored:
int arr[3][2];
This is an "array of 3 arrays of 2 int elements". The leftmost dimension is called the "outer", the rightmost the "inner" dimension, because of the memory layout:
arr[0][0] // row 0, column 0
arr[0][1] // row 0, column 1
arr[1][0] // ...
arr[1][1]
arr[2][0]
arr[2][1]
Using the formula above, &arr[0][2] (arr + 0 * 2 + 2) will yield the same as &arr[1][0] (arr + 1 * 2 + 0), etc. Note, however, while the addresses are identical, the first version must not be dereferenced and the compiler may generate incorrect code, etc.

Array indexing in C is similar to adding the value of the index to the address of the first element.
In the multidimensional array that you describe, you have 2 elements on each dimension: 0 and 1. When you introduce a number larger than that, you're referencing an element outside that dimension. Technically, this is an array out of bounds error.
The addresses break down like this:
arr[0][0] - &arr[0] + 0
arr[0][1] - &arr[0] + 1
arr[1][0] - &arr[0] + 2
arr[1][0] - &arr[0] + 3
When you write arr[0][2], you're referencing address &arr[0] + 2, which is the same as arr[1][0]. It all just pointer math, so you can work it out pretty easily once you know how it works.

You can look in your two dimensional array as a long one dimensional array:
[00][01][10][11]
With the pointers arithmetic, another representation of this long one dimensional array is:
[00][01][02][03]
So looking in cell [10] is exactly the same as looking into a cell [20] in pointer arithmetic point of view.

Related

Storage order for multidimensional arrays in C

With a C compiler, are array elements are stored in column major order or row major order, or it is compiler dependent?
int arr[2][3]={1,2,3,4,5,6};
int array[3][2]={1,2,3,4,5,6};
on printing arr and array output:
arr:
1 2 3
4 5 6
array:
1 2
3 4
5 6
It seems its always prefer row major order?
Row major order is mandated by the standard.
6.5.2.1p3:
Successive subscript operators designate an element of a
multidimensional array object. If E is an n-dimensional array (n >= 2)
with dimensions i x j x . . . x k, then E (used as other than an
lvalue) is converted to a pointer to an (n - 1)-dimensional array with
dimensions j x . . . x k. If the unary * operator is applied to this
pointer explicitly, or implicitly as a result of subscripting, the
result is the referenced (n - 1)-dimensional array, which itself is
converted into a pointer if used as other than an lvalue. It follows
from this that arrays are stored in row-major order (last subscript
varies fastest).
(Emphasis mine)
You printed the array. The output is in whatever order that you printed the array elements. So what you see has nothing to do with the order in which array elements are stored in memory.
int arr[2][3] means that you have three arrays, and the object stored in each array is an int[2]. Objects are always stored consecutively, so the first int[2] is stored in consecutive memory, followed by the second int[2], followed by the third int[2]. And that is the same for any C implementation.

Subtracting addresses in a 3d array

I created a 3d array
a[2][3][2]
Now when I try to print
printf("%d",a[1] - a[0]);
I get 3 as the output.
What I understand is that a[1] gives me the address of a[1][0][0] element and a[0] the address of a[0][0][0].
Let Address of a[0][0][0] is BA then Address of a[1][0][0] is BA + 4*2*3 where 4 byte is the memory space of an integer datatype
I was expecting the result to be 6.
Similarly I tried
printf("%d",(&a + 1) - &a);
and the output received was 1.
Where am I going wrong?
Edit 1: Entire Code
#include<stdio.h>
int main(){
int a[2][3][2] = {{{1,2},{3,4},{5,6}},{{7,8},{9,10},{11,12}}};
printf("%d",a[1]-a[0]);
return 0;
}
What I understand is that a[1] gives me the address of a[1][0][0] element and a[0] the address of a[0][0][0].
This is wrong a[0] will give the address of the first 2D array. The address of the first 2D array and the address of a[0][0][0] might be co-incident, but they are not the same.
Specifically &a +1 is not equal to &a[0][0][0] +1
Let's break the expression a[1] - a[0] apart:
a[1] - refers to the second [3][2] array.
a[0] - refers to the first [3][2] array.
Now, when arrays are used in most contexts, they decay into pointers to the first element. So a[i] will decay into a pointer to a 2d array int(*)[2].
The difference is calculated in sizeof(int[2]) as dictated by pointer arithmetic. And you can see that there are 3 units of int[2] in the range [a[0], a[1]).

c - array memory storage

I am relatively new to C and am just learning about ways that memory is stored during a program. Can someone please explain why the following code:
int main(int argc, char** argv){
float x[3][4];
printf("%p\n%p\n%p\n%p\n", &(x[0][0]), &(x[2][0]), &(x[2][4]), &(x[3][0]));
return 0;
}
outputs this:
0x7fff5386fc40
0x7fff5386fc60
0x7fff5386fc70
0x7fff5386fc70
Why would the first 3 be different places in memory but the last be the same as the third?
Why is there a gap the size of 20 between the first two, but a gap the size of 10 between the second and third? The distance between &(x[2][0]) and &(x[2][4]) doesn't seem like half the distance between &(x[0][0])and &(x[2][0]).
Thanks in advance.
When you declare an array of size n, the indices range from 0 to n - 1. So x[2][4] and x[3][0] are actually stepping outside the bounds of your arrays.
If you weren't already aware, the multidimensional array you declared is actually an array of arrays.
Your compiler is laying out each array one after the other in memory. So, in memory, your elements are laid out in this order: x[0][0], x[0][1], x[0][2], x[0][3], x[1][0], x[1][1], and so on.
It looks like you already understand how pointers work, so I'll gloss over that. The reason the last two elements are the same is because x[2][4] is out of bounds, so it's referring to the next slot in memory after the end of the x[2] array. That would be the first element of the x[3] array, if there was one, which would be x[3][0].
Now, since x[3][0] refers to an address that you don't have a variable mapping to, it's entirely possible that dereferencing it could cause a segmentation fault. In the context of your program, there just happens to be something stored at 0x7fff5386fc70; in other words, you got lucky.
This is due to pointer arithmetic.
Your array is flat, which means that data are stored in a linear way, each one after the other in memory. First [0][0] then [0][1], etc.
The address of [x][y] is calculated as (x*4+y)*float_size+starting_address.
So the gap between the two first [0][0] and [2][0] is 8*float_size. The difference is 20 in hexadecimal, which is 32 in decimal, float_size is then 4.
In between the second and third you have (2*4+4)-(2*4)*float_size which is 16 in decimal, so 10 in hexadecimal. This is exactly half the size of the previous because it is the size of one row (the size of 4 elements in the third row), and the previous is the size of two rows (the size of 8 elements in the first and second rows).
Arrays are linear data structures. Irrespective of their dimension, say 1-dimensional or 2-dimensional or 3-dimensional, they are linearlly arranged.
Your x[3][4] will be stored in memory as consecutive fixed sized cells like :
| (0,0) | (0, 1) | (0,2) | (0,3) | (1,0) | (1,1) | (1,2) | (1,3) | (2,0) | (2,1) | (2,2) | (2,3) |
This x[0][0] notation is matrix notation. On compile time, it is converted to pointer notation. The calculation is like:
x[i][j] = y * i + j where y in your case is 4.
So on calculating by this way the outputs are perfect.
Array elements in C are stored contiguously, in row-major order. So, in your example, &x[row][column] is exactly equal to &x[0][0]+((row*4)+column))*sizeof(float) (when those addresses are converted to number of bytes, which is what you're outputting).
The third address you're printing has the second index out of bounds (valid values 0 to 3), and the fourth has the first index out of bounds (valid values 0 to 2). It just happens that the values you've chosen work out to the same location in memory, because the rows are laid out in memory end-to-end.
There are 8 elements between &(x[0][0]) and &(x[2][0]). The actual difference in memory is multiplied by sizeof(float) which, for your compiler, is 4. 4*8 is 32 which, when printed as hex, is 0x20, is the difference you're seeing.
If you picked a value of row and column where ((row*4)+column)) was 12(=3*4) or more, your code would be computing the address of something outside the array. Attempting to use such a pointer pointer (e.g. setting the value at that address) would give undefined behaviour. You just got lucky that the indices you picked happen to be within the array.

2-D array in C, address generation

How do addresses get generated in arrays in C, say how does a [x][y] get to a particular value, i know its not that big a question but just about to actually start coding.
Well it is done depending on your data type of whose array you have considered.
Say for an Integer array, each value holds 4 bytes, thus a row X long will take 4X bytes.
Thus a 2-D matrix of X*Y will be of 4*X*Y Bytes.
Any address say Arry[X][Y] would be
calculated as : (Base Address of Arry)
+ (X * No. of columns) + ( Y // Offset in current row )
2-dimensional arrays in C are rectangular. For example:
int matrix[2][3];
allocates a single block of memory 2*3*sizeof(int) bytes in size. Addressing matrix[0][1] is just a matter of adding 0 * (3 * sizeof(int)) to sizeof(int). Then add that sum to the address at which matrix starts.
A nested array is an array of arrays.
For example, an int[][6] is an array of int[6].
Assuming a 4-byte int, each element in the outer array is 6 * 4 = 24 bytes wide.
Therefore, arr[4] gets the third array in the outer array, or *(arr + 4 * 24).
arr[4] is a normal int[]. arr[4][2] gets the second int in this inner array, or *(arr + 4 * 24 + 2 * 4)
E.g.
char anArray[][13]={"Hello World!","February","John"};
You can visualize it as:
anArray:
H|e|l|l|o| |W|o|r|l|d|!|\0|F|e|b|r|u|a|r|y|\0|\0|\0|\0|\0|J|o|h|n|\0|\0|\0|0|\0
^ ^ ^
0 13 26

pointer indirection confusion

I have an array as:
int x[3][5]={
{1,2,3,4,5},
{6,7,8,9,10},
{11,12,13,14,15}
};
What does *x refer to?
*(*x+2)+5 refer to "8".How does that happen?
Is *(*x+2) same as *(*x)+2?
What if I do:
*n=&x;
Where is the pointer n pointing to? if it would have been only x and not an & then it would have been the base address.What for now?
*x is a dereference operation. In other words, "give me what x is pointing at". Since this is an array (of arrays), dereferencing x will give you the first array. This is equivalent to the array access syntax of x[0].
*(*x+2)+5 is equivalent to x[0][2] + 5, which gives you 8. This is because:
*x is the same as x[0] (see #1) and *(x + 2) is the same as x[2]. Once you've done two dereferences, you've gone from an array of arrays (similar to a double-pointer) to an array (single pointer) to an actual number (the third item in the first array). Then, it's just 3 + 5 = 8.
*(*x+2) is equivalent to x[0][2] (see #2), which is 3 (third element in array). However, *(*x) + 2 gives you x[0][0] + 2 (first element in array plus 2), which is 1 + 2 = 3. Same answer, but very different way of getting it.
*x refers to the first array ({1,2,3,4,5}), and is equivalent to x[0]. Adding one to x move to the next array, so *(x+1) would refer to the second array, and would be equivalent to x[1].
*(*x + 2) is therefore the third element in the first array, which is 3. This means that *(*x + 2) + 5 is equal to 8.
The parentheses matter a lot, for example *(*(x+2)) would be the first element in the third array.
*(*x + 2) results in the same value as *(*x) + 2, but does not use the same element of the array.
x is a int** so it's like if you have a first layer of pointers and everyone of them point to a int* (so an array of int).
When you write *x you obtain the address that contains the address which points to the first row of your multi dimensional array.
So if you take (*x + 2) if it's like referencing to first row of you array and then add 2 to the address: you obtain the address of the third element of first row. But since this is still a pointer you add an external *(*x+2) to exactly obtain third element of first row.
Think of it this way:
typedef int Int5[5];
Int5 x[3];
x is an array with 3 elements. Each of those three elements is a array of 5 ints.
What does *x refer to?
x is the same as '&x[0]so*xis the same asx[0]` which is the first 5-element array.
*(*x+2)+5 refer to "8". How does that happen?
*x is x[0], and x+2 is &x[2] so *x+2 is &x[0][2] and *(*x + 2) is x[0][2] which happens to be 3. Add five to that for 8.
Is *(*x+2) same as *(*x)+2?
*(*x+2) is x[0][2] as we've seen. *(*x) would be x[0][0], so *(*x)+2 is x[0][0]+2. So both *(*x+2) and *(*x)+2 end up equaling 3, but that is merely a coincidence.
All the answers are 100% correct and I will just generally explain this part *n=&x in general terms
&x generates a pointer (variable containing an address of another variable) and stores it in n and to get the value pointed to by n, you *n called de referencing or indirection.
To really understand this pointer business, you need to study how computers store values in memory.

Resources