When we're defining a 2D array as in:
int *a[5];
Which dimension does the "5" define? The first or the second?
It's not a "2D" array. It's a 1-dimensional array of pointers to int. As such the array size designates that it has space for 5 pointers. Each individual pointer can point to the first element of a buffer with different size.
A "true 2D array" is the colloquial "array of arrays" int a[M][N]. Here the expression a[i] evaluates to the array of N integers, at position i.
Each a[i] points to a single int, which may be the first element in a sequence of int objects, like so:
a[0] a[1] a[2] a[3] a[4]
+----+----+----+----+----+
| | | | | |
+----+----+----+----+----+
| | | | |
| | | ... ...
| | +-------------------------------+
| +-------------------+ |
+-------+ | |
| | |
v v v
+---+ +---+ +---+
a[0][0] | | a[1][0] | | a[2][0] | |
+---+ +---+ +---+
a[0][1] | | a[1][1] | | a[2][1] | |
+---+ +---+ +---+
... ... ...
Thus, each a[i] can represent a "row" in your structure. You can dynamically allocate each "row" as
a[i] = malloc( sizeof *a[i] * row_length_for_i );
or you can set it to point to an existing array:
int foo[] = { 1, 2, 3 };
int bar[] = { 5, 6, 7, 8, 9 };
...
a[0] = foo;
a[1] = bar;
As shown in the example above, each "row" may have a different length.
I keep putting scare quotes around "row" because what you have is not a true 2D array - it's not a contiguous sequence of elements. The object immediately following a[0][N-1] will most likely not be a[1][0]. What you have is a sequence of pointers, each of which may point to the first element of a sequence of int, or to a single int, or to nothing at all.
Related
Could you explain how the output is -4? I think ++pp; is UB but not sure. Your explanation will really help in the way of understanding. Could be there any difference of outputs in big-endian or little-endian machine?
#include <stdio.h>
int a[] = { -1, -2, -3, -4 };
int b[] = { 0, 1, 2, 3 };
int main(void)
{
int *p[] = { a, b };
int **pp = p;
printf("a=%p, b=%p, p=%p, pp=%p\n", (void*)a, (void*)b, (void*)p, (void*)pp);
++pp;
printf("p=%p, pp=%p *pp=%p\n", (void*)p, (void*)pp, (void*)*pp);
++*pp;
printf("p=%p, pp=%p *pp=%p\n", (void*)p, (void*)pp, (void*)*pp);
++**pp;
printf("%d\n", (++**pp)[a]);
}
My output:
a=0x107121040, b=0x107121050, p=0x7ffee8adfad0, pp=0x7ffee8adfad0
p=0x7ffee8adfad0, pp=0x7ffee8adfad8 *pp=0x107121050
p=0x7ffee8adfad0, pp=0x7ffee8adfad8 *pp=0x107121054
-4
Ideone output
When you use the name of an array (in most contexts), it decays to a pointer to its first element. That means that int* p = a; and int* p = &a[0]; are exactly the same.
So to understand what happens in this case, just walk through step by step. At the point of your first printf call, things look like this:
pp p a
+-------+ +------+ +----+----+----+----+
| +---------> +--------> -1 | -2 | -3 | -4 |
+-------+ | | +----+----+----+----+
| |
+------+ b
| | +----+----+----+----+
| +---------> 0 | 1 | 2 | 3 |
| | +----+----+----+----+
+------+
pp points to the first element of p, which is a pointer to the first element of a.
Now, when you increment pp, it changes to point to the second element of p, which is a pointer to the first element of b:
pp p a
+-------+ +------+ +----+----+----+----+
| + | | +--------> -1 | -2 | -3 | -4 |
+---|---+ | | +----+----+----+----+
| | |
| +------+ b
| | | +----+----+----+----+
+---------> +---------> 0 | 1 | 2 | 3 |
| | +----+----+----+----+
+------+
You then increment *pp. Since *pp is a pointer to the first element of b, that pointer is incremented to point to the second element of b:
pp p a
+-------+ +------+ +----+----+----+----+
| + | | +--------> -1 | -2 | -3 | -4 |
+---|---+ | | +----+----+----+----+
| | |
| +------+ b
| | | +----+----+----+----+
+---------> | | 0 | 1 | 2 | 3 |
| + | +----+-^--+----+----+
+---|--+ |
+---------------+
Then you increment **pp. At this point pp is a pointer to the second element of p, so *pp is a pointer to the second element of b. That means **pp names the second element of b. You increment that from 1 to 2:
pp p a
+-------+ +------+ +----+----+----+----+
| + | | +--------> -1 | -2 | -3 | -4 |
+---|---+ | | +----+----+----+----+
| | |
| +------+ b
| | | +----+----+----+----+
+---------> | | 0 | 2 | 2 | 3 |
| + | +----+-^--+----+----+
+---|--+ |
+---------------+
Now, lets dissect (++**pp)[a]. ++**pp is the same as before, so the second element of b gets incremented to 3.
Now, for any pointer ptr and integer n, ptr[n] is the same as *(ptr + n). Since addition is commutative, ptr + n is the same as n + ptr. That means ptr[n] is the same as n[ptr].
Putting these together, that means that (++**pp)[a] is the same as 3[a], which is the same as a[3]. a[3] is -4, hence your result.
Remember the definition of the subscription operator [], e.g. as defined in this online C standard draft:
6.5.2.1 Array subscripting
2) ... The definition of the subscript operator [] is that E1[E2] is
identical to (*((E1)+(E2))). ...
It says that E1[E2] is identical to (*((E1)+(E2)).Then it becomes clear that (++**pp)[a] is the same as *((++**pp)+(a)), which again is the same as *((a)+(++**pp)), which consequently reads as a[(++**pp)]. The value of ++**pp is 3 then, and a[3] is -4.
It's easiest to understand this if you express all the array names in expressions as their decayed values. arrayName as a pointer becomes &arrayName[0]. So after all the initializations, you have:
a[0] = -1, a[1] = -2, a[2] = -3, a[3] = -4
b[0] = 0, b[1] = 1, b[2] = 2, b[3] = 3
p[0] = &a[0], p[1] = &b[0]
pp = &p[0]
Incrementing a pointer makes it point to the next array element, so after ++pp we now have
pp = &p[1]
++*pp dereferences pp, so it's equivalent to ++p[1], so now we have
p[1] = &b[1]
++**pp dereferences this twice, so it's equivalent to ++b[1], so now we have
b[1] = 2
Finally, we have the really confusing expression (++**pp)[a]. ++**pp again increments b[1], so its value is now 3, and that value replaces that expression, so it's equivalent to 3[a]. This might look like nonsense (3 isn't an array, how can you index it?), but it turns out that in C, x[y] == y[x] because of the way indexing is defined in terms of pointer arithmetic. So 3[a] is the same as a[3], and the last line prints -4.
Does this statement make sense, from the book C Programming: A Modern Approach, 2nd Edition on page 269
Just as the name of a one-dimensional array can be used as a pointer, so can the name of any array, regardless of how many dimensions it has. Some care is required, though. Consider the following array:
int a[NUM_ROWS][NUM_COLS];
a is not a pointer to a[0][0]; instead, it's a pointer to a[0]. This makes more sense if we look at it from the standpoint of C, which regards a not as a two-dimensional array but as a one-dimensional array whose elements are one-dimensional arrays. When used as a pointer, a has type int (*) [NUM_COLS] (pointer to an integer array of length NUM_COLS).
I'm confused because when I think "array whose elements are one-dimensional arrays" I think a jagged-array, but that's not what's going on here.. This is more like a macro with pointer arithmetic?
Is this in reference to the type system and how it treats multidimensional arrays? Could any one explain this?
Yes, it makes sense, and no, it's not even talking about "ragged" or "jagged" arrays. It's simply that when we say
int a[NUM_ROWS][NUM_COLS];
what we're creating is an array a, and what it's an array of is... other arrays. You could think of it like this:
+---------------------------------------+
| +--------+--------+--------+--------+ |
a: [0]: | | | | | | |
| +--------+--------+--------+--------+ |
+ +
| +--------+--------+--------+--------+ |
[1]: | | | | | | |
| +--------+--------+--------+--------+ |
+ +
| +--------+--------+--------+--------+ |
[2]: | | | | | | |
| +--------+--------+--------+--------+ |
+---------------------------------------+
(Here NUM_COLS is evidently 4, and NUM_ROWS is 3.)
A two- (or more) dimensional array is 100% analogous to a simple, single-dimensional array -- you just have to be careful thinking about the analogies. If a is an array, then any mention of a in an expression where its value is needed results in a pointer to the array's first element, &a[0]. So given the two-dimensional array a we're talking about, a's value is &a[0] and is a pointer to an array of NUM_COLS integers.
It has to work this way, if multidimensional array subscripts are to work correctly. If we write a[i][j], that's interpreted as (a[i])[j]. a turns into a pointer to the array's first element, as usual, but a[i] is equivalent to *(a + i), where the pointer arithmetic ends up being scaled by the size of the pointed-to element -- that is, under the hood, it's more like *(a+ i * sizeof(*a)). So sizeof(*a) has to be sizeof(int [NUM_COLS]), or NUM_COLS * sizeof(int). That way a[i] gets you the i'th subarray, and then j can select one of the cells -- the int-sized cells -- of the subarray.
One final note: I've talked colloquially about "multi-dimensional arrays", but strictly speaking, and as many of the regulars here are fond of pointing out, C has no multidimensional arrays; it has only single-dimensional arrays, and what we think of as a two-dimensional array is actually, as we've seen here, a single-dimensional array whose elements happen to be other single-dimensional arrays. (If C had true multi-dimensional arrays, the subscripts would probably look like a[i,j] instead of a[i][j].)
Addendum: Despite your mention of pointer arithmetic, and my mention of pointer arithmetic, it's important to realize that there are no pointers involved in a's definition. Pointers arise only when we try to "take the value of" a, or explain how a[i] is equivalent to *(a + i).
For a data structure that does involve pointers, we could contrast the situation described by the code
int *a2[NUM_ROWS];
for(i = 0; i < NUM_ROWS; i++)
a2[i] = malloc(NUM_COLS * sizeof(int));
This gives us a very different memory layout:
+-----+
a2: | | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
And this is what's usually called a "ragged" or "jagged" array, since it's obviously not necessary that all the rows in this case be the same length. Nevertheless, almost magically, the cells in the "ragged" array can also be accessed using the a2[i][j] notation. And for full dynamism, we could use
int **a3 = malloc(NUM_ROWS * sizeof(int *));
for(i = 0; i < NUM_ROWS; i++)
a3[i] = malloc(NUM_COLS * sizeof(int));
resulting in this memory layout:
+-----+
a3: | |
| * |
| | |
+--|--+
|
|
V
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
| | +--------+--------+--------+--------+
| *------->| | | | |
| | +--------+--------+--------+--------+
+-----+
And a3[i][j] works here, too.
(Of course, in real code constructing "dynamic arrays" like a2 and a3, we'd have to check to make sure that malloc didn't return NULL.)
Another way to look at it...
For any type T, we create an array as
T arr[N];
where T can be int, char, double, struct foo, whatever, and reads as “N-element array of T”. It can also be another array type. So, instead of just int, suppose T is an M-element array of int, which we’d write as
int arr[N][M];
This reads as “arr is an N-element array of M-element arrays of int”. This isn’t a jagged array - all the “rows” are the same size. But it’s not exactly a 2-dimensional array, either - it is an array of arrays. The expression arr[i] has an array type (int [M]).
This view helps us figure out pointer to array types as well. Except when it is the operand of the sizeof or unary & operator, or is a string literal used to initialize a character array in a declaration, an expression of type “N-element array of T” (T [N]) will be converted (“decay”) to an expression of type “pointer to T” (T *). Again, if you replace T with an array type int [M], then you have an expression of type “N-element array of M-element arrays of int” (int [N][M]), which “decays” to type “pointer to M-element array of int” (int (*)[M]).
Excuse the amateurism but I'm really struggling to understand the basic incrementing mechanisms. Are the comments correct?
#include <stdio.h>
main()
{
int a[5]={1,2,3,4,5};
int i,j,m;
i = ++a[1]; // the value of a[1] is 3. i=3
j = ++a[1]; /* because of the previous line a[1]=3
and now a[1]=4? but not in the line defining i? */
m = a[i++]; /* i retained the value of 3 even though the value of a[1] has changed
so finally i++ which is incremented in printf()? */
printf("%d, %d, %d", i,j,m);
}
I could be answering my own question but I have fooled myself quite a few times learning C so far.
i = ++a[1] will increment the value of a[1] to 3 and the result of ++a[1] which is 3 will be assigned to i.
j = ++a[1]; will increment the value of a[1] to 4 and the result of ++a[1] which is 4 will be assigned to j.
m = a[i++];, will assign the value of a[3] (as i is 3 b now) to m which is 4 and i will be incremented by 1. Now i becomes 4.
The thing to remember with the ++ and -- operators is that the expression has a result and a side effect. The result of ++i is the original value of i plus 1. The side effect of ++i is to add 1 to the value stored in i.
So, if i is originally 0, then in the expression
j = ++i
j gets the result of 0 + 1 (the original value of i plus 1). As a side effect, 1 is added to the value currently stored in i. So after this expression is evaluated, both i and j contain 1.
The postfix version of ++ is slightly different; the result of i++ is the original value of i, but the side effect is the same - 1 is added to the value stored in i. So, if i is originally 0, then
j = i++;
j gets the original value of i (0), and 1 is added to the value stored in i. After this expression, j is 0 and i is 1.
Important - the exact order in which the assignment to j and the side effect to i are executed is not specified. i does not have to be updated before j is assigned, and vice versa. Because of this, certain combinations of ++ and -- (including but not limited to i = i++, i++ * i++, a[i++] = i, and a[i] = i++) will result in undefined behavior; the result will vary, unpredictably, depending on platform, optimization, and surrounding code.
So, let's imagine your objects are laid out in memory like so:
+---+
a: | 1 | a[0]
+---+
| 2 | a[1]
+---+
| 3 | a[2]
+---+
| 4 | a[3]
+---+
| 5 | a[4]
+---+
i: | ? |
+---+
j: | ? |
+---+
m: | ? |
+---+
First we evaluate
i = ++a[1];
The result of ++a[1] is the original value of a[1] plus 1 - in this case, 3. The side effect is to update the value in a[1]. After this statement, your objects now look like this:
+---+
a: | 1 | a[0]
+---+
| 3 | a[1]
+---+
| 3 | a[2]
+---+
| 4 | a[3]
+---+
| 5 | a[4]
+---+
i: | 3 |
+---+
j: | ? |
+---+
m: | ? |
+---+
Now we execute
j = ++a[1];
Same deal - j gets the value of a[1] plus 1, and the side effect is to update a[1]. After evaluation, we have
+---+
a: | 1 | a[0]
+---+
| 4 | a[1]
+---+
| 3 | a[2]
+---+
| 4 | a[3]
+---+
| 5 | a[4]
+---+
i: | 3 |
+---+
j: | 4 |
+---+
m: | ? |
+---+
Finally, we have
m = a[i++];
The result of i++ is 3, so m gets the value stored in a[3]. The side effect is to add 1 to the value stored in i. Now, our objects look like
+---+
a: | 1 | a[0]
+---+
| 4 | a[1]
+---+
| 3 | a[2]
+---+
| 4 | a[3]
+---+
| 5 | a[4]
+---+
i: | 4 |
+---+
j: | 4 |
+---+
m: | 4 |
+---+
I want to create an integer array[5][10] using malloc(). The difference between memory address of array[0] and array[1] is showing 8. Why?
#include <stdio.h>
#include <stdlib.h>
int main() {
int *b[5];
for (int loop = 0; loop < 5; loop++)
b[loop] = (int*)malloc(10 * sizeof(int));
printf("b=%u \n", b);
printf("(b+1)=%u \n", (b + 1));
printf("(b+2)=%u \n", (b + 2));
}
The output is:
b=2151122304
(b+1)=2151122312
(b+2)=2151122320
The difference between memory address of array[0] and array[1] is showing 8. Why?
That's because sizeof of a pointer on your platform is 8.
BTW, use of %u to print a pointer leads to undefined behavior. Use %p instead.
printf("(b+1)=%p \n",(b+1));
printf("(b+2)=%p \n",(b+2));
Difference between array of pointers and a 2D array
When you use:
int *b[5];
The memory used for b is:
&b[0] &b[1] &b[2]
| | |
v v v
+--------+--------+--------+
| b[0] | b[1] | b[2] |
+--------+--------+--------+
(b+1) is the same as &b[1]
(b+2) is the same as &b[2]
Hence, the difference between (b+2) and (b+1) is the size of a pointer.
When you use:
int b[5][10];
The memory used for b is:
&b[0][0] &b[1][0] &b[2][0]
| | |
v v v
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+ ...
| | | | | | | | | | | | | | | | | | | | | ...
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+ ...
(b+1) is the same as &b[1], The value of that pointer is the same as the value of &b[1][0] even though they are pointers to different types.
(b+2) is the same as &b[2], The value of that pointer is the same as the value of &b[2][0]
Hence, the difference between (b+2) and (b+1) is the size of 10 ints.
First, with int *b[5] you are not creating a two dimensional array, but an array of pointers.
The elements of the array b are pointers. Each occupies the size of a pointer, which depends on your architecture. In a 64-bits architecture it will probably occupy 64 bits (8 bytes). You can check that by printing sizeof(int*) or sizeof(b[0])
Memory allocation will look like
b
+-----+
| | +------+------+-----------+-----+-----+-----+-----+
| b[0]+--------------> | | | | | | | |
| | +------+------+-----------+-----+-----+-----+-----+
+-----+
| | +------+------+-----------+-----+-----+-----+-----+
| b[1]+--------------> | | |....... | | | | |
| | +------+------+-----------+-----+-----+-----+-----+
+-----+
| | +------+------+-----------+-----+-----+-----+-----+
| b[2]+--------------> | | | ...... | | | | |
| | +------+------+-----------+-----+-----+-----+-----+
+-----+
| | +------+------+-----------+-----+-----+-----+-----+
| b[3]+--------------> | | | ...... | | | | |
| | +------+------+-----------+-----+-----+-----+-----+
+-----+
| | +------+------+-----------+-----+-----+-----+-----+
| b[4]+--------------> | | | ...... | | | | |
| | +------+------+-----------+-----+-----+-----+-----+
+-----+
b will point to b[0], after decay, and b + 1 will give the address of b[1]. Size of pointer on your machine is 8 bytes, therefore you are getting a difference of 8 in the address.
Beside of this
Do not cast return value of malloc
b[loop]=malloc(10*sizeof(int));
and use %p for pointer data type
printf("b=%p \n",(void *)b);
printf("(b+1)=%p \n",(void *)(b+1));
printf("(b+2)=%p \n",(void *)(b+2));
What you've declared is not technically a two dimensional array but an array of pointers to int, each of which points to an array of int. The reason array[0] and array[1] are 8 bytes apart is because you have an array of pointers, and pointers on your system are 8 bytes.
When you allocate each individual 1 dimensional array, they don't necessarily exist next to each other in memory. If on the other hand you declared int b[5][10], you would have 10 * 5 = 50 contiguous integers arranged in 5 rows of 10.
#include <stdio.h>
int main()
{
int a [2][3][2]={{{1,2},{3,4},{5,6}},{{5,8},{9,10},{11,12}}};
printf("%d\n%d\n%d\n",a[1]-a[0],a[1][0]-a[0][0],a[1][0][0]-a[0][0][0]);
return 0;
}
The output is 3 6 4. Can anyone explain to me the reason for this? How come a[1]-a[0]=3 and a[1][0]-a[0][0]=6 and how a[] and a[][] interprets in a 3-dimensional array?
It might help if you understand how an array like yours is laid out in memory:
+------------+ Low address +---------+ Low address +------+
| a[0][0][0] | | a[0][0] | | a[0] |
| a[0][0][1] | | | | |
| a[0][1][0] | | a[0][1] | | |
| a[0][1][1] | | | | |
| a[0][2][0] | | a[0][2] | | |
| a[0][2][1] | | | | |
| a[1][0][0] | | a[1][0] | | a[1] |
| a[1][0][1] | | | | |
| a[1][1][0] | | a[1][1] | | |
| a[1][1][1] | | | | |
| a[1][2][0] | | a[1][2] | | |
| a[1][2][1] | | | | |
+------------+ High address +---------+ High address +------+
Then it helps to know that the difference you get is in multiples of the type. So for a[0] and a[1] the type is int[3][2] and there are three of those multiples between a[0] and a[1].
Same for a[0][0] and a[1][0], the type is int[2], and the difference is six int[2] units between a[0][0] and a[1][0].
To elaborate a little: Between a[0] and a[1] you have a[0][0], a[0][1] and a[0][2]. Three entries.
Between a[0][0] and a[1][0] you have a[0][0][0], a[0][0][1], a[0][1][0], a[0][1][1], a[0][2][0] anda[0][2][1]. Six entries.
At the point of address, a[1] and a[1][0] are the same value. And a[0] and a[0][0] are same value.
But the types are different.
a[1][0] and a[0][0] are int *, from a[0][0] to a[1][0], there are 6 int.
And from a[1] to a[0], there are 3 {x, y}.
a[1][0][0] and a[0][0][0] are int, a[1][0][0]-a[0][0][0] = 5 - 1 = 4.
In C, a multi-dimensional array is conceptually an array whose elements are also arrays. So if you do:
int array[2][3]; Conceptually you end up with:
array[0] => [0, 1, 2]
array[1] => [0, 1, 2]
int array[2][3][2]; ...will give you a structure like:
array[0] => [0] => [1, 2]
[1] => [3, 4]
[2] => [5, 6]
array[1] => [0] => [5, 8]
[1] => [9, 10]
[2] => [11, 12]
a[1]-a[0] => will give difference you get is type of unit. a[0] and a[1] is int and there are three unit between them.similarly for the second part
a[1][0]-a[0][0]=6
number of combination for between a[0][0] and a[1][0] is 6.