In this code:
#include<stdio.h>
int main()
{
int num[2] = {20, 30};
printf("%d", num);
printf("%d", &num[0]);
return 0;
}
As far as I know, both the printf statement will print the address of the first element in num because in the first statement, num is a pointer to an int.
But if num is a pointer, then it should also have any address but on printing its address (with printf("%d", &num)), it's showing the address of the first element.
In a 2-D array the whole thing becomes confusing too:
#include<stdio.h>
int main(void)
{
int num[ ] [2]={20,30,40,50};
printf("%d",*num);
return 0;
}
This program is printing the address of zeroth element that is the address of num[0][0]. But why does it do this? Why isn't it printing the value stored in it, since they all have same address(num,num[0] and num[0][0])?
First things first; array variables are not pointers; they do not store an address to anything.
For a declaration such as
T a[N];
memory will be laid out as
+---+
a[0]: | |
+---+
a[1]: | |
+---+
...
+---+
a[N-1]: | |
+---+
For a 2D MxN array, it will look like
+---+
a[0][0]: | |
+---+
a[0][1]: | |
+---+
...
+---+
a[0][N-1]: | |
+---+
a[1][0]: | |
+---+
a[1][1]: | |
+---+
...
+---+
a[M-1][N-1]: | |
+---+
The pattern should be obvious for 3D and higher arrays.
As you can see, no storage is set aside for a separate variable a that contains the address of the first element; instead, there is a rule in the C language that an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array, except when the array expression is one of the following:
an operand of the sizeof operator
an operand of the unary & operator
an operand of the _Alignof operator (C99 and later)
a string literal used to initialize an array in a declaration
So given the declaration
T a[N];
all of the following are true:
Expression Type Decays to Value
---------- ---- --------- -----
a T [N] T * address of first element, &a[0]
*a T n/a value stored in first element
&a T (*)[N] n/a address of the array, which is
the same as the address of the
first element of the array
a[i] T n/a value stored in the i'th element
&a[i] T * n/a address of the i'th element
sizeof a size_t n/a total number of bytes used by the
array
sizeof *a size_t n/a total number of bytes used by the
first element of the array
sizeof &a size_t n/a total number of bytes used by a
pointer to the array
The expression a has type "N-element array of T"; it is not the operand of the unary & or sizeof operators, so it is converted to a pointer to the first element of the array, amd its value is the address of that element.
The expression &a has type "pointer to N-element array of T"; since a is an operand of the unary & operator, the conversion rule above isn't applied (which is why the expression has type T (*)[N] instead of T **). However, since the address of the array is the same as the address of the first element of the array, it yields the same value as the expression a.
The expression &a[0] has type "pointer to T", and explicitly points to the first element of the array. Again, this value will be the same as the previous two expressions.
For a 2D array
T a[M][N];
all of the following are true:
Expression Type Decays to Value
---------- ---- --------- -----
a T [M][N] T (*)[N] address of first subarray, a[0]
*a T [N] T * address pf first subarray, a[0]
&a T (*)[M][N] n/a address of the array, which is
the same as the address of the
first subarray, which is the same
as the address of the first element
of the first subarray.
a[i] T [N] T * address of first element of i'th
subarray
*a[i] T n/a value of first element of i'th subarray
&a[i] T (*)[N] n/a address of the i'th subarray
sizeof a size_t n/a total number of bytes used by the
array
sizeof *a size_t n/a total number of bytes used by the
first subarray
sizeof &a size_t n/a total number of bytes used by a
pointer to the array
Final note: to print out pointer values, use the %p conversion specifier and cast the argument to (void *) (this is the pretty much the only time it's considered proper to explicitly cast a pointer to void *):
printf( " &a yields %p\n", (void *) &a );
printf( " a yields %p\n", (void *) a );
printf( "&a[0] yields %p\n", (void *) &a[0] );
Edit
To answer a question in the comments:
num,num[] and num[][] are all different thing. There types are different.Here num decays and became pointer to a pointer and num[] decays and became pointer to int and num[][] is a int. Right?
Not quite.
Assuming a declaration like
int arr[10][10];
then the expression arr will decay to type int (*)[10] (pointer to 10-element array of int), not int **; refer to the table above again. Otherwise you're right; arr[i] will decay to type int *, and arr[i][j] will have type int.
An expression of type "N-element array of T" decays to type "pointer to T"; if T is an array type, then the result is "pointer to array", not "pointer to pointer".
In the second example, num is a 2 dimensional array, or say an array of array. It's true that *num is its first element, but this first element is an array itself.
To get num[0][0], you need **num.
printf("%d\n", **num);
Look how an array looks like:
int num[ ] [2]={20,30,40,50};
is better written as
int num[][2]={{20,30},{40,50}};
It is an array with 2 elements. Those 2 elements are, again, arrays with 2 ints.
In memory, they look like
20 30 40 50
but the difference is that num refers to the whole array, num[0] to the first "part- array" and num[0][0] to the first element of the first array.
They have the same address (because they start at the same place), but they have a different type.
That is, the address is not the only important thing with a pointer, the type is important as well.
Arrays are not pointers actually, though they tend to act in a bit similar way, but not always.
Say you have this array and a pointer:
int a[] = {1, 2, 3};
int i = 19;
int *ptr = &i;
Now here a is equal to &a, but the same is not true, for pointers (ptr is not equal to &ptr).
Now coming to the question:
Consider a single dimensional array:
int arr[] = {11, 19, 5, 9};
Here, this array elements are stored in contiguous memory locations. Say, with starting address 0:
---------------------
| 11 | 19 | 5 | 9 |
---------------------
0 4 8 12 16
Now when you write name of the array, arr (for this example), you will get the starting address of the 1st element. Though if you write &arr, then you get the starting address of the whole block(this includes all the elements of the array). Now when you write *arr, you actually get the value inside the 1st element of this array.
Now consider this 2-dimensional array arr[][4] = {{11, 19, 5, 9}, {5, 9, 11, 19}}:
0 4 8 12 16 -> These are memory addresses
---------------------
| 11 | 19 | 5 | 9 | ----> These values represent the values inside each index
---------------------
| 5 | 9 | 11 | 19 |
---------------------
16 20 24 28 32
Here, when you write the name of the array, as arr, what you get is the address of the 1st element of this array, which in this case will be address of this 0th index:
0 16 32
----------------------------------------------
| 0<sup>th</sup> index | 1<sup>st</sup> index |
----------------------------------------------
Now when you do &arr, here what you get is the base address for whole of the block, i.e. base address of this:
0 4 8 12 16
---------------------
| 11 | 19 | 5 | 9 |
---------------------
| 5 | 9 | 11 | 19 |
---------------------
16 20 24 28 32
Now, if you do *arr, in 1-dimensional array it gives you the value inside the 1st element, though in 2-dimensional array, the value inside each index is actually one 1-dimensional array, hence you will get the address of this array:
0 4 8 12 16
---------------------
| 11 | 19 | 5 | 9 |
---------------------
Now if you do **arr, that is when you will actually get the value inside the 1st element, which is 11.
I hope it clears some doubts :-)
EDIT 1:
As brought to my attendtion, by fellow user, it seems there is a bit of a confusion somewhere, though I have explained in detail what is meant by what thingy. But just to justify, for this statement:
Now here __a is equal to &a__, but the same is not true, for pointers (__ptr is not equal to &ptr__).
The types of both a and &a will be different, as already stated, in the answer. If one performs pointer arithmetics, one will able to know that. Try performing a + 1 and &a + 1, how they both react to pointer arithmetics will surely give a good idea.
Considering a 1-dimensional array:
int arr[] = {11, 19, 5, 9};
---------------------
| 11 | 19 | 5 | 9 |
---------------------
0 4 8 12 16
We cannot do a++, though for a pointer:
int i = 4;
int *ptr = &i;
we can perform ptr++, this will make ptr point to the next memory location.
I think it result means that the array not really a pointer, but it is converted to a pointer in some contexts that is expected a pointer, like pass to a function that expect a pointer argument.
see this code:
void test(int* num) {
printf("test\n");
printf("%p\n",num);
printf("%p\n",&num);
printf("%p\n",&num[0]);
}
int main(){
int num[2]={20,30};
test(num);
printf("main\n");
printf("%p\n",num);
printf("%p\n",&num);
printf("%p\n",&num[0]);
//other();
return 0;
}
The output is:
test
0x7fff7a422300
0x7fff7a4222e8 //LOOK THIS! Is diferent from main!
0x7fff7a422300
main
0x7fff7a422300
0x7fff7a422300
0x7fff7a422300
Related
Say I have the following code:
// x = whatever number
int *arr_of_ptr[x];
int (*ptr_to_arr)[x]
int **p1 = arr_of_ptr;
int **p2 = ptr_to_arr;
My understanding of arr_of_ptr is that "dereferencing an element of arr_of_ptr results in an int" - therefore the elements of arr_of_ptr are pointers to integers. On the other hand, dereferencing ptr_to_arr results in an array that I can then nab integers from, hence ptr_to_arr points to an array.
I also have a rough understanding that arrays themselves are pointers, and that arr[p] evaluates to (arr + p * sizeof(data_type_of_arr)) where the name arr decays to the pointer to the first element of arr.
So that's all well and good, but is there any way for me to tell whether p1 and p2 are pointers to arrays or arrays of pointers without prior information?
My confusion mostly stems from the fact that (I think) we can evaluate int **p two ways:
*(p + n * size) is what's giving me an int
(*p + n * size) is what's giving me an int
In hindsight this question might be poorly worded because I'm confusing myself a bit just looking back on it, but I really don't know how to articulate myself better. Sorry.
The main difference is that this is legal:
int **p1 = arr_of_ptr;
While this is not:
int **p2 = ptr_to_arr;
Because arr_of_ptr is an array, it can (in most contexts) decay to a pointer to its first element. So because the elements of arr_of_ptr are of type int *, a pointer to an element has type int ** so you can assign it to p1.
ptr_to_arr however is not an array but a pointer, so there's no decaying happening. You're attempting to assign an expression of type int (*)[x] to an expression of type int **. Those types are incompatible, and if you attempt to use p2 you won't get what you expect.
First,
I also have a rough understanding that arrays themselves are pointers, and that arr[p] evaluates to (arr + p * sizeof(data_type_of_arr)) where the name arr decays to the pointer to the first element of arr.
This isn't strictly correct. Arrays are not pointers. Under most circumstances, expressions of array type will be converted ("decay") to expressions of pointer type and the value of the expression will be the address of the first element of the array. That pointer value is computed as necessary and isn't stored anywhere.
Exceptions to the decay rule occur when the array expression is the operand of the sizeof, _Alignof, or unary & operators, or is a string literal used to initialize a character array in a declaration.
Having said all that, ptr_to_arr has pointer type, not array type - it will not "decay" to int **.
Given the declaration
T arr[N];
the following are true:
Expression Type Decays to Equivalent expression
---------- ---- --------- ---------------------
arr T [N] T * &arr[0]
*arr T n/a arr[0]
arr[i] T n/a n/a
&arr T (*)[N] n/a n/a
The expressions arr, &arr[0], and &arr all yield the same value (modulo any differences in representation between types). arr and &arr[0] have the same type, "pointer to T" (T *), while &arr has type "pointer to N-element array of T" (T (*)[N]).
If you replace T with pointer type P *, such that the declaration is now
P *arr[N];
you get the following:
Expression Type Decays to Equivalent expression
---------- ---- --------- ---------------------
arr P *[N] P ** &arr[0]
*arr P * n/a arr[0]
arr[i] P * n/a n/a
&arr P *(*)[N] n/a n/a
So given your declarations, it would be more correct to write something like this:
int arr[x];
int *p1 = arr; // the expression arr "decays" to int *
int *arr_of_ptr[x];
int **p2 = arr_of_ptr; // the expression arr_of_ptr "decays" to int **
/**
* In the following declarations, the array expressions are operands
* of the unary & operator, so the decay rule doesn't apply.
*/
int (*ptr_to_arr)[x] = &arr;
int *(*ptr_to_arr_of_ptr)[x] = &arr_of_ptr;
Again, ptr_to_arr and ptr_to_arr_of_ptr are pointers, not arrays, and do not decay to a different pointer type.
EDIT
From the comments:
Can I just hand-wavily explain it as: an array of pointers has a name that can decay to a pointer,
Yeah, -ish, just be aware that it is hand-wavey and not really accurate (which is shown by example below). If you are a first-year student, your institution isn't doing you any favors by making you deal with C this early. While it is the substrate upon which most of the modern computing ecosystem is built, it is an awful teaching language. Awful. Yes, it's a small language, but aspects of it are deeply unintuitive and confusing, and the interplay between arrays and pointers is one of those aspects.
an array of pointers has a name that can decay to a pointer, but a pointer to an array, even when dereferenced, does not give a give me something that decays to a pointer?
Actually...
If ptr_to_arr has type int (*)[x], then the expression *ptr_to_arr would have type int [x], which would decay to int *. The expression *ptr_to_arr_of_ptr would have type int *[x], which would decay to int **. This is why I keep using the term "expression of array type" when talking about the decay rule, rather than just the name of the array.
Something I have left out of my explanations until now - why do array expressions decay to pointers? What's the reason for this incredibly confusing behavior?
C didn't spring fully-formed from the brain of Dennis Ritchie - it was derived from an earlier language named B (which was derived from BCPL, which was derived from CPL, etc.)1. B was a "typeless" language, where data was simply a sequence of words or "cells". Memory was modeled as a linear array of "cells". When you declared an N-element array in B, such as
auto arr[N];
the compiler would set aside all the cells necessary for the array elements, plus an extra cell that would store the numerical offset (basically, a pointer) to the first element of the array, and that cell would be bound to the variable arr:
+---+
arr: | +-+-----------+
+---+ |
... |
+---+ |
| | arr[0] <--+
+---+
| | arr[1]
+---+
...
+---+
| | arr[N-1]
+---+
To index into the array, you'd offset i cells from the location stored in arr and dereference the result. IOW, a[i] was exactly equivalent to *(a + i).
When Ritchie was developing the C language, he wanted to keep B's array semantics (a[i] is still exactly equivalent to *(a + i)), but for various reasons he didn't want to store that pointer to the first element. So, he got rid of it entirely. Now, when you declare an array in C, such as
int arr[N];
the only storage set aside is for the array elements themselves:
+---+
| | arr[0]
+---+
| | arr[1]
+---+
...
+---+
| | arr[N-1]
+---+
There is no separate object arr which stores a pointer to the first element (which is part of why array expressions cannot be the target of an assignment - there's nothing to assign to). Instead, that pointer value is computed as necessary when you need to subscript into the array.
This same principal holds for multi-dimensional arrays as well. Assume the following:
int a[2][2] = { { 1, 2 }, { 3, 4 } };
What you get in memory is the following:
Viewed as int Viewed as int [2]
+---+ +---+
a: | 1 | a[0][0] a:| 1 | a[0]
+---+ + - +
| 2 | a[0][1] | 2 |
+---+ +---+
| 3 | a[1][0] | 3 | a[1]
+---+ + - +
| 4 | a[1][1] | 4 |
+---+ +---+
On the left we view it as a sequence of int, while on the right we view it as a sequence of int [2].
Each a[i] has type int [2], which decays to int *. The expression a itself decays from type int [2][2] to int (*)[2] (not int **).
The expression a[i][j] is exactly equivalent to *(a[i] + j), which is equivalent to *( *(a + i) + j ).
As detailed in The Development of the C Language
#include <stdio.h>
int main(void) {
// your code goes here
int arr[] = {1,2,3};
int *p1 = &arr[0];
int *p2 = &arr[1];
int *p3 = &arr[2];
int* arr2[3];
arr2[0] = p1;
arr2[1] = p2;
arr2[2] = p3;
int *p4 = &arr;
printf("%d\n", sizeof(p4));
printf("%d\n", sizeof(arr2));
printf("%d\n", *p4); // not **p4
printf("%d\n", **arr2);
return 0;
}
In the above code arr is a normal integer array with 3 elements.
p1, p2, and p3 are normal pointers to these elements.
arr2 is an array of pointers storing p1, p2, and p3.
p4 is a pointer to array pointing to array arr
According to your question, you need to differentiate between p4 and arr2
Since, p4 is a pointer, its size is fixed (8 bytes) while size of arr2 vaires on how many elements it contains (8x3=24).
Also, to print value contained in p4 use use single dereferencing (*p4) not **p4 (illegal), while to print value contained in arr2 use use double dereferencing (**arr2).
The output of above code is :
8
24
1
1
I am seeking for an explanation regarding how incrementing an address affects a pointer.
I learned about how C pointers work and how incrementing a pointer is done by considering the pointer type. still I don't understand the following case
int main()
{
int a[] = {1,2,3,4,5};
int *p = (int*)(&a+1);
printf("%d\n%d\n", *(a+1), *(p-1));
return 0;
}
I expected this line
int *p = (int*)(&a+1);
to make p point to the address that follows array a, therefore I expected the output:
2
as it is simply a[1]
But the output was unknown_number - as I don't know which int is 4 bytes behind (&a+1)
And the actual result is:
2
5
Why does it seems that p points directly to the memory sitting after a?
What is the source for my confusion?
So in this example &a is of type int(*)[5]. When you add 1 to it it actually adds sizeof(int[5]) - because that is how pointer arithmetic works, adding an offset adds the size of the type being pointed to times the offset. That is how you get p to be one past last element of a, after which you cast it to int* so now you have a pointer pointing to an integer at an address one past the last element of a. So effectively, subtracting 1 from it gives you the last element of a.
Two basic concepts:
Except when it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, a expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
Adding 1 to an expression of type "pointer to T" yields the address of the object of type T immediately following the current object. IOW, if p points to a 4-byte int, p+1 points to the int immediately following it. If p points to a 5-element array of int, then p+1 points to the next 5-element array of int immediately following it. This is how array indexing works - the subscript operation a[i] is defined as *(a + i). Given a starting address a (either a pointer expression or an array expression that decays to a pointer), find the address of the i’th object following that address and dereference the result.
So, if you have the declaration
int a[] = {1, 2, 3, 4, 5};
then the following are true:
the expression a has type "5-element array of int" (int [5]) - if the expression is not the operand of the sizeof or unary & operators, it "decays" to type "pointer to int" (int *) and its value is the address of the first element of the array (&a[0]).
the expression *(a + 1) is identical to a[1], and evaluates to the second object in the array (2).
the expression &a + 1 has type int (*)[5] and yields the starting address of the 5-element array of int after a. The type of this expression is converted to int * and assigned to p.
the expression p has type int * - subtracting 1 from this yields the address of the int object immediately preceding p, which happens to be the last element of a.
Graphically:
+–––+
a: | 1 |
+–––+
| 2 | <–– a + 1
+–––+
| 3 |
+–––+
| 4 |
+–––+
| 5 | <–– p - 1
+–––+
| ? | <–– p (&a + 1)
+–––+
You can manipulate the array a as a pointer to a int int *. But it is not the same for &a, which is a pointer to an array of 5 ints : &a + 1 will add the size of the 5 ints to the pointer.
Just remove the & before adding 1 to a, and it'll work as you expected:
#include <stdio.h>
int main()
{
int a[] = {1,2,3,4,5};
int *p = (int*)(a+1); // & removed
printf("%d %d\n", *(a+1), *(p-1));
return 0;
}
Going through some C interview questions, I've found a question stating "How to find the size of an array in C without using the sizeof operator?", with the following solution. It works, but I cannot understand why.
#include <stdio.h>
int main() {
int a[] = {100, 200, 300, 400, 500};
int size = 0;
size = *(&a + 1) - a;
printf("%d\n", size);
return 0;
}
As expected, it returns 5.
edit: people pointed out this answer, but the syntax does differ a bit, i.e. the indexing method
size = (&arr)[1] - arr;
so I believe both questions are valid and have a slightly different approach to the problem. Thank you all for the immense help and thorough explanation!
When you add 1 to a pointer, the result is the location of the next object in a sequence of objects of the pointed-to type (i.e., an array). If p points to an int object, then p + 1 will point to the next int in a sequence. If p points to a 5-element array of int (in this case, the expression &a), then p + 1 will point to the next 5-element array of int in a sequence.
Subtracting two pointers (provided they both point into the same array object, or one is pointing one past the last element of the array) yields the number of objects (array elements) between those two pointers.
The expression &a yields the address of a, and has the type int (*)[5] (pointer to 5-element array of int). The expression &a + 1 yields the address of the next 5-element array of int following a, and also has the type int (*)[5]. The expression *(&a + 1) dereferences the result of &a + 1, such that it yields the address of the first int following the last element of a, and has type int [5], which in this context "decays" to an expression of type int *.
Similarly, the expression a "decays" to a pointer to the first element of the array and has type int *.
A picture may help:
int [5] int (*)[5] int int *
+---+ +---+
| | <- &a | | <- a
| - | +---+
| | | | <- a + 1
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
| | <- &a + 1 | | <- *(&a + 1)
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
This is two views of the same storage - on the left, we're viewing it as a sequence of 5-element arrays of int, while on the right, we're viewing it as a sequence of int. I also show the various expressions and their types.
Be aware, the expression *(&a + 1) results in undefined behavior:
...
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
C 2011 Online Draft, 6.5.6/9
This line is of most importance:
size = *(&a + 1) - a;
As you can see, it first takes the address of a and adds one to it. Then, it dereferences that pointer and subtracts the original value of a from it.
Pointer arithmetic in C causes this to return the number of elements in the array, or 5. Adding one and &a is a pointer to the next array of 5 ints after a. After that, this code dereferences the resulting pointer and subtracts a (an array type that has decayed to a pointer) from that, giving the number of elements in the array.
Details on how pointer arithmetic works:
Say you have a pointer xyz that points to an int type and contains the value (int *)160. When you subtract any number from xyz, C specifies that the actual amount subtracted from xyz is that number times the size of the type that it points to. For example, if you subtracted 5 from xyz, the value of xyz resulting would be xyz - (sizeof(*xyz) * 5) if pointer arithmetic didn't apply.
As a is an array of 5 int types, the resulting value will be 5. However, this will not work with a pointer, only with an array. If you try this with a pointer, the result will always be 1.
Here's a little example that shows the addresses and how this is undefined. The the left-hand side shows the addresses:
a + 0 | [a[0]] | &a points to this
a + 1 | [a[1]]
a + 2 | [a[2]]
a + 3 | [a[3]]
a + 4 | [a[4]] | end of array
a + 5 | [a[5]] | &a+1 points to this; accessing past array when dereferenced
This means that the code is subtracting a from &a[5] (or a+5), giving 5.
Note that this is undefined behavior, and should not be used under any circumstances. Do not expect the behavior of this to be consistent across all platforms, and do not use it in production programs.
Hmm, I suspect this is something that would not have worked back in the early days of C. It is clever though.
Taking the steps one at a time:
&a gets a pointer to an object of type int[5]
+1 gets the next such object assuming there is an array of those
* effectively converts that address into type pointer to int
-a subtracts the two int pointers, returning the count of int instances between them.
I'm not sure it is completely legal (in this I mean language-lawyer legal - not will it work in practice), given some of the type operations going on. For example you are only "allowed" to subtract two pointers when they point to elements in the same array. *(&a+1) was synthesised by accessing another array, albeit a parent array, so is not actually a pointer into the same array as a.
Also, while you are allowed to synthesise a pointer past the last element of an array, and you can treat any object as an array of 1 element, the operation of dereferencing (*) is not "allowed" on this synthesised pointer, even though it has no behaviour in this case!
I suspect that in the early days of C (K&R syntax, anyone?), an array decayed into a pointer much more quickly, so the *(&a+1) might only return the address of the next pointer of type int**. The more rigorous definitions of modern C++ definitely allow the pointer to array type to exist and know the array size, and probably the C standards have followed suit. All C function code only takes pointers as arguments, so the technical visible difference is minimal. But I am only guessing here.
This sort of detailed legality question usually applies to a C interpreter, or a lint type tool, rather than the compiled code. An interpretter might implement a 2D array as an array of pointers to arrays, because there is one less runtime feature to implement, in which case dereferencing the +1 would be fatal, and even if it worked would give the wrong answer.
Another possible weakness may be that the C compiler might align the outer array. Imagine if this was an array of 5 chars (char arr[5]), when the program performs &a+1 it is invoking "array of array" behaviour. The compiler might decide that an array of array of 5 chars (char arr[][5]) is actually generated as an array of array of 8 chars (char arr[][8]), so that the outer array aligns nicely. The code we are discussing would now report the array size as 8, not 5. I'm not saying a particular compiler would definitely do this, but it might.
1) #include <stdio.h>
int main()
{
int a[5] = {1,2,3,4,5};
int *ptr = (int*)(&a+1);
printf("%d %d", *(a+1), *(ptr-1));
return 0;
}
the output is 2 5. &a means the address of a[0] so &a+1 should be the address of a[1]. So ptr should hold the address of a[1]. *(a+1) will be 2 but *(ptr-1) should also be 2. I can't understand how is it printing 5.
This expression is the important thing: &a+1. That is actually (&a)+1 which is equal to (&a)[1] which will be a pointer to one element past the end of the array.
If we look at it more "graphically" it looks like this, with relevant pointers added:
+------+------+------+------+------+
| a[0] | a[1] | a[2] | a[3] | a[4] |
+------+------+------+------+------+
^ ^ ^
| | |
| &a[1] (equal to *(a + 1)) |
| |
&a[0] (equal to a) |
| |
&a &a+1
First of all, the type of &a is int (*)[5], so your cast to int * will break strict aliasing (which leads to undefined behavior).
Second of all, since ptr is pointing, effectively, to what would be a[5] then ptr - 1 will point to a[4].
&a is not the address of a[0] but the address of a. The values may be the same but the types are different. That is important when it comes to pointer arithmetic.
In the expression &a + 1, you first have &a which has type int (*)[5], i.e. a pointer to an array of size 5. When you add 1 to that it actually adds sizeof(a) bytes to the pointer value. So &a + 1 actually points to one byte past the end of the array. You then cast this expression from int (*)[5] to int * and assign it to ptr.
When you then evaluate *(ptr - 1), the - operator subtracts 1 * sizeof(int) from the byte value of ptr so it now points to the last element of the array, i.e. 5, and that is what is printed.
&a gives the address of the array as an array pointer, int (*)[5]. It is a pointer type that points at the array as whole, so if you do pointer arithmetic with it, +1 will mean +sizeof(int[5]) which is not what you intended.
Correct code:
int *ptr = a+1;
Notably, the cast (int*) was hiding this bug. Don't use casts to silence compiler errors you don't understand!
Firstly, you said: &a means the address of a[0] so &a+1 should be the address of a[1] ? No you are wrong. &a means address of a not a[0]. And &a+1 means it increments by whole array size not just one elements size and a+1 means address of a[1].
Here
int a[5] = {1,2,3,4,5};
lets assume base address of a is 0x100
--------------------------------------
| 1 | 2 | 3 | 4 | 5 |
--------------------------------------
0x100 0x104 0x108 0x112 0x116 ..
LSB
|
a
When you are doing
int *ptr = (int*)(&a+1);
Where ptr points ? first (&a+1) performed and it got increments by whole array size i.e
(&a+1) == (0x100 + 1*20) /* &a+1 here it increments by array size */
== 0x120
So now ptr points to
--------------------------------------
| 1 | 2 | 3 | 4 | 5 |
--------------------------------------
0x100 0x104 0x108 0x112 0x116 0x120
a |
ptr points here
Now when you print like
printf("%d %d", *(a+1), *(ptr-1));
Here
*(a+1) == *(0x100 + 1*4) /* multiplied by 4 bcz of elements is of int type*/
== *(0x104) /* value at 0x104 location */
== 2 (it prints 2)
And
*(ptr-1) == *(0x120 - 1*4)
== *(0x116) /* prints value at 0x116 memory location */
== 5
Note :- Here
int *ptr = (int*)(&a+1);
type of &a is of int(*)[5] i.e pointer to an array of 5 elements but you are casting as of int* type, as pointed by #someprogrammerdude it breaks the strict aliasing and lead to undefined behavior.
Correct one is
int *ptr = a+1;
I know array in C does essentially behaves like a pointer except at some places like (sizeof()). Apart from that pointer and array variables dont differ except in their declaration.
For example consider the two declarations:
int arr[] = {11,22,33};
int *arrptr = arr;
Now here is how they behave same:
printf("%d %d", arrptr[0], arr[0]); //11 11
printf("%d %d", *arrptr, *arr); //11 11
But here is one more place I found they differ:
//the outputs will be different on your machine
printf("%d %d", &arrptr, &arr); //2686688 2686692 (obviously different address)
printf("%d %d", arrptr, arr); //2686692 2686692 (both same)
Here the issue is with last line. I understand that arrptr contains the address of arr. Thats why the first address printed in last line is 2686692. I also understand that logically the address (&arr) and contents (arr) of arr should be same unlike arrptr. But then whats exactly that which (internally at implementation level) that makes this happen?
When the unary & operator is applied to an array, it returns a pointer to an array. When applied to a pointer, it returns a pointer to a pointer. This operator together with sizeof represent the few contexts where arrays do not decay to pointers.
In other words, &arrptr is of type int **, whereas &arr is of type int (*)[3]. &arrptr is the address of the pointer itself and &arr is the beginning of the array (like arrptr).
The subtle part: arrptr and &arr have the same value (both point to the beginning of the array), but are of a different type. This difference will show if you do any pointer arithmetic to them – with arrptr the implied offset will be sizeof(int), whereas with &arr it will be sizeof(int) * 3.
Also, you should be using the %p format specifier to print pointers, after casting to void *.
I know array in C does essentially behaves like a pointer except at some places like (sizeof()). Apart from that pointer and array variables dont differ except in their declaration.
This is not quite true. Array expressions are treated as pointer expressions in most circumstances, but arrays and pointers are completely different animals.
When you declare an array as
T a[N];
it's laid out in memory as
+---+
a: | | a[0]
+---+
| | a[1]
+---+
| | a[2]
+---+
...
+---+
| | a[N-1]
+---+
One thing immediately becomes obvious - the address of the first element of the array is the same as the address of the array itself. Thus, &a[0] and &a will yield the same address value, although the types of the two expressions are different (T * vs. T (*)[N]), and the value may possibly adjusted based on type.
Here's where things get a little confusing - except when it is the operand of the sizeof or unary & operator, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
This means the expression a also yields the same address value as &a[0] and &a, and it has the same type as &a[0]. Putting this all together:
Expression Type Decays to Value
---------- ---- --------- -----
a T [N] T * Address of a[0]
&a T (*)[N] n/a Address of a
*a T n/a Value of a[0]
a[i] T n/a Value of a[i]
&a[i] T * n/a Address of a[i]
sizeof a size_t n/a Number of bytes in a
So why does this conversion rule exist in the first place?
C was derived from an earlier language called B (go figure). B was a typeless
language - everything was treated as basically an unsigned integer. Memory
was seen as a linear array of fixed-length "cells". When you declared an
array in B, an extra cell was set aside to store the offset to the first
element of the array:
+---+
a:| | ----+
+---+ |
... |
+-------+
|
V
+---+
| | a[0]
+---+
| | a[1]
+---+
...
+---+
| | a[N-1]
+---+
The array subscript operation a[i] was defined as *(a + i); that is, take the offset value stored in a, add i, and dereference the result.
When Ritchie was designing C, he wanted to keep B's array semantics, but couldn't figure out what to do with the explicit pointer to the first element, so he got rid of it. Thus, C keeps the array subscripting definition a[i] == *(a + i) (given the address a, offset i elements from that address and dereference the result), but doesn't set aside space for a separate pointer to the first element of the array - instead, it converts the array expression a to a pointer value.
This is why you see the same output when you print the values of arr and arrptr. Note that you should print out pointer values using the %p conversion specifier and cast the argument to void *:
printf( "arr = %p, arrptr = %p\n", (void *) arr, (void *) arrptr );
This is pretty much the only place you need to explicitly cast a pointer value to void * in C.