Different Pointer Arithmetic Results when Taking Address of Array - c

Program:
#include<stdio.h>
int main(void) {
int x[4];
printf("%p\n", x);
printf("%p\n", x + 1);
printf("%p\n", &x);
printf("%p\n", &x + 1);
}
Output:
$ ./a.out
0xbff93510
0xbff93514
0xbff93510
0xbff93520
$
I expect that the following is the output of the above program. For example:
x // 0x100
x+1 // 0x104 Because x is an integer array
&x // 0x100 Address of array
&x+1 // 0x104
But the output of the last statement is different from whast I expected. &x is also the address of the array. So incrementing 1 on this
will print the address incremented by 4. But &x+1 gives the address incremented by 10. Why?

x -> Points to the first element of the array.
&x ->Points to the entire array.
Stumbled upon a descriptive explanation here: http://arjunsreedharan.org/post/69303442896/the-difference-between-arr-and-arr-how-to-find
SO link: Why is arr and &arr the same?

In case 4 you get 0x100 + sizeof x and sizeof x is 4 * sizeof int = 4 * 4 = 16 = 0x10.
(On your system, sizeof int is 4).

An easy thumbrule to evaluate this is:
Any pointer on increment points to the next memory location of its base type.
The base type of &x here is int (*p)[4] which is a pointer to array of 4 integers.
So the next pointer of this type will point to 16 bytes away (assuming int to be 4 bytes) from the original array.

Even though x and &x evaluate to the same pointer value, they are different types. Type of x after it decays to a pointer is int* whereas type of &x is int (*)[4].
sizeof(x) is sizeof(int)*4.
Hence the numerical difference between &x and &x + 1 is sizeof(int)*4.
It can be better visualized using a 2D array. Let's say you have:
int array[2][4];
The memory layout for array is:
array
|
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
array[0] array[1]
| |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
If you use a pointer to such an array,
int (*ptr)[4] = array;
and look at the memory through the pointer, it looks like:
ptr ptr+1
| |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
As you can see, the difference between ptr and ptr+1 is sizeof(int)*4. That analogy applies to the difference between &x and &x + 1 in your code.

Believe it or not, the behaviour of your program is undefined!
&x + 1 is actually pointing to just beyond the array, as #i486's answer cleverly points out. You don't own that memory. Even attempting to assign a pointer to it is undefined behaviour, let alone attempting to dereference it.

Related

How can an address be same as a value?

Here is a sample code. I believe array is an address while *array is the value
int array[7][7];
array == *array
But I found out array same as *array. How is it?
Here
int arr[7][7];
arr is two dimensional array containing 7 one dimensional array & each one dimensional array contains 7 elements. It looks like
arr[0][0] arr[0][1] arr[0][2] arr[0][3] arr[0][4] arr[0][5] arr[0][6] arr[0][7]
| | | | | | | | | ....|
---------------------------------------------------------------------- -----
(0x100) | |
arr[0] (0x100) arr[1] ... arr[6]
| |
-----------------------------------------------
|
arr(0x100) -assume base address is 0x100
arr, arr[0] and address of arr[0][0] all results in same i.e arr and *arr results in same address.
You can't have a value without a type.
123456789 just like that is nothing; you need to know if it's an integer, a double, a pointer, ...
So, I like to think of it as the pair (value, type).
And that value could be (123456789, int), or (123456789, double), or (123456789, char*), ... which are all different pairs (with the same value).
In your case you have (<address>, char(*)[7]) not the same as (<address>, char*)
An element of a multidimensional array is an array, which in many cases decays to a pointer to its first element.
So, in both cases you have a pointer to the same location in memory.
As of C, they aren't technically the same, because they point to differents objects, but because those objects have the same address, and because most implementations store all pointers in the same format, they happen to evaluate equal.

How does this piece of code determine array size without using sizeof( )?

Going through some C interview questions, I've found a question stating "How to find the size of an array in C without using the sizeof operator?", with the following solution. It works, but I cannot understand why.
#include <stdio.h>
int main() {
int a[] = {100, 200, 300, 400, 500};
int size = 0;
size = *(&a + 1) - a;
printf("%d\n", size);
return 0;
}
As expected, it returns 5.
edit: people pointed out this answer, but the syntax does differ a bit, i.e. the indexing method
size = (&arr)[1] - arr;
so I believe both questions are valid and have a slightly different approach to the problem. Thank you all for the immense help and thorough explanation!
When you add 1 to a pointer, the result is the location of the next object in a sequence of objects of the pointed-to type (i.e., an array). If p points to an int object, then p + 1 will point to the next int in a sequence. If p points to a 5-element array of int (in this case, the expression &a), then p + 1 will point to the next 5-element array of int in a sequence.
Subtracting two pointers (provided they both point into the same array object, or one is pointing one past the last element of the array) yields the number of objects (array elements) between those two pointers.
The expression &a yields the address of a, and has the type int (*)[5] (pointer to 5-element array of int). The expression &a + 1 yields the address of the next 5-element array of int following a, and also has the type int (*)[5]. The expression *(&a + 1) dereferences the result of &a + 1, such that it yields the address of the first int following the last element of a, and has type int [5], which in this context "decays" to an expression of type int *.
Similarly, the expression a "decays" to a pointer to the first element of the array and has type int *.
A picture may help:
int [5] int (*)[5] int int *
+---+ +---+
| | <- &a | | <- a
| - | +---+
| | | | <- a + 1
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
| | <- &a + 1 | | <- *(&a + 1)
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
This is two views of the same storage - on the left, we're viewing it as a sequence of 5-element arrays of int, while on the right, we're viewing it as a sequence of int. I also show the various expressions and their types.
Be aware, the expression *(&a + 1) results in undefined behavior:
...
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
C 2011 Online Draft, 6.5.6/9
This line is of most importance:
size = *(&a + 1) - a;
As you can see, it first takes the address of a and adds one to it. Then, it dereferences that pointer and subtracts the original value of a from it.
Pointer arithmetic in C causes this to return the number of elements in the array, or 5. Adding one and &a is a pointer to the next array of 5 ints after a. After that, this code dereferences the resulting pointer and subtracts a (an array type that has decayed to a pointer) from that, giving the number of elements in the array.
Details on how pointer arithmetic works:
Say you have a pointer xyz that points to an int type and contains the value (int *)160. When you subtract any number from xyz, C specifies that the actual amount subtracted from xyz is that number times the size of the type that it points to. For example, if you subtracted 5 from xyz, the value of xyz resulting would be xyz - (sizeof(*xyz) * 5) if pointer arithmetic didn't apply.
As a is an array of 5 int types, the resulting value will be 5. However, this will not work with a pointer, only with an array. If you try this with a pointer, the result will always be 1.
Here's a little example that shows the addresses and how this is undefined. The the left-hand side shows the addresses:
a + 0 | [a[0]] | &a points to this
a + 1 | [a[1]]
a + 2 | [a[2]]
a + 3 | [a[3]]
a + 4 | [a[4]] | end of array
a + 5 | [a[5]] | &a+1 points to this; accessing past array when dereferenced
This means that the code is subtracting a from &a[5] (or a+5), giving 5.
Note that this is undefined behavior, and should not be used under any circumstances. Do not expect the behavior of this to be consistent across all platforms, and do not use it in production programs.
Hmm, I suspect this is something that would not have worked back in the early days of C. It is clever though.
Taking the steps one at a time:
&a gets a pointer to an object of type int[5]
+1 gets the next such object assuming there is an array of those
* effectively converts that address into type pointer to int
-a subtracts the two int pointers, returning the count of int instances between them.
I'm not sure it is completely legal (in this I mean language-lawyer legal - not will it work in practice), given some of the type operations going on. For example you are only "allowed" to subtract two pointers when they point to elements in the same array. *(&a+1) was synthesised by accessing another array, albeit a parent array, so is not actually a pointer into the same array as a.
Also, while you are allowed to synthesise a pointer past the last element of an array, and you can treat any object as an array of 1 element, the operation of dereferencing (*) is not "allowed" on this synthesised pointer, even though it has no behaviour in this case!
I suspect that in the early days of C (K&R syntax, anyone?), an array decayed into a pointer much more quickly, so the *(&a+1) might only return the address of the next pointer of type int**. The more rigorous definitions of modern C++ definitely allow the pointer to array type to exist and know the array size, and probably the C standards have followed suit. All C function code only takes pointers as arguments, so the technical visible difference is minimal. But I am only guessing here.
This sort of detailed legality question usually applies to a C interpreter, or a lint type tool, rather than the compiled code. An interpretter might implement a 2D array as an array of pointers to arrays, because there is one less runtime feature to implement, in which case dereferencing the +1 would be fatal, and even if it worked would give the wrong answer.
Another possible weakness may be that the C compiler might align the outer array. Imagine if this was an array of 5 chars (char arr[5]), when the program performs &a+1 it is invoking "array of array" behaviour. The compiler might decide that an array of array of 5 chars (char arr[][5]) is actually generated as an array of array of 8 chars (char arr[][8]), so that the outer array aligns nicely. The code we are discussing would now report the array size as 8, not 5. I'm not saying a particular compiler would definitely do this, but it might.

Difference between `*&p` vs `&*p`?

int a=10;
int *p=&a;
now looking at &*p we first look at *p which is 10 and then at &10
which is the address of 10 or the address of a
In the case of *&p we first look at the address of p and then at the value in this address which is 10
But I understand that both *&p vs &*p are the same, why?
Lets draw your variables:
+---+ +---+
| p | --> | a |
+---+ +---+
That is, p is pointing to a.
Now if you do &*p then you first dereference p to get a, then you get the address of a, which leaves you with a pointer to a.
If we take *&p then you get the address of p to get a pointer to p, then you dereference that pointer to get p. Which is a pointer to a.
So while the expressions do different things, the end result is the same: A pointer to a.
And a descent compiler would probably just do nothing at all, since the dereference operator * and address-of operator & together will always cancel each other out, no matter in which order they are.
Considering below example
int a=10;
int *p=&a;
this
*&p
means here both * and & gets nullified and it result in p which is nothing but &a.
And this
&*p
means first dereference p which gives a and then reference & i.e address of a which is nothing but p, same as the first case.
By the clockwise / spiral rule:
For *&p:
+-----+
| +-+ |
| ^ | |
* & p ; |
^ ^ | |
| +---+ |
+-------+
We first take the address of p, which is at this point the address of the address of a.
Then we dereference that, which gives the address of a.
For &*p:
+-----+
| +-+ |
| ^ | |
& * p ; |
^ ^ | |
| +---+ |
+-------+
We first dereference p, which gives us a.
We then take the address of that, which gives us the address of a, just like before.
In this context, & takes the address (i.e., informally "adds *" to the type of the expression). Meanwhile * dereferences a pointer (i.e., "removes a *" from the type of the expression). Therefore:
int *p = …;
p; // int *
*p; // int
&*p; // int *
&p; // int **
*&p; // int *
So, yes, in this context the result is the same: a pointer to int, because the & and the * cancel out. However, this is also why the combinations are pointless: the result is the same as p by itself.
*&p == *(&p). &p is an pointer to the pointer or int. *(&p) is a value to which pointer of pointer points, which is value of p. To continue, **&p will print '10'.
&*p == &(*p) where *p is the value at which the pointer points (value of a). Now & is an address of a, which is p again. And to go further, *&*p will print value of a (10).

strange output issue in c

1) #include <stdio.h>
int main()
{
int a[5] = {1,2,3,4,5};
int *ptr = (int*)(&a+1);
printf("%d %d", *(a+1), *(ptr-1));
return 0;
}
the output is 2 5. &a means the address of a[0] so &a+1 should be the address of a[1]. So ptr should hold the address of a[1]. *(a+1) will be 2 but *(ptr-1) should also be 2. I can't understand how is it printing 5.
This expression is the important thing: &a+1. That is actually (&a)+1 which is equal to (&a)[1] which will be a pointer to one element past the end of the array.
If we look at it more "graphically" it looks like this, with relevant pointers added:
+------+------+------+------+------+
| a[0] | a[1] | a[2] | a[3] | a[4] |
+------+------+------+------+------+
^ ^ ^
| | |
| &a[1] (equal to *(a + 1)) |
| |
&a[0] (equal to a) |
| |
&a &a+1
First of all, the type of &a is int (*)[5], so your cast to int * will break strict aliasing (which leads to undefined behavior).
Second of all, since ptr is pointing, effectively, to what would be a[5] then ptr - 1 will point to a[4].
&a is not the address of a[0] but the address of a. The values may be the same but the types are different. That is important when it comes to pointer arithmetic.
In the expression &a + 1, you first have &a which has type int (*)[5], i.e. a pointer to an array of size 5. When you add 1 to that it actually adds sizeof(a) bytes to the pointer value. So &a + 1 actually points to one byte past the end of the array. You then cast this expression from int (*)[5] to int * and assign it to ptr.
When you then evaluate *(ptr - 1), the - operator subtracts 1 * sizeof(int) from the byte value of ptr so it now points to the last element of the array, i.e. 5, and that is what is printed.
&a gives the address of the array as an array pointer, int (*)[5]. It is a pointer type that points at the array as whole, so if you do pointer arithmetic with it, +1 will mean +sizeof(int[5]) which is not what you intended.
Correct code:
int *ptr = a+1;
Notably, the cast (int*) was hiding this bug. Don't use casts to silence compiler errors you don't understand!
Firstly, you said: &a means the address of a[0] so &a+1 should be the address of a[1] ? No you are wrong. &a means address of a not a[0]. And &a+1 means it increments by whole array size not just one elements size and a+1 means address of a[1].
Here
int a[5] = {1,2,3,4,5};
lets assume base address of a is 0x100
--------------------------------------
| 1 | 2 | 3 | 4 | 5 |
--------------------------------------
0x100 0x104 0x108 0x112 0x116 ..
LSB
|
a
When you are doing
int *ptr = (int*)(&a+1);
Where ptr points ? first (&a+1) performed and it got increments by whole array size i.e
(&a+1) == (0x100 + 1*20) /* &a+1 here it increments by array size */
== 0x120
So now ptr points to
--------------------------------------
| 1 | 2 | 3 | 4 | 5 |
--------------------------------------
0x100 0x104 0x108 0x112 0x116 0x120
a |
ptr points here
Now when you print like
printf("%d %d", *(a+1), *(ptr-1));
Here
*(a+1) == *(0x100 + 1*4) /* multiplied by 4 bcz of elements is of int type*/
== *(0x104) /* value at 0x104 location */
== 2 (it prints 2)
And
*(ptr-1) == *(0x120 - 1*4)
== *(0x116) /* prints value at 0x116 memory location */
== 5
Note :- Here
int *ptr = (int*)(&a+1);
type of &a is of int(*)[5] i.e pointer to an array of 5 elements but you are casting as of int* type, as pointed by #someprogrammerdude it breaks the strict aliasing and lead to undefined behavior.
Correct one is
int *ptr = a+1;

Why won't *num show the zeroth element value?

In this code:
#include<stdio.h>
int main()
{
int num[2] = {20, 30};
printf("%d", num);
printf("%d", &num[0]);
return 0;
}
As far as I know, both the printf statement will print the address of the first element in num because in the first statement, num is a pointer to an int.
But if num is a pointer, then it should also have any address but on printing its address (with printf("%d", &num)), it's showing the address of the first element.
In a 2-D array the whole thing becomes confusing too:
#include<stdio.h>
int main(void)
{
int num[ ] [2]={20,30,40,50};
printf("%d",*num);
return 0;
}
This program is printing the address of zeroth element that is the address of num[0][0]. But why does it do this? Why isn't it printing the value stored in it, since they all have same address(num,num[0] and num[0][0])?
First things first; array variables are not pointers; they do not store an address to anything.
For a declaration such as
T a[N];
memory will be laid out as
+---+
a[0]: | |
+---+
a[1]: | |
+---+
...
+---+
a[N-1]: | |
+---+
For a 2D MxN array, it will look like
+---+
a[0][0]: | |
+---+
a[0][1]: | |
+---+
...
+---+
a[0][N-1]: | |
+---+
a[1][0]: | |
+---+
a[1][1]: | |
+---+
...
+---+
a[M-1][N-1]: | |
+---+
The pattern should be obvious for 3D and higher arrays.
As you can see, no storage is set aside for a separate variable a that contains the address of the first element; instead, there is a rule in the C language that an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array, except when the array expression is one of the following:
an operand of the sizeof operator
an operand of the unary & operator
an operand of the _Alignof operator (C99 and later)
a string literal used to initialize an array in a declaration
So given the declaration
T a[N];
all of the following are true:
Expression Type Decays to Value
---------- ---- --------- -----
a T [N] T * address of first element, &a[0]
*a T n/a value stored in first element
&a T (*)[N] n/a address of the array, which is
the same as the address of the
first element of the array
a[i] T n/a value stored in the i'th element
&a[i] T * n/a address of the i'th element
sizeof a size_t n/a total number of bytes used by the
array
sizeof *a size_t n/a total number of bytes used by the
first element of the array
sizeof &a size_t n/a total number of bytes used by a
pointer to the array
The expression a has type "N-element array of T"; it is not the operand of the unary & or sizeof operators, so it is converted to a pointer to the first element of the array, amd its value is the address of that element.
The expression &a has type "pointer to N-element array of T"; since a is an operand of the unary & operator, the conversion rule above isn't applied (which is why the expression has type T (*)[N] instead of T **). However, since the address of the array is the same as the address of the first element of the array, it yields the same value as the expression a.
The expression &a[0] has type "pointer to T", and explicitly points to the first element of the array. Again, this value will be the same as the previous two expressions.
For a 2D array
T a[M][N];
all of the following are true:
Expression Type Decays to Value
---------- ---- --------- -----
a T [M][N] T (*)[N] address of first subarray, a[0]
*a T [N] T * address pf first subarray, a[0]
&a T (*)[M][N] n/a address of the array, which is
the same as the address of the
first subarray, which is the same
as the address of the first element
of the first subarray.
a[i] T [N] T * address of first element of i'th
subarray
*a[i] T n/a value of first element of i'th subarray
&a[i] T (*)[N] n/a address of the i'th subarray
sizeof a size_t n/a total number of bytes used by the
array
sizeof *a size_t n/a total number of bytes used by the
first subarray
sizeof &a size_t n/a total number of bytes used by a
pointer to the array
Final note: to print out pointer values, use the %p conversion specifier and cast the argument to (void *) (this is the pretty much the only time it's considered proper to explicitly cast a pointer to void *):
printf( " &a yields %p\n", (void *) &a );
printf( " a yields %p\n", (void *) a );
printf( "&a[0] yields %p\n", (void *) &a[0] );
Edit
To answer a question in the comments:
num,num[] and num[][] are all different thing. There types are different.Here num decays and became pointer to a pointer and num[] decays and became pointer to int and num[][] is a int. Right?
Not quite.
Assuming a declaration like
int arr[10][10];
then the expression arr will decay to type int (*)[10] (pointer to 10-element array of int), not int **; refer to the table above again. Otherwise you're right; arr[i] will decay to type int *, and arr[i][j] will have type int.
An expression of type "N-element array of T" decays to type "pointer to T"; if T is an array type, then the result is "pointer to array", not "pointer to pointer".
In the second example, num is a 2 dimensional array, or say an array of array. It's true that *num is its first element, but this first element is an array itself.
To get num[0][0], you need **num.
printf("%d\n", **num);
Look how an array looks like:
int num[ ] [2]={20,30,40,50};
is better written as
int num[][2]={{20,30},{40,50}};
It is an array with 2 elements. Those 2 elements are, again, arrays with 2 ints.
In memory, they look like
20 30 40 50
but the difference is that num refers to the whole array, num[0] to the first "part- array" and num[0][0] to the first element of the first array.
They have the same address (because they start at the same place), but they have a different type.
That is, the address is not the only important thing with a pointer, the type is important as well.
Arrays are not pointers actually, though they tend to act in a bit similar way, but not always.
Say you have this array and a pointer:
int a[] = {1, 2, 3};
int i = 19;
int *ptr = &i;
Now here a is equal to &a, but the same is not true, for pointers (ptr is not equal to &ptr).
Now coming to the question:
Consider a single dimensional array:
int arr[] = {11, 19, 5, 9};
Here, this array elements are stored in contiguous memory locations. Say, with starting address 0:
---------------------
| 11 | 19 | 5 | 9 |
---------------------
0 4 8 12 16
Now when you write name of the array, arr (for this example), you will get the starting address of the 1st element. Though if you write &arr, then you get the starting address of the whole block(this includes all the elements of the array). Now when you write *arr, you actually get the value inside the 1st element of this array.
Now consider this 2-dimensional array arr[][4] = {{11, 19, 5, 9}, {5, 9, 11, 19}}:
0 4 8 12 16 -> These are memory addresses
---------------------
| 11 | 19 | 5 | 9 | ----> These values represent the values inside each index
---------------------
| 5 | 9 | 11 | 19 |
---------------------
16 20 24 28 32
Here, when you write the name of the array, as arr, what you get is the address of the 1st element of this array, which in this case will be address of this 0th index:
0 16 32
----------------------------------------------
| 0<sup>th</sup> index | 1<sup>st</sup> index |
----------------------------------------------
Now when you do &arr, here what you get is the base address for whole of the block, i.e. base address of this:
0 4 8 12 16
---------------------
| 11 | 19 | 5 | 9 |
---------------------
| 5 | 9 | 11 | 19 |
---------------------
16 20 24 28 32
Now, if you do *arr, in 1-dimensional array it gives you the value inside the 1st element, though in 2-dimensional array, the value inside each index is actually one 1-dimensional array, hence you will get the address of this array:
0 4 8 12 16
---------------------
| 11 | 19 | 5 | 9 |
---------------------
Now if you do **arr, that is when you will actually get the value inside the 1st element, which is 11.
I hope it clears some doubts :-)
EDIT 1:
As brought to my attendtion, by fellow user, it seems there is a bit of a confusion somewhere, though I have explained in detail what is meant by what thingy. But just to justify, for this statement:
Now here __a is equal to &a__, but the same is not true, for pointers (__ptr is not equal to &ptr__).
The types of both a and &a will be different, as already stated, in the answer. If one performs pointer arithmetics, one will able to know that. Try performing a + 1 and &a + 1, how they both react to pointer arithmetics will surely give a good idea.
Considering a 1-dimensional array:
int arr[] = {11, 19, 5, 9};
---------------------
| 11 | 19 | 5 | 9 |
---------------------
0 4 8 12 16
We cannot do a++, though for a pointer:
int i = 4;
int *ptr = &i;
we can perform ptr++, this will make ptr point to the next memory location.
I think it result means that the array not really a pointer, but it is converted to a pointer in some contexts that is expected a pointer, like pass to a function that expect a pointer argument.
see this code:
void test(int* num) {
printf("test\n");
printf("%p\n",num);
printf("%p\n",&num);
printf("%p\n",&num[0]);
}
int main(){
int num[2]={20,30};
test(num);
printf("main\n");
printf("%p\n",num);
printf("%p\n",&num);
printf("%p\n",&num[0]);
//other();
return 0;
}
The output is:
test
0x7fff7a422300
0x7fff7a4222e8 //LOOK THIS! Is diferent from main!
0x7fff7a422300
main
0x7fff7a422300
0x7fff7a422300
0x7fff7a422300

Resources