starting address of array a and &a - c

In the below two lines,
char a[5]={1,2,3,4,5};
char *ptr=(char *)(&a+1);
printf("%d",*(ptr-1));
This prints 5 on screen.Whereas when use a instead of &a,
char a[5]={1,2,3,4,5};
char *ptr=(char *)(a+1);
printf("%d",*(ptr-1));
This prints 1
Both a and &a are the starting address of the array.So Why is this difference?
Also
char *ptr=&a+1;
shows a warning.

Arrays aren't pointers! Read section 6 of the comp.lang.c FAQ for more information.
Let's look at your second case first, since it's the more "normal" and idiomatic looking. Line by line:
You declare an array a containing 5 char elements.
The name of the array (a) decays into a pointer to its first element in this context. You add 1 to that and assign the result to ptr. ptr points to the 2. No cast is necessary, though you have one.
You subtract 1 from ptr and then dereference and print - hence you get the 1.
Now, let's address the first case, again line by line:
You declare an array a containing 5 char elements.
You take the address of a, yielding a char (*)[5] type pointer. You then add 1 to this pointer - because of pointer arithmetic this new pointer pasts to the byte just after 5 in memory. Then you typecast (required, this time) and assign this value to ptr.
You subtract 1 from ptr and then dreference and print. ptr is a char *, so this subtraction simply moves the pointer back by one from "one past the end of a" to point to the last element of a. Hence you get the 5.
Finally, the reason char *ptr=&a+1; gives a warning is because C requires conversions between pointer types to have an explicit cast. As mentioned above, &a is of type char (*)[5], not char *, so to assign that value to a char * variable, you'll need the explicit cast.

Since you seem totally new to it let me explain it to you in simple terms instead of going for the rigorous explanation.
You see, for your program above, a and &a will have the same numerical value,and I believe that's where your whole confusion lies.You may wonder that if they are the same,the following should give the next address after a in both cases,going by pointer arithmetic:
(&a+1) and (a+1)
But it's not so!!Base address of an array (a here) and Address of an array are not same! a and &a might be same numerically ,but they are not the same type. a is of type char* while &a is of type char (*)[5],ie , &a is a pointer to (address of ) and array of size 5.But a as you know is the address of the first element of the array.Numerically they are the same as you can see from the illustration using ^ below.
But when you increment these two pointers/addresses, ie as (a+1) and (&a+1), the arithmetic is totally different.While in the first case it "jumps" to the address of the next element in the array, in the latter case it jumps by 5 elements as that's what the size of an array of 5 elements is!.Got it now?
1 2 3 4 5
^ // ^ stands at &a
1 2 3 4 5
^ // ^ stands at (&a+1)
1 2 3 4 5
^ //^ stands at a
1 2 3 4 5
^ // ^ stands at (a+1)
The following will give an error about unspecified bound for array as not explicitly specifying the size as below means the program won't know how many elements to "jump" to when something like (&a+1) is encountered.
char a[]={1,2,3,4,5};
char *ptr=(char *)(&a+1); //(&a+1) gives error as array size not specified.
Now to the part where you decrement the pointers/addresses as (ptr-1).In the first case, before you come to the decrement part, you should know what happens in the statement above it where it is cast to type char*:
char *ptr=(char *)(&a+1);
What happens here is that you "strip off" the original type of (&a+1) which was type char (*)[5] and now cast it to type char* which is the same as that of a,ie, the base address of the array.(Note again the difference between base address of an array and address of an array.So after the cast and assignment in the above statement,followed by the decrement in printf(), ptr now gives the memory location right after the last element of the array, which is 5.
1 2 3 4 5
^ // ^ stands at location of 5, so *ptr gives 5
So when you dereference the pointer ptr after decrementing it as *(ptr-1) it prints the value of 5 as expected.
Now finally, contrast it with the second case where 1 is printed.Look at the illustration I have given using the symbol ^. When you had incremented a as a+1, it points to the second element of the array, ie 2 and you had assigned this address to ptr.So when you decrement ptr it as (ptr-1), it jumps back one element and now points to the first element of the array ,ie 1.So dereferencing ptr in second case gives 1.
1 2 3 4 5
^ // ^ stands at address of 1, so *ptr gives 1
Hope this made it all clear.

The difference is in the type of the pointer that you get:
Array name a by itself represents a pointer to the initial element of the array. When interpreted in that way, e.g. in an expression a+1, the pointer is considered to point to a single character.
When you take &a, on the other hand, the pointer points to an array of five characters.
When you add an integer to a pointer, the number of bytes the pointer is moved is determined by the type of the object pointer to by the pointer. In case the pointer points to char, adding N advances the pointer by N bytes. In case the pointer points to an array of five chars, adding N advances the pointer by 5*N bytes.
That's precisely the difference that you are getting: your first example advances the pointer to the element one past the end of the array (which is legal), and then move it back to the last element. Your second example, on the other hand, advances the pointer to the second element, and then moves it back to point to the initial element of the array.

What you are running into is a subtlety of pointer arithmetic.
The compiler treats "a" as a pointer to char - an entity that is 1 byte in size. Adding 1 to this yields a pointer that is incremented by the size of the entity (i.e. 1).
The compiler treats "&a" as a pointer to an array of chars - an entity that is 5 bytes in size. Adding 1 to this yields a pointer that is incremented by the size of the entity (i.e. 5).
This is how pointer arithmetic works. Adding one to a pointer increments it by the size of the type that it is a pointer to.
The funny thing, of course, is that when it comes to evaluating the value of "a" or "&a", when dereferencing, they both evaluate to the same address. Which is why you see the values that you do.

Arrays "decay" into pointers to the first element. So taking the address of a gives you a pointer to an array of 5 chars, which is like declaring a char[][5]. And incrementing this pointer advances to the next element of the char[][5] array - that is 5 characters at a time. This is different from incrementing the pointer that decays from the char[5] array - that is, one character at a time.

Related

Why is this code involving arrays and pointers behaving as it does?

I was asked what the output of the following code is:
int a[5] = { 1, 3, 5, 7, 9 };
int *p = (int *)(&a + 1);
printf("%d, %d", *(a + 1), *(p - 1));
3, 9
Error
3, 1
2, 1
The answer is NO.1
It is easy to get *(a+1) is 3.
But how about int *p = (int *)(&a + 1); and *(p - 1) ?
The answer to this could be either "1) 3,9" or "2) Error" (or more specifically undefined behavior) depending on how you read the C standard.
First, let's take this:
&a + 1
The & operator takes the address of the array a giving us an expression of type int(*)[5] i.e. a pointer to an array of int of size 5. Adding 1 to this treats the pointer as pointing to the first element of an array of int [5], with the resulting pointer pointing to just after a.
Also, even though &a points to a singular object (in this case an array of type int [5]) we can still add 1 to this address. This is valid because 1) a pointer to a singular object can be treated as a pointer to the first element of an array of size 1, and 2) a pointer may point to one element past the end of an array.
Section 6.5.6p7 of the C standard states the following regarding treating a pointer to an object as a pointer to the first element of an array of size 1:
For the purposes of these operators, a pointer to an object
that is not an element of an array behaves the same as a pointer
to the first element of an array of length one with the type of the
object as its element type.
And section 6.5.6p8 says the following regarding allowing a pointer to point to just past the end of an array:
When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to the
i-th element of an array object, the expressions (P)+N
(equivalently, N+(P)) and (P)-N (where N has the value n) point to,
respectively, the i+n-th and i−n-th elements of the array object,
provided they exist. Moreover, if the expression P points to the
last element of an array object, the expression (P)+1 points one past
the last element of the array object, and if the expression Q
points one past the last element of an array object, the
expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to
elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined. If the result points
one past the last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated.
Now comes the questionable part, which is the cast:
(int *)(&a + 1)
This converts the pointer of type int(*)[5] to type int *. The intent here is to change the pointer which points to the end of the 1-element array of int [5] to the end of the 5-element array of int.
However the C standard isn't clear on whether this conversion and the subsequent operation on the result is allowed. It does allow conversion from one object type to another and back, assuming the pointer is properly aligned. While the alignment shouldn't be an issue, using this pointer is iffy.
So this pointer is assigned to p:
int *p = (int *)(&a + 1)
Which is then used as follows:
*(p - 1)
If we assume that p validly points to one element past the end of the array a, subtracting 1 from it results in a pointer to the last element of the array. The * operator then dereferences this pointer to the last element, yielding the value 9.
So if we assume that (int *)(&a + 1) results in a valid pointer, then the answer is 1) 3,9 otherwise the answer is 2) Error.
In the line
int *p = (int *)(&a + 1);
note that &a is being written, not a. This is important.
If simply a had been written, then the array would have decayed to a pointer to the first element, i.e. to &a[0]. However, since the expression &a was used instead, the result of this expression has the same value as if a or &a[0] had been used, but the type is different: The type is a pointer to an array of 5 int elements, instead of a pointer to a single int element.
According to the rules on pointer arithmetic, incrementing a pointer by 1 will increase the memory address by the size of the object that it is pointing to. Since the pointer is not pointing to a single element, but to an array of 5 elements, the memory address will be incremented by 5 * sizeof(int). Therefore, after incrementing the pointer, the value of (but not type of) the pointer will be equivalent to &a[5], i.e. one past the end of the array.
After casting this pointer to int * and assigning the result to p, the expression p is fully equivalent to &a[5] (both in value and in type).
Therefore, the expression *(p - 1) is equivalent to *(&a[5] - 1), which is equivalent to *(&a[4]), or simply a[4].
This:
&a + 1;
is taking the address of a, an array, and adding 1, which adds the size of one a, i.e. 5 integers. Then the indexing "backs down", one integer, ending up in the final element of a.
Normally whenever arrays are used in expressions, they "decay" into a pointer to the first element. There are a few exceptions to this rule and one such exception is the & operator.
&a therefore yields a pointer to the array of type int (*)[5]. Then &a + 1 is pointer arithmetic on such a type, meaning the pointer address is increased by the size of one int [5]. We end up pointing just beyond the array, but C actually allows us to do that as long as we don't de-reference that location.
Then the pointer is forced a type conversion to (int *) which we can do too - C allows pretty much any manner of wild pointer conversions as long as we don't de-reference or cause misalignment etc.
p - 1 does pointer arithmetic on type int and the actual type of data in the array is also int, so we are allowed to de-reference that location. We end up at the last item of the array.

Printf and Array

I was asked this question as a class exercise:
int A[] = {1,3,5,7,9,0,2,4,6};
printf("%d\n", *(A+A[1]-*A));
I couldn't figure it out on paper, so went ahead to compiling a simple program and tested it and found that printf("%d",*A) always gives me 1 for the output.
But I still do not understand why this is the case, hence it would be great if someone can explain this.
A is treated like a pointer to the first element of array of integers.
A[1] is the value of the first element of that array, which is 3 (indexes are 0-based)
*A is the value to which A points, which if the zeroth element of array, so 1.
So
A[1] - *A == 3 - 1 == 2
Now we have
*(A + 2)
That's where pointer arithmetic kicks in. Since A is a pointer to integer, A+2 points to the second (0-based) item in that array and *(A+2) gets its value.
So answer is 5.
Also please note for future reference that pointer to an integer and array of integers are somewhat different things in C, but for the purposes of this discussion they are the same thing.
Break it down into its constituent parts:
A by itself is the memory address of the array, which is also equivalent to &A[0], the memory address of the first element of the array.
A[1] is the value stored in the second element of the array, which is 3.
*A dereferences the memory address of the array, which is equivilent to A[0], the value stored in the first element of the array, which is 1.
So, do some substitutions:
*(A+A[1]-*A)
= *(A+(A[1])-(A[0]))
= *(A+3-1)
= *(A+2)
The notation *(Array+index) is the same as the notation Array[index]. Under the hood, they both take the starting address of the array, increment it by the number of bytes of the array element type (in this case, int) multiplied by the index, and then dereference the resulting address. So *(A+2) is the same as A[2], which is 5.
Arrays used in expressions are automatically converted into pointers pointing at the first elements of the arrays except for some exceptions such as operands of sizeof or unary & operators.
E1[E2] is defined to be equivalent to *((E1) + (E2))
+ and - operator used to pointers will move the pointer forward and backward.
In this case, *A is equivalent to *(A + 0), which is equivalent to A[0] and it will give you the first element of the array.
The expression *(A+A[1]-*A) will
Get the pointer to the first element, which points at 1, via A
Move the pointer to A[1] (3) elements ahead via +A[1], so the pointer now points at 7
Move the pointer to *A (1) element before what is pointed via -*A, so the pointer now points at 5
Dereference the pointer via the unary * operator, so the expression is evaluated to 5
An array variable in C is only the pointer to the initial memory location for the array. So if you derreference the array, you will always get the value for the first position.
If you sum up 1 to the original array value, like *(A+1) you will get the second position.
You can get any position from the array using the same method:
*(A) is the first position
*(A+1) is the second position
*(A+2) is the third position
and so on...
If you declare the int array as int* A and allocate the memory and attribute the values, it is usually easier to visualize how this works.

Array to pointer decay and passing multidimensional arrays to functions

I know that an array decays to a pointer, such that if one declared
char things[8];
and then later on used things somewhere else, things is a pointer to the first element in the array.
Also, from my understanding, if one declares
char moreThings[8][8];
then moreThings is not of type pointer to char but of type "array of pointers to char," because the decay only occurs once.
When moreThings is passed to a function (say with prototype void doThings(char thingsGoHere[8][8]) what is actually going on with the stack?
If moreThings is not of pointer type, then is this really still a pass-by-reference? I guess I always thought that moreThings still represented the base address of the multidimensional array. What if doThings took input thingsGoHere and itself passed it to another function?
Is the rule pretty much that unless one specifies an array input as const then the array will always be modifiable?
I know that the type checking stuff only happens at compile time, but I'm still confused about what technically counts as a pass by reference (i.e. is it only when arguments of type pointer are passed, or would array of pointers be a pass-by-reference as well?)
Sorry to be a little all over the place with this question, but because of my difficulty in understanding this it is hard to articulate a precise inquiry.
You got it slightly wrong: moreThings also decays to a pointer to the first element, but since it is an array of an array of chars, the first element is an "array of 8 chars". So the decayed pointer is of this type:
char (*p)[8] = moreThings;
The value of the pointer is of course the same as the value of &moreThings[0][0], i.e. of the first element of the first element, and also the same of &a, but the type is a different one in each case.
Here's an example if char a[N][3]:
+===========================+===========================+====
|+--------+--------+-------+|+--------+--------+-------+|
|| a[0,0] | a[0,1] | a[0,2]||| a[1,0] | a[1,1] | a[1,2]|| ...
|+--------+--------+-------+++--------+--------+-------++ ...
| a[0] | a[1] |
+===========================+===========================+====
a
^^^
||+-- &a[0,0]
|+-----&a[0]
+-------&a
&a: address of the entire array of arrays of chars, which is a char[N][3]
&a[0], same as a: address of the first element, which is itself a char[3]
&a[0][0]: address of the first element of the first element, which is a char
This demonstrates that different objects may have the same address, but if two objects have the same address and the same type, then they are the same object.
"ARRAY ADDRESS AND POINTERS TO MULTIDIMENSIONAL ARRAYS"
Lets we start with 1-D array first:
Declaration char a[8]; creates an array of 8 elements.
And here a is address of fist element but not address of array.
char* ptr = a; is correct expression as ptr is pointer to char and can address first element.
But the expression ptr = &a is wrong! Because ptr can't address an array.
&a means address of array. Really Value of a and &a are same but semantically both are different, One is address of char other is address of array of 8 chars.
char (*ptr2)[8]; Here ptr2 is pointer to an array of 8 chars, And this time
ptr2=&a is a valid expression.
Data-type of &a is char(*)[8] and type of a is char[8] that simply decays into char* in most operation e.g. char* ptr = a;
To understand better read: Difference between char *str and char str[] and how both stores in memory?
Second case,
Declaration char aa[8][8]; creates a 2-D array of 8x8 size.
Any 2-D array can also be viewed as 1-D array in which each array element is a 1-D array.
aa is address of first element that is an array of 8 chars. Expression ptr2 = aa is valid and correct.
If we declare as follows:
char (*ptr3)[8][8];
char ptr3 = &aa; //is a correct expression
Similarly,
moreThings in your declaration char moreThings[8][8]; contain address of fist element that is char array of 8 elements.
To understand better read: Difference between char* str[] and char str[][] and how both stores in memory?
It would be interesting to know:
morething is an address of 8 char array .
*morething is an address of first element that is &morething[0][0].
&morething is an address of 2-D array of 8 x 8.
And address values of all above three are same but semantically all different.
**morething is value of first element that is morething[0][0].
To understand better read: Difference between &str and str, when str is declared as char str[10]?
Further more,
void doThings(char thingsGoHere[8][8]) is nothing but void doThings(char (*thingsGoHere)[8]) and thus accepts any array that is two dimensional with the second dimension being 8.
About type of variables in C and C++: (I would like to add in answer)
Nothing is pass by reference in C its C++ concept. If its used in C that means author talking about pointer variable.
C supports pass by Address and pass by value.
C++ supports Pass by address, pass by value and also pass by Reference.
Read: pointer variables and reference variables
At the end,
Name Of an array is constant identifier not variable.
Nicely explained by Kerrek,
In addition to that, we can prove it by the following example:
#include <stdio.h>
int main ()
{
int a[10][10];
printf (".. %p %p\n", &a, &a+1);
printf (".. %p %p \n ", &a[0], &a[0]+1);
printf (".. %p %p \n ", &a[0][0], &a[0][0] +1);
}
The Output is :
.. 0x7fff6ae2ca5c 0x7fff6ae2cbec = 400 bytes difference
.. 0x7fff6ae2ca5c 0x7fff6ae2ca84 = 40 bytes difference
.. 0x7fff6ae2ca5c 0x7fff6ae2ca60 = 4 bytes difference.
&a +1 -> Moves the pointer by adding whole array size. ie: 400 bytes
&a[0] + 1 -> Moves the pointer by adding the size of column. ie: 40 bytes.
&a[0][0] +1 -> Moves the pointer by adding the size of element ie: 4 bytes.
[ int size is 4 bytes ]
Hope this helps. :)

why don't i get the value of first element of the array?

In this program all three addresses which I mention refer to the first element of the array but why don't I get the value of the first element of the array when I dereference them?
int main()
{
int a[5] = {1,2,3,4,5};
printf("address a = %d\n",a);
printf("address of a[0] = %d\n",&a[0]);
printf("address of first element = %d\n",&a);
printf("value of first element of the array a =%d\n",*(a));
printf("first element =%d\n",*(&a[0]));
printf("a[0] = %d\n",*(&a));//this print statement again prints the address of a[0]
return 0;
}
I get address of the first element of the array a for the first 3 print statements and when I dereference all the 3 I get values only for the fourth and fifth print statements and not for the sixth print statement (which is accompanied with a comment).
Things to remember:
Name of the array is the address of its first element
So, as the array name is a, then, printing a would give you the address of a[0] (which is also the address of the array too) i.e. you will get the values of &a[0] (same as a) and &a to be the same
Now, you are aware that a and &a[0] refer to the first element, you can dereference the first element in 3 ways:-
*a
*(&a[0])
a[0] - Note that internally, this gets transformed into: *(a+0)
Things to remember:
2. Adding an integer to a pointer takes the pointer to the next element
Here, &a points to the address of the whole array. Although the value of &a is the same as &a[0] and a, but, it is a pointer to the array, not pointer to the first element.
So, if you add 1 to &a i.e. &a + 1, you'll go beyond this array.
SImilarly, as &a[0] and a are pointers to the first element, adding 1 to them will give you the next element of the array (if there are more than 1 items defined in the array). i.e. (a+1) and &a[0] + 1 point to the next element from the first element. Now, for dereferencing them, you can use:
*(a+1)
*(&a[0] +1)
a[1] - Note that internally, this gets transformed into: *(a+1)
Adding more information to remove the following doubt:
If, as this answer states, the name of the array were the address of its first element, then &a would be the address of the address of the first element.
The answer to this doubt is both No and Yes.
No because there is nothing like address of the address.
For understanding yes, consider the following situation:
Imagine that you have 10 boxes of chocolates and each box contains 5 chocolates (fitted in a line inside the box) and that the boxes are lined up.
Ok, enough chocolates to explain.
Here, So, boxes represent arrays of chocolates. Thus, we have with us 5 boxes of 5 chocolates each. The declaration for that would be:
Translating it to C, just assume that a is an array with 5 numbers.
-Now, if I ask you to tell me the location of the first box, then, you will refer to it as &a. If I ask you to get me the location of second box, then, you'll refer to it as &a +1.
If I ask you to get me the location of first chocolate in the first box, then you'll refer to it as &a[0] or (a+0) or a.
If I ask you to get me the location of second chocolate in the first box, then you'll refer to it as &a[1] or (a+1) or a+1. Note: In (a+1), as a is the name of the array, it is the address of the first element, which is an integer. So, increasing a by 1, means the address of the second element.
If I ask you to get me the location of the second box of chocolates, then, you'll refer to it as (&a+1)
If I ask you to get me the location of the first chocolate in the second box of chocolates, then, you'll refer to it as *(&a+1) or *((&a+1) + 0)
If I ask you to get me the location of the third chocolate in the second box of chocolates, then, you'll refer to it as (*(&a+1))+2
To answer just your question, by definition the operators * and & are such that they cancel out. Taking the address of a variable and then dereferencing gives you back the variable. Here this is is an array, a. In most contexts arrays "decay" to pointers so what you then see again is the address of the first element.
The C standard specifies that an expression that has type “array of type” is converted to type “pointer to type” and points to the initial element of the array, except when the expression is the operand of & (or three other exceptions, noted below but not relevant to this question). Here is how this applies to your examples:
a is an array of int. It is converted to pointer to int. The value is the address of the initial element of a.
In &a[0], a[0] is processed first. a is again converted to pointer to int. Then the subscript operator is applied, which produces an lvalue for the initial element of the array. Finally, & takes the address of this lvalue, so the value is the address of the initial element of a.
In &a, a is the operand of &, so a is not converted to a pointer to int. It remains an array, and &a takes its address. The result has type “pointer to array of int”, and its value is the address of the array, which equals the address of the initial element.
For completeness: The relevant rule in the C standard is 6.3.2.1 paragraph 3. The exceptions are:
The array expression is the operand of &.
The array expression is the operand of sizeof.
The array expression is the operand of _Alignof.
The array expression is a string literal used to initialize an array.
The last means that in char s[] = "abc";, "abc" is not converted to a pointer to the initial element; it remains an array, which is used to initialize s.
The a is a pointer and a[0] is *(a+0). So when you write *(&a), you aren't dereferencing it.

Dereferencing multi-dimensional array name and pointer arithmetic

I have this multi-dimensional array:
char marr[][3] = {{"abc"},{"def"}};
Now if we encounter the expression *marr by definition (ISO/IEC 9899:1999) it says (and I quote)
If the operand has type 'pointer to type', the result has type 'type'
and we have in that expression that marr decays to a pointer to his first element which in this case is a pointer to an array so we get back 'type' array of size 3 when we we have the expression *marr. So my question is why when we do (*marr) + 1 we add 1 byte only to the address instead of 3 which is the size of the array.
Excuse my ignorance I am not a very bright person I get stuck sometimes on trivial things like this.
Thank you for your time.
The reason why incrementing (*marr) moves forward 1 byte is because *marr refers to a char[3], {"abc"}. If you don't already know:
*marr == marr[0] == &marr[0][0]
(*marr) + 1 == &marr[0][1]
If you had just char single_array[3] = {"abc"};, how far would you expect single_array + 1 to move forward in memory? 1 byte right, not 3, since the type of this array is char and sizeof(char) is 1.
If you did *(marr + 1), then you would be referring to marr[1], which you can then expect to be 3 bytes away. marr + 1 is of type char[][3], the increment size is sizeof(char[3]).
The key difference about the two examples above is that:
The first is dereferenced to a char[3], and then incremented, therefore the increment size is sizeof(char).
The second is incrementing a char[][3], therefore the increment size is sizeof(char[3]), and then dereferencing.
It adds one because the type is char (1 byte). Just like:
char *p = 0x00;
++p; /* is now 0x01 */
When you dereference a char [][] it will be used as char * in an expression.
To add 3, you need to do the arithmetic first and then dereference:
*(marr+1)
You were doing:
(*marr)+1
which dereferences first.

Resources