Dereferencing multi-dimensional array name and pointer arithmetic - c

I have this multi-dimensional array:
char marr[][3] = {{"abc"},{"def"}};
Now if we encounter the expression *marr by definition (ISO/IEC 9899:1999) it says (and I quote)
If the operand has type 'pointer to type', the result has type 'type'
and we have in that expression that marr decays to a pointer to his first element which in this case is a pointer to an array so we get back 'type' array of size 3 when we we have the expression *marr. So my question is why when we do (*marr) + 1 we add 1 byte only to the address instead of 3 which is the size of the array.
Excuse my ignorance I am not a very bright person I get stuck sometimes on trivial things like this.
Thank you for your time.

The reason why incrementing (*marr) moves forward 1 byte is because *marr refers to a char[3], {"abc"}. If you don't already know:
*marr == marr[0] == &marr[0][0]
(*marr) + 1 == &marr[0][1]
If you had just char single_array[3] = {"abc"};, how far would you expect single_array + 1 to move forward in memory? 1 byte right, not 3, since the type of this array is char and sizeof(char) is 1.
If you did *(marr + 1), then you would be referring to marr[1], which you can then expect to be 3 bytes away. marr + 1 is of type char[][3], the increment size is sizeof(char[3]).
The key difference about the two examples above is that:
The first is dereferenced to a char[3], and then incremented, therefore the increment size is sizeof(char).
The second is incrementing a char[][3], therefore the increment size is sizeof(char[3]), and then dereferencing.

It adds one because the type is char (1 byte). Just like:
char *p = 0x00;
++p; /* is now 0x01 */
When you dereference a char [][] it will be used as char * in an expression.
To add 3, you need to do the arithmetic first and then dereference:
*(marr+1)
You were doing:
(*marr)+1
which dereferences first.

Related

Why is this code involving arrays and pointers behaving as it does?

I was asked what the output of the following code is:
int a[5] = { 1, 3, 5, 7, 9 };
int *p = (int *)(&a + 1);
printf("%d, %d", *(a + 1), *(p - 1));
3, 9
Error
3, 1
2, 1
The answer is NO.1
It is easy to get *(a+1) is 3.
But how about int *p = (int *)(&a + 1); and *(p - 1) ?
The answer to this could be either "1) 3,9" or "2) Error" (or more specifically undefined behavior) depending on how you read the C standard.
First, let's take this:
&a + 1
The & operator takes the address of the array a giving us an expression of type int(*)[5] i.e. a pointer to an array of int of size 5. Adding 1 to this treats the pointer as pointing to the first element of an array of int [5], with the resulting pointer pointing to just after a.
Also, even though &a points to a singular object (in this case an array of type int [5]) we can still add 1 to this address. This is valid because 1) a pointer to a singular object can be treated as a pointer to the first element of an array of size 1, and 2) a pointer may point to one element past the end of an array.
Section 6.5.6p7 of the C standard states the following regarding treating a pointer to an object as a pointer to the first element of an array of size 1:
For the purposes of these operators, a pointer to an object
that is not an element of an array behaves the same as a pointer
to the first element of an array of length one with the type of the
object as its element type.
And section 6.5.6p8 says the following regarding allowing a pointer to point to just past the end of an array:
When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to the
i-th element of an array object, the expressions (P)+N
(equivalently, N+(P)) and (P)-N (where N has the value n) point to,
respectively, the i+n-th and i−n-th elements of the array object,
provided they exist. Moreover, if the expression P points to the
last element of an array object, the expression (P)+1 points one past
the last element of the array object, and if the expression Q
points one past the last element of an array object, the
expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to
elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined. If the result points
one past the last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated.
Now comes the questionable part, which is the cast:
(int *)(&a + 1)
This converts the pointer of type int(*)[5] to type int *. The intent here is to change the pointer which points to the end of the 1-element array of int [5] to the end of the 5-element array of int.
However the C standard isn't clear on whether this conversion and the subsequent operation on the result is allowed. It does allow conversion from one object type to another and back, assuming the pointer is properly aligned. While the alignment shouldn't be an issue, using this pointer is iffy.
So this pointer is assigned to p:
int *p = (int *)(&a + 1)
Which is then used as follows:
*(p - 1)
If we assume that p validly points to one element past the end of the array a, subtracting 1 from it results in a pointer to the last element of the array. The * operator then dereferences this pointer to the last element, yielding the value 9.
So if we assume that (int *)(&a + 1) results in a valid pointer, then the answer is 1) 3,9 otherwise the answer is 2) Error.
In the line
int *p = (int *)(&a + 1);
note that &a is being written, not a. This is important.
If simply a had been written, then the array would have decayed to a pointer to the first element, i.e. to &a[0]. However, since the expression &a was used instead, the result of this expression has the same value as if a or &a[0] had been used, but the type is different: The type is a pointer to an array of 5 int elements, instead of a pointer to a single int element.
According to the rules on pointer arithmetic, incrementing a pointer by 1 will increase the memory address by the size of the object that it is pointing to. Since the pointer is not pointing to a single element, but to an array of 5 elements, the memory address will be incremented by 5 * sizeof(int). Therefore, after incrementing the pointer, the value of (but not type of) the pointer will be equivalent to &a[5], i.e. one past the end of the array.
After casting this pointer to int * and assigning the result to p, the expression p is fully equivalent to &a[5] (both in value and in type).
Therefore, the expression *(p - 1) is equivalent to *(&a[5] - 1), which is equivalent to *(&a[4]), or simply a[4].
This:
&a + 1;
is taking the address of a, an array, and adding 1, which adds the size of one a, i.e. 5 integers. Then the indexing "backs down", one integer, ending up in the final element of a.
Normally whenever arrays are used in expressions, they "decay" into a pointer to the first element. There are a few exceptions to this rule and one such exception is the & operator.
&a therefore yields a pointer to the array of type int (*)[5]. Then &a + 1 is pointer arithmetic on such a type, meaning the pointer address is increased by the size of one int [5]. We end up pointing just beyond the array, but C actually allows us to do that as long as we don't de-reference that location.
Then the pointer is forced a type conversion to (int *) which we can do too - C allows pretty much any manner of wild pointer conversions as long as we don't de-reference or cause misalignment etc.
p - 1 does pointer arithmetic on type int and the actual type of data in the array is also int, so we are allowed to de-reference that location. We end up at the last item of the array.

Pointers to arrays in C

I just saw this code snippet Q4 here and was wondering if I understood this correctly.
#include <stdio.h>
int main(void)
{
int a[5] = { 1, 2, 3, 4, 5 };
int *ptr = (int*)(&a + 1);
printf("%d %d\n", *(a + 1), *(ptr - 1));
return 0;
}
Here's my explanation:
int a[5] = { 1, 2, 3, 4, 5 }; => a points to the first element of the array. In other words: a contains the address of the first element of the array.
int *ptr = (int*)(&a + 1); => Here &a will be a double pointer and point to the whole array. I visualize it like this: int b[1][5] = {1, 2, 3, 4, 5};, here b points to a row of a 2D array. &a + 1 should point to the next array of integers in the memory (non-existent) [kind of like, b + 1 points to the second (non-existent) row of a 2D array with 1 row]. We cast it as int *, so this should probably point to the first element of the next array (non-existent) in memory.
*(a + 1) => This one's easy. It just points to the second element of the array.
*(ptr - 1) => This one's tricky, and my explanation is probably flawed for this one. As ptr is an int *, this should point to int previous to that pointed by ptr. ptr points to the non-existent second array in memory. So, ptr - 1 should probably point to the last element of the first array (a[4]).
Here &a will be a double pointer.
No. It is a pointer to an array. In this example, int (*)[5]. Refer C pointer to array/array of pointers disambiguation
so when you increment pointer to an array, it will crosses the array and points to non-existent place.
In this example, It is assigned to integer pointer. so when int pointer is decremented, it will point to previous sizeof(int) bytes. so 5 is printed.
Your statement is essentially correct, and you probably understand it better than most professionals. But since you are seeking a critique, here is the long answer. Arrays and pointers in C are different types, this is one of the most subtle details in C. I remember one of my favorite professors saying once that the people who made the language latter regretted making this so subtle and often confusing.
It is true in many cases an array of a type, and a pointer to a type can be treated the same way. They both have a value equal to their address, but they are truly different types.
When you take the address of an array &a, you have a pointer to an array. When you say (a + 1) you have a pointer to an int, when you just say a you have an array (not a pointer). a[1] is exactly the same as typing *(a + 1), in fact you could type 1[a] and it would be exactly the same as the previous two. When you pass an array to a function, you are not really passing an array, you are passing a pointer void Fn(int b[]) and void Fn(int *b) are both the exact same function signature, if you take sizeof b within the function, in both cases you will get the size of a pointer.
Pointer arithmetic is tricky, it always offsets by the size of the object it's pointing to in bytes. Whenever you use the address of operator you get a pointer to the type you applied it to.
So for what's going on in your example above:
&a is a pointer to an array, and so when you add one to it, it is offset by the sizeof that array (5 * sizeof(int)).
When you cast to int*, the cast retains the value of the pointer, but now its type is pointer to int, you then store it in ptr, a variable of type pointer to int.
a is an array, not a pointer. So when you say a + 1 you apply the addition operator to an array, not a pointer; and this yields a pointer to one-past the first element of the type stored in the array, int. Dereferencing it with * gives you the int pointed to.
ptr is a pointer to int, and it points one past the end of the array. (it is legal by the way to point one past the end of an array, it's just not legal to dereference this pointer) When you subtract 1 from it, you end up with a pointer to an int that is the last in the array, which you can dereference. (Your explain of visualizing int b[1][5] = {1, 2, 3, 4, 5}; is something I've not heard before, and while I can't honestly say if this is technically correct, I will say this is how it works and I think this is a great way to think of it; I will likely do so in the back of my mind from now on.)
Types will get very tricky in C, and also in C++. The best is yet to come.
As per your explanation you understand array pointer correctly.Using statement
int *ptr = (int*)(&a + 1);
you point to the next address of address occupied by whole array a[] so you can access the array element using ptr by decrementing address of ptr.

starting address of array a and &a

In the below two lines,
char a[5]={1,2,3,4,5};
char *ptr=(char *)(&a+1);
printf("%d",*(ptr-1));
This prints 5 on screen.Whereas when use a instead of &a,
char a[5]={1,2,3,4,5};
char *ptr=(char *)(a+1);
printf("%d",*(ptr-1));
This prints 1
Both a and &a are the starting address of the array.So Why is this difference?
Also
char *ptr=&a+1;
shows a warning.
Arrays aren't pointers! Read section 6 of the comp.lang.c FAQ for more information.
Let's look at your second case first, since it's the more "normal" and idiomatic looking. Line by line:
You declare an array a containing 5 char elements.
The name of the array (a) decays into a pointer to its first element in this context. You add 1 to that and assign the result to ptr. ptr points to the 2. No cast is necessary, though you have one.
You subtract 1 from ptr and then dereference and print - hence you get the 1.
Now, let's address the first case, again line by line:
You declare an array a containing 5 char elements.
You take the address of a, yielding a char (*)[5] type pointer. You then add 1 to this pointer - because of pointer arithmetic this new pointer pasts to the byte just after 5 in memory. Then you typecast (required, this time) and assign this value to ptr.
You subtract 1 from ptr and then dreference and print. ptr is a char *, so this subtraction simply moves the pointer back by one from "one past the end of a" to point to the last element of a. Hence you get the 5.
Finally, the reason char *ptr=&a+1; gives a warning is because C requires conversions between pointer types to have an explicit cast. As mentioned above, &a is of type char (*)[5], not char *, so to assign that value to a char * variable, you'll need the explicit cast.
Since you seem totally new to it let me explain it to you in simple terms instead of going for the rigorous explanation.
You see, for your program above, a and &a will have the same numerical value,and I believe that's where your whole confusion lies.You may wonder that if they are the same,the following should give the next address after a in both cases,going by pointer arithmetic:
(&a+1) and (a+1)
But it's not so!!Base address of an array (a here) and Address of an array are not same! a and &a might be same numerically ,but they are not the same type. a is of type char* while &a is of type char (*)[5],ie , &a is a pointer to (address of ) and array of size 5.But a as you know is the address of the first element of the array.Numerically they are the same as you can see from the illustration using ^ below.
But when you increment these two pointers/addresses, ie as (a+1) and (&a+1), the arithmetic is totally different.While in the first case it "jumps" to the address of the next element in the array, in the latter case it jumps by 5 elements as that's what the size of an array of 5 elements is!.Got it now?
1 2 3 4 5
^ // ^ stands at &a
1 2 3 4 5
^ // ^ stands at (&a+1)
1 2 3 4 5
^ //^ stands at a
1 2 3 4 5
^ // ^ stands at (a+1)
The following will give an error about unspecified bound for array as not explicitly specifying the size as below means the program won't know how many elements to "jump" to when something like (&a+1) is encountered.
char a[]={1,2,3,4,5};
char *ptr=(char *)(&a+1); //(&a+1) gives error as array size not specified.
Now to the part where you decrement the pointers/addresses as (ptr-1).In the first case, before you come to the decrement part, you should know what happens in the statement above it where it is cast to type char*:
char *ptr=(char *)(&a+1);
What happens here is that you "strip off" the original type of (&a+1) which was type char (*)[5] and now cast it to type char* which is the same as that of a,ie, the base address of the array.(Note again the difference between base address of an array and address of an array.So after the cast and assignment in the above statement,followed by the decrement in printf(), ptr now gives the memory location right after the last element of the array, which is 5.
1 2 3 4 5
^ // ^ stands at location of 5, so *ptr gives 5
So when you dereference the pointer ptr after decrementing it as *(ptr-1) it prints the value of 5 as expected.
Now finally, contrast it with the second case where 1 is printed.Look at the illustration I have given using the symbol ^. When you had incremented a as a+1, it points to the second element of the array, ie 2 and you had assigned this address to ptr.So when you decrement ptr it as (ptr-1), it jumps back one element and now points to the first element of the array ,ie 1.So dereferencing ptr in second case gives 1.
1 2 3 4 5
^ // ^ stands at address of 1, so *ptr gives 1
Hope this made it all clear.
The difference is in the type of the pointer that you get:
Array name a by itself represents a pointer to the initial element of the array. When interpreted in that way, e.g. in an expression a+1, the pointer is considered to point to a single character.
When you take &a, on the other hand, the pointer points to an array of five characters.
When you add an integer to a pointer, the number of bytes the pointer is moved is determined by the type of the object pointer to by the pointer. In case the pointer points to char, adding N advances the pointer by N bytes. In case the pointer points to an array of five chars, adding N advances the pointer by 5*N bytes.
That's precisely the difference that you are getting: your first example advances the pointer to the element one past the end of the array (which is legal), and then move it back to the last element. Your second example, on the other hand, advances the pointer to the second element, and then moves it back to point to the initial element of the array.
What you are running into is a subtlety of pointer arithmetic.
The compiler treats "a" as a pointer to char - an entity that is 1 byte in size. Adding 1 to this yields a pointer that is incremented by the size of the entity (i.e. 1).
The compiler treats "&a" as a pointer to an array of chars - an entity that is 5 bytes in size. Adding 1 to this yields a pointer that is incremented by the size of the entity (i.e. 5).
This is how pointer arithmetic works. Adding one to a pointer increments it by the size of the type that it is a pointer to.
The funny thing, of course, is that when it comes to evaluating the value of "a" or "&a", when dereferencing, they both evaluate to the same address. Which is why you see the values that you do.
Arrays "decay" into pointers to the first element. So taking the address of a gives you a pointer to an array of 5 chars, which is like declaring a char[][5]. And incrementing this pointer advances to the next element of the char[][5] array - that is 5 characters at a time. This is different from incrementing the pointer that decays from the char[5] array - that is, one character at a time.

How does C retrieve the address of a row for a 2d array

Can someone explain to me how C retrieves the correct memory address for a row when you only use one subscript to access a 2d array?
Example -
int array2D[2][2] = {1,2,3,4};
printf ( "starting address of row2 = %p" , array2D[1]);
I understand that when subscripting in C that what is actually going on is pointer addition so for a 1d array the array name points to element 0. In this case if I had wanted element 1 the compiler would take the starting address (say 4000) and add 4 to it (assuming a 4 bit int) so that what is returned is the item at memory address 4004.
My understanding is that when you populate a 2d array, as in my example, they are allocated sequentially so I would have
1 2
3 4
at addresses
4000 4004
4008 4012
So how does C work out that in this case array2D[1] should point to 4008 and not 4004? Does it run a sizeof() operator or have I misunderstood a fundamental here?
Thanks in advance
C knows how long each row is, so it does the multiplication to find the row.
int x[][3] = {{1,2,3},{4,5,6}};
then &x[1][0] is &x[0][0] plus 3 * sizeof(int).
That's why in a multidimensional C array declaration, all but the first dimension must be specified.
Pointer arithmetic depends on the type of the element being pointed to. Given a pointer p to type T, p + 1 points to the next element of type T, not necessarily the next byte following p. If T is char, then p + 1 points to the next char object after p, which starts at the byte immediately following p; if T is char [10], then p + 1 points to the next 10-element array of char after p, which starts at the 10th byte following p.
The type of the expression array2d in is "2-element array of 2-element array of int", which "decays" to type "pointer to 2-element array of int", or int (*)[2]1. Thus the expression array2d[1] is interpreted as *(array2d + 1). Since array2d points to an object of type int [2], array2d + 1 points to the next 2-element array of int following array2d, which is 2 * sizeof int bytes away from array2d.
1. Except when it is the operand of the sizeof or unary & operators, or is a string literal being used to initialize another array in a declaration, an expression of type "N-element array of T" will be converted to an expression of type "pointer to T" and its value will be the address of the first element in the array.
This is going to be a bit long-winded, but bear with me still.
Array subscription is just a shorthand: (p[N]) equals (*(p + N)) in all contexts for pointer types (both are invalid expressions for void*, though).
Now, if p is an array type, it would decay to a pointer type in an expression like (*(p + N)); an int[2][2] would decay into a pointer of type (*)[2] (i.e. a pointer to an int[2]).
Pointer arithmetic takes types into account; we need to convert things to char* to visualize what the compiler does to us:
T *p;
p[N] equals *(p + N) equals *(T*)((unsigned char*)p + N * sizeof *p)
Now, if T were an int[2] (to equal the situation we described above), then sizeof *p would be sizeof(int[2]), i.e. 2 * sizeof(int).
This is how subscription works in so-called multidimensional arrays.
sizeof(array2D[1]) == 8;
if array2D address is 4000;
so array2D[1] address is 4000+sizeof(array2D[1]) == 4000+8;

Will the prototype of a[1][2] be this: int **a?

a[1][2] is expanded by compiler like this: *( *(a+1) +2 ). So if a has such a prototype:int **a,
The foregoing expression should be explained like this:
Get the address of a from symbol table. Note it is a pointer
to a pointer
Now we add it by 1, then it point to the somewhere next to
where a point to.
Then we dereference it. I think here is a undefined behavior,
for we don't know if a+1 is valid and we arbitraryly access it.
Ok, if we are lucky enough that we successfully get the value
*(a+1). We add this by 2.
Upon this step, we dereference (*(a+1) +2 ). Will we be lucky now?
I read this in Expert C Programming in Chapter 10. Is this correct?
New answer, after edited question:
For a[1][2] to be valid, given that a has is defined as int **a;, both of these must be true:
a must point at the first of two sequential int * objects;
The second of those int * objects must point at the first of three sequential int objects.
The simplest way to arrange this is:
int x[3];
int *y[2] = { 0, x };
int **a = y;
Original answer:
If the expression a[1][2] is valid, then there are many distinct possibilities for the type of a (even neglecting qualifiers like const):
type **a; (pointer to pointer to type)
type *a[n]; (array of n pointers to type)
type (*a)[n]; (pointer to array of n type)
type a[m][n]; (array of m arrays of n type)
Precisely how the expression is evaluated depends on which of these types a actually has.
First a + 1 is calculated. If a is itself a pointer (either case 1 or case 3), then the value of a is directly loaded. If a is an array (case 2 or case 4), then the address of the first element of a is loaded (which is identical to the address of a itself).
This pointer is now offset by 1 object of the type that it points to. In case 1 and case 2, it would be offset by 1 "pointer to type" object; in case 3 and case 4, it would be offset by 1 "array of n type" object, which is the same as ofsetting by n type objects.
The calculated (offset) pointer is now dereferenced. In cases 1 and 2, the result has type "pointer to type", in cases 3 and 4 the result has type "array of n type".
Next *(a + 1) + 2 is calculated. As in the first case, if *(a + 1) is a pointer, then the value is used directly (this time, cases 1 and 2). If *(a + 1) is an array (cases 3 and 4), then the address of the first element of that array is taken.
The resulting pointer (which, at this point, always has type "pointer to type") is now offset by 2 type objects. The final offset pointer is now dereferenced, and the type object is retrieved.
Let's say the definition of a looks something like this:
int a[2][2] = {{1, 2}, {3, 4}};
Here's what the storage that the symbol a looks like:
[ 1 ][ 2 ][ 3 ][ 4 ]
In C, when you perform arithmetic on a pointer, the actual amount by which the pointer value is incremented or decremented is based on the size of the type stored in the array. The type contained in the first dimension of a is int[2], so when we ask C to calculate the pointer value (a + 1), it takes the location named by a and increments it by the size of int[2], which results in pointer referring to the memory location containing the integer value [3]. So yes, when you dereference this pointer and then add 2 to it, the result is the integer value 5. When you then try to dereference that integer value, it makes no sense.
So now let's say the array contains pointers:
char const * one = "one",
two = "two",
three = "three",
four = "four";
char const * a[2][2] = {{one, two}, {three, four}};
Add 1 to a and then dereference it, and you get the char pointer referring to the string "three." Add two to this, and you'll get a pointer referring to a now shorter string "ree". Dereference that, and you get the char value 'r', but only by sheer luck did you avoid a memory protection fault.

Resources