I just read some question about pointer. Here is the code:
int a[5]={1, 2, 3, 4, 5};
int *p = (int*)(&a + 1);//second line
cout<<(*p)<<endl;
My compiler output is 0. What is *p? Is it the pointer to array a? and what is &a+1 means?
This is what your declaration statement means:
p
|
v
a[0] a[1] a[2] a[3] a[4] | a[5]
---------(Length)------->
But you're trying to get this (I assume):
p
|
v
a[0] a[1] a[2] a[3] a[4] |
---------(Length)------->
You need to remove the second set of parenthesis from your declaration statement to get a[1]:
int * p = (int *)(&a + 1);
// change to this:
int * p = (int *) &a + 1;
The reason you're getting the wrong value has to do with the sizeof operator, operator precedence, and pointer arithmetic. Let me explain those before I explain the error.
Sizeof Operator
The sizeof operator evaluates the size (in bytes) of a datatype. Immediate example:
sizeof (char) // always returns 1
sizeof (int) // usually returns 4 or 8
sizeof (int *) // usually returns 4 or 8
And note this interesting example:
int a[5];
sizeof (a) // returns sizeof (int) * 5
Operator Precedence
Operator precedence is the order in which a series of operators are evaluated. Anything in parenthesis will be evaluated first. After parenthesis are evaluated, it's up to the operator precedence to determine what order the expression will be solved.
Here's the relevant operators and they're order in the precedence table:
Operator Description Associativity
() Parenthesis Left-to-right
(cast) Cast Right-to-left
& Address of Right-to-left
+, - Plus, Minus Right-to-left
Pointer Arithmetic
Pointer arithmetic is mostly about adding, subtracting and multiplying pointers with integers (or other pointers). What you need to know (for the scope of this question) is this:
int * p;
p = p + 1;
Even though it says + 1, it is actually adding sizeof (int) to the pointer.
This was a design choice by the writers of the C standard because it is much more common
that programmers want to add sizeof (int) to the pointer (which brings them to the next integer in an array) than to add 1 (which brings the pointer in between the first and second element in the array).
More generally put:
datatype p;
p = p + 1;
// actually means
p = p + sizeof (datatype);
Here's a relevant example:
int a[5];
a = a + 1;
// actually means
a = a + sizeof (a);
// remember that sizeof (a) is going to be sizeof (int) * 5?
Declaration Statement
Back to your declaration statement:
int * p = (int *)(&a + 1);
You may already see what's wrong with it: a + 1 really means a + sizeof (a), which brings you out of scope of the array. This may be 0 (often it is) or it may be some other random value.
What you might not have noticed is that this:
int * p = (int *) &a + 1;
Actually gives you the second element in the array. This has to do with operator precedence. If you look at the operator precedence table that I put in a link, casts have a higher precedence than & and + operators. So if a is cast as a (int *) instead of a[5] before the rest of the expression is evaluated, then it becomes equivalent to this:
int * a;
a = a + 1; /* and this would
give you the second element
in the array */
So simply put, if you want to access the second element in your array, change:
int * p = (int *)(&a + 1);
// to this:
int * p = (int *) &a + 1;
&a is the address of the array.
&a + 1 is also an address, but what is this 1? It's a pointer that points sizeof a bytes from a. So it's like writing int *p = &a[5];, then you're casting it to int *.
Now why 0? Because a[5] happens to be 0 - Note that it's out of bounds and could be anything else (Undefined behavior).
Note that arrays are zero-based, meaning that the indexes are from 0. So actually a[4] is the last element, a[5] is out of bounds.
The operator & is used to take the address of a variable. Thus, &a is of type pointer to array of 5 ints: int (*)[5].
Pointer arithmetic means that when you have a pointer p, then p+1 will point to the next element, which is sizeof(*p) bytes away. This means that &a+1 points to 5*sizeof(int) blocks away, namely, the block after the last element in the array.
Casting &a+1 to int * means that now you want this new pointer to be interpreted as a pointer to int instead of pointer to array of 5 ints. You're saying that, from now on, this pointer references something that is sizeof(int) bytes long, so if you increment it, it will move forward sizeof(int) units.
Therefore, *p is accessing a[5], which is an out of bounds position (one beyond the last), so the program has undefined behavior. It printed 0, but it could have crashed or printed something else. Anything can happen when undefined behavior occurs.
int *p = (int*)(&a+1);
what is *p? :
In your code, *p is a pointer to an unknown element of type int.
Lets reason it out:
&a is a valid pointer expression. An integer can be added to a pointer expression.
Result of &a : points to the whole array a[5]. Its type is int (*)[5]
Result of sizeof *(&a) is 5, which is the size of the array.
Result of &a +1 is pointer expression which holds the address of the 1st int beyond the one &a currently points to.
Hence, 1 object past the &a is what *p points to. Hence it is unknown element of type int. So, accessing unknown element is undefined behaviour. This also answers why your output is zero.
Side Note:
If you really wanted to access the second element in the array, you should add 1 to pointer which points to the first element of array. a points to the first element in the array and size is sizeof(int*).
Result of a+1 is the address of the second element in the array.
Result of *(a+1) is the value of the second element in the array. 2
Related
I am currently trying to understand pointers in C but I am having a hard time understanding this code:
int a[10];
int *p = a+9;
while ( p > a )
*p-- = (int)(p-a);
I understand the code to some degree. I can see that an array with 10 integer elements is created then a pointer variable to type int is declared. (But I don't understand what a+9 means: does this change the value of the array?).
It would be very helpful if someone could explain this step by step, since I am new to pointers in C.
When used in an expression1, the name of an array in C, 'decays' to a pointer to its first element. Thus, in the expression a + 9, the a is equivalent to an int* variable that has the value of &a[0].
Also, pointer arithmetic works in units of the pointed-to type; so, adding 9 to &a[0] means that you get the address of a[9] โ the last element of the array. So, overall, the p = a + 9 expression assigns the address of the array's last element to the p pointer (but it does not change anything in that array).
The subsequent while loop, however, does change the values of the array's elements, setting each to the value of its position (the result of the p - a expression) and decrementing the address in p by the size of an int. (Well, that what it's probably intended to do; but, as mentioned in the comments, the use of such "unsequenced operations" โ i.e. the use of p-- and p - a in the same statement โ is actually undefined behaviour because, in this case, the C Standard does not dictate which of those two expressions should be evaluated first.)
To avoid that undefined behaviour, the code should be written to use an explicit intermediate, like this:
int main()
{
int a[10];
int* p = a + 9;
while (p > a) {
int n = (int)(p - a); // Get the value FIRST ...
*p-- = n; // ... only THEN assign it
}
return 0;
}
1 There two exceptions: when that array name is used as the operand of a sizeof operator or of the unary & (address of) operator.
int a[10];
This declares an array on e.g. the stack. a represents the starting address of the array. The declaration tells the compiler that a will hold 10 integers. C assumes you know what you are doing so it is up to you to keep yourself in that range.
int *p = a+9;
p is declared a pointer e.g. like a RL street address. When you add an offset to a an offset is added to the address a. The compiler converts the offset like +5 to bytes +5*sizeof(int) so you don't need to think about that, so your p pointer is now pointing inside the array at offset 9 - which is the last int in the array a since index starts at 0 in C.
while( p > a )
The condition says that do this while the address of what p is pointing to is larger than the address where a is.
*p-- = (int)(p-a);
here the value what p points to is overwritten with a crude(1) subtraction between current p and starting address a before the pointer p is decremented.
(1) Undefined Behavior
I understand that when we use sizeof operator on an array name, it gives the total size of the array in bytes. For example
int main(int argc, const char * argv[]) {
int a[][5] = {
{1,2,3,4,5},
{10,20,30,40,50},
{100,200,300,400,500}
};
int n=sizeof(a);
printf("%d\n",n);
}
It gives 60 as output for 15 elements of the array. But when I write
int n=sizeof(*a);
It gives 20 as the output that is the size of the first row while *a is the base address of the 0th element of the 0th row, and its type is a pointer to an integer. And a points to the first row itself. Why is this happening?
*a is row 0 of a, and that row is an array of five int.
In most expressions, an array is automatically converted to a pointer to its first element. Thus, when you use *a in a statement such as int *x = *a;, *a is converted to a pointer to its first element. That results in a pointer to int, which may be assigned to x.
However, when an array is the operand of a sizeof operator, a unary & operator, or an _Alignof_ operator, it is not converted to a pointer to its first element. Also, an array that is a string literal being used to initialize an array is not converted to a pointer (so, in char foo[] = "abc";, "abc" is used as an array to initialize foo; it is not converted to a pointer).
*a is not a pointer, it's an int[5], which is coherent with your reading of 20 assuming a 4-byte int.
Except when it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
The expression a has type "3-element array of 5-element array of int"; thus, sizeof a should yield 3 * 5 * sizeof (int).
The expression *a is the same as the expression a[0] (a[i] is defined as *(a + i) - *a is the same as *(a + 0), which is the same as a[0]). Both *a and a[0] have type "5-element array of int"; thus sizeof *a and sizeof a[0] should both yield 5 * sizeof (int).
However...
If you pass a to a function, such as
foo( a );
then a is not the operand of the sizeof or unary & operators, and the expression will be converted from type "3-element array of 5-element array of int" to "pointer to 5-element array of int":
void foo( int (*a)[5] ) { ... }
If you computed sizeof a in function foo, you would not get 5 * sizeof (int), you would get sizeof (int (*)[5]), which, depending on the platform, would be 4 to 8 bytes.
Similarly, if you passed *a or a[i] to a function, what the function actually receives is a pointer to int, not an array of int, and sizeof will reflect that.
In this 2d array *a is a pointer because when you print it, its seems an address (but it is the 1st column address) :
printf("%d\n", *a);
Output : 9435248
So :
for(int i = 0;i < 3;i++)
printf("%d\n", *a[i]);
The output is :
1
10
100
When you use of *a like this : *a[3] its means you are in 3rd row and 1st column by default.
*a is the address of 1st column and we have 5 column, so when you try this :
sizeof(*a);
Output will be 20 => (5 column) * (int pointer which is 4 byte)).
A 2D array is viewed as an array of 1D arrays. That is, each row in a 2D array is a 1D array. For a given 2D array A, int A[m][n]
you can think of
A[0] as the address of row 0
A[1] as the address of row 1 etc & so on.
Dereferencing can be thought of as below,
A[i][j] = *(A[i] + j) = *(*(A+i) + j)
So when you say *A, it means A[0] which gives you the address of 1st row & not the 1st element of the matrix.
Dereference of A or *A gives the address of row 0 or A[0].
Dereference of A[0] gives the first entry of A or A[0][0] that is
**A = A[0][0].
& since you have 5 elements in the 1st row the size if 20 bytes.
Sizeof returns the size of the variable in memory expressed in bytes. This includes padding (unused bytes added by a compiler to a structure to improve performance). Your array has 15 elements of size 4. The sizeof an integer in memory is 4 in you case. You can easily verify this by running:
printf("sizeof an integer: %zu\n", sizeof(int));
It is always a good idea to use standard int types.
#inlcude <stdint.h>
uint32_t a[][5] = {
{1,2,3,4,5},
{10,20,30,40,50},
{100,200,300,400,500}
};
This will produce exactly the same code but wil clearly show the memory size of the (32 bit) integer.
In your case uint8_t (unsigned int of 8 bit) might be more appropriate. You can read more here. To get the number of elements you should devide the total memory of the array by the sizeof an element.
sizeof(a)/sizeof(a[0])
You can also use a macro to do this:
#define ARRAY_LENGTH(array)(sizeof(array)/sizeof(array[0]))
/*ARRAY_LENGTH(a) will return 15 as expected.*/
You can also look for an answer here: How can I find the number of elements in an array?
Is the following well defined, for different values of REF?
#include <stdio.h>
#define REF 1
#define S 1
int main(void) {
int a[2][S] = {{1},{2}};
int *q = REF ? a[1] : 0;
int *p = a[0] + S;
memcpy (&q, &p, sizeof q);
printf ("q[0] = %d\n", q[0]);
return 0;
}
Note that p points to the after the last element of a[0], not to an element in the array a[0], hence not dereferenceable. But the address stored in p is the address of a[1][0]. p semantically (intentionally?) points "to" (well, out of) a[0] but physically points into a[1].
Can a copy of the bit pattern of a pointer point semantically to an object when the original only physically does?
SEE ALSO
I have asked essentially the same C/C++ question with a different "angle":
Are pointer variables just integers with some operators or are they "mystical"?
Is memcpy of a pointer the same as assignment?
Overwriting an object with an object of same type (C++ only)
Given
int blah(int x, int y)
{
int a[2][5];
a[1][0] = x;
a[0][y] = 9;
return a[1][0];
}
nothing in the Standard would forbid a compiler from recoding that as int blah(int x, int y) { return x; }, nor trapping (or doing anything whatsoever) when y>=5, since a[0] and a[1] are distinct arrays of five elements each. In cases where the last element of an indirectly-accessed structure is a single-element array, compilers have generally included code to allow pointer arithmetic on that array to yield a pointer to storage outside the structure. While such pointer arithmetic would be forbidden by the Standard, it enables useful constructs which could not be practically implemented in any standard-compliant fashion prior to C99.
Note that adding 5 to a[0] would yield an int* that compares identical to a[1], but the fact that a "one-past" pointer compares equal to a pointer which identifes the next object in memory does not imply that it may be safely used to access the latter object. Such accesses may often work, but that doesn't mean compilers are required to have them do so.
Can a copy of the bit pattern of a pointer point semantically to an
object when the original only physically does?
There is no such distinction effectively, because a[0] + S is the same as a[1], assuming that inner array is declared with S size.
The following:
int a[2][S];
declares two-elements array, where each element is an array of S-elements of type int. Arrays are stored contigously and there is no padding before/between/after its elements.
We will prove, that a[0] + S == a[1] holds. It can be rewritten as:
*(a + 0) + S == *(a + 1)
By pointer arithmetic, RHS adds 1 * sizeof(*a) bytes to a, that is the same as size of the inner array. LHS is little more complex, as addition is performed after dereference of a, thus it adds:
S * sizeof(**a) bytes,
Both sides are guaranteed to be equal when they point to the same object (the same memory location), of the same type. Hence you could rewrite it into "absolute" one-byte form as:
(char *)a + S * sizeof(**a) == (char *)a + sizeof(*a)
This reduces into:
S * sizeof(**a) == sizeof(*a)
We know, that sub-array *a has S elements of type of **a (i.e. int), so both offsets are the same. Q.E.D.
If we have,
int p;
int res;
res= (char*)(&p+1)-(char*)(&p)
printf("size of p= %d",res);
So, the size of p will print 4. Which is the correct answer.
But, if I don't use (char*), for example,
res=(&p+1)-(&p)
I then got res=1 as output. So, why is this (char*) type casting important to get the size of the p variable.
When I print the value of (&p+1) and (&p), the difference is 4, but when I print the difference it outputs 1.
Arithmetic with pointers always operates in multiples of the size of the element, not bytes. This is because expressions like array[2] act the same as *(array + 2). If array is of type int *, then both array[2] and *(array + 2) should refer to the same element. If array + 2 pointed two bytes after array, it would point in the middle of the first element. So instead, the compiler does something like *((int *)((char *)array + 2 * sizeof(int))).
The same thing happens for pointer subtraction, to maintain the basic properties of addition and subtraction. Given int *p = array + 2, then p points to the third element. Then, if you compute p - array, then logically, it should equal 2 (because if z = x + y, then z - x = y).
Further, because the C standard requires the size of char to be 1, casting the pointers to char * will give you the difference in bytes (technically, in chars, but the two terms are usually interchangeable).
Please help me understand the reason the following code works the way it does:
#include <stdio.h>
int main(){
int a = 10;
void *b = &a;
int *p = b;
printf("%u",*p++);
return 0;
}
I know the output of printf will be 10, but I'm not quite following why *p++ is 10
Here are my steps:
1) void *b = &a; stores the address of a in pointer b
2) int *p = b; pointer p now points to the same data item as pointer b
3) printf("%u",*p++); is where I get confused... the dereference of pointer p is a, which is 10... isn't *p++ basically the same as 10+1 which will be 11?
*p++ is essentially *(p++). It evaluates to the value of p before it is incremented which is the address to a. Then you dereference it which evaluates to the value 10.
The post-increment operator in the expression *p++ applies to the pointer, not the value stored at that location, so the result is never 11, before or after it is evaluated. The expression *p++ means: dereference p (get it's value) then increment p one location. Since p points to an int, incrementing it will move it forward sizeof(int) bytes . The addition does not ever apply to the value that p points to, which is 10.
However, the expression (*p)++ is different. It dereferences p (gets its value) and then increments the value in that memory location. The expression evaluates to the original value. So after executing the statement
int c = (*p)++;
the variable c would equal 10, while a would equal 11.
*p++ is parsed as *(p++). p++ evaluates to p, and then increments p, so the change won't be seen until the next reference to p. So *p is 10, *p++ is 10 (but p now points to &a+1), *++p is undefined behavior (because *(&a+1) is not a valid value), (*p)++ is 10 but changes a to 11, and ++*p (or ++(*p)) is 11 (as is a).
Variable p is a pointer to an int (pointing to a)
The expression *p dereferences the pointer, hence it's like accessing the int a directly.
Operator postfix ++ on pointer p takes precedence over the dereferencing. Therefore *p++ increments the pointer p (to whatever junk is in memory after int a) AFTER the expression is evaluated, so the dereferencing still resolves to a and that's why 10 is printed. But after the statement is run the value of p is changed. So, likely after that statement if you do printf("%u ",*p) you will get an awkward value.
If you do ++*p however, the expression is evaluated as ++ operation on the dereferenced int variable pointed by p. If you want to avoid trouble like this, when not sure, use parenthesis:
(*p)++
++(*p)
And you're making sure you are dereferencing the value and acting on it. Incrementing a pointer value is a very dangerous operation allowed by languages like C and C++, so avoid whenever possible!
Why *p++ is 10 ?
[C11: ยง6.5.2.4/2]: The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it).
The below statement
printf("%u",*p++);
is equivalent to
printf("%u",*p); /* p points to 'a' and value of `a` is 10. Hence, 10 is printed */
p = p + 1;
p is of type pointer-to-int. Hence, 1 is scaled to sizeof (int).
As a result, p now points to an int at address : p + sizeof (int)
I just want to add my five cents.
For incrementing value indirected via pointer you can use ++*ip or (*ip)++. There is a nice explanation about parentheses in K&R book:
The parentheses are necessary in this last example
(*ip)++; without them, them expression
would increment ip instead of what it points to, because
unary operators like * and ++ associate right to left.
And in your piece of code you got 10 because printf will print original value of variable and only after it wiil be incremented by one due of using of postfix ++ operator.