Questions are based on the following code :
struct t
{
int * arr;
};
int main()
{
struct t *a = malloc(5*sizeof(struct t));
a[2].arr = malloc(sizeof(int));//line 1
a[2].arr[1] = 3; //line 2
}
In line 2 I'm accessing the array arr using the . (dot) operator and not the -> operator. Why does this work?
When i rewrite line 2 as (a+2)->arr[1] = 3 this works. But if I write it as (a+2)->(*(arr+1)) = 3 I get a message as expected identifier before '(' token. Why is this happening?
For line 1, the dot operator works in this case, because the array access dereferences the pointer for you. *(a+2) == a[2]. These two are equivalent in both value and type.
The "->" operator, expects an identifier after it, specifically the right argument must be a property of the type of the left argument. Read the messages carefully, it really is just complaining about your use of parentheses. (Example using the . operator instead: a[2].(arr) is invalid, a[2].arr is just dandy.)
Also, if we can extrapolate meaning from your code, despite its compilation errors, there is the potential for memory related run time issues as well.
-> dereferences a pointer and accesses its pointee. As you seem to know a[1] is equivalent to *(a + 1), where the dereference already takes place.
The expression (a+2)->arr[1] is equivalent to *((a+2)->arr + 1).
You allocated one single struct t for a[2].arr, then wrote in the second one. Oops.
a[2] is not a pointer. The indexing operator ([]) dereferences the pointer (a[2] is equivalent to *(a+2)).
(*(arr+1)) is an expression. If you want to do it that way, you want to get the pointer (a+2)->(arr+1), then derefrence it: *((a+2)->arr+1). Of course, since you've only malloced enough memory for one int, this will attempt to access unallocated memory. If you malloc(sizeof(int)*2), it should work.
Related
This is purely theory based, but I have this code:
int i = 3, k[] = {2, 4, 6, 8, 10, 12}, *x = &i, *y = k;
double d = 1.5;
struct point_tag {
int x, y;
char *name;
} pt[] = {{200, 40, "begin"}, {300, 100, "end"}}, *pp = pt;
and these two expressions:
pt[i--].y+50 which causes undefined behavior at run time
*(*pp.name+2) which does not compile
I would just like to know why the top one can not run and why the bottom one does not compile, even though *((*pp).name+2) does.
pt[i--].y+50 which causes undefined behavior at run time
The array pt has the size of 2, because you added two initializer items. This results in valid index values as 0 and 1. i has an initial value of 3. You just access the array out of bounds.
*(*pp.name+2) which does not compile
The operator preceedence favors . before *.
You declared the array pt as having 2 elements. The variable i is initialized by the value 3
int i = 3,...;
So this expression with the subscript operator
pt[i--].y+50
accesses memory beyond the array because the valid range of indices for the array is [0, 2).
As for this expression
*(*pp.name+2)
then it is the same as the expression
*(*( pp.name ) + 2 )
As the variable pp is a pointer you may not apply the dot operator.
You need to write at least
*( ( *pp ).name+2)
pt[i--].y+50 which causes undefined behavior at run time
i is stored the value 3 and you have only a two slots array. As you have not specified a length in the brackets in de definition of the array, so the initializer indicates the size of the array, and the initializer has only two cells. You can only access indices 0 and 1, the rest is undefined behaviour. (3 is two slots far out of the array).
*(*pp.name+2) which does not compile
This is normal. field selection have higher preference than pointer dereference, so it is interpreted as *((*(pp.name))+2), in which pp.name is invalid, as pp is a pointer, not a struct. By the way, (*pp) is a struct, so (*pp).name will compile fine.
I would just like to know why the top one can not run and why the bottom one does not compile, even though *((*pp).name+2) does.
this would compile because *pp is a struct, so the field accessor works (*pp).name is a pointer to char, so (*pp).name + 2 (and the equivalent pp->name + 2) is also a pointer (a pointer pointing to the third character of pp->name, and *((*pp).name + 2) is the character pointed to (in the third position of the field name)
Finally, I'd recommend you that, if you want to end understanding the pointers and don't end hating them, you start with simpler expressions, use the -> and [] operators (that are there to simplify pointer expressions, that look very unnatural due to operator precedence) and complicate things as long as you first understand the basic.
Suppose I want to get the last element of an automatic array whose size is unknown. I know that I can make use of the sizeof operator to get the size of the array and get the last element accordingly.
Is using *((*(&array + 1)) - 1) safe?
Like:
char array[SOME_SIZE] = { ... };
printf("Last element = %c", *((*(&array + 1)) - 1));
int array[SOME_SIZE] = { ... };
printf("Last element = %d", *((*(&array + 1)) - 1));
etc
No, it is not.
&array is of type pointer to char[SOME_SIZE] (in the first example given). This means &array + 1 points to memory immediately past the end of array. Dereferencing that (as in (*(&array+1)) gives undefined behaviour.
No need to analyse further. Once there is any part of an expression that gives undefined behaviour, the whole expression does.
I don't think it is safe.
From the standard as #dasblinkenlight quoted in his answer (now removed) there is also something I would like to add:
C99 Section 6.5.6.8 -
[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
So as it says , we should not do this *(&array + 1) as it will go one past the last element of array and so * should not be used.
As also it is well known that dereferencing pointers pointing to an unauthorized memory location leads to undefined behaviour .
I believe it's undefined behavior for the reasons Peter mentions in his answer.
There is a huge debate going on about *(&array + 1). On the one hand, dereferencing &array + 1 seems to be legal because it's only changing the type from T (*)[] back to T [], but on the other hand, it's still a pointer to uninitialized, unused and unallocated memory.
My answer relies on the following:
C99 6.5.6.7 (Semantics of additive operators)
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Since &array is not a pointer to an object that is an element of an array, then according to this, it means that the code is equivalent to:
char array_equiv[1][SOME_SIZE] = { ... };
/* ... */
printf("Last element = %c", *((*(&array_equiv[0] + 1)) - 1));
That is, &array is a pointer to an array of 10 chars, so it behaves the same as a pointer to the first element of an array of length 1 where each element is an array of 10 chars.
Now, that together with the clause that follows (already mentioned in other answers; this exact excerpt is blatantly stolen from ameyCU's answer):
C99 Section 6.5.6.8 -
[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
Makes it pretty clear that it is UB: it's equivalent to dereferencing a pointer that points one past the last element of array_equiv.
Yes, in real world, it probably works, as in reality the original code doesn't really dereference a memory location, it's mostly a type conversion from T (*)[] to T [], but I'm pretty sure that from a strict standard-compliance point of view, it is undefined behavior.
It is probably safe, but there are some caveats.
Suppose we have
T array[LEN];
Then &array is of type T(*)[LEN].
Next, &array + 1 is again of type T(*)[LEN], pointing just past the end of the original array.
Next, *(&array + 1) is of type T[LEN], which may be implicitly converted to T*, still pointing just past the end of the original array. (So we did NOT dereference an invalid memory location: the * operator is not evaluated).
Next, *(&array + 1) - 1 is of type T*, pointing at the last array location.
Finally, we dereference this (which is legitimate if the array length is not zero): *(*(&array + 1) - 1) gives the last array element, a value of type T.
Note that the only time we actually dereference a pointer is in this last step.
Now, the potential caveats.
First, *(&array + 1) formally appears like an attempt to dereference a pointer that points to an invalid memory location. But it really isn't. That's the nature of array pointers: this formal dereference only changes the type of the pointer, does not actually result in an attempt to retrieve value from the referenced location. That is, array is of type T[LEN] but it may be implicitly converted to type &T, pointing to the first element of the array; &array is a pointer to type T[LEN], pointing at the beginning of the array; *(&array+1) is again of type T[LEN] which may be implicitly converted to type &T. At no point is a pointer actually dereferenced.
Second, &array + 1 may in fact be an invalid address, but it really isn't: My C++11 reference manual tells me explicitly that "Taking a pointer to the element one beyond the end of an array is guaranteed to work", and a similar statement is also made in K&R, so I believe it has always been standard behavior.
Finally, in case of a zero-length array, the expression dereferences the memory location just before the array, which may be unallocated/invalid. But this issue would also arise if one used a more conventional approach using sizeof() without testing for nonzero length first.
In short, I do not believe there is anything undefined or implementation-dependent about this expression's behavior.
Imho that might work but is probably unwise. You should carefully review your sw design and ask yourself why you want the last entry of the array. Is the content of the array completely unknown to you or is it possible to define the structure in terms of c structs and unions. If that is the case stay away from complex pointer operations in a char array for example and define the data properly in you c code, in structs and unions where ever possible.
So instead of :
printf("Last element = %c", *((*(&array + 1)) - 1));
It could be :
printf("Checksum = %c", myStruct.MyUnion.Checksum);
This clarifies your code. The last letter in your array means nothing to a person not familiar with whats in this array. myStruct.myUnion.Checksum makes sense to anyone. Studying the myStruct structure could explain the whole data structure to anyone. Please use something like that if it can be declared in such a way. If you are in the rare situation you can not, study above answers, they make good sense i think
a)
If both the pointer operand and the result [of P + N] point to
elements of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
[...]
if the expression P points either to an element of an array
object or one past the last element of an array object, and the
expression Q points to the last element of the same array object, the
expression ((Q)+1)−(P) has the same value as ((Q)−(P))+1 and as
−((P)−((Q)+1)), and has the value zero if the expression P points one
past the last element of the array object, even though the expression
(Q)+1 does not point to an element of the array object.
This states that computations using array elements one past the last element is actually completely fine. As some people here have written that the use of non-existent objects for computations is already illegal, I thought I include that part.
Then we need to take care about this part:
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is
evaluated.
There is one important part that the other answers omitted and that is:
If the pointer operand points to an element of an array object
This is not the fact. The pointer operand we dereference is not a pointer to an element of an array object, it is a pointer to a pointer. So this whole clause is completely irrelevant. But, there is also stated:
For the purposes of these [additive] operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
What does this mean?
It means our pointer to a pointer is actually again a pointer to an array - of length[1]. And now we can close the loop, because as the first paragraph states, we are allowed to make calculations with one past the array, so we are allowed to make calculations with the array as if it would be an array of length[2]!
In a more graphical way:
ptr -> (ptr to int[10])[0] -> int[10]
-> (ptr to int[10])[1]
So, we are allowed to make calculations with (ptr to int[10])[1], even though it is technically outside the array of length[1].
b)
The steps that happen are:
array ptr of type int[SOME_SIZE] to the first element array
&array ptr to a ptr of type int[SOME_SIZE] to the first element of array
+ 1 ptr, one more than the ptr of type int[SOME_SIZE]) to the first element array, to a ptr of type int
This is NOT yet a pointer to int[SOME_SIZE+1], according to C99 Section 6.5.6.8. This is NOT yet ptr + SOME_SIZE + 1
* We dereference the pointer to the pointer. NOW, after the dereferencing, we have a pointer according to C99 Section 6.5.6.8, which is past the element of the array and which is not allowed to be dereferenced. This pointer is allowed to exist and we are allowed to use operators on it, except the unary * operator. But we don't use that one on that pointer yet.
-1 Now we subtract one from the ptr of type int to one after the last element of the array, letting ptr point to the last element of the array.
* dereferencing a ptr to int to the last element of the array, which is legal.
c)
And last, but not least:
If it would be illegal, then the offsetof macro would be illegal, too, which is defined as:
((size_t)(&((st *)0)->m))
I'm new in C programming and currently learning about array and strings. I'm quite confuse in this topic. Coming to my question-
Since an array (for ex- a[]={20,44,4,8}), the name in an expression decays into pointer constant,so whenever if i try to do pointer arithmetic for example- a=a+1 or anything like this the compiler shows error but when the same thing I write in printf() function it is showing the address of the first element rather than showing error. Why?
In an expression for example *(a+1)=2 first (a+1) will be evaluated and then * will dereference it. My question is that if a is a pointer constant then how it can point to any other memory location in an array and how this expression is perfectly legal?
I tried to search about this but couldn't get the accurate result.
Although an array name evaluates to a pointer in some expressions, your a = a+1 assignment tries to assign to an array, which is not allowed.
On the other hand, a+1 expression is allowed, and it evaluates to another pointer. When you pass this value to printf, the function happily prints it. Do not forget to cast the result to void* when you print:
printf("%p\n", (void*)(a+1));
if a is a pointer constant then how it can point to any other memory location in an array and how is *(a+1) expression perfectly legal?
For the same reason that 2+3, a combination of two constants, produces a value that is neither a 2 nor a 3. In your example, a+1 expression does not modify a. Instead, the expression uses it as a "starting point", computes a different value (which happens to be of type pointer), and leaves a unchanged.
The name of the array a is not quite the same as a pointer constant. It merely
acts like a pointer constant in some circumstances. In other circumstances it will
act quite differently; for example, sizeof(a) may have a much larger value
than sizeof(b) where b is truly a pointer.
This code is legal:
int a[] = {20,44,4,8};
int *b;
b = a;
b = b + 1;
because a is enough like a pointer that you can set b to point to the same
address but, unlike a, b really is a pointer and it can be modified.
The last line of code could just as well be:
b = a + 1;
because the right-hand side here is not trying to modify a; it is merely using
the address of the first element of a to compute a new address.
The expression *(a + 1) is effectively another way of writing a[1].
You know what will happen when you write a[1] = 2, right?
It will change what is stored in the second element of a.
(The first element is always a[0] whether you do anything with it or not.)
Storing a new value in a[1] doesn't change the location of the array a.
When array decays in to pointer, the resulting value is a rvalue. It's an value that cannot be assigned to.
So int[4] will become int*const, constant pointer to integer.
Q1:
Types in expression a = a + 1 are:
int[4] = int[4] + int
If we focus on addition first, array decays to pointer:
int[4] = int*const + int
int[4] = int*const // After addition
But now there is a problem:
int*const = int*const
In memory a is an array with 4 ints, and nothing more. There is no place where you could possibly store address with type int*. Compiler will show an error.
Q2:
Types in expression *(a+1)=2 are:
*(int[4] + int) = int
Again, array decays to pointer and addition happens:
*(int*const + int) = int
*(int*const) = int // int* is now equal to &a[1]
Dereferencing int*const is legal. While pointer is constant, value it points to is not:
int = int // Ok, equal types
Types are now perfectly compatible.
This is my scenario.
struct X {
char A[10];
int foo;
};
struct X *x;
char B[10]; // some content in it.
x = malloc(sizeof(struct X));
To copy contents from B to A, why is the following syntax correct:
memcpy(x->A, B, sizeof(x->A));
Here x->A is treated as a pointer, so we don't need &. But then we should need sizeof(*x->A) right? But with * it is not working, while without * it's working fine.
Is it like sizeof operator does not treat A like a pointer?
A is NOT a pointer, it's an array. So sizeof(x->A) is the correct syntax, it's the size of the whole array, i.e, 10.
It's true that in many situations, an array name is converted to a pointer to the first element. But sizeof(arrayname) is NOT one of them.
sizeof(*x->A) gives you the size of a char(1 byte), while size0f(x->A) gives you the size of the entire array(10bytes).
sizeof(*x->A) is equivalent to sizeof(x->A[0]).
sizeof(*x->A) is 1 bye here. So memcpy will happen for only one byte.
This is sizeof(x->A) is the correct procedure.
Though in many cases array name decay to a pointer (like the first argument to memcpy() in your example), there are a few that don't and sizeof operator argument is one of them. Other examples are unary & operator argument, etc. C++ has more scenarios (e.g. initializer for array reference).
Just to add on to previous comments
sizeof(x->A) is correct sizeof(*x->A) is not correct because -> has higher precedence than * so first the address of A is obtained(X->A) then * again deference's it to first byte (one char byte).
Not to forget sizeof operator doesn't consider '\0' character. if the the string "Hello" is pointed by A then it returns 5 ( array size is 6 including '\0'),
so while copying to B you have to add '\0' explicitly or you can increase the number bytes to be copied by one as shown below.
memcpy(x->A, B, sizeof(x->A) + 1);
In the Following example, I expect that foo((&i)++) will evaluate to foo(4 + address of (i)) assuming that the int size is 4 Byte however it gives a compilation error at this line
anyone has an explanation ?
void foo(int*);
int main()
{
int i = 10;
foo((&i)++);
}
void foo(int *p)
{
printf("%d\n", *p);
}
The error message is "lvalue required for increment operator". The problem is that ++ needs to operate on a variable - you increment, AND STORE THE RESULT.
You cannot store the result of the increment operation in (&i).
To get foo to operate on an integer that is stored at the address you appear to want, you can do one of the following (I'm sure you can think of others):
foo(&i);
int *p = &i; foo(p++);
The second option will correctly call foo with a pointer to i, but will increment that pointer for the next time (which seems to be what you were trying to do with your code - except you had nowhere to put that value. By declaring a separate pointer p, I created that storage space. But realize that p is now pointing "nowhere" - if you access it again, you will get undefined behavior).
If you wanted to point to the next location after the address of i you would have to do
foo(++p);
but that would be undefined behavior (since there is no way of knowing what is stored in the next location after i; most likely it will be p but that is not guaranteed.)
Pointers. Powerful, dangerous, and slightly mysterious.
The operand of ++ must be an lvalue -- a variable or other location in memory that can be modified -- since it adds 1 to it and replaces it with the result. &i is not an lvalue, it's just an expression that yields the address of i.
Also,
foo(4 + address of (i))
is wrong since you're using postfix ++ rather than prefix ++. The value of EXPR++ is EXPR (with the side effect of changing the variable that expr refers to).
To get the value you want, just use
foo(&i + 1)
Note, however, that this is likely to result in undefined behavior ... depending on just what foo does with its argument.
You're incrementing and trying to store back the address of i. How and where does the result get stored back? It can't, because &i doesn't exist in memory.
I think you want to do this instead:
foo((&i)+1);