Difference between ++*p++ and *++p++? - c

What is the difference between ++*p++ and *++p++ (where p is a pointer) in C?
I keep getting an error when I do the second one; can somebody explain, as it's been in my head for days. I checked on Quora and other websites but I couldn't find anything useful.
I want to know why the first one is acceptable but not the latter.
#include <stdio.h>
int main()
{
int arr[]={3,9,0,4,5};
int *ptr=arr;
printf("%d ",*++ptr++);
printf("%d ",*ptr);
return 0;
}

The issue here is a question of operator precedence and the nature of the result of the postfix increment operator (i.e. p++).
In C, that postfix increment operator has the highest priority of all; then, the prefix increment and indirection (*) operator have equal priority, and have right-to-left associativity.
So, adding parentheses to your expressions, to clarify the order of evaluation, we get the following:
*++ptr++ becomes *( ++(p++) )
++*ptr++ becomes ++( *(p++) )
Now remember that the result of the postfix operation is a so-called "rvalue" § – that is, something that can be used on the right-hand side of an assignment but not on the left-hand side. (For example, the constant, 3 is an rvalue: x = 3 is a valid operation but 3 = x is clearly not.)
We can now see that, in your first expression, inside the outer brackets that I have added, we are trying to increment the result of the p++ operation – and that is not allowed. However, in the second case, we are only dereferencing that result (which is a pointer) and then (outside the outer brackets) incrementing the pointed-to variable – which is allowed.
When I compile your code with clang-cl, the error is:
error : expression is not assignable
As (hopefully) explained above, the "expression" referred to is p++.
§ Formally, the result of the postfix increment operator is (a copy of) the value of its operand; that value is not modifiable or assignable.

This is definitely a strange and surprising result. To understand what's going on, it will help to take a closer result at what the "autoincrement" operators ++x and x++ really do.
Most expressions simply compute new values. If I say
a = b * 3;
that means, "take the value of the variable b, multiply it by 3, and that's the value we'll assign to a". Similarly, if I say
a = a + 1;
that means, "take the (old) value of the variable a, add 1 to it, and that's the new value we'll assign back to a".
But ++ is special, because it has the "assign the value back" part built in. Any time you use ++ (or --), two things are happening: we're computing a new value, but we're also modifying the variable whose value we just fetched.
To make this very clear, if I say
a = ++b;
that means, "take the value of the variable b, add 1 to it, assign that new value back to b, and that's the value we'll assign to a". That's for the "prefix" form ++b. For b++, it's a little different:
a = b++;
That means, "take the value of the variable b, add 1 to it, assign that new value back to b, but the value we'll assign to a is the old value of b, before we added 1 to it." In other words, the value of the subexpression b++, the value that "pops out" to participate in the larger expression, is the old value of b.
The other thing to keep in mind here is that when it comes to assigning values, we obviously need a variable to assign the value to. We can't say
3 = b * 3; /* WRONG */
On the right-hand side of the = sign, we fetch b's value and multiply it by 3, but then where we store the new value? On the left-hand side of the = sign, 3 is not the name of a variable, nor is it any kind of a location where we can store a value. So an assignment like this is illegal.
(Formally, what we've been talking about here is the difference between an rvalue and an lvalue. Those are interesting and useful terms that you might want to learn about some day, perhaps even today, but I'm not going to say anything more about them for now.)
But now we have almost enough information to answer your original question. Let's look at the expression that worked:
++*ptr++
What the heck does that mean?
In one sense, it's kind of meaningless, because it's not something that you would probably ever write in a real program. It has very little practical value, which is actually kind of good, which means it's not so bad that, at first glance, it's pretty badly cryptic, in that it's not obvious what it should do.
To understand what it does, we have to be clear about the precedence. Which operands bind more tightly to their operands? Precedence is what tells us that if we write
1 + 2 * 3
the multiplication operator * binds more tightly, meaning that the expression is evaluated as if we had written
1 + (2 * 3)
Now, it happens that the autoincrement operator ++ binds more tightly that the unary contents-of operator *. That is, when we write
++*ptr++
the expression is evaluated as if we had written
++ *(ptr++)
So the first thing we're going to do, inside the parentheses, is ptr++. This means, as we saw before, "take the value of the variable ptr, add 1 to it, assign that new value back to ptr, but the value that pops out to the larger expression is the old value of ptr, before we added 1 to it."
And then the next thing that happens in the "larger expression" is the * or contents-of operator. * works on a pointer, and accesses the object pointed to by the pointer. In your original program, the object pointed to by the pointer ptr was the first cell of the array arr, that is, arr[1]. So * is going to operate on whatever the old value of ptr was, whatever ptr used to point to. And that's arr[0]. So what we end up doing is the equivalent of
++(arr[0])
We're going to take arr[0]'s old value, add 1 to it, store it back in arr[0], and (since this is prefix ++ we're talking about), the value that will "pop out" to the larger expression would be the new value of arr[0]. So, if you had written
printf("%d\n", ++*ptr++);
it would have printed the new value of arr[0], or 4.
The bottom line is that although ++*ptr++ is a complicated-looking expression that's hard to understand and does something so obscure that it might not even be useful, it does do something, and is legal.
So now, finally, it's time to look at
*++ptr++
What does that do?
The first thing we have to know is whether prefix ++ or postfix ++ binds more tightly. It's a question that hardly ever comes up (and we're about to see why), but the answer is that postfix ++ binds more tightly. So this expression is interpreted as if you had written
* ++(ptr++)
So, once again, the first thing we're going to do is take ptr's value, add 1 to it, store that new value back into ptr, and then the value that's going to "pop out" to the larger expression is going to be the old value — but only the old value — of ptr.
Let me say that again. The value that "pops out" to the larger expression is just the old value of ptr. By that time we no longer know or care that it was the variable ptr that we got this value from.
So then we come to the prefix ++. And now we have a serious problem. Remember, ++ wants to fetch a value from an object, add 1 to it, and store the new value back into an object. But at this point we don't have an object to fetch from or store to, we just have a value — remember, the old value of the variable ptr.
This will be easier to understand if we think about integer variables, instead of pointer-to-integer. Suppose I said
int a;
int b = 5;
a = ++(b++);
So we fetch b's value, which is 5, and add 1 to it, and store the new value — which is 6 — back in b, and the value that "pops out" to the larger expression s the old value of b. So now it's as if we had written
a = ++5; /* WRONG */
And this makes no sense. We can't "fetch the old value from the variable 5", because 5 isn't a variable. It's just as wrong as when we said 3 = b * 3;, and for the same reason.
You might also be interested in Question 4.3 in the C FAQ list.

Related

Confusion over pointer index operator

I am a little bit confused about pointer index operator in C. I will try to explain my question with an example:
int array[5] = {1,2,3,4,5};
int *p;
p = array;
p[2]++;
In the fourth line, I know that it increments the second index of array. However, when I see an index operator, I convert it.
For instance, I converted p[2]++ to *(p+2)++. According to the operator precedence table, in the statement of *(p+2)++, the increment and dereferencing operators have the same precedence, but increment takes precedence due to right associativity. Therefore, it becomes *(p+3). Then, this statement cannot change any value and just points third index of array.
Why does p[2]++ increment the second index of the array? What is wrong in my perspective?
p[2]++ is equivalent to (*(p+2))++, not *(p+2)++. You need an extra set of parentheses to maintain the precedence from the original expression.
Without them you've got *(p+2)++ which, as you've noted, is equivalent to *((p+2)++). This has a different meaning from the original expression since it splits up the +2 and the *. They need to be done in the same step since [2] is a single atomic operation.
As already commented p[2]++ can be converted to (*(p+2))++, because p[2] is the element you want to increment.
Think that when incrementing indexes is usually done like p[i++]
In math, if a=b+c,then a*d is not b+c*d but (b+c)*d. Likely, p[2] is not be taken by *(p+2) but (*(p+2)) to avoid any change on precedence.

Why is the printf statement in the code below printing a value rather than a garbage value?

int main(){
int array[] = [10,20,30,40,50] ;
printf("%d\n",-2[array -2]);
return 0 ;
}
Can anyone explain how -2[array-2] is working and Why are [ ] used here?
This was a question in my assignment it gives the output " -10 " but I don't understand why?
Technically speaking, this invokes undefined behaviour. Quoting C11, chapter §6.5.6
If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. [....]
So, (array-2) is undefined behavior.
However, most compilers will read the indexing, and it will likely be able to nullify the +2 and -2 indexing, [2[a] is same as a[2] which is same as *(a+2), thus, 2[a-2] is *((2)+(a-2))], and only consider the remaining expression to be evaluated, which is *(a) or, a[0].
Then, check the operator precedence
-2[array -2] is effectively the same as -(array[0]). So, the result is the value array[0], and -ved.
This is an unfortunate example for instruction, because it implies it's okay to do some incorrect things that often work in practice.
The technically correct answer is that the program has Undefined Behavior, so any result is possible, including printing -10, printing a different number, printing something different or nothing at all, failing to run, crashing, and/or doing something entirely unrelated.
The undefined behavior comes up from evaluating the subexpression array -2. array decays from its array type to a pointer to the first element. array -2 would point at the element which comes two positions before that, but there is no such element (and it's not the "one-past-the-end" special rule), so evaluating that is a problem no matter what context it appears in.
(C11 6.5.6/8 says)
When an expression that has integer type is added to or subtracted from a pointer, .... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Now the technically incorrect answer the instructor is probably looking for is what actually happens on most implementations:
Even though array -2 is outside the actual array, it evaluates to some address which is 2*sizeof(int) bytes before the address where the array's data starts. It's invalid to dereference that address since we don't know that there actually is any int there, but we're not going to.
Looking at the larger expression -2[array -2], the [] operator has higher precedence than the unary - operator, so it means -(2[array -2]) and not (-2)[array -2]. A[B] is defined to mean the same as *((A)+(B)). It's customary to have A be a pointer value and B be an integer value, but it's also legal to use them reversed like we're doing here. So these are equivalent:
-2[array -2]
-(2[array -2])
-(*(2 + (array - 2)))
-(*(array))
The last step acts like we would expect: Adding two to the address value of array - 2 is 2*sizeof(int) bytes after that value, which gets us back to the address of the first array element. So *(array) dereferences that address, giving 10, and -(*(array)) negates that value, giving -10. The program prints -10.
You should never count on things like this, even if you observe it "works" on your system and compiler. Since the language guarantees nothing about what will happen, the code might not work if you make slight changes which seem they shouldn't be related, or on a different system, a different compiler, a different version of the same compiler, or using the same system and compiler on a different day.
Here is how -2[array-2] is evaluated:
First, note that -2[array-2] is parsed as - (2[array-2]). The subscript operator, [...] has higher precedence than the unary - operator. We often think of constants like -2 as single numbers, but it is in fact a - operator applied to a 2.
In array-2, array is automatically converted to a pointer to its first element, so it points to array[0].
Then array-2 attempts to calculate a pointer to two elements before the first element of the array. The resulting behavior is not defined by the C standard because C 2018 6.5.6 8 says that only arithmetic that points to array members and the end of the array is defined.
For illustration only, suppose we are using a C implementation that extends the C standard by defining pointers to use a flat address space and permit arbitrary pointer arithmetic. Then array-2 points two elements before the array.
Then 2[array-2] uses the fact that the C standard defines E1[E2] to be *((E1)+(E2)). That is, the subscript operator is implemented by adding the two things and applying *. Thus, it does not matter which expression is E1 and which is E2. E1+E2 is the same as E2+E1. So 2[array-2] is *(2 + (array-2)). Adding 2 moves the pointer from two elements before the array back to the start of the array. Then applying * produces the element at that location, which is 10.
Finally, applying - gives −10. (Recall that this conclusion is only achieved using our supposition that the C implementation supports a flat address space. You cannot use this in general C code.)
This code invokes undefined behavior and can print anything, including -10.
C17 6.5.2.1 Array subscripting states:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))
Meaning array[n] is equivalent to *((array) + (n)) and that's how the compiler evaluates subscripting. This allows us to write silly obfuscation like n[array] as 100% equivalent to array[n]. Because *((n) + (array)) is equivalent to *((array) + (n)). As explained here:
With arrays, why is it the case that a[5] == 5[a]?
Looking at the expression -2[array -2] specifically:
[array -2] and [array - 2] are naturally equivalent. In this case the former is just sloppy style purposely used for the sake of obfuscating the code.
Operator precedence tells us to first consider [].
Thus the expression is equivalent to -*( (2) + (array - 2) )
Note that the first - is not part of the integer constant 2. C does not support negative integer constants1), the - is actually the unary minus operator.
Unary minus has lower presedence than [], so the 2 in -2[ "binds" to the [.
The sub-expression (array - 2) is evaluated individually and invokes undefined behavior, as per C17 6.5.6/8:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. /--/ If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Speculatively, one potential form of undefined behavior could be that a compiler decides to replace the whole expression (2) + (array - 2) with array, in which case the whole expression would end up as -*array and prints -10.
There's no guarantees of this and therefore the code is bad. If you were given the assignment to explain why the code prints -10, your teacher is incompetent. Not only is it meaningless/harmful to study obfuscation as part of C studies, it is harmful to rely on undefined behavior or expect it to give a certain result.
1) C rather supports negative integer constant expressions. -2 is an integer constant expression, where 2 is an integer constant of type int.

What happens when you dereference a postincrement C

I am receiving a lot of conflicting answers about this. But as I always understood it.
When we have a pointer in C and use it in a post increment statement, the post increment will always happen after the line of code resolves.
int array[6] = {0,1,2,3,4,5};
int* p = array;
printf("%d", *p++); // This will output 0 then increment pointer to 1
output :
0
Very simple stuff. Now here's where I am receiving a bit of dissonance in the information people are telling me and my own experience.
// Same code as Before
int array[0] = {0,1,2,3,4,5};
int* p = array;
printf("%d", *(p++)); // Issue with this line
output :
0
Now when I run that second version of the code The result is that it will output 0 THEN increments the pointer. The order of operations implied by the parentheses seems to be violated. However some other answers on this site tell me that the proper thing that should happen is that the increment should happen before the dereference. So I guess my question is this: Is my understanding correct? Do post increment statements always execute at the end of the line?
Additional Info:
I am compiling with gcc on linux mint with gcc version ubuntu 4.8.4
I have also tested this on gcc on debian with version debian 4.7.2
OP's "The result is that it will output 0 THEN increments the pointer." is not correct.
The postfix increment returns the value of the pointer. Consider this value as a copy of the original pointer's value. The pointer is incremented which does not affect the copy.
The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented. ... C11dr 6.5.2.4 2
Then the copy of the pointer is de-referenced and returns the 0. That is the functional sequence of events.
Since the side-effect of incrementing the pointer and de-referencing that copy of the pointer do not effect each other, which one occurs first is irrelevant. The compiler may optimized as it likes.
"the end of the line" is not involved in code. It is the end of the expression that is important.
There is no difference in meaning between *p++ and *(p++).
This is because postfix operators have a higher precedence than unary operators.
Both these expressions mean "p is incremented, and its previous value is dereferenced".
If you want to increment the object being referenced by the pointer, then you need to override precedence by writing (*p)++.
No version of your code can produce output and then increment p. The reason is that p is incremented in an argument expression which produces a value that is passed into printf. In C, a sequence point occurs just before a function is called. So the new value of p must settle in place before printf executes. And the output cannot take place until printf is called.
Now, you have to take the above with a slight grain of salt. Since p is a local variable, modifying it isn't an externally visible effect. If the new value of p isn't used anywhere, the increment can be entirely optimized away. But suppose we had an int * volatile p; at file scope, and used that instead. Then the expression printf("...", *p++) has to increment p before printf is called.
The expression p++ has a result (the value of p before the increment) and a side effect (the value of p is updated to point to the next object of type int).
Postfix ++ has higher precedence than unary *, so *p++ is already parsed as *(p++); you will see no difference in behavior between those two forms. IOW, the dereference operator is applied to the result of p++; the line
printf("%d", *p++);
is roughly equivalent to
printf("%d", *p);
p++;
with the caveat that p will actually be updated before the call to printf1.
However, (*p)++ will be different; instead of incrementing the pointer, you are incrementing the thing p points to.
1. The side effect of a ++ or -- operator must be applied before the next sequence point, which in this particular case occurs between the time the function arguments are evaluated and the function itself is called.
Here is my take on this. Let's ignore the printf function altogether and make things simpler.
If we said
int i;
int p=0;
i = p++;
Then i would be equal to zero because p was equal to zero but now p has been incremented by one; so now i still equals zero and p is equal to 1.
Ignoring the declarations of i and p as integers, if we wrap this as in the example, i = *(p++), then the same action occurs but i now contains the value pointed at by p which had the value of zero. However, the value of p, now, has been incremented by one.

Confusing answers : One says *myptr++ increments pointer first,other says *p++ dereferences old pointer value

I would appreciate if you clarify this for me.Here are two recent questions with their accepted answers:
1) What is the difference between *myptr++ and *(myptr++) in C
2) Yet another sequence point query: how does *p++ = getchar() work?
The accepted answer for the first question,concise and easily to understand states that since ++ has higher precedence than *, the increment to the pointer myptr is done first and then it is dereferenced.I even checked that out on the compiler and verified it.
But the accepted answer to the second question posted minutes before has left me confused.
It says in clear terms that in *p++ strictly the old address of p is dereferenced. I have little reason to question the correctness of a top-rated answer of the second question, but frankly I feel it contradicts the first question's answer by user H2CO3.So can anyone explain in plain and simple English what the second question's answer mean and how come *p++ dereferences the old value of p in the second question.Isn't p supposed to be incremented first as ++ has higher precedence?How on earth can the older address be dereferenced in *p++Thanks.
The postfix increment operator does have higher precedence than the dereference operator, but postfix increment on a variable returns the value of that variable prior to incrementing.
*myptr++
Thus the increment operation has higher precedence, but the dereferencing is done on the value returned by the increment, which is the previous value of myptr.
The answer in the first question you've linked to is not wrong, he's answering a different question.
There is no difference between *myptr++ and *(myptr++) because in both cases the increment is done first, and then the previous value of myptr is dereferenced.
The accepted answer for the first question,concise and easily to understand states that since ++ has higher precedence than *,
Right. That is correct.
the increment to the pointer myptr is done first and then it is dereferenced.
It doesn't say that. Precedence determines the grouping of the subexpressions, but not the order of evaluation.
That the precedence of ++ is higher than the precedence of the indirection * says that
*myptr++
is exactly the same (not on the cource code level, of course) as
*(myptr++)
and that means that the indirection is applied to the result of the
myptr++
subexpression, the old value of myptr, whereas (*myptr)++ would apply the increment operator to what myptr points to.
The result of a postfix increment is the old value of the operand, so
*myptr++ = something;
has the same effect as
*myptr = something;
myptr++;
When the side-effect of storing the incremented value of myptr happens is unspecified. It may happen before the indirection is evaluated, or after that, that is up to the compiler.
Section 6.5.2.4 of the C specification discusses the postfix increment and decrement operators. And the second paragraph there pretty much answers your question:
The result of the postfix ++ operator is the value of the operand. As a side effect, the
value of the operand object is incremented (that is, the value 1 of the appropriate type is
added to it).
...
The value computation of the result is sequenced before the side effect of
updating the stored value of the operand.
So given *myptr++, yes it's true the the ++ part has higher precedence; but precedence does not exclusively determine your result. The language defines that with the specs. In this case the value of myptr is returned, then the "side effect" of myptr being incremented is executed.

Question about C programming

int a, b;
a = 1;
a = a + a++;
a = 1;
b = a + a++;
printf("%d %d, a, b);
output : 3,2
What's the difference between line 3 and 5?
What you are doing is undefined.
You can't change the value of a variable you are about to assign to.
You also can't change the value of a variable with a side effect and also try to use that same variable elsewhere in the same expression (unless there is a sequence point, but in this case there isn't). The order of evaluation for the two arguments for + is undefined.
So if there is a difference between the two lines, it is that the first is undefined for two reasons, and line 5 is only undefined for one reason. But the point is both line 3 and line 5 are undefined and doing either is wrong.
What you're doing on line 3 is undefined. C++ has the concept of "sequence points" (usually delimited by semicolons). If you modify an object more than once per sequence point, it's illegal, as you've done in line 3. As section 6.5 of C99 says:
(2) Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
Line 5 is also undefined because of the second sentence. You read a to get its value, which you then use in another assignment in a++.
a++ is a post-fix operator, it gets the value of a then increments it.
So, for lines 2,3:
a = 1
a = 1 + 1, a is incremented.
a becomes 3 (Note, the order these operations are performed may vary between compilers, and a can easily also become 2)
for lines 4,5:
a = 1
b = 1 + 1, a is incremented.
b becomes 2, a becomes 2. (Due to undefined behaviour, b could also become 3 of a++ is processed before a)
Note that, other than for understanding how postfix operators work, I really wouldn't recommend using this trick. It's undefined behavior and will get different results when compiled using different compilers
As such, it is not only a needlessly confusing way to do things, but an unreliable, and worst-practice way of doing it.
EDIT: And has others have pointed out, this is actually undefined behavior.
Line 3 is undefined, line 5 is not.
EDIT:
As Prasoon correctly points out, both are UB.
The simple expression a + a++ is undefined because of the following:
The operator + is not a sequence point, so the side effects of each operands may happen in either order.
a is initially 1.
One of two possible [sensible] scenarios may occur:
The first operand, a is evaluated first,
a) Its value, 1 will be stored in a register, R. No side effects occur.
b) The second operand a++ is evaluated. It evaluates to 1 also, and is added to the same register R. As a side effect, the stored value of a is set to 2.
c) The result of the addition, currently in R is written back to a. The final value of a is 2.
The second operand a++ is evaluated first.
a) It is evaluated to 1 and stored in register R. The stored value of a is incremented to 2.
b) The first operand a is read. It now contains the value 2, not 1! It is added to R.
c) R contains 3, and this result is written back to a. The result of the addition is now 3, not 2, like in our first case!
In short, you mustn't rely on such code to work at all.

Resources