order of evaluation for multiple increment operator on pointer - c

Having trouble understanding, How following statement would be evaluated :
++*++ptr and *ptr++++
As per my understanding first would give me lValue required because after * is applied it would give value which cannot be used for ++ operator. But, the result is opposite. Please explain.
Second statement gives me error : batch3.c:6:21: error: lvalue required as increment operand
printf("%d", *ptr++++);

First, some standardese:
6.5.2.4 Postfix increment and decrement operators
Constraints
1 The operand of the postfix increment or decrement operator shall have atomic, qualified,
or unqualified real or pointer type, and shall be a modifiable lvalue.
Semantics
2 The result of the postfix ++ operator is the value of the operand. As a side effect, the
value of the operand object is incremented (that is, the value 1 of the appropriate type is
added to it). See the discussions of additive operators and compound assignment for
information on constraints, types, and conversions and the effects of operations on
pointers. The value computation of the result is sequenced before the side effect of
updating the stored value of the operand. With respect to an indeterminately-sequenced
function call, the operation of postfix ++ is a single evaluation. Postfix ++ on an object
with atomic type is a read-modify-write operation with memory_order_seq_cst
memory order semantics.98)
...
6.5.16 Assignment operators
...
3 An assignment operator stores a value in the object designated by the left operand. An
assignment expression has the value of the left operand after the assignment,111) but is not
an lvalue. The type of an assignment expression is the type the left operand would have
after lvalue conversion. The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.
Emphasis mine.
The upshot of that wall of text is that the results of the expressions ptr++ and ++ptr are not lvalues. However, both expressions result in pointer values, so they may be the operands of the unary * operator, and the results of *ptr++ and *++ptr may be lvalues.
This is why ++*++ptr works; you're incrementing the result of *++ptr, which may be an lvalue. However, *ptr++++ is parsed as *(((ptr)++)++) (postfix ++ has higher precedence than unary *); the result of ptr++ is the operand to the second ++, but since the result of ptr++ is not an lvalue, the compiler complains. If you had written it as (*ptr++)++, then the expression would be valid.
In short:
++*++ptr - valid, equivalent to ++(*ptr++)
*++++ptr - invalid, equivalent to *(++(++ptr)), result of ++ptr is not an lvalue
++++*ptr - invalid, equivalent to ++(++(*ptr)), result of ++*ptr is not an lvalue
*ptr++++ - invalid, equivalent to *((ptr++)++), result pf ptr++ is not an lvalue
(*ptr)++++ - invalid, equivalent to ((*ptr)++)++, result of (*ptr)++ is not an lvalue
(*ptr++)++ - valid

++ operator has higher precedence over *
So the pointer will be incremented first and then dereferenced.
*p++
First
p++
Then *p++
For ++ there has to be a value which needs to be incremented. But the below expression doesn't provide the lvalue for the ++ . p++ is not a modifiable lvalue.
*ptr++++;

Considering that these two expressions are separate (note that and is defined <iso646.h>) here what happens:
First one ++*++ptr is equivalent to ++(*(++ptr)), as both prefix ++ and unary * have the same precedence and assiociativity is from right to left for both of them. See following example as an illustration:
#include <stdio.h>
int main(void)
{
int a[] = {1, 2};
int *ptr = a;
++(*(++ptr));
printf("%d\n", a[0]);
printf("%d\n", a[1]);
return 0;
}
Result:
1
3
The latter expression is not compilable, as ptr++ subexpression is not a modifiable lvalue. Note that postix ++ has higher precedence, that * (indirection operator) and its associativity is from left to right.

Related

How does C compiler determine a valid lvalue?

I'm trying to figure out how C determines if an expression is a valid LVALUE.
I know declaring variable gives it a named memory space, which is variable name. The variable name can be RVALUE or LVALUE. If used to represent a value its content is used, but if it is used as LVALUE its address is used to tell that the expression at right side is stored in this address. The picture I see for this operation is like ADDRESS=VALUE: That's how the right and left expressions for assignment operator are evaluated.
So why I can't define a variable like int a;, and then use the address of operator to store value in that address, like &a = 5;?
I know &a returns a constant pointer, but that means I can't change the address or I can't change the value stored in the address? If its content can't be changed, then why using *&a=5 works?
Why I can't assign a value this way, although the left hand expression is always evaluated to an address as I understand? Maybe something is wrong in my understanding?
Automatic lvalue conversion
This is covered by C 2018 6.3.2.1 2, which says:
Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion.…
Consider the expression x = y + z:
y is an operand of +. The + operator is not in the list of exceptions above. So y is converted to its value.
z is an operand of +. The + operator is not in the list of exceptions above. So z is converted to its value.
x is the left operand of =, which is the assignment operator. That is in the list of exceptions above. So x remains an lvalue.
About &a = 5
In regard to int a; followed by &a = 5;:
The result of the & operator is merely an address—it is just a value; there is no object holding this value, so it is not an lvalue.
The assignment operator must have an lvalue as its left operand. C 2018 6.5.16 2 is a constraint that says “An assignment operator shall have a modifiable lvalue as its left operand.”
Therefore &a = 5; violates a constraint, and a C compiler is required to produce a diagnostic message for it. The = operator cannot have a plain value as its left operand.
It is possible to design a programming language so that the assignment operator accepts &a = 5; and uses it to store the value on the right in the location given on the left. The BLISS language does this. In BLISS, the name of a variable always provides its address. To get the value, you must prefix the variable with a period (which acts like C’s unary * operator). So you would write z = .x + .y. So the fact that C does not do this is a choice about aesthetics and convenience, not about logical necessity. In C, lvalues are automatically converted to values in most places, and the exceptions are for operators that act on objects instead of values. In BLISS, you must explicitly designate each lvalue-to-value conversion.
About *a = 5
In *&a=5:
The * operator produces an lvalue, per C 2018 6.5.3.2 4: “The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object.…”
Thus *&a provides the lvalue that the assignment operator requires.
First of all, C does not use the term rvalue, preferring the term "value of an expression". The term lvalue is used, and it means (C11 6.3.2.1p1)
[...] an expression (with an object type other than void) that potentially designates an object)
It does not mean the address of the object, it means that the lvalue is the object.
The operand of & more often than not is an lvalue too
The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.
The result is a value of an expression of a pointer type, an address. Even though an address points to an object, it is not the object. Just like 1600 Pennsylvania Avenue NW in Washington, D.C. is an address, but it is not the building found at that address.
So if you have a house:
house my_House;
you can ask for its address
&my_house;
which is the address of your house, but it is not a house, i.e. not an lvalue, but the house located at the address of your house is a house, i.e. an lvalue:
*&my_house;

Is (*&a) a lvalue or a rvalue?

First of all, I think it's a rvalue, but the following fact changed my mind.
I tried an expression as &(*&a) and it works fine, but the operator & can just work with a lvalue, so (*&a) is a lvalue, why?
Per C 2018 6.5.3.2 4 (discussing the unary * operator), the result of unary * is an lvalue:
… If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object.…
This tells us that *&a is an lvalue. However, the expression asked about in the question is (*&a), so we must consider the effect of the parentheses.
6.3.2.1 2 (discussing automatic conversions) seems to tell us that (*&a) is converted to the value in *&a and is not an lvalue:
Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion.
However, 6.5.1 5 (discussing parenthesized expressions) contradicts this:
A parenthesized expression is a primary expression. Its type and value are identical to those of the unparenthesized expression. It is an lvalue, a function designator, or a void expression if the unparenthesized expression is, respectively, an lvalue, a function designator, or a void expression.
This is a defect in the C standard; 6.5.1 5 and 6.3.2.1 2 contradict each other. It is left to us to understand that 6.5.1 5, which is specifically about parenthesized expressions, takes precedence over the more general 6.3.2.1 2, and this is how all C implementations behave.
Thus (*&a) is an lvalue.
This expression &(&a) is invalid and will not work.
According to the C Stnadard
1 The operand of the unary & operator shall be either a function
designator, the result of a [] or unary * operator, or an lvalue
that designates an object that is not a bit-field and is not declared
with the register storage-class specifier.
and
3 The unary & operator yields the address of its operand. If the
operand has type ‘‘type’’, the result has type ‘‘pointer to type’’. If
the operand is the result of a unary * operator, neither that operator
nor the & operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators still apply
and the result is not an lvalue.
So the result of the expression &a is not an lvalue. So you may not apply the operator & to the expression like &&a.
Here is a demonstrative program.
#include <stdio.h>
int main(void)
{
int x = 10;
&( &x );
return 0;
}
The compiler gcc 8.3 issues an error
prog.c: In function ‘main’:
prog.c:7:2: error: lvalue required as unary ‘&’ operand
&( &x );
^
This expression *&a is valid and the result is an lvalue.
4 The unary * operator denotes indirection. If the operand points to a
function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If the
operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If
an invalid value has been assigned to the pointer, the behavior of the
unary * operator is undefined.
Bear in mind that parentheses do not influence on whether the enclosed expression is an lvalue or not.

Understanding indirection through pointers and taking address

In the Standard N1570, Section 6.5.3.2#3 the following is specified (emp. mine):
If the operand is the result of a unary * operator, neither that
operator nor the & operator is evaluated and the result is as if both
were omitted, except that the constraints on the operators still apply
and the result is not an lvalue.
Later on the section 6.5.3.2#4 specifies:
If the operand points to a function, the result is a function
designator; if it points to an object, the result is an lvalue
designating the object.
This two sections look contradictory to me. The first on I cited specifies that the result is not an lvalue, but the second one specifies that the result of indirection operator is an lvalue.
Can you please explain this? Does it mean that in case of object the operators * and & does not eliminate each other?
Section 6.5.3.2#3 talks about the unary & operator and the 6.5.3.2#4 talks about the unary * operator. They have different behaviors.
Elaboration (from comment):
The point is that unary & does not result in an lvalue, even in the case where it is considered omitted because it immediately precedes unary * in a dereference context. Just because both operators are considered omitted doesn't change the fact the resulting expression is not an lvalue; the same way it would not be if a solo unary & were applied.
int a;
&a = ...;
is not legal (obviously). But neither is
int a;
&*a = ...;
Just because they are considered omitted doesn't mean &* is lvalue-equivalent to solo a.

Why can't you increment/decrement a variable twice in the same expression?

When I try to compile this code
int main() {
int i = 0;
++(++i);
}
I get this error message.
test.c:3:5: error: lvalue required as increment operand
++(++i);
^
What is the error message saying? Is this something that gets picked up by the parser, or is it only discovered during semantic analysis?
++i will give an rvalue1 after the evaluation and you can't apply ++ on an rvalue.
§6.5.3.1 (p1):
The operand of the prefix increment or decrement operator shall have atomic, qualified, or unqualified real or pointer type, and shall be a modifiable lvalue.
1. What is sometimes called "rvalue" is in this International Standard described as the "value of an expression". - §6.3.2.1 footnote 64).
A lvalue is a value you can write to / assign to.
You can apply ++ to i (i is modified) but you cannot apply ++ to the result of the previous ++ operator. I wouldn't have any effect anyway.
Aside: C++ allows that (probably because ++ operator returns a non-const reference on the modified value)
The issue that the (++i) returns new integer value, and please note ++ operation needs some variable for assignment, not a value (you are trying to increment an integer not a variable), so you can use this instead :
i += 2;
or
i = i + 2;

Operator precedence in the given expressions

Expression 1: *p++; where p is a pointer to integer.
p will be incremented first and then the value to which it is pointing to is taken due to associativity(right to left). Is it right?
Expression 2: a=*p++; where p is a pointer to integer.
Value of p is taken first and then assigned to a first then p is incremented due to post increment. Is it right?
First of all, let me tell you that, neither associativity nor order of evaluation is actually relevant here. It is all about the operator precedence. Let's see the definitions first. (emphasis mine)
Precedence : In mathematics and computer programming, the order of operations (or operator precedence) is a collection of rules that reflect conventions about which procedures to perform first in order to evaluate a given mathematical expression.
Associativity: In programming languages, the associativity (or fixity) of an operator is a property that determines how operators of the same precedence are grouped in the absence of parentheses.
Order of evaluation : Order of evaluation of the operands of any C operator, including the order of evaluation of function arguments in a function-call expression, and the order of evaluation of the subexpressions within any expression is unspecified, except a few cases. There's mainly two types of evaluation: a) value computation b) side effect.
Post-increment has higher precedence, so it will be evaluated first.
Now, it so happens that the value increment is a side effect of the operation which is sequenced after the " value computation". So, the value computation result, will be the unchanged value of the operand p (which again, here, gets dereferenced due to use of * operator) and then, the increment takes place.
Quoting C11, chapter §6.5.2.4,
The result of the postfix ++ operator is the value of the operand. As a side effect, the
value of the operand object is incremented (that is, the value 1 of the appropriate type is
added to it). See the discussions of additive operators and compound assignment for
information on constraints, types, and conversions and the effects of operations on
pointers. The value computation of the result is sequenced before the side effect of
updating the stored value of the operand. [.....]
The order of evaluation in both the cases are same, the only difference is, in the first case, the final value is discarded.
If you use the first expression "as-is", your compiler should produce a warning about unused value.
Postfix operators have higher priorities than unary operators.
Thus this expression
*p++
is equivalent to the expression
*( p++ )
According to the C Standard (6.5.2.4 Postfix increment and decrement operators)
2 The result of the postfix ++ operator is the value of the
operand. As a side effect, the value of the operand object is
incremented (that is, the value 1 of the appropriate type is added to
it). See the discussions of additive operators and compound assignment
for information on constraints, types, and conversions and the effects
of operations on pointers. The value computation of the result is
sequenced before the side effect of updating the stored value of the
operand.
So p++ yields the original value of the pointer p as the result of the operation and has also a side effect of incrementing the operand itself.
As for the unary operator then (6.5.3.2 Address and indirection operators)
4 The unary * operator denotes indirection. If the operand points to a
function, the result is a function designator; if it points to an
object, the result is an lvalue designating the object. If the operand
has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an
invalid value has been assigned to the pointer, the behavior of the
unary * operator is undefined
So the final result of the expression
*( p++ )
is the value of the object pointed to by the pointer p that also is incremented due to the side effect. This value is assigned to the variable a in the statement
a=*p++;
For example if there are the following declarations
char s[] = "Hello";
char *p = s;
char a;
then after this statement
a = *p++;
the object a will have the character 'H' and the pointer p will point to the second character of the array s that is to the character 'e'.
Associativity is not relevant here. Associativity only matters when you have adjacent operators with the same precedence. But in this case, ++ has higher precedence than *, so only precedence matters. Because of precedence, the expression is equivalent to:
*(p++)
Since it uses post-increment, p++ increments the pointer, but the expression returns the value of the pointer before it was incremented. The indirection then uses that original pointer to fetch the value. It's effectively equivalent to:
int *temp = p;
p = p + 1;
*temp;
The second expression is the same, except it assigns the value to another variable, so that last statement becomes:
a = *temp;
The expression
*p++
is equivalent to
*(p++)
This is due to precedende (i.e.: the postfix increment operator has higher precedence than the indirection operator)
and the expression
a=*p++
is for the same reason equivalent to
a=*(p++)
In both cases, the expression p++ is evaluated to p.
v = i++;: i is returned to the equality operation and then assigned to v. Subsequently, i is incremented (EDIT: technically it's not necessarily executed in this order). Thus v has the old value of i. I remember it like this: ++ is written last and therefore happens last.
v = ++i;: i is incremented, and then returned to be assigned to v. v and i has the same value.
When you don't use the returned value, they do the same (although different implementations may yield different performance in some cases). E.g. in for loops, for(int i=0; i<n; i++) is the same as for(int i=0; i<n; ++i). The latter is sometimes automatically preferred because it tends to be faster for some objects.
* has lower precedence than ++ so *p++ is the same as *(p++). Thus in this case p is returned to * which dereferences it. Then the address in p is incremented by one element. *++p increments the adress of p first, then dereferences it.
v = (*p)++; sets v equal to the old value pointed to by p and then increments it, while v = ++(*p); increments the value pointed to by p and then sets v equal to it. The address in p is unchanged.
Example: If,
int a[] = {1,2};
then
int v = *a++;
and
int v = *++a;
will both leave a incremented, but in the first case v will be 1 and in the latter it'll be 2.
*p++; where p is a pointer to integer.
p will be incremented first and then the value to which it is pointing to is taken due to associativity (right to left). Is it right?
No. In a post-increment, the value is copied to a temporary (an rvalue), then the lvalue is incremented as a side effect.
a=*p++; where p is a pointer to integer.
Value of p is taken first and then assigned to a first then p is incremented due to post increment. Is it right?
No, that's not correct either. The increment of p might happen before the write to a. What's important is that the value being stored in a was loaded using the temporary copy of the prior value of p.
Whether that memory fetch occurs before the memory write with the new value of p isn't specified, and any code that relies on the order is undefined behavior.
Any of these sequences are allowed:
Copy p into temporary THEN increment p, THEN load value at address indicated in temporary THEN store loaded value to a
Copy p into temporary THEN load value at address indicated in temporary (this value itself is placed in a temporary) THEN increment p THEN store loaded value to a
Copy p into temporary THEN load value at address indicated in temporary THEN store loaded value to a THEN increment p
Here are two code examples that are undefined behavior because they rely on the order of side effects:
int a = 7;
int *p = &a;
a = (*p)++; // undefined behavior, do not do this!!
void *pv;
pv = &pv;
void *pv2;
pv2 = *(pv++); // undefined behavior, do not do this!!!
The parentheses do not create a sequence point (or sequenced before relationship, in the new wording). The version of the code with parentheses is just as undefined as the version without.

Resources