Why compiler treats i+++++i and i+++i differently [duplicate] - c

This question already has answers here:
Why doesn't a+++++b work?
(9 answers)
Closed 9 years ago.
int i=5;
printf("%d",i+++++i);
This gives error, but:
printf("%d",i+++i);
gives the output 11. In this case, the compiler read it as:
printf("%d",i+ ++i);
Why is this not done in first expression? i.e :
printf("%d",i+++++i);

Because of operator precedence i++++++i is treated as (i++)++ + i). This gives a compiler error because (i++) is not an lvalue.

Modifying the same variable multiple times between two sequence points is an Undefined Behavior according to §6.5 of language specifications
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.(71)
71) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;

i+++++i is parsed as i ++ ++ + i. It contains an invalid subexpression i ++ ++. Speaking formally, this expression contains a constraint violation, which is why it does not compile.
Meanwhile i+++i is parsed as i ++ + i (not as i + ++ i as you incorrectly believe). It does not contain any constraint violations. It produces undefined behavior, but is otherwise well-formed.
Also, it is rather naive to believe that printf("%d",i+++i) will print 11. The behavior of i+++i is undefined, meaning that there's no point in trying to predict the output.

In printf("%d",i+++++i);, the source text i+++++i is first processed according to this rule from C 2011 (N1570) 6.4 4:
If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token…
This causes the lexical analysis to proceed in this way:
i can be a token, but i+ cannot, so i is the next token. This leaves +++++i.
+ and ++ can each be a token, but +++ cannot. Since ++ is the longest sequence that could be a token, it is the next token. This leaves +++i.
For the same reason, ++ is the next token. This leaves +i.
+ can be a token, but +i cannot, so + is the next token. This leaves i.
i can be a token, but i) cannot, so i is the next token.
Thus, the expression is i ++ ++ + i.
Then the grammar rules structure this expression as ((i ++) ++) + i.
When i++ is evaluated, the result is just a value, not an lvalue. Since ++ cannot be applied to a value that is not an lvalue, (i ++) ++ is not allowed.
After the compiler recognizes that the expression is semantically incorrect, it cannot go back and change the lexical analysis. Th C standard specifies that the rules must be followed as described above.
In i+++i, the code violates a separate rule. This is parsed as (i ++) + i. This expression both modifies i (in i ++) and separately accesses it (in the i of + i). This violates C 2011 (1570) 6.5 2:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
This rule uses some technical terms: In i ++, the effect of changing i is a side effect of ++. (The main effect is to produce the value of i.) The use of i in + i is a value computation of the scalar object i. And these two things are unsequenced, because the C standard does not specify whether producing the value of i for + i comes before or after changing i in i ++.

Related

Increment madness [duplicate]

int main ()
{
int a = 5,b = 2;
printf("%d",a+++++b);
return 0;
}
This code gives the following error:
error: lvalue required as increment operand
But if I put spaces throughout a++ + and ++b, then it works fine.
int main ()
{
int a = 5,b = 2;
printf("%d",a++ + ++b);
return 0;
}
What does the error mean in the first example?
Compilers are written in stages. The first stage is called the lexer and turns characters into a symbolic structure. So "++" becomes something like an enum SYMBOL_PLUSPLUS. Later, the parser stage turns this into an abstract syntax tree, but it can't change the symbols. You can affect the lexer by inserting spaces (which end symbols unless they are in quotes).
Normal lexers are greedy (with some exceptions), so your code is being interpreted as
a++ ++ +b
The input to the parser is a stream of symbols, so your code would be something like:
[ SYMBOL_NAME(name = "a"),
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS,
SYMBOL_NAME(name = "b")
]
Which the parser thinks is syntactically incorrect. (EDIT based on comments: Semantically incorrect because you cannot apply ++ to an r-value, which a++ results in)
a+++b
is
a++ +b
Which is ok. So are your other examples.
printf("%d",a+++++b); is interpreted as (a++)++ + b according to the Maximal Munch Rule!.
++ (postfix) doesn't evaluate to an lvalue but it requires its operand to be an lvalue.
!
6.4/4 says
the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token"
The lexer uses what's generally called a "maximum munch" algorithm to create tokens. That means as it's reading characters in, it keeps reading characters until it encounters something that can't be part of the same token as what it already has (e.g., if it's been reading digits so what it has is a number, if it encounters an A, it knows that can't be part of the number. so it stops and leaves the A in the input buffer to use as the beginning of the next token). It then returns that token to the parser.
In this case, that means +++++ gets lexed as a ++ ++ + b. Since the first post-increment yields an rvalue, the second can't be applied to it, and the compiler gives an error.
Just FWIW, in C++ you can overload operator++ to yield an lvalue, which allows this to work. For example:
struct bad_code {
bad_code &operator++(int) {
return *this;
}
int operator+(bad_code const &other) {
return 1;
}
};
int main() {
bad_code a, b;
int c = a+++++b;
return 0;
}
The compiles and runs (though it does nothing) with the C++ compilers I have handy (VC++, g++, Comeau).
This exact example is covered in the draft C99 standard(same details in C11) section 6.4 Lexical elements paragraph 4 which in says:
If the input stream has been parsed into preprocessing tokens up to a
given character, the next preprocessing token is the longest sequence
of characters that could constitute a preprocessing token. [...]
which is also known as the maximal munch rule which is used in in lexical analysis to avoid ambiguities and works by taking as many elements as it can to form a valid token.
the paragraph also has two examples the second one is an exact match for you question and is as follows:
EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y, which
violates a constraint on increment operators, even though the parse x
++ + ++ y might yield a correct expression.
which tells us that:
a+++++b
will be parsed as:
a ++ ++ + b
which violates the constraints on post increment since the result of the first post increment is an rvalue and post increment requires an lvalue. This is covered in section 6.5.2.4 Postfix increment and decrement operators which says (emphasis mine):
The operand of the postfix increment or decrement operator shall have
qualified or unqualified real or pointer type and shall be a
modifiable lvalue.
and
The result of the postfix ++ operator is the value of the operand.
The book C++ Gotchas also covers this case in Gotcha #17 Maximal Munch Problems it is the same problem in C++ as well and it also gives some examples. It explains that when dealing with the following set of characters:
->*
the lexical analyzer can do one of three things:
Treat it as three tokens: -, > and *
Treat it as two tokens: -> and *
Treat it as one token: ->*
The maximal munch rule allows it to avoid these ambiguities. The author points out that it (In the C++ context):
solves many more problems than it causes, but in two common
situations, it’s an annoyance.
The first example would be templates whose template arguments are also templates (which was solved in C++11), for example:
list<vector<string>> lovos; // error!
^^
Which interprets the closing angle brackets as the shift operator, and so a space is required to disambiguate:
list< vector<string> > lovos;
^
The second case involves default arguments for pointers, for example:
void process( const char *= 0 ); // error!
^^
would be interpreted as *= assignment operator, the solution in this case is to name the parameters in the declaration.
Your compiler desperately tries to parse a+++++b, and interprets it as (a++)++ +b. Now, the result of the post-increment (a++) is not an lvalue, i.e. it can't be post-incremented again.
Please don't ever write such code in production quality programs. Think about the poor fellow coming after you who needs to interpret your code.
(a++)++ +b
a++ returns the previous value, a rvalue. You can't increment this.
Because it causes undefined behaviour.
Which one is it?
c = (a++)++ + b
c = (a) + ++(++b)
c = (a++) + (++b)
Yeah, neither you nor the compiler know it.
EDIT:
The real reason is the one as said by the others:
It gets interpreted as (a++)++ + b.
but post increment requires a lvalue (which is a variable with a name) but (a++) returns a rvalue which cannot be incremented thus leading to the error message you get.
Thx to the others to pointing this out.
I think the compiler sees it as
c = ((a++)++)+b
++ has to have as an operand a value that can be modified. a is a value that can be modified. a++ however is an 'rvalue', it cannot be modified.
By the way the error I see on GCC C is the same, but differently-worded: lvalue required as increment operand.
Follow this precesion order
1.++ (pre increment)
2.+ -(addition or subtraction)
3."x"+ "y"add both the sequence
int a = 5,b = 2;
printf("%d",a++ + ++b); //a is 5 since it is post increment b is 3 pre increment
return 0; //it is 5+3=8

Unary operator behaviour [duplicate]

What are "sequence points"?
What is the relation between undefined behaviour and sequence points?
I often use funny and convoluted expressions like a[++i] = i;, to make myself feel better. Why should I stop using them?
If you've read this, be sure to visit the follow-up question Undefined behavior and sequence points reloaded.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
C++98 and C++03
This answer is for the older versions of the C++ standard. The C++11 and C++14 versions of the standard do not formally contain 'sequence points'; operations are 'sequenced before' or 'unsequenced' or 'indeterminately sequenced' instead. The net effect is essentially the same, but the terminology is different.
Disclaimer : Okay. This answer is a bit long. So have patience while reading it. If you already know these things, reading them again won't make you crazy.
Pre-requisites : An elementary knowledge of C++ Standard
What are Sequence Points?
The Standard says
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)
Side effects? What are side effects?
Evaluation of an expression produces something and if in addition there is a change in the state of the execution environment it is said that the expression (its evaluation) has some side effect(s).
For example:
int x = y++; //where y is also an int
In addition to the initialization operation the value of y gets changed due to the side effect of ++ operator.
So far so good. Moving on to sequence points. An alternation definition of seq-points given by the comp.lang.c author Steve Summit:
Sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete.
What are the common sequence points listed in the C++ Standard?
Those are:
at the end of the evaluation of full expression (§1.9/16) (A full-expression is an expression that is not a subexpression of another expression.)1
Example :
int a = 5; // ; is a sequence point here
in the evaluation of each of the following expressions after the evaluation of the first expression (§1.9/18) 2
a && b (§5.14)
a || b (§5.15)
a ? b : c (§5.16)
a , b (§5.18) (here a , b is a comma operator; in func(a,a++) , is not a comma operator, it's merely a separator between the arguments a and a++. Thus the behaviour is undefined in that case (if a is considered to be a primitive type))
at a function call (whether or not the function is inline), after the evaluation of all function arguments (if any) which
takes place before execution of any expressions or statements in the function body (§1.9/17).
1 : Note : the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument
2 : The operators indicated are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.
What is Undefined Behaviour?
The Standard defines Undefined Behaviour in Section §1.3.12 as
behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.
Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior.
3 : permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or with-
out the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
In short, undefined behaviour means anything can happen from daemons flying out of your nose to your girlfriend getting pregnant.
What is the relation between Undefined Behaviour and Sequence Points?
Before I get into that you must know the difference(s) between Undefined Behaviour, Unspecified Behaviour and Implementation Defined Behaviour.
You must also know that the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.
For example:
int x = 5, y = 6;
int z = x++ + y++; //it is unspecified whether x++ or y++ will be evaluated first.
Another example here.
Now the Standard in §5/4 says
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
What does it mean?
Informally it means that between two sequence points a variable must not be modified more than once.
In an expression statement, the next sequence point is usually at the terminating semicolon, and the previous sequence point is at the end of the previous statement. An expression may also contain intermediate sequence points.
From the above sentence the following expressions invoke Undefined Behaviour:
i++ * ++i; // UB, i is modified more than once btw two SPs
i = ++i; // UB, same as above
++i = 2; // UB, same as above
i = ++i + 1; // UB, same as above
++++++i; // UB, parsed as (++(++(++i)))
i = (i, ++i, ++i); // UB, there's no SP between `++i` (right most) and assignment to `i` (`i` is modified more than once btw two SPs)
But the following expressions are fine:
i = (i, ++i, 1) + 1; // well defined (AFAIK)
i = (++i, i++, i); // well defined
int j = i;
j = (++i, i++, j*i); // well defined
Furthermore, the prior value shall be accessed only to determine the value to be stored.
What does it mean? It means if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.
For example in i = i + 1 all the access of i (in L.H.S and in R.H.S) are directly involved in computation of the value to be written. So it is fine.
This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.
Example 1:
std::printf("%d %d", i,++i); // invokes Undefined Behaviour because of Rule no 2
Example 2:
a[i] = i++ // or a[++i] = i or a[i++] = ++i etc
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. So the behaviour is undefined.
Example 3 :
int x = i + i++ ;// Similar to above
Follow up answer for C++11 here.
This is a follow up to my previous answer and contains C++11 related material..
Pre-requisites : An elementary knowledge of Relations (Mathematics).
Is it true that there are no Sequence Points in C++11?
Yes! This is very true.
Sequence Points have been replaced by Sequenced Before and Sequenced After (and Unsequenced and Indeterminately Sequenced) relations in C++11.
What exactly is this 'Sequenced before' thing?
Sequenced Before(§1.9/13) is a relation which is:
Asymmetric
Transitive
between evaluations executed by a single thread and induces a strict partial order1
Formally it means given any two evaluations(See below) A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced 2.
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which3.
[NOTES]
1 : A strict partial order is a binary relation "<" over a set P which is asymmetric, and transitive, i.e., for all a, b, and c in P, we have that:
........(i). if a < b then ¬ (b < a) (asymmetry);
........(ii). if a < b and b < c then a < c (transitivity).
2 : The execution of unsequenced evaluations can overlap.
3 : Indeterminately sequenced evaluations cannot overlap, but either could be executed first.
What is the meaning of the word 'evaluation' in context of C++11?
In C++11, evaluation of an expression (or a sub-expression) in general includes:
value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and
initiation of side effects.
Now (§1.9/14) says:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
Trivial example:
int x;
x = 10;
++x;
Value computation and side effect associated with ++x is sequenced after the value computation and side effect of x = 10;
So there must be some relation between Undefined Behaviour and the above-mentioned things, right?
Yes! Right.
In (§1.9/15) it has been mentioned that
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced4.
For example :
int main()
{
int num = 19 ;
num = (num << 3) + (num >> 3);
}
Evaluation of operands of + operator are unsequenced relative to each other.
Evaluation of operands of << and >> operators are unsequenced relative to each other.
4: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations.
(§1.9/15)
The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator.
That means in x + y the value computation of x and y are sequenced before the value computation of (x + y).
More importantly
(§1.9/15) If a side effect on a scalar object is unsequenced relative to either
(a) another side effect on the same scalar object
or
(b) a value computation using the value of the same scalar object.
the behaviour is undefined.
Examples:
int i = 5, v[10] = { };
void f(int, int);
i = i++ * ++i; // Undefined Behaviour
i = ++i + i++; // Undefined Behaviour
i = ++i + ++i; // Undefined Behaviour
i = v[i++]; // Undefined Behaviour
i = v[++i]: // Well-defined Behavior
i = i++ + 1; // Undefined Behaviour
i = ++i + 1; // Well-defined Behaviour
++++i; // Well-defined Behaviour
f(i = -1, i = -1); // Undefined Behaviour (see below)
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [Note: Value computations and side effects associated with different argument expressions are unsequenced. — end note]
Expressions (5), (7) and (8) do not invoke undefined behaviour. Check out the following answers for a more detailed explanation.
Multiple preincrement operations on a variable in C++0x
Unsequenced Value Computations
Final Note :
If you find any flaw in the post please leave a comment. Power-users (With rep >20000) please do not hesitate to edit the post for correcting typos and other mistakes.
C++17 (N4659) includes a proposal Refining Expression Evaluation Order for Idiomatic C++
which defines a stricter order of expression evaluation.
In particular, the following sentence
8.18 Assignment and compound assignment operators:....
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
together with the following clarification
An expression X is said to be sequenced before an expression Y if every
value computation and every side effect associated with the expression X is sequenced before every value
computation and every side effect associated with the expression Y.
make several cases of previously undefined behavior valid, including the one in question:
a[++i] = i;
However several other similar cases still lead to undefined behavior.
In N4140:
i = i++ + 1; // the behavior is undefined
But in N4659
i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
Of course, using a C++17 compliant compiler does not necessarily mean that one should start writing such expressions.
I am guessing there is a fundamental reason for the change, it isn't merely cosmetic to make the old interpretation clearer: that reason is concurrency. Unspecified order of elaboration is merely selection of one of several possible serial orderings, this is quite different to before and after orderings, because if there is no specified ordering, concurrent evaluation is possible: not so with the old rules. For example in:
f (a,b)
previously either a then b, or, b then a. Now, a and b can be evaluated with instructions interleaved or even on different cores.
In C99(ISO/IEC 9899:TC3) which seems absent from this discussion thus far the following steteents are made regarding order of evaluaiton.
[...]the order of evaluation of subexpressions and the order in which
side effects take place are both unspecified. (Section 6.5 pp 67)
The order of evaluation of the operands is unspecified. If an attempt
is made to modify the result of an assignment operator or to access it
after the next sequence point, the behavior[sic] is undefined.(Section
6.5.16 pp 91)

Why doesn't c = a+++++b work in C? [duplicate]

int main ()
{
int a = 5,b = 2;
printf("%d",a+++++b);
return 0;
}
This code gives the following error:
error: lvalue required as increment operand
But if I put spaces throughout a++ + and ++b, then it works fine.
int main ()
{
int a = 5,b = 2;
printf("%d",a++ + ++b);
return 0;
}
What does the error mean in the first example?
Compilers are written in stages. The first stage is called the lexer and turns characters into a symbolic structure. So "++" becomes something like an enum SYMBOL_PLUSPLUS. Later, the parser stage turns this into an abstract syntax tree, but it can't change the symbols. You can affect the lexer by inserting spaces (which end symbols unless they are in quotes).
Normal lexers are greedy (with some exceptions), so your code is being interpreted as
a++ ++ +b
The input to the parser is a stream of symbols, so your code would be something like:
[ SYMBOL_NAME(name = "a"),
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS,
SYMBOL_NAME(name = "b")
]
Which the parser thinks is syntactically incorrect. (EDIT based on comments: Semantically incorrect because you cannot apply ++ to an r-value, which a++ results in)
a+++b
is
a++ +b
Which is ok. So are your other examples.
printf("%d",a+++++b); is interpreted as (a++)++ + b according to the Maximal Munch Rule!.
++ (postfix) doesn't evaluate to an lvalue but it requires its operand to be an lvalue.
!
6.4/4 says
the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token"
The lexer uses what's generally called a "maximum munch" algorithm to create tokens. That means as it's reading characters in, it keeps reading characters until it encounters something that can't be part of the same token as what it already has (e.g., if it's been reading digits so what it has is a number, if it encounters an A, it knows that can't be part of the number. so it stops and leaves the A in the input buffer to use as the beginning of the next token). It then returns that token to the parser.
In this case, that means +++++ gets lexed as a ++ ++ + b. Since the first post-increment yields an rvalue, the second can't be applied to it, and the compiler gives an error.
Just FWIW, in C++ you can overload operator++ to yield an lvalue, which allows this to work. For example:
struct bad_code {
bad_code &operator++(int) {
return *this;
}
int operator+(bad_code const &other) {
return 1;
}
};
int main() {
bad_code a, b;
int c = a+++++b;
return 0;
}
The compiles and runs (though it does nothing) with the C++ compilers I have handy (VC++, g++, Comeau).
This exact example is covered in the draft C99 standard(same details in C11) section 6.4 Lexical elements paragraph 4 which in says:
If the input stream has been parsed into preprocessing tokens up to a
given character, the next preprocessing token is the longest sequence
of characters that could constitute a preprocessing token. [...]
which is also known as the maximal munch rule which is used in in lexical analysis to avoid ambiguities and works by taking as many elements as it can to form a valid token.
the paragraph also has two examples the second one is an exact match for you question and is as follows:
EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y, which
violates a constraint on increment operators, even though the parse x
++ + ++ y might yield a correct expression.
which tells us that:
a+++++b
will be parsed as:
a ++ ++ + b
which violates the constraints on post increment since the result of the first post increment is an rvalue and post increment requires an lvalue. This is covered in section 6.5.2.4 Postfix increment and decrement operators which says (emphasis mine):
The operand of the postfix increment or decrement operator shall have
qualified or unqualified real or pointer type and shall be a
modifiable lvalue.
and
The result of the postfix ++ operator is the value of the operand.
The book C++ Gotchas also covers this case in Gotcha #17 Maximal Munch Problems it is the same problem in C++ as well and it also gives some examples. It explains that when dealing with the following set of characters:
->*
the lexical analyzer can do one of three things:
Treat it as three tokens: -, > and *
Treat it as two tokens: -> and *
Treat it as one token: ->*
The maximal munch rule allows it to avoid these ambiguities. The author points out that it (In the C++ context):
solves many more problems than it causes, but in two common
situations, it’s an annoyance.
The first example would be templates whose template arguments are also templates (which was solved in C++11), for example:
list<vector<string>> lovos; // error!
^^
Which interprets the closing angle brackets as the shift operator, and so a space is required to disambiguate:
list< vector<string> > lovos;
^
The second case involves default arguments for pointers, for example:
void process( const char *= 0 ); // error!
^^
would be interpreted as *= assignment operator, the solution in this case is to name the parameters in the declaration.
Your compiler desperately tries to parse a+++++b, and interprets it as (a++)++ +b. Now, the result of the post-increment (a++) is not an lvalue, i.e. it can't be post-incremented again.
Please don't ever write such code in production quality programs. Think about the poor fellow coming after you who needs to interpret your code.
(a++)++ +b
a++ returns the previous value, a rvalue. You can't increment this.
Because it causes undefined behaviour.
Which one is it?
c = (a++)++ + b
c = (a) + ++(++b)
c = (a++) + (++b)
Yeah, neither you nor the compiler know it.
EDIT:
The real reason is the one as said by the others:
It gets interpreted as (a++)++ + b.
but post increment requires a lvalue (which is a variable with a name) but (a++) returns a rvalue which cannot be incremented thus leading to the error message you get.
Thx to the others to pointing this out.
I think the compiler sees it as
c = ((a++)++)+b
++ has to have as an operand a value that can be modified. a is a value that can be modified. a++ however is an 'rvalue', it cannot be modified.
By the way the error I see on GCC C is the same, but differently-worded: lvalue required as increment operand.
Follow this precesion order
1.++ (pre increment)
2.+ -(addition or subtraction)
3."x"+ "y"add both the sequence
int a = 5,b = 2;
printf("%d",a++ + ++b); //a is 5 since it is post increment b is 3 pre increment
return 0; //it is 5+3=8

Why doesn't a+++++b work?

int main ()
{
int a = 5,b = 2;
printf("%d",a+++++b);
return 0;
}
This code gives the following error:
error: lvalue required as increment operand
But if I put spaces throughout a++ + and ++b, then it works fine.
int main ()
{
int a = 5,b = 2;
printf("%d",a++ + ++b);
return 0;
}
What does the error mean in the first example?
Compilers are written in stages. The first stage is called the lexer and turns characters into a symbolic structure. So "++" becomes something like an enum SYMBOL_PLUSPLUS. Later, the parser stage turns this into an abstract syntax tree, but it can't change the symbols. You can affect the lexer by inserting spaces (which end symbols unless they are in quotes).
Normal lexers are greedy (with some exceptions), so your code is being interpreted as
a++ ++ +b
The input to the parser is a stream of symbols, so your code would be something like:
[ SYMBOL_NAME(name = "a"),
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS_PLUS,
SYMBOL_PLUS,
SYMBOL_NAME(name = "b")
]
Which the parser thinks is syntactically incorrect. (EDIT based on comments: Semantically incorrect because you cannot apply ++ to an r-value, which a++ results in)
a+++b
is
a++ +b
Which is ok. So are your other examples.
printf("%d",a+++++b); is interpreted as (a++)++ + b according to the Maximal Munch Rule!.
++ (postfix) doesn't evaluate to an lvalue but it requires its operand to be an lvalue.
!
6.4/4 says
the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token"
The lexer uses what's generally called a "maximum munch" algorithm to create tokens. That means as it's reading characters in, it keeps reading characters until it encounters something that can't be part of the same token as what it already has (e.g., if it's been reading digits so what it has is a number, if it encounters an A, it knows that can't be part of the number. so it stops and leaves the A in the input buffer to use as the beginning of the next token). It then returns that token to the parser.
In this case, that means +++++ gets lexed as a ++ ++ + b. Since the first post-increment yields an rvalue, the second can't be applied to it, and the compiler gives an error.
Just FWIW, in C++ you can overload operator++ to yield an lvalue, which allows this to work. For example:
struct bad_code {
bad_code &operator++(int) {
return *this;
}
int operator+(bad_code const &other) {
return 1;
}
};
int main() {
bad_code a, b;
int c = a+++++b;
return 0;
}
The compiles and runs (though it does nothing) with the C++ compilers I have handy (VC++, g++, Comeau).
This exact example is covered in the draft C99 standard(same details in C11) section 6.4 Lexical elements paragraph 4 which in says:
If the input stream has been parsed into preprocessing tokens up to a
given character, the next preprocessing token is the longest sequence
of characters that could constitute a preprocessing token. [...]
which is also known as the maximal munch rule which is used in in lexical analysis to avoid ambiguities and works by taking as many elements as it can to form a valid token.
the paragraph also has two examples the second one is an exact match for you question and is as follows:
EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y, which
violates a constraint on increment operators, even though the parse x
++ + ++ y might yield a correct expression.
which tells us that:
a+++++b
will be parsed as:
a ++ ++ + b
which violates the constraints on post increment since the result of the first post increment is an rvalue and post increment requires an lvalue. This is covered in section 6.5.2.4 Postfix increment and decrement operators which says (emphasis mine):
The operand of the postfix increment or decrement operator shall have
qualified or unqualified real or pointer type and shall be a
modifiable lvalue.
and
The result of the postfix ++ operator is the value of the operand.
The book C++ Gotchas also covers this case in Gotcha #17 Maximal Munch Problems it is the same problem in C++ as well and it also gives some examples. It explains that when dealing with the following set of characters:
->*
the lexical analyzer can do one of three things:
Treat it as three tokens: -, > and *
Treat it as two tokens: -> and *
Treat it as one token: ->*
The maximal munch rule allows it to avoid these ambiguities. The author points out that it (In the C++ context):
solves many more problems than it causes, but in two common
situations, it’s an annoyance.
The first example would be templates whose template arguments are also templates (which was solved in C++11), for example:
list<vector<string>> lovos; // error!
^^
Which interprets the closing angle brackets as the shift operator, and so a space is required to disambiguate:
list< vector<string> > lovos;
^
The second case involves default arguments for pointers, for example:
void process( const char *= 0 ); // error!
^^
would be interpreted as *= assignment operator, the solution in this case is to name the parameters in the declaration.
Your compiler desperately tries to parse a+++++b, and interprets it as (a++)++ +b. Now, the result of the post-increment (a++) is not an lvalue, i.e. it can't be post-incremented again.
Please don't ever write such code in production quality programs. Think about the poor fellow coming after you who needs to interpret your code.
(a++)++ +b
a++ returns the previous value, a rvalue. You can't increment this.
Because it causes undefined behaviour.
Which one is it?
c = (a++)++ + b
c = (a) + ++(++b)
c = (a++) + (++b)
Yeah, neither you nor the compiler know it.
EDIT:
The real reason is the one as said by the others:
It gets interpreted as (a++)++ + b.
but post increment requires a lvalue (which is a variable with a name) but (a++) returns a rvalue which cannot be incremented thus leading to the error message you get.
Thx to the others to pointing this out.
I think the compiler sees it as
c = ((a++)++)+b
++ has to have as an operand a value that can be modified. a is a value that can be modified. a++ however is an 'rvalue', it cannot be modified.
By the way the error I see on GCC C is the same, but differently-worded: lvalue required as increment operand.
Follow this precesion order
1.++ (pre increment)
2.+ -(addition or subtraction)
3."x"+ "y"add both the sequence
int a = 5,b = 2;
printf("%d",a++ + ++b); //a is 5 since it is post increment b is 3 pre increment
return 0; //it is 5+3=8

Using postfix increment in an L-value [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
In what order does evaluation of post-increment operator happen?
Consider the following snippet(in C):
uint8_t index = 10;
uint8_t arr[20];
arr[index++] = index;
When I compile this with gcc, it sets arr[10] to 10, which means that the postfix increment isn't being applied until after the entire assignment expression. I found this somewhat surprising, as I was expecting the increment to return the original value(10) and then increment to 11, thereby setting arr[10] to 11.
I've seen lots of other posts about increment operators in RValues, but not in LValue expressions.
Thanks.
Some standard language:
6.5 Expressions
1 An expression is a sequence of operators and operands that specifies computation of a
value, or that designates an object or a function, or that generates side effects, or that
performs a combination thereof.
2 Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression.72) Furthermore, the prior value shall be read only to determine the value to be stored.73)
3 The grouping of operators and operands is indicated by the syntax.74) Except as specified later (for the function-call (), &&, ||, ?:, and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.
Paragraph 2 explicitly renders expressions of the form a[i++] = i undefined; the prior value of i isn't just being read to determine the result of i++. Thus, any result is allowed.
Beyond that, you cannot rely on the side effect of the ++ operator to be applied immediately after the expression is evaluated. For an expression like
a[i++] = j++ * ++k
the only guarantee is that the result of the expression j++ * ++k is assigned to the result of the expression a[i++]; however, each of the subexpressions a[i++], j++, and ++k may be evaluated in any order, and the side effects (assigning to a[i], updating i, updating j, and updating k) may be applied in any order.
The line
arr[index++] = index;
causes undefined behaviour. Read the C standard for more details.
The essence you should know is: You should never read and change a variable in the same statement.
Assignment operation works from right to left, i.e. right part of expression calculated first and then it is assigned to the left part.

Resources