I have a simple C code and big confusion about expressions containing comma(,) operator(s).
int main(){
int i=0,j=11,c;
c=i=j,++i;
printf("c=%d i=%d\n",c,i);
c=(i=j,++i);
printf("c=%d i=%d\n",c,i);
return 0;
}
The above code prints:
c=11 i=12
c=12 i=12
My questions are:
What is the actual work of comma(,) as an operator?
++ has more precedence than , and =, why evaluation is done for the expression on left of comma?
What will be the order if an expression contains operators with different priority, will it depend on comma(,)?
Is it behaving like a substitute of semicolon(;)?
The assignment operator has a higher priority then the comma operator.
Thus expression
c = i = j, ++i;
is equivalent to
( c = i = j ), ++i;
According to the C Standard (6.5.17 Comma operator)
2 The left operand of a comma operator is evaluated as a void
expression; there is a sequence point between its evaluation and that
of the right operand. Then the right operand is evaluated; the result
has its type and value.114)
In the expression above the result of the comma operator is discarded but it has a side effect of increasing i.
In this expression
c = ( i = j, ++i );
due to using parentheses you changed the order of the evaluation of the above expression. Now it is equivalent to
c = ( ( i = j ), ++i );
and variable c gets the value of expression ++i according to the quote from the C Standard listed above.
operator comma is to execute many statement and return only result of last statement.
So for c=i=j,++i; : c=i=j is executed, then ++i and after that result of ++i is returned (but not used).
And for c=(i=j,++i);, according to operator precedence, i=j is executed, and just after ++i is executed, and then affectation to c of result of (i=j, ++i), which is the result of last statement, i.e. ++i
So, the behavior of comma is not really same as semicolon. You can use it as a substitute like in c=i=j,++i;.
Personally, I do not encourage to use this operator, which generates less readable and less maintainable code
What is the actual work of comma(,) as an operator?
The comma operator is mainly a superfluous feature. See this for a description of how it works.
++ has more precedence than , and =, why evaluation is done for the expression on left of comma?
What will be the order if an expression contains operators with different priority, will it depend on comma(,)?
The left operand is evaluated for side effects. The result of the comma operator is the result of the evaluated right operand. Note that the comma operator has the lowest precedence of all operators in C.
Is it behaving like a substitute of semicolon(;)?
Kind of, yeah. Both a semi-colon and the comma operator includes a sequence point. The difference is that the comma operator isn't the end of a statement, so it can be squeezed in with other operators on the same line, and it also returns a result.
There is really no reason why you ever would want to do this though. The main use of the comma operator is to obfuscate code, it should be avoided. The only reason why you need to learn how it works, is because you might encounter crap code containing it.
For example, your nonsense code should be rewritten into something more readable and safe:
int main(){
int i=0;
int j=11;
int c;
i=j;
c=j;
i++;
printf("c=%d i=%d\n",c,i);
i=j;
i++;
c=i;
printf("c=%d i=%d\n",c,i);
return 0;
}
Well let's split it. In the first case c and i take the value of j => c=i=j=11; then you increment i => i=12; So the code is equivalent to this
c = j;
i = j;
++i;
For the second case i takes the value of j => i=j=11 and then you increment i => i=12 and then c takes the value of i => c = 12;
So the code is equivalent to this:
i = j;
++i;
c = i;
The comma operator will evaluate and discard all operations, up to, but not including, the final operation. This allows any number of non-consequential operations to be invoked together on a single line where only the last operation is of interest.
Think of it this way, if you have a number of loop variables to increment at any one location in a loop, you can separate all the additions/subtraction, etc.. over their own individual variables on separate line, but why? Where they are executed (within reason) is of no-consequence to the operation of the code. They then can be invoked on a single line with no adverse effect to the code.
Related
I just wonder if, for the following code, the compiler uses associativity/precedence alone or some other logic to evaluate.
int i = 0, k = 0;
i = k++;
If we evaluate based on associativity and precedence, postfix ++ has higher precedence than =, so k++(which becomes 1) is evaluated first and then comes =, now the value of k which is 1 is assigned to i.
So the value of i and k would be 1. However, the value of i is 0 and k is 1.
So I think that the compiler splits this i = k++; into two (i = k; k++;). So here compiler is not going for the statements associativity/precedence, it splits the line as well. Can someone explain how the compiler resolves these kinds of statements?
++ does two separate things.
k++ does two things:
It has the value of k before any increment is performed.
It increments k.
These are separate:
Producing the value of k occurs as part of the main evaluation of i = k++;.
Incrementing k is a side effect. It is not part of the main evaluation. The program may increment the value of k after evaluating the rest of the expression or during it. It may even increment the value before the rest of the expression, as long as it “remembers” the pre-increment value to use for the expression.
Precedence and associativity are not involved.
This effectively has nothing to do with precedence or associativity. The increment part of a ++ operator is always separate from the main evaluation of an expression. The value used for k++ is always the value of k before the increment regardless of what other operators are present.
Supplement
It is important to understand that the increment part of ++ is detached from the main evaluation and is sort of “floating around” in time–it is not anchored to a certain spot in the code, and you do not control when it occurs. This is important because if there is another use or modification of the operand, such as in k * k++, the increment can occur before, during, or after the main evaluation of the other occurrence. When this happens, the C standard does not define the behavior of the program.
Postfix operators have higher precedence than assignment operators.
This expression with the assignment operator
i = k++
contains two operands.
It is equivalently can be rewritten like
i = ( k++ );
The value of the expression k++ is 0. So the variable i will get the value 0.
The operands of the assignment operator can be evaluated in any order.
According to the C Standard (6.5.2.4 Postfix increment and decrement operators)
2 The result of the postfix ++ operator is the value of the operand.
As a side effect, the value of the operand object is incremented (that
is, the value 1 of the appropriate type is added to it).
And (6.5.16 Assignment operators)
3 An assignment operator stores a value in the object designated by
the left operand. An assignment expression has the value of the left
operand after the assignment,111) but is not an lvalue. The type of an
assignment expression is the type the left operand would have after
lvalue conversion. The side effect of updating the stored value of
the left operand is sequenced after the value computations of the left
and right operands. The evaluations of the operands are unsequenced.
Unlike C++, C does not have "pass by reference". Only "pass by value". I'm going to borrow some C++ to explain. Let's implement the functionality of ++ for both postfix and prefix as regular functions:
// Same as ++x
int inc_prefix(int &x) { // & is for pass by reference
x += 1;
return x;
}
// Same as x++
int inc_postfix(int &x) {
int tmp = x;
x += 1;
return tmp;
}
So your code is now equivalent to:
i = inc_postfix(k);
EDIT:
It's not completely equivalent for more complex things. Function calls introduces sequence points for instance. But the above is enough to explain what happens for OP.
It's similar to (only with an additional sequence point for illustration):
i = k; // i = 0
k = k + 1; // k = 1
Operator associativity doesn't apply here. Operator precedence merely states which operand that sticks to which operator. It's not particularly relevant in this case, it just says that the expression should be parsed as i = (k++); and not as (i = k)++; which wouldn't make any sense.
From there on, how this expression is evaluated/executed is specified by specific rules for each operator. The postfix operator is specified to behave as (6.5.2.4):
The value computation of the result is sequenced before the side effect of
updating the stored value of the operand.
That is, k++ is guaranteed to evaluate to 0 and then at some point later on, k is increased by 1. We don't really know when, only that it happens somewhere between the point when k++ is evaluated but before the next sequence point, in this case the ; at the end of the line.
The assignment operator behaves as (6.5.16):
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.
In this case, the right operand of = has its value computed before updating the left operand.
In practice, this means that the executable can look as either this:
k is evaluated to 0
set i to 0
increase k by 1
semicolon/sequence point
Or this:
k is evaluated to 0
increase k by 1
set i to 0
semicolon/sequence point
Precedence and associativity only affect how operators and operands are associated with each other - they do not affect the order in which expressions are evaluated. Precedence rules dictate that
i = k++
is parsed as
i = (k++)
instead of something like
(i = k)++
The postfix ++ operator has a result and a side effect. In the expression
i = k++
the result of k++ is the current value of k, which gets assigned to i. The side effect is to increment k.
It's logically equivalent to writing
tmp = k
i = tmp
k = k + 1
with the caveat that the assignment to i and the update to k can happen in any order - the operations can even be interleaved with each other. What matters is that i gets the value of k before the increment and that k gets incremented, not necessarily the order in which those operations occur.
The fundamental issue here is that precedence is not the right way to think about what
i = k=+;
means.
Let's talk about what k++ actually means. The definition of k++ is that if gives you the old value of k, and then adds 1 to the stored value of k. (Or, stated another way, it takes the old value of k, plus 1, and stores it back into k, while giving you the old value of k.)
As far as the rest of the expression is concerned, the important thing is what the value of k++ is. So when you say
i = k++;
the answer to the question of "What gets stored in i?" is, "The old value of k".
When we answer the question of "What gets stored in i?", we don't think about precedence at all. We think about the meaning of the postfix ++ operator.
See also this older question.
Postscript: The other thing you have to be really careful about is when you think about the side question, "When does it store the new value into k? It turns out that's a really hard question to answer, because the answer is not as well defined as you might like. The new value gets stored back into k sometime before the end of the larger expression it's in (formally, "before the next sequence point"), but we don't know whether it happens before or after, say, the point at which the thing gets stored into i, or before or after other interesting points in the expression.
Ahh, this is quite an interesting question. To help you understand better, this is what actually happens.
I'm going to try to explain using a bit of operator overloading concepts from C++, so bear with me if you do not know C++.
This is how you would overload the postfix-increment operator:
int operator++(int) // Note that the 'int' parameter is just a C++ way of saying that this is the postfix and not prefix operator
{
int copy = *this; // *this just means the current object which is calling the function
*this += 1;
return copy;
}
Essentially what the postfix-increment operator does is that it creates a copy of the operand, increases the original variable, and then returns the copy.
In your case of i = k++, k++ does actually happen first but the value returned is actually k (think of it like a function call). This then gets assigned to i.
The statement puts("a") + puts("b") is undefined.
This is because it is not specified in the C Standard whether these ought to be executed left to right or right to left so you could get
a
b
or
b
a
Is there a clean way to dictate the order of operations in an expression?
The only thing I can think of is to use a compound statement such as
({
int temp = puts("a");
temp += puts("b");
temp;
})
though this is non-portable and a little longer than I was hoping.
How could this best be achieved?
If you declare an int variable before the expression, you can force order portably with the comma operator while computing the sum inside an expression:
int temp;
...
(temp = puts("a"), temp + puts("b"))
As specified in the C Standard:
6.5.17 Comma operator
Syntax
expression:
assignment-expression
expression , assignment-expression
Semantics
The left operand of a comma operator is evaluated as a void expression; there is a sequence point between its evaluation and that of the right operand. Then the right operand is evaluated; the result has its type and value.
Note however that the value of the expression will not be very useful given the semantics of puts(), as commented by Jonathan Leffler.
The only way to force the order of evaluation is to use separate statements.
Compilers can use whatever order is deemed necessary. So for function calls f1() + f2() + f3(); any of one of those function calls could be called before the other. The only influence you can have on that statement is what to do with the returns from those functions. So in short, just use separate statements. Most likely for whatever you're doing, putting the calls in a loop should do fine.
Decent reference:
http://en.cppreference.com/w/c/language/eval_order
Why is the postfix increment operator (++) executed after the assignment (=) operator in the following example? According to the precedence/priority lists for operators ++ has higher priority than = and should therefore be executed first.
int a,b;
b = 2;
a = b++;
printf("%d\n",a);
will output a = 2.
PS: I know the difference between ++b and b++ in principle, but just looking at the operator priorities these precende list tells us something different, namely that ++ should be executed before =
++ is evaluated first. It is post-increment, meaning it evaluates to the value stored and then increments. Any operator on the right side of an assignment expression (except for the comma operator) is evaluated before the assignment itself.
It is. It's just that, conceptually at least, ++ happens after the entire expression a = b++ (which is an expression with value a) is evaluated.
Operator precedence and order of evaluation of operands are rather advanced topics in C, because there exists many operators that have their own special cases specified.
Postfix ++ is one such special case, specified by the standard in the following manner (6.5.2.4):
The value computation of the result is sequenced before the side
effect of updating the stored value of the operand.
It means that the compiler will translate the line a = b++; into something like this:
Read the value of b into a CPU register. ("value computation of the result")
Increase b. ("updating the stored value")
Store the CPU register value in a.
This is what makes postfix ++ different from prefix ++.
The increment operators do two things: add +1 to a number and return a value. The difference between post-increment and pre-increment is the order of these two steps. So the increment actually is executed first and the assignment later in any case.
This question already has answers here:
C comma operator
(4 answers)
What does the comma operator , do?
(8 answers)
Closed 7 years ago.
I saw this in an exam and when I tried it out I was surprised. I tried it online and it works too. So I think it is the C language.
Why is that working? What is the use case for such an assignment syntax?
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
int i = (1,2,3,4,5);
printf("%d", i);
return 0;
}
These are not "multiple integers", but a comma operator. The whole parenthesised part is a single expression with each sub-expression (separated by commas) evaluated strictly from left to right. The results of all but the rightmost subexpression are ignored. The result of the whole expression is that of the last (rightmost) expression. Here it is the integer value 5.
Note that this operator is mostly used where only a single expression is allowed to add further side-effects. E.g. in a loop like:
int cnt = 0;
for ( const char *cp = "Hello" ; *cp != '\0' ; cp++, cnt++ ) ;
This counts the number of characters in a C-string, incrementing the pointer and cnt after each iteration. The results are ignored here.
So, this is in no way related to tuples or similar like in Python. There are actually no cases the usage of this operator is unavoidable and it should be used with caution — and some coding standards forbid its usage.
That's the comma operator at work. It evaluates the expression on its left-hand side, creates a sequence point, discards the value of the expression, then evaluates the expression on the right-hand side, and returns that as the value. When there are multiple expressions as in the example, then each is evaluated in turn and only the last is kept. With the sample, the compiler does the evaluation because every value is known at compile time. Note that the argument list to a function is not a use of the comma operator.
That isn't a valid use-case for the comma operator. What might be a more nearly valid use-case would be some operations with side-effects (such as function calls) that need to be sequenced and the final value assigned:
int i = (getchar(), getchar(), getchar());
This sets i to the third character in standard input, or EOF if there are not three characters left in the standard input to be read. Still not a realistic use-case, but better than assigning a list of constants.
In addition to the other answers, you need to watch for instances where a , is the comma operator as opposed to when it is a separator. For example, the following is invalid:
int i = 1,2,3,4,5;
In this case, the , is a separator between variable declarations. It declares i as an int and initializes it to 1, then it attempts to parse 2 as a variable name, which fails.
It works because you're using the "comma operator", which evaluates the subexpressions on the left and right, and has the value from the right-hand expression.
So in (1,2,3,4,5), 1 is evaluated and the result is discarded, then 2,3,4,5... in which (because of the next comma) 2 is evaluated and the result discarded, then 3,4,5... in which 3 is evaluated and discarded, then 4,5... in which 4 is evaluated and discarded, then 5 which becomes the result of the expression.
As for when it's useful, mainly as a shortcut when you need to evaluate several (sub)expressions for their side effects but aren't interested in their values (except maybe the last one). It's sometimes convenient in for loop expressions, such as when incrementing two variables:
for (i=0,j=1; j < len; i++,j++) {
..where it appears in both the initialization expression and the loop expression.
Why is that working?
Because its a valid C syntax. The comma in (1,2,3,4,5) are comma operator
C11: 6.5.17 Comma operator
Syntax
1 expression:
assignment-expression
expression , assignment-expression
Semantics
2 The left operand of a comma operator is evaluated as a void expression; there is a sequence point between its evaluation and that of the right operand. Then the right operand is evaluated; the result has its type and value.114)
What is the use case for such an assignment syntax?
See the example below
3 EXAMPLE As indicated by the syntax, the comma operator (as described in this subclause) cannot appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists of initializers). On the other hand, it can be used within a parenthesized expression or within the second
expression of a conditional operator in such contexts. In the function call
f(a, (t=3, t+2), c)
the function has three arguments, the second of which has the value 5.
After reading this answer about undefined behavior and sequence points, I wrote a small program:
#include <stdio.h>
int main(void) {
int i = 5;
i = (i, ++i, 1) + 1;
printf("%d\n", i);
return 0;
}
The output is 2. Oh God, I didn't see the decrement coming! What is happening here?
Also, while compiling the above code, I got a warning saying:
px.c:5:8: warning: left-hand operand of comma expression has no effect
[-Wunused-value] i = (i, ++i, 1) + 1;
^
Why? But probably it will be automatically answered by the answer of my first question.
In the expression (i, ++i, 1), the comma used is the comma operator
the comma operator (represented by the token ,) is a binary operator that evaluates its first operand and discards the result, and then evaluates the second operand and returns this value (and type).
Because it discards its first operand, it is generally only useful where the first operand has desirable side effects. If the side effect to the first operand does not takes place, then the compiler may generate warning about the expression with no effect.
So, in the above expression, the leftmost i will be evaluated and its value will be discarded. Then ++i will be evaluated and will increment i by 1 and again the value of the expression ++i will be discarded, but the side effect to i is permanent. Then 1 will be evaluated and the value of the expression will be 1.
It is equivalent to
i; // Evaluate i and discard its value. This has no effect.
++i; // Evaluate i and increment it by 1 and discard the value of expression ++i
i = 1 + 1;
Note that the above expression is perfectly valid and does not invoke undefined behavior because there is a sequence point between the evaluation of the left and right operands of the comma operator.
Quoting from C11, chapter 6.5.17, Comma operator
The left operand of a comma operator is evaluated as a void expression; there is a
sequence point between its evaluation and that of the right operand. Then the right
operand is evaluated; the result has its type and value.
So, in your case,
(i, ++i, 1)
is evaluated as
i, gets evaluated as a void expression, value discarded
++i, gets evaluated as a void expression, value discarded
finally, 1, value returned.
So, the final statement looks like
i = 1 + 1;
and i gets to 2. I guess this answers both of your questions,
How i gets a value 2?
Why there is a warning message?
Note: FWIW, as there is a sequence point present after the evaluation of the left hand operand, an expression like (i, ++i, 1) won't invoke UB, as one may generally think by mistake.
i = (i, ++i, 1) + 1;
Let's analyse it step by step.
(i, // is evaluated but ignored, there are other expressions after comma
++i, // i is updated but the resulting value is ignored too
1) // this value is finally used
+ 1 // 1 is added to the previous value 1
So we obtain 2. And the final assignment now:
i = 2;
Whatever was in i before it's overwritten now.
The outcome of
(i, ++i, 1)
is
1
For
(i,++i,1)
the evaluation happens such that the , operator discards the evaluated value and will retain just the right most value which is 1
So
i = 1 + 1 = 2
You'll find some good reading on the wiki page for the Comma operator.
Basically, it
... evaluates its first operand and discards the result, and then evaluates the second operand and returns this value (and type).
This means that
(i, i++, 1)
will, in turn, evaluate i, discard the result, evaluate i++, discard the result, and then evaluate and return 1.
You need to know what the comma operator is doing here:
Your expression:
(i, ++i, 1)
The first expression, i, is evaluated, the second expression, ++i, is evaluated, and the third expression, 1, is returned for the whole expression.
So the result is: i = 1 + 1.
For your bonus question, as you see, the first expression i has no effect at all, so the compiler complains.
Comma has an 'inverse' precedence. This is what you will get from old books and C manuals from IBM (70s/80s). So the last 'command' is what is used in parent expression.
In modern C its use is strange but is very interesting in old C (ANSI):
do {
/* bla bla bla, consider conditional flow with several continue's */
} while ( prepAnything(), doSomethingElse(), logic_operation);
While all operations (functions) are called from left to right, only the last expression will be used as a result to conditional 'while'.
This prevent handling of 'goto's to keep a unique block of commands to run before condition check.
EDIT: This avoid also a call to a handling function which could take care of all logic at left operands and so return the logical result. Remember that, we had not inline function in the past of C. So, this could avoid a call overhead.