Why results of GCC and Clang are different with following code? - c

I got different results for the following code with gcc and clang, I believe it is not a serious bug, but I wonder which result is more coherent with the standard? Thanks a lot for your reply.
I use gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 and clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
#include <stdio.h>
int get_1(){
printf("get_1\n");
return 1;
}
int get_2(){
printf("get_2\n");
return 2;
}
int get_3(){
printf("get_3\n");
return 3;
}
int get_4(){
printf("get_4\n");
return 4;
}
int main(int argc, char *argv[])
{
printf("%d\n",get_1() + get_2() - (get_3(), get_4()));
return 0;
}
the result of gcc is
get_3
get_1
get_2
get_4
-1
and the result of clang is
get_1
get_2
get_3
get_4
-1

C does not impose an order in evaluating operands of some operators. The order of evaluation is imposed in C standard by sequence points. When you have sequence points present, a sound implementation of the language must finish evaluating everything at the left of the sequence point before it starts evaluating what is present in the right side. The + and - operators do not contain any sequence point. Here is the very definition from 5.1.2.3 p2
At certain specified points in the execution sequence called sequence points,all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
In your expression
get_1() + get_2() - (get_3(), get_4())
you have the +, - and the comma , operator. Only the comma imposes an order of evaluation, the + and - does not.

The , between get_3() and get_4() is the only sequence point in printf("%d\n",get_1() + get_2() - (get_3(), get_4())); the get_x calls can happen in any order defined by the compiler as long as get_3() happens before get_4().
You're seeing the result of unspecified behaviour.

There's two different but related terms at play: operator precedence and order of evaluation.
Operator precedence dictates parsing order:
In your expression, the parenthesis has highest precedence so what's inside it belongs together.
Next we have the function call operators (). Nothing strange there, they are postfix and belong to their operator, the function name.
Next up we have binary + and - operators. They belong to the same operator group "additive operators" and have the same precedence. When this happens, operator associativity for operators of that group decides in which order they should be parsed.
For additive operators, the operator associativity is left-to-right. Meaning that the expression is guaranteed to be parsed as (get_1() + get_2()) - ....
And finally we have the oddball comma operator, with lowest precedence of all.
Once the operator precedence is sorted out as per above, we know which operands that belong to which operators. But this says nothing of in which order the expression will get executed. That is where order of evaluation comes in.
Generally C says, in dry standard terms:
Except as specified later, side effects and value computations of subexpressions are unsequenced.
In plain English this means that the order of evaluation of operands is unspecified, for the most part, with some special exceptions.
For the additive operators + and -, this is true. Given a + b we cannot know if a or b will get executed first. The order of evaluation is unspecified - the compiler may execute it in any order it pleases, need not document how, and need not even behave consistently from case to case.
This is intentionally left unspecified by the C standard, to allow different compilers to parse expressions differently. Essentially allowing them to keep their expression tree algorithm a compiler trade secret, to allow some compilers to produce more effective code than others on a free market.
And this is why gcc and clang give different results. You have written code that relies on the order of evaluation. This is no fault of either compiler - we should simply not write programs that relies on poorly-specified behavior. If you have to execute those functions in a certain order, you should split them up over several lines/expressions.
As for the comma operator, it is one of the rare special cases. It comes with a built-in "sequence point" which guarantees that the left operand is always evaluated (executed) before the right. Other such special cases are && || operators and the ?: operator.

Related

Operators in the C language with the same precedence level?

C programming language documentation Precedence and order of evaluation states:
The direction of evaluation does not affect the results of expressions that include more than one multiplication (*), addition (+), or binary-bitwise (&, |, or ^) operator at the same level. Order of operations is not defined by the language.
What exactly does the above mean (perhaps a code example will help)?
That page is not particularly well-written.
Precedence determines which operators are grouped with which operands in an expression - it does not dictate the order in which subexpressions are evaluated. For example, in the expression a + b * c, the * operator has higher precedence than the + operator, so the expression is parsed as a + (b * c) - the result of a is added to the result of b * c.
However, each of the expressions a, b, and c may be evaluated in any order, even simultaneously (interleaved or in parallel). There’s no requirement that b be evaluated before c or that either must be evaluated before a.
Associativity also affects grouping of operators and operands when you have multiple operators of the same precedence - the expression a + b + c is parsed as (a + b) + c because the + operator (along with the other arithmetic operators) is left-associative. The result of a + b is added to the result of c.
But like with precedence above, this does not control order of evaluation. Again, each of a, b, and c may be evaluated in any order.
The only operators which force left-to-right evaluation of their operands are the &&, ||, ?:, and comma operators.
From the comments:
Because the operands b and c accompany the * (multiplication) operator which has higher precedence than the + (addition operator), then isn't it required (always) that both b and c be evaluated first before a?
The only requirement is that the result of b * c be known before it can be added to the result of a. It doesn’t mean that b * c must be evaluated before a:
t0 <- a
t1 <- b * c
t2 <- t1 + t0
Again, precedence only controls the grouping of operators and operands, not the order in which subexpressions are evaluated.
I assume that what the cited documentation is trying to say is that given the code
a = f1() + f2() + f3();
or
b = f1() * f2() * f3();
we do not know which of the functions f1, f2, or f3 will be called first.
However, it is guaranteed that the result of calling f1 will be added to the result of calling f2, and that this intermediate sum will then be added to the result of calling f3. Similarly for the multiplications involved in computing b. These aspects of the evaluation order are guaranteed due to the left-associativity of addition and multiplication. That is, the results (both the defined and the unspecified aspects) are the same as if the expressions had been written
a = (f1() + f2()) + f3();
and
b = (f1() * f2()) * f3();
Upon reading the cited documentation, however, I fear that I may be wrong. It's possible that the cited documentation is simply wrong, in that it seems to be suggesting that the +, *, &, |, and ^ are somehow an exception to the associativity rules, and that the defined left-associativity is somehow not applicable. That's nonsense, of course: left-associativity is just as real when applied to these operators as it is when applied to, say, - and /.
To explain: If we write
10 - 5 - 2
it is unquestionably equivalent to
(10 - 5) - 2
and therefore results in 3. It is not equivalent to
10 - (5 - 2)
and the result is therefore not 7. Subtraction is not commutative and not associative, so the order you do things in almost always matters.
In real mathematics, of course, addition and multiplication are fully commutative and associative, meaning that you can mix things up almost any which way and still get the same result. But what's not as well known is that computer mathematics are significantly enough different from "real" mathematics that not all of the rules — in particular, commutativity — actually apply.
Consider the operation
-100000000 + 2000000000 + 200000000
If it's evaluated the way I've said it has to be, it's
(-100000000 + 2000000000) + 200000000
which is
1900000000 + 200000000
which is 2100000000, which is fine.
If someone (or some compiler) chose to evaluate it the way I've said it couldn't be evaluated, on the other hand, it might come out as
-100000000 + (2000000000 + 200000000) /* WRONG */
which is
-100000000 + 2200000000
which is... wait a minute. We're in trouble already. 2200000000 is a 32-bit number, which means it can't be properly represented as a positive, 32-bit signed integer.
In other words, this is an example of an expression which, if you evaluate it in the wrong order, can overflow, and theoretically become undefined.
Similar things can happen with floating-point arithmetic. The expression
1.2e-50 * 3.4e300 * 5.6e20
will overflow (exceed the maximum value of a double, which is good up to about 1e307) if the second multiplication wrongly happens first. The expression
2.3e100 * 4.5e-200 * 6.7e-200
will underflow (to zero, exceeding the minimum value of a double) if the second multiplication happens first.
The point I'm trying to make here is that computer addition and multiplication are not quite commutative, meaning that a compiler should not rearrange them. If a compiler does (as the cited documentation seems to, wrongly, claim is possible), you, the programmer, can see results which are significantly and wrongly different from what the C Standard said you were allowed to expect.
I hope this all makes some kind of sense, although in closing, I should perhaps suggest that it's not necessarily as unambiguous and clear-cut as I've made it sound. I believe what I've described (that is, the strict associativity, and non commutativity, of multiplication and addition) is what's formally required by the current C standards, and by IEEE-754. However, I'm not sure they've been required by all versions of the C Standard, and I don't believe they were clearly guaranteed by Ritchie's original definition of C, either. They're not guaranteed by all C compilers, they're not depended upon or cared about by all C programmers, and they're not appreciated by people who write documentation like that cited in this thread.
(Also, for those really interested in fine points, rearranging integer addition as if it were commutative is not quite so wrong — or, at least, it's not visible/detectably wrong — if you know you're compiling for a processor that quietly wraps around on signed integer overflow.)

Using of several increment/decrement in the same statement

I know that order of computations in C is not strict, so value of expression --a + ++a is undefined because it's unknown which part of statement runs first.
But, what if I known that order of computations is irrelevant in a particular case? For example:
All modifications correspond to different variables (like in a[p1++] = b[p2++])
Order do not matter, like in a++ + ++a - the result is two no matter which side of + is calculated first. Is it guaranteed that one the parts will be calculated fully before running the another? I.e. compiler is unable to remember result of a++, the result of ++a and then apply first a++, getting one instead of two? For example, caching initial value of a and passing it as argument to two operators independently.
I'm interested in answers about C, C99, C11, C++03 and C++11, if there is any difference between all of them.
The standard says:
Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be accessed only to
determine the value to be stored. /26/
Except as indicated by the syntax /27/ or otherwise specified later
(for the function-call operator () , && , || , ?: , and comma
operators), the order of evaluation of subexpressions and the order in
which side effects take place are both unspecified.
So:
1.) a[p1++] = b[p2++]: It is guaranteed that the statement is evaluated correctly and gives the expected result. This is because each variable is modified only once and the result does not depend on the time when the actual increment of both variables is done.
2.) a++ + ++a: It is not guaranteed that the side effect (increment) is performed before the second usage of a. Hence this expression can give the value a + (a+1) or (a+1) + (a+1) or a + (a+2) depending on when your compiler performs the side effect increments of the original variable.
Online C 2011 standard:
6.5 Expressions
...
3 The grouping of operators and operands is indicated by the syntax.85) Except as specified
later, side effects and value computations of subexpressions are unsequenced.86)
85) The syntax specifies the precedence of operators in the evaluation of an expression, which is the same
as the order of the major subclauses of this subclause, highest precedence first. Thus, for example, the
expressions allowed as the operands of the binary + operator (6.5.6) are those expressions defined in
6.5.1 through 6.5.6. The exceptions are cast expressions (6.5.4) as operands of unary operators
(6.5.3), and an operand contained between any of the following pairs of operators: grouping
parentheses () (6.5.1), subscripting brackets [] (6.5.2.1), function-call parentheses () (6.5.2.2), and
the conditional operator ? : (6.5.15).
Within each major subclause, the operators have the same precedence. Left- or right-associativity is
indicated in each subclause by the syntax for the expressions discussed therein.
86) In an expression that is evaluated more than once during the execution of a program, unsequenced and
indeterminately sequenced evaluations of its subexpressions need not be performed consistently in
different evaluations.
Emphasis added.
There's no guarantee that the side effect of either a++ or ++a is applied before the other expression is evaluated, so you can get different results depending on the sequence of operations.
Here are several cases, assuming a starts out at 1:
Left to right evaluation, side effects applied immediately: (1) + (2+1) == 4
Left to right evaluation, side effects deferred: (1) + (1+1) == 3
Right to left evaluation, side effects applied immediately: (2) + (1+1) == 4
Right to left evaluation, side effects deferred: (1) + (1+1) == 3
Or any other combination.

C Programming : Confusion between operator precedence

I am confused between precedence of operators and want to know how this statement would be evaluated.
# include <stdio.h>
int main()
{
int k=35;
printf("%d %d %d",k==35,k=50,k>40);
return 0;
}
Here k is initially have value 35, when I am testing k in printf I think :
k>40 should be checked which should result in 0
k==35 should be checked and which should result in 1
Lastly 50 should get assigned to k and which should output 50
So final output should be 1 50 0, but output is 0 50 1.
You can not rely on the output of this program since it is undefined behavior, the evaluation order is not specified in C since that allows the compiler to optimize better, from the C99 draft standard section 6.5 paragraph 3:
The grouping of operators and operands is indicated by the syntax.74) Except as specified
later (for the function-call (), &&, ||, ?:, and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.
It is also undefined because you are accessing the value of k and assigning to it in the same sequence point. From draft standard section 6.5 paragraph 2:
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
it cites the following code examples as being undefined:
i = ++i + 1;
a[i++] = i;
Update
There was a comment as to whether the commas in the function call acted as a sequence point or not. If we look at section 6.5.17 Comma operator paragraph 2 says:
The left operand of a comma operator is evaluated as a void expression; there is a
sequence point after its evaluation.
but paragraph 3 says:
EXAMPLE As indicated by the syntax, the comma operator (as described in this subclause) cannot appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists of initializers).
So in this case the comma does not introduce a sequence point.
The order in which function arguments are evaluated is not specified. They can be evaluated in any order. The compiler decides.
This is undefined behaviour.
You may get any value. Lack of sequence points in two consecutive execution. Increase strictness level for warning and you will get warning: operation on ‘k’ may be undefined.

C order of operations -- foo() + bar() -- must foo be called before bar?

In the following code:
int foo();
int bar();
int i;
i = foo() + bar();
Is it guaranteed by the C standard that foo is called before bar is called?
No, there's no sequence point with +. There's actually a quote on the Wikipedia page about it that answers your question:
Consider two functions f() and g(). In C and C++, the + operator is not associated with a sequence point, and therefore in the expression f()+g() it is possible that either f() or g() will be executed first.
http://en.wikipedia.org/wiki/Sequence_points
It's unspecified, and in the case of C99 the relevant quotation is 6.5/3:
Except as specified later (for the function-call (), &&, ||, ?:, and
comma operators), the order of evaluation of subexpressions and the
order in which side effects take place are both unspecified.
In your example, foo() and bar() are subexpressions of the full expression i = foo() + bar().
The "later" for function calls isn't directly relevant here, but for reference it is 6.5.2.2/10:
The order of evaluation of the function designator,the actual
arguments, and subexpressions within the actual arguments is
unspecified, but there is a sequence point before the actual call.
For && it's 6.5.13/4:
Unlike the bitwise binary & operator,the && operator guarantees
left-to-right evaluation; there is a sequence point after the
evaluation of the first operand.
Since + is not in the list of operators at the top, && and + are "unlike" in the same way that && and & are "unlike", and this is precisely the thing you're asking about. Unlike &&, + does not guarantee left-to-right evaluation.
No, it is not. The evaluation order of function and operator arguments is undefined.
The standard says only, that calls to foo and bar cannot be interleaved, which can happen when evaluating subexpressions without function calls.
No, this is not defined. From K & R page 200:
the order of evaluation of expressions is, with certain exceptions, undefined, even if the subexpressions involve side effects. That is, unless the definition of the operator guarantees that its operands are evaluated in a particular order, the implementation is free to evaluate operands in any order, or even to interleave their evaluation.
Page 205 of K & R describes the additive operators, and doesn't define the order of evaluator of the two operands.
The correct answer where I work is "If the order is important, the code is unmaintainable regardless of what the standard says will happen". If you must have foo() evaluated before bar(), explicitly evaluate foo() before bar(). The basis for this is not every programmer knows the standards, and those that do don't know if the original author did.

Will a+b+c be operated like this: a+c+b?

As all we know: the sequence of evalutation is determined by the priority and associativity.
For this example,the associativity determined that a+b,then the result plus c. This is what ANSI C compliant compiler do(leave out the optimization).But will it be evaluated like foregoing manner in the title? In what compiler? In K&R C?
Let me throw this at you:
Operator Precedence vs Order of Evaluation
The compiler is free to rearrange things, as long as the end result is the same.
For example:
1 + b + 1
Can easily be transformed to:
b + 2
The structure of the equation in mathematical terms (in a+(b*c) we talk about b*c being evaluated "first") is not necessarily related to the order the compiler will evaluate the arguments
The actual order of execution in this instance is undefined IIRC. C only guarantees that the order of expressions separated by sequence points remains unchanged, and the + operator is not a sequence point.
Most compilers will do what you expect - generating code that will evaluate a then b then c
n1256:
6.5 Expressions
...
3 The grouping of operators and operands is indicated by the syntax.74) Except as specified
later (for the function-call (), &&, ||, ?:, and comma operators), the order of evaluation
of subexpressions and the order in which side effects take place are both unspecified.
...
74) The syntax specifies the precedence of operators in the evaluation of an expression, which is the same
as the order of the major subclauses of this subclause, highest precedence first. Thus, for example, the
expressions allowed as the operands of the binary + operator (6.5.6) are those expressions defined in
6.5.1 through 6.5.6. The exceptions are cast expressions (6.5.4) as operands of unary operators
(6.5.3), and an operand contained between any of the following pairs of operators: grouping
parentheses () (6.5.1), subscripting brackets [] (6.5.2.1), function-call parentheses () (6.5.2.2), and
the conditional operator ?: (6.5.15).
Within each major subclause, the operators have the same precedence. Left- or right-associativity is
indicated in each subclause by the syntax for the expressions discussed therein.
Emphasis mine. The expression a + b + c will be evaluated as (a + b) + c; that is, the result of c will be added to the result of a + b. Both a and b must be evaluated before a + b can be evaluated, but a, b, and c can be evaluated in any order.
a + b + c == c + b + a
The order does not matter.
It is called operator precedence
I'll try and highlight the difference between what you consider to be the order of evaluation and what the compiler considers it to be.
Mathematically we say that in the expression a + b * c, the multiplication is evaluated before the addition. Of course it must be because we need to know what to add to a.
However, the compiler doesn't necessarily have to consider evaluating the expression b * c before it evaluates a. You might think that because multiplication has a higher precedence, then the compiler will look at that part of the expression first. Actually, there is no guarantee about what the compiler will decide to do first. It may evaluate a first, or b, or c. This behaviour is unspecified by the standard.
To demonstrate, let's look at the following code:
#include <iostream>
int f() { std::cout << "f\n"; return 1; }
int g() { std::cout << "g\n"; return 2; }
int h() { std::cout << "h\n"; return 3; }
int main(int argc, const char* argv[])
{
int x = f() + g() * h();
std::cout << x << std::endl;
return 0;
}
Each function, f(), g() and h(), simply outputs the name of the function to the standard output and then returns 1, 2 or 3 respectively.
When the program starts, we initalise a variable x to be f() + g() * h(). This is exactly the expression we looked at earlier. The answer will of course be 7. Now, naively you might assume that multiplication happens first, so it'll go there and it'll do g(), then multiply it by h(), then it'll do f() and add it to the previous result.
Actually, compiling this with GCC 4.4.5 shows me that the functions are executed in the order that they appear in the expression: f(), then g(), then h(). This isn't something that will necessarily happen the same in all compilers. It completely depends on how the compiler wants to do it.
If you're performing operations that are associative or commutative then the compiler is also free to swap around the mathematical groupings in the expression, but only if the result will be exactly the same. The compiler must be careful not to do any regroupings that may cause overflows to happen which wouldn't have happened anyway. As long as the result is as defined by the standard, the compiler is free to do what it likes.

Resources