Clarification of C Operator Associativity - c

I'm reading "The C Programming Language" book, and I came across the following line (paraphrasing):
"...in expressions like x = f() + g();
f() maybe called before g() or vice versa. C doesn't specify the order
in which the operands will be evaluated..."
But, according to precedence and association rules, doesn't functions calls have highest precedence? And, since the associativity of the function call operator () is from "left to right", shouldn't f() be called before g() (definitely) ?

But, according to precedence and association rules, doesn't functions
calls have highest precedence?
The function calls have highest precedence. It means that both f() and g() will be evaluated before "+" is evaluated. It does not mean that f() will be evaluated before or after g().
And, since the associativity of the function call operator () is from
"left to right", shouldn't f() be called before g() (definitely) ?
No, it shouldn't. The operator () has left to right associativity. It means that in expression f(x)(y) (can have it in C if the function f returns another function), f(x) is calculated first and then the resulting function is applied to y. It does not mean that f() will be evaluated before or after g().

Associativity and operator precedence only matter when you can write a construct that is ambiguous without them. In a + b * c, operator precedence requires b * c to be evaluated first. In a + b + c, associativity requires a + b to be evaluated first, when the program can tell the difference (such as when a, b, and c are floating point values).
In f() + g(), operator precedence does come into it a little; it says that f and g are the functions being called, as opposed to f and f() + g, which is what would happen if addition had higher precedence, and it also says that both function calls happen before the addition. Associativity, though, is irrelevant, because the addition is in the middle. I don't think it's even possible to write the equivalent of a + b + c with function-call operators, because the function-call operator is asymmetric.
Neither associativity nor operator precedence controls which function call is evaluated first. The only guarantee you have is that one of them is evaluated first — if there's any way for you to observe this, their execution can't be interleaved.

This is neither the case of precedence or associativity in the way you refer to.
Simply, f() and g() will be evaluated in whatever order before + is applied and then their results will be added to produce x.

Related

Operators in the C language with the same precedence level?

C programming language documentation Precedence and order of evaluation states:
The direction of evaluation does not affect the results of expressions that include more than one multiplication (*), addition (+), or binary-bitwise (&, |, or ^) operator at the same level. Order of operations is not defined by the language.
What exactly does the above mean (perhaps a code example will help)?
That page is not particularly well-written.
Precedence determines which operators are grouped with which operands in an expression - it does not dictate the order in which subexpressions are evaluated. For example, in the expression a + b * c, the * operator has higher precedence than the + operator, so the expression is parsed as a + (b * c) - the result of a is added to the result of b * c.
However, each of the expressions a, b, and c may be evaluated in any order, even simultaneously (interleaved or in parallel). There’s no requirement that b be evaluated before c or that either must be evaluated before a.
Associativity also affects grouping of operators and operands when you have multiple operators of the same precedence - the expression a + b + c is parsed as (a + b) + c because the + operator (along with the other arithmetic operators) is left-associative. The result of a + b is added to the result of c.
But like with precedence above, this does not control order of evaluation. Again, each of a, b, and c may be evaluated in any order.
The only operators which force left-to-right evaluation of their operands are the &&, ||, ?:, and comma operators.
From the comments:
Because the operands b and c accompany the * (multiplication) operator which has higher precedence than the + (addition operator), then isn't it required (always) that both b and c be evaluated first before a?
The only requirement is that the result of b * c be known before it can be added to the result of a. It doesn’t mean that b * c must be evaluated before a:
t0 <- a
t1 <- b * c
t2 <- t1 + t0
Again, precedence only controls the grouping of operators and operands, not the order in which subexpressions are evaluated.
I assume that what the cited documentation is trying to say is that given the code
a = f1() + f2() + f3();
or
b = f1() * f2() * f3();
we do not know which of the functions f1, f2, or f3 will be called first.
However, it is guaranteed that the result of calling f1 will be added to the result of calling f2, and that this intermediate sum will then be added to the result of calling f3. Similarly for the multiplications involved in computing b. These aspects of the evaluation order are guaranteed due to the left-associativity of addition and multiplication. That is, the results (both the defined and the unspecified aspects) are the same as if the expressions had been written
a = (f1() + f2()) + f3();
and
b = (f1() * f2()) * f3();
Upon reading the cited documentation, however, I fear that I may be wrong. It's possible that the cited documentation is simply wrong, in that it seems to be suggesting that the +, *, &, |, and ^ are somehow an exception to the associativity rules, and that the defined left-associativity is somehow not applicable. That's nonsense, of course: left-associativity is just as real when applied to these operators as it is when applied to, say, - and /.
To explain: If we write
10 - 5 - 2
it is unquestionably equivalent to
(10 - 5) - 2
and therefore results in 3. It is not equivalent to
10 - (5 - 2)
and the result is therefore not 7. Subtraction is not commutative and not associative, so the order you do things in almost always matters.
In real mathematics, of course, addition and multiplication are fully commutative and associative, meaning that you can mix things up almost any which way and still get the same result. But what's not as well known is that computer mathematics are significantly enough different from "real" mathematics that not all of the rules — in particular, commutativity — actually apply.
Consider the operation
-100000000 + 2000000000 + 200000000
If it's evaluated the way I've said it has to be, it's
(-100000000 + 2000000000) + 200000000
which is
1900000000 + 200000000
which is 2100000000, which is fine.
If someone (or some compiler) chose to evaluate it the way I've said it couldn't be evaluated, on the other hand, it might come out as
-100000000 + (2000000000 + 200000000) /* WRONG */
which is
-100000000 + 2200000000
which is... wait a minute. We're in trouble already. 2200000000 is a 32-bit number, which means it can't be properly represented as a positive, 32-bit signed integer.
In other words, this is an example of an expression which, if you evaluate it in the wrong order, can overflow, and theoretically become undefined.
Similar things can happen with floating-point arithmetic. The expression
1.2e-50 * 3.4e300 * 5.6e20
will overflow (exceed the maximum value of a double, which is good up to about 1e307) if the second multiplication wrongly happens first. The expression
2.3e100 * 4.5e-200 * 6.7e-200
will underflow (to zero, exceeding the minimum value of a double) if the second multiplication happens first.
The point I'm trying to make here is that computer addition and multiplication are not quite commutative, meaning that a compiler should not rearrange them. If a compiler does (as the cited documentation seems to, wrongly, claim is possible), you, the programmer, can see results which are significantly and wrongly different from what the C Standard said you were allowed to expect.
I hope this all makes some kind of sense, although in closing, I should perhaps suggest that it's not necessarily as unambiguous and clear-cut as I've made it sound. I believe what I've described (that is, the strict associativity, and non commutativity, of multiplication and addition) is what's formally required by the current C standards, and by IEEE-754. However, I'm not sure they've been required by all versions of the C Standard, and I don't believe they were clearly guaranteed by Ritchie's original definition of C, either. They're not guaranteed by all C compilers, they're not depended upon or cared about by all C programmers, and they're not appreciated by people who write documentation like that cited in this thread.
(Also, for those really interested in fine points, rearranging integer addition as if it were commutative is not quite so wrong — or, at least, it's not visible/detectably wrong — if you know you're compiling for a processor that quietly wraps around on signed integer overflow.)

Beginner in need of a simple explanation of the difference between order of evaluation and precedence/associativity

I am reading the end of the 2nd chapter of K&R and I'm having some difficulty understanding two specific unrelated example lines of code (which follow) along with commentary of them in the book:
x = f() + g();
a[i] = i++;
FIRST LINE - I have no trouble understanding that the standard does not specify the order of evaluation for the + operator, and that therefore it is unspecified whether f() or g() evaluates first (and that is why I think the question isn't a duplicate). My confusion stems from the fact that if we look up the C operator precedence chart it cites function calls as of highest precedence with left-to-right associativity. Now doesn't that mean that f() has to be called/evaluated before g()? Obviously not, but I don't know what I am missing.
SECOND LINE - Again the similar conundrum regarding whether the array is indexed to the initial value of i or the incremented value. However, again the operator precedence chart cites array subscripting as of highest precedence with left-to-right associativity. Therefore wouldn't array subscripting be the first thing to be evaluated causing the array to be subscripted to the initial value of i and removing any unambiguity? Obviously not, and I'm missing something.
I do understand that compilers have the freedom to decide when side effects happen in an expression (between sequence points of course) and that that may cause undefined behaviour if the variable in question is used again in the same expression, however in the examples above it seems that any ambiguity is cleared by function calls and array subscripting having highest precedence and defined left-to-right associativity, so I fail to see the ambiguity.
I have a feeling that I have some fundamental misconception about the concepts of associativity, operator precedence and order of evaluation, but I can't point my finger on what it is, and similar questions/answers on this topic were out of my league to understand thoroughly at this point.
FIRST LINE
The left-to-right associativity means that an expression such as f()()() is evaluated as ((f())())(). The associativity of the function call operator () says nothing about its relationship with other operators such as +.
(Note that associativity only really makes sense for nestable infix operators such as binary +, %, or ,. For operators such as function call or the unary ones, associativity is rather pointless in general.)
SECOND LINE
Operator precedence affects parsing, not order of evaluation. The fact that [] has higher precedence than = means that the expression is parsed as (a[i]) = (i++). It says very little about evaluation order; a[i] and i++ must both be evaluated before the assignment, but nothing is said about their order with respect to each other.
To hopefully clear up confusion:
Associativity controls parsing and tells you whether a + b + c is parsed as (a + b) + c (left-to-right) or as a + (b + c) (right-to-left).
Precedence also controls parsing and tells you whether a + b * c is parsed as (a + b) * c (+ has higher precedence than *) or as a + (b * c) (* has higher precedence than +).
Order of evaluation controls which values need to be evaluated in which order. Parts of it can follow from associativity or precedence (an operand must be evaluated before it's used), but it's seldom fully defined by them.
It's not really meaningful to say that function calls have left-to-right associativity, and even if it were meaningful, this would only apply to exotic combinations where two function-call operators were being applied right next to each other. It wouldn't say anything about two separate function calls on either side of a + operator.
Precedence and associativity don't help us at all in the expression a[i] = i++. There simply is no rule that says precisely when within an expression i++ stores the new result back into i, meaning that there is no rule to tell us whether the a[i] part uses the old or the new value. That's why this expression is undefined.
Precedence tells you what happens when you have two different operators that might apply. In a + b * c, does the + or the * apply first? In *p++, does the * or the ++ apply first? Precedence answers these questions.
Associativity tells you what happens when you have two of the same operators that might apply (generally, a string of the same operators in a row). In a + b + c, which + applies first? That's what associativity answers.
But the answers to these questions (that is, the answers supplied by the precedence and associativity rules) apply rather narrowly. They tell you which of the two operators you were wondering about apply first, but they do not tell you much of anything about the bigger expression, or about the smaller subexpressions "underneath" the operators you were wondering about. (For example, if I wrote (a - b) + (c - d) * (e - f), there's no rule to say which of the subtractions happens first.)
The bottom line is that precedence and associativity do not fully determine order of evaluation. Let's say that again in a slightly different way: precedence and associativity partially determine the order of evaluation in certain expressions, but they do not fully determine the order of evaluation in all expressions.
In C, some aspects of the order of evaluation are unspecified, and some are undefined. (This is by contrast to, as I understand it, Java, where all aspects of evaluation order are defined.)
See also this answer which, although it's about a different question, explains the same points in more detail.
Precedence and associativity matter when an expression has more than one operator.
Associativity doesn't matter with addition, because as you may remember from grade school math, addition is commutative and associative -- there's no difference between (a + b) + c, a + (b + c), or (b + c) + a (but see the Note at the end of my answer).
But consider subtraction. If you write
100 - 50 - 5
it matters whether you treat this as
(100 - 50) - 5 = 45
or
100 - (50 - 5) = 55
Left associativity means that the first interpretation will be used.
Precedence comes into play when you have different operators, e.g.
10 * 20 + 5
Since * has higher precedence than +, this is treated like
(10 * 20) + 5 = 205
rather than
10 * (20 + 5) = 250
Finally, order of evaluation is only noticeable when there are side effects or other dependencies between the sub-expressions. If you write
x = f() - g() - h()
and these functions each print something, the language doesn't specify the order in which the output will occur. Associativity doesn't change this. Even though the results will be subtracted in left-to-right order, it could call them in a different order, save the results somewhere, and then subtract them in the correct order. E.g. it could act as if you'd written:
temp_h = h();
temp_f = f();
temp_g = g();
x = (temp_f - temp_g) - temp_h;
Any reordering of the first 3 lines would be allowed as an interpretation.
Note
Note that in some cases, computer arithmetic is not exactly like real arithmetic. Numbers in computers generally have limited range or precision, so there can be anomalous results (e.g. overflow if the result of addition is too large). This could cause different results depending on the order of operations even with operators that are theoretically associative, e.g. mathematically the following two expressions are equivalent:
x + y - z = (x + y) - z
y - z + x = (y - z) + x
But if x + y overflows, the results can be different. Use explicit parentheses to override the default associativity if necessary to avoid a problem like this.
Regarding your first question:
x = f() + g();
The left-to-right associativity relates to operators at the same level that are directly grouped together. For example:
x = a + b - c;
Here the + and - operators have the same precedence level, so a + b is first evaluated, then a + b - c.
For an example more related to yours, imagine a function that returns a function pointer. You could then do something like this:
x()();
In this case, the function x must be called first, then the function pointer returned by x is called.
For the second:
a[i] = i++;
The side effect of the postincrement operator is not guaranteed to occur until the next sequence point. Because there are no sequence points in this expression, the i on the left side may be evaluated before or after the side effect of ++. This invokes undefined behavior due to both reading and writing a variable without a sequence point.
FIRST LINE - Associativity is not relevant here. Associativity only really comes into play when you have a sequence of operators with the same precedence. Let's take the expression x + y - z. The additive operators + and - are left-associative, so that sequence is parsed as (x + y) - z - IOW, the result of z is subtracted from the result of x + y.
THIS DOES NOT MEAN that any of x, y, or z have to be evaluated in any particular order. It does not mean that x + y must be evaluated before z. It only means that the result of x + y must be known before the result of z is subtracted from it.
Regarding x = f() + g(), all that matters is that the results of f() and g() are known before they can be added together - it does not mean that f() must be evaluated before g(). And again, associativity has no effect here.
SECOND LINE - This statement invokes undefined behavior precisely because the order of operations is unspecified (strictly speaking, the expressions a[i] and i++ are unsequenced with respect to each other). You cannot both update an object (i++) and use its value in a computation (a[i]) in the same expression without an intervening sequence point. The result will not be consistent or predictable from build to build (it doesn't even have to be consistent from run to run of the same build). Expressions like a[i] = i++ (or a[i++] = i) and x = x++ all have undefined behavior, and the result can be quite literally anything.
Note that the &&, ||, ?:, and comma operators do force left-to-right evaluation and introduce sequence points, so an expression like
i++ && a[i]
is well-defined - i++ will be evaluated first and its side effect will be applied before a[i] is evaluated.
Precedence and associativity fall out of the language grammar - for example, the grammar for the additive operators + and - is
additive-expression:
multiplicative-expression
additive-expression + multiplicative-expression
additive-expression - multiplicative-expression
IOW, an additive-expression can produce a single multiplicative-expression, or it can produce another additive-expression followed by an additive operator followed by a multiplicative-expression. Let's see how this plays out with x + y - z:
x -- additive-expression ---------+
|
+ +-- additive-expression --+
| |
y -- multiplicative-expression ---+ |
+-- additive-expression
- |
|
z -- multiplicative-expression -----------------------------+
You can see that x + y is grouped together into an additive-expression first, and then that expression is grouped with z to form another additive-expression.

C operator order

Why is the postfix increment operator (++) executed after the assignment (=) operator in the following example? According to the precedence/priority lists for operators ++ has higher priority than = and should therefore be executed first.
int a,b;
b = 2;
a = b++;
printf("%d\n",a);
will output a = 2.
PS: I know the difference between ++b and b++ in principle, but just looking at the operator priorities these precende list tells us something different, namely that ++ should be executed before =
++ is evaluated first. It is post-increment, meaning it evaluates to the value stored and then increments. Any operator on the right side of an assignment expression (except for the comma operator) is evaluated before the assignment itself.
It is. It's just that, conceptually at least, ++ happens after the entire expression a = b++ (which is an expression with value a) is evaluated.
Operator precedence and order of evaluation of operands are rather advanced topics in C, because there exists many operators that have their own special cases specified.
Postfix ++ is one such special case, specified by the standard in the following manner (6.5.2.4):
The value computation of the result is sequenced before the side
effect of updating the stored value of the operand.
It means that the compiler will translate the line a = b++; into something like this:
Read the value of b into a CPU register. ("value computation of the result")
Increase b. ("updating the stored value")
Store the CPU register value in a.
This is what makes postfix ++ different from prefix ++.
The increment operators do two things: add +1 to a number and return a value. The difference between post-increment and pre-increment is the order of these two steps. So the increment actually is executed first and the assignment later in any case.

C order of operations -- foo() + bar() -- must foo be called before bar?

In the following code:
int foo();
int bar();
int i;
i = foo() + bar();
Is it guaranteed by the C standard that foo is called before bar is called?
No, there's no sequence point with +. There's actually a quote on the Wikipedia page about it that answers your question:
Consider two functions f() and g(). In C and C++, the + operator is not associated with a sequence point, and therefore in the expression f()+g() it is possible that either f() or g() will be executed first.
http://en.wikipedia.org/wiki/Sequence_points
It's unspecified, and in the case of C99 the relevant quotation is 6.5/3:
Except as specified later (for the function-call (), &&, ||, ?:, and
comma operators), the order of evaluation of subexpressions and the
order in which side effects take place are both unspecified.
In your example, foo() and bar() are subexpressions of the full expression i = foo() + bar().
The "later" for function calls isn't directly relevant here, but for reference it is 6.5.2.2/10:
The order of evaluation of the function designator,the actual
arguments, and subexpressions within the actual arguments is
unspecified, but there is a sequence point before the actual call.
For && it's 6.5.13/4:
Unlike the bitwise binary & operator,the && operator guarantees
left-to-right evaluation; there is a sequence point after the
evaluation of the first operand.
Since + is not in the list of operators at the top, && and + are "unlike" in the same way that && and & are "unlike", and this is precisely the thing you're asking about. Unlike &&, + does not guarantee left-to-right evaluation.
No, it is not. The evaluation order of function and operator arguments is undefined.
The standard says only, that calls to foo and bar cannot be interleaved, which can happen when evaluating subexpressions without function calls.
No, this is not defined. From K & R page 200:
the order of evaluation of expressions is, with certain exceptions, undefined, even if the subexpressions involve side effects. That is, unless the definition of the operator guarantees that its operands are evaluated in a particular order, the implementation is free to evaluate operands in any order, or even to interleave their evaluation.
Page 205 of K & R describes the additive operators, and doesn't define the order of evaluator of the two operands.
The correct answer where I work is "If the order is important, the code is unmaintainable regardless of what the standard says will happen". If you must have foo() evaluated before bar(), explicitly evaluate foo() before bar(). The basis for this is not every programmer knows the standards, and those that do don't know if the original author did.

Will a+b+c be operated like this: a+c+b?

As all we know: the sequence of evalutation is determined by the priority and associativity.
For this example,the associativity determined that a+b,then the result plus c. This is what ANSI C compliant compiler do(leave out the optimization).But will it be evaluated like foregoing manner in the title? In what compiler? In K&R C?
Let me throw this at you:
Operator Precedence vs Order of Evaluation
The compiler is free to rearrange things, as long as the end result is the same.
For example:
1 + b + 1
Can easily be transformed to:
b + 2
The structure of the equation in mathematical terms (in a+(b*c) we talk about b*c being evaluated "first") is not necessarily related to the order the compiler will evaluate the arguments
The actual order of execution in this instance is undefined IIRC. C only guarantees that the order of expressions separated by sequence points remains unchanged, and the + operator is not a sequence point.
Most compilers will do what you expect - generating code that will evaluate a then b then c
n1256:
6.5 Expressions
...
3 The grouping of operators and operands is indicated by the syntax.74) Except as specified
later (for the function-call (), &&, ||, ?:, and comma operators), the order of evaluation
of subexpressions and the order in which side effects take place are both unspecified.
...
74) The syntax specifies the precedence of operators in the evaluation of an expression, which is the same
as the order of the major subclauses of this subclause, highest precedence first. Thus, for example, the
expressions allowed as the operands of the binary + operator (6.5.6) are those expressions defined in
6.5.1 through 6.5.6. The exceptions are cast expressions (6.5.4) as operands of unary operators
(6.5.3), and an operand contained between any of the following pairs of operators: grouping
parentheses () (6.5.1), subscripting brackets [] (6.5.2.1), function-call parentheses () (6.5.2.2), and
the conditional operator ?: (6.5.15).
Within each major subclause, the operators have the same precedence. Left- or right-associativity is
indicated in each subclause by the syntax for the expressions discussed therein.
Emphasis mine. The expression a + b + c will be evaluated as (a + b) + c; that is, the result of c will be added to the result of a + b. Both a and b must be evaluated before a + b can be evaluated, but a, b, and c can be evaluated in any order.
a + b + c == c + b + a
The order does not matter.
It is called operator precedence
I'll try and highlight the difference between what you consider to be the order of evaluation and what the compiler considers it to be.
Mathematically we say that in the expression a + b * c, the multiplication is evaluated before the addition. Of course it must be because we need to know what to add to a.
However, the compiler doesn't necessarily have to consider evaluating the expression b * c before it evaluates a. You might think that because multiplication has a higher precedence, then the compiler will look at that part of the expression first. Actually, there is no guarantee about what the compiler will decide to do first. It may evaluate a first, or b, or c. This behaviour is unspecified by the standard.
To demonstrate, let's look at the following code:
#include <iostream>
int f() { std::cout << "f\n"; return 1; }
int g() { std::cout << "g\n"; return 2; }
int h() { std::cout << "h\n"; return 3; }
int main(int argc, const char* argv[])
{
int x = f() + g() * h();
std::cout << x << std::endl;
return 0;
}
Each function, f(), g() and h(), simply outputs the name of the function to the standard output and then returns 1, 2 or 3 respectively.
When the program starts, we initalise a variable x to be f() + g() * h(). This is exactly the expression we looked at earlier. The answer will of course be 7. Now, naively you might assume that multiplication happens first, so it'll go there and it'll do g(), then multiply it by h(), then it'll do f() and add it to the previous result.
Actually, compiling this with GCC 4.4.5 shows me that the functions are executed in the order that they appear in the expression: f(), then g(), then h(). This isn't something that will necessarily happen the same in all compilers. It completely depends on how the compiler wants to do it.
If you're performing operations that are associative or commutative then the compiler is also free to swap around the mathematical groupings in the expression, but only if the result will be exactly the same. The compiler must be careful not to do any regroupings that may cause overflows to happen which wouldn't have happened anyway. As long as the result is as defined by the standard, the compiler is free to do what it likes.

Resources