Consider the below question in regard to this expression: (a+b)+ (c +(d+e)) +(f+g)
In C we use precedence and associativity to decide which operator should be evaluated first and if there is more than one operator with the same precedence, then in what order they should be associated.
My question is:
Is (d+e) evaluated first by the compiler as it is the innermost nested parenthesis in the expression?
If so, do associativity and precedence somehow recommend innermost parenthesis evaluation should be done first?
I strongly feel that it doesn't, and if it doesn't then why would the compiler even decide to evaluate the innermost parenthesis first? Because going from left to right and evaluating parentheses at the same level seems much more logical to me.
You are confusing operator precedence with order of evaluation. These are different though related terms. See What is the difference between operator precedence and order of evaluation?
Specifically:
Do (d+e) is evaluated first by compiler as it is innermost nested parenthesis in the expression?
This depends on the order of evaluation of the + (additive) operators, nothing else. The order is unspecified for + (as it is for most operators) and we can't know it, nor should we rely on it. Compilers are allowed to do as they please and don't need to tell you (document) how and when they pick a certain order.
If so does associativity and precedence somehow explains innermost parenthesis evaluation should be done first?
No, this has nothing to do with operator precedence.
I strongly feel that it doesn't and if it don't then why we decided to evaluate innermost parenthesis first, because going from left to right and evaluating parenthesis at same level seems much logical to me
See 1)
Consider instead
(a() + b()) + (c() + (d() + e())) + (f() + g())
where each of the functions prints it's letter and, of course, returns a value
int a(void) { putchar('a'); return 42; }
When that expression is evaluated, something like "abcdefg" will be printed according to "order of evaluation", regardless of the operator precedence.
The order of evaluation is not something the Standard has rules for. Each compiler implementation can do the evaluation as it likes best ... even changing the order between compilations or runs.
Related
Parentheses and pointer symbol have same priority, and they are dealed from left to right. Why does the following code try to get the member nfct from skb, then do the type conversion? It's seems that the associativity is from right to left.
(struct nf_conn *) skb->nfct
I believe the point you're missing here is the Operator Precedence.
The pointer member access operator (->) has higher precedence than the cast.
To elaborate, (borrowed wordings)
Operator precedence determines which operator is performed first in an expression with more than one operators with different precedence.
Associativity is used (or comes into play) when two operators of same precedence appear in an expression.
In C (and some other C-like languages) we have 2 unary operators for working with pointers: the dereference operator (*) and the 'address of' operator (&). They are left unary operators, which introduces an uncertainty in order of operations, for example:
*ptr->field
or
*arr[id]
The order of operations is strictly defined by the standard, but from a human perspective, it is confusing. If the * operator was a right unary operator, the order would be obvious and wouldn't require extra parentheses:
ptr*->field vs ptr->field*
and
arr*[id] vs arr[id]*
So is there a good reason why are the operators left unary, instead of right. One thing that comes to mind would be the declaration of types. Left operators stay near the type name (char *a vs char a*), but there are type declarations, which already break this rule, so why bother (char a[num], char (*a)(char), etc).
Obviously, there are some problems with this approach too, like the
val*=2
Which would be either an *= short hand for val = val * 2 or dereference and assign val* = 2.
However this can be easily solved by requiring a white space between the * and = tokens in case of dereferencing. Once again, nothing groundbreaking, since there is a precedent of such a rule (- -a vs --a).
So why are they left instead of right operators?
Edit:
I want to point out, that I asked this question, because many of the weirder aspects of C have interesting explanations, for why they are the way they are, like the existence of the -> operator or the type declarations or the indexing starting from 0. And so on. The reasons may be no longer valid, but they are still interesting in my opinion.
There indeed is an authoritative source: "The Development of the C Language" by the creator of the language, Dennis M. Ritchie:
An accident of syntax contributed to the perceived complexity of the language. The indirection operator, spelled * in C, is syntactically a unary prefix operator, just as in BCPL and B. This works well in simple expressions, but in more complex cases, parentheses are required to direct the parsing. For example, to distinguish indirection through the value returned by a function from calling a function designated by a pointer, one writes *fp() and (*pf)() respectively. The style used in expressions carries through to declarations, so the names might be declared
int *fp();
int (*pf)();
In more ornate but still realistic cases, things become worse:
int *(*pfp)();
is a pointer to a function returning a pointer to an integer. There are two effects occurring. Most important, C has a relatively rich set of ways of describing types (compared, say, with Pascal). Declarations in languages as expressive as C—Algol 68, for example—describe objects equally hard to understand, simply because the objects themselves are complex. A second effect owes to details of the syntax. Declarations in C must be read in an `inside-out' style that many find difficult to grasp [Anderson 80]. Sethi [Sethi 81] observed that many of the nested declarations and expressions would become simpler if the indirection operator had been taken as a postfix operator instead of prefix, but by then it was too late to change.
Thus the reason why * is on the left in C is because it was on the left in B.
B was partially based on BCPL, where the dereferencing operator was !.
This was on the left; the binary ! was an array indexing operator:
a!b
is equivalent to !(a+b).
!a
is the content of the cell whose address is given by a; it can appear on the left of an assignment.
Yet the 50 year old BCPL manual doesn't even contain mentions of the ! operator - instead, the operators were words: unary lv and rv. Since these were understood as if they were functions, it was natural that they preceded the operand; later the longish rv a could then be replaced with syntactic sugar !a.
Many of the current C operator practices can be traced via this route. B alike had a[b] being equivalent to *(a + b) to *(b + a) to b[a] just like in BCPL one could use a!b <=> b!a.
Notice that in B variables were untyped, so certainly similarity with declarations could not have been the reason to use * on the left there.
So the reason for unary * being on the left in C is as boring as "there wasn't any problem in the simpler programs with the unary * being on the left, in the position that everyone was accustomed to have the dereferencing operator in other languages, that no one really thought that some other way would have been better until it was too late to change it".
This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 8 years ago.
For my compiler class, we are gradually creating a pseudo-PASCAL compiler. It does, however, follow the same precedence as C. That being said, in the section where we create prefix and postfix operators, I get 0 for
int a = 1;
int b = 2;
++a - b++ - --b + a--
when C returns a 1. What I don't understand is how you can even get a 1. By doing straight prefix first, the answer should be 2. And by doing postfix first, the answer should be -2. By doing everything left to right, I get zero.
My question is, what should my precedence of my operators be to return a 1?
Operator precedence tells you for example whether ++a - b means (++a) - b or ++(a - b). Clearly it should be the former since the latter isn't even valid. In your implementation it's clearly the former (or you wouldn't be getting a result at all), so you implemeneted operator precedence correctly.
Operator precedence has nothing to do with the order in which subexpressions are evaluated. In fact the order in which the operator operands to + and - are evaluated is unspecified in C and any code that modifies the same variable twice without a sequence point in between invokes undefined behavior. So whichever order you choose is fine and 0 is as valid a result as any other value.
It is illegal to change variables several times in a row like that (roughly between asignments, the standard talks about sequence points). Technically, this is what the C standard calls undefined behaviour. The compiler has no obligation to detect you are writing nonsense, and can assume you will never do. Anything whatsoever can happen when you run the program (or even while compiling). Also check nasal demons in the Jargon File.
The ++ increment and -- decrement operators can be placed before or after a value, different affect. If placed before the operand (prefix), its value is immediately changed, if placed after the operand (postfix) its value is noted first, then the value is changed.
McGrath, Mike. (2006). C programming in easy steps, 2nd Edition. United Kingdom : Computer Step.
As far as I know, C uses lazy calculation for logical expressions, e. g. in expression
f(x) && g(x)
g(x) will not be called if f(x) is false.
But what about arithmetic expressions like
f(x)*g(x)
Does g(x) will be called if f(x) is zero?
Yes, arithmetic operations are eager, not lazy.
So in f(x)*g(x) both f and g are always called (pedantically the compiler is transforming that into some A-normal form and could even avoid some calls if that is not observable), but there is no guarantee about the order of calling f before or after g. And evaluating x*1/x or y*1/x is undefined behavior when x is 0.
This is not true in Haskell AFAIU
Yes, g(x) will still be called.
Generally, it would be a quite slow to conditionally elide the evaluation of the right-hand side just because the left-hand side is zero. Perhaps not in the case where the right-hand side is an expensive function call, but the compiler wouldn't presume to know that.
It's called "Short Circuit" instead of lazy. And, at least as far as the standard cares, yes -- i.e., it doesn't specify short-circuit evaluation for *.
A compiler might be able to do short-circuit evaluation if it can be certain g() has no side effects, but only under the as-if rule (i.e., it can do so only by finding that there's no externally observable difference, not because the standard gives it any direct permission to do so).
In case of logical operators && and || order of evaluation bound to take place from left to right and short circuiting takes place.
There is a sequence point between evaluation of the left and right operands of the && (logical AND), || (logical OR) (as part of short-circuit evaluation). For example, in the expression *p++ != 0 && *q++ != 0, all side effects of the sub-expression *p++ != 0 are completed before any attempt to access q, but not in case of arithmetic operators .
While that optimization would be possible, there are a few arguments against it:
You might pay more for the optimization than you get back from it: Unlike with logical operators, the optimization is likely to be beneficial in only a small percentage of all cases with arithmetic operators, but at the same time requires an additional check for 0 for every operation.
Because boolean truth values only have two possible values, there is a theoretical 50 % chance (1 ÷ 2) with short-circuiting boolean expressions that the second operand will not have to be evaluated. (This assumes uniform distribution, which is perhaps not realistic, but bear with me.) That is, you are likely to profit from the optimization in a relatively large percentage of cases.
Contrast this with integral numbers, where 0 is only one out of millions of possible values. The probability that the first operand is 0 is much lower: 1 ÷ 232 (for 32-bit integers, again assuming uniform distribution). Even if 0 were in fact somewhat more probable to occur than that (i.e. with a non-uniform distribution), it's still unlikely that we're dealing with the same order of magnitude as with truth values.
Floating point math further aggravates that issue. Here you need to deal with the possibility of rounding errors and denormalization. The probability that some calculation yields exactly 0 is likely to be even lower than with integral numbers.
Therefore the optimization is relatively unlikely to result in the remaining operand not being evaluated. But it will result in an added check for zero, 100 % of the time!
If you want evaluation rules to remain reasonably consistent, you would have to redefine short-circuit evaluation order of && and ||: Division has one important corner case, namely division by 0: Even if the first operand is 0, the quotient is not necessarily 0. Divison by 0 is to be treated as an error (except perhaps in IEEE floating-point math); therefore, you always have to evaluate the second operand in order to determine whether the calculation is valid.
There is one alternative optimization for /: division by 1. In that case, you wouldn't have to divide at all, but simply return the first operand. / would therefore be better optimised by starting with the second operand (divisor).
Now, unless you want &&, ||, and * to start evaluation with the first operand, but / to start with the second (which might seem unintuitive), you would have to generally re-define short-circuiting behavior such that the second operand always gets evaluated first, which would be a departure from the status quo.
This is not per se a problem, but might break a lot of existing code if the C language were thus changed.
The optimization might break "compatibility" with C++ code where operators can be overloaded. Would the optimizations still apply to overloaded * and / operators? Or would there have to be two different forms of these operators, one short-circuiting, and one with eager evaluation?
Again, this is not a deficiency inherent in short-circuit arithmetic operators, but an issue that would arise if such short-circuiting were introduced into the C (and C++) language as a breaking change.
If I do *ptr[x], is that equivalent to *(ptr[x]), or (*ptr)[x]?
*(ptr[x])
See the Wikipedia operator precedence table, or, for a more detailed table, this C/C++ specific table.
In C, all postfix operators have higher precedence than prefix operators, and prefix operators have higher precedence than infix operators. So its *(ptr[x])
Using the counter-clockwise movement of analyzing and parsing that simple example
1. starting with ptr, work in counter-clockwise until you hit asterisk operator
2. asterisk, in counter-clockwise until you hit subscript operator
3. we arrive here, at subscript operator [x]
Since [] has higher precedence than the asterisk as per this table , that makes it *(ptr[x])