I'm reading precedence and associativity. In the table, I observed these two things -
(i) precedence of postfix increment(or decrement) operator is greater than precedence of prefix increment(or dec.) operator.
(ii) associativity of postfix inc.(or dec.) operator is left-to-right but that of prefix increment(or dec.) operator is right-to-left.
I'm not sure why it is needed. Can anyone help me by showing code(separately for each case), which shows the need of these two facts? Thanks.
I tried to think about cases but not getting any such(as I'm very new to programming).
This is needed for an expression like
data = *pointer++;
You to obtain the value at pointer in data, and then increment it to the next element. If the precedence of the postfix weren't greater, you'd end up with an incremented value in data.
And the associativity of the prefix operator is right-to-left because you want an expression like
data = **pointer_to_pointer;
to be evaluated from right to left as if you'd write
data = *(*pointer_to_pointer);
In C, as in most languages, postfix operators bind more tightly than prefix operators, and prefix operators bind more tightly than binary operators. (There are exceptions to the second part, but not in C.) That generally corresponds to intuitions about the meaning of expressions.
For example, almost everyone would expect
-a[0]
to mean "the negative of element 0 of the array a", rather than "element 0 of the array -a", particularly in languages like C where "the array -a" is not meaningful. Similarly,
-a++
has the expected meaning "the negative of the current value of a, which is subsequently incremented." Again, incrementing -a is meaningless since -a is not a variable.
Naive intuitions might not work as well with more obscure operators, so it is useful to maintain consistency. While it is imaginable that there might exist a prefix operator which almost always needs to be surrounded in parentheses because it doesn't bind tightly enough, making that operator an exception to the general rule us likely to create more surprises than it solves, and few languages take this path.
So prefix and postfix uses of ++ and -- have syntax defined by this common rule. Nonetheless, in C at least, it is an error to apply both to the same operand (++a--) because the value returned, both by the pre- and post-fix versions, is not an lvalue, while the operand is required to be an lvalue. In that sense, the particular case of comparing precedences of prefix and postfix ++ and -- never shows up in a correct program. But other combinations of prefix and postfix operators do, and precedence levels need to apply homogenously.
There is another sense in which the precedence and associativity declarations are redundant. It is syntactically meaningless to talk about the associaticity between two prefix operators, or between two postfix operators. And if prefix and postfix operators have different precedence levels, it is also meaningless to talk about associativity between a prefix and a postfix operator. So the associativity is irrelevant.
However, you could stretch the concept of associativity and say that all unary operators have the same precedence and all of them associate to the right. That will actually produce a correct parser, and was used in the definition of B (C's predecessor). But it really is too confusing for most people not accustomed to grammatical analysis.
(i) precedence of postfix operator is greater than precedence of
prefix operator.
Consider the code:
int x[] = {1,2,3};
int *p = x;
int y;
y = ++p[0];
This will increment the first element of x and assign it to y.
versus this one, where we explicitly changing the precedence so the prefix ++ gets a higher one:
y = (++p)[0];
This will not increment x elements but will move p to the second element of x and assign y with it.
(ii) associativity of postfix operator is left-to-right but that of prefix operator is right-to-left.
This means that:
p->x->y should be read as (p->x)->y, and not p->(x->y) which does not even make any sense.
Same for other the other operators of the postfix group - the RTL associativity just doesn't make sense for them:
p[x][y][z] is the same as ((p[x])[y])[z], but p[x]([y]([z])) is meaningless (and illegal).
From historical point of view, the precedence of the operators was influenced by the B and BCPL programming languages. In the article The Development of the C Language, Dennis Ritchie explains how he chose the precedence.
An accident of syntax contributed to the perceived complexity of the language. The indirection operator, spelled * in C, is syntactically a unary prefix operator, just as in BCPL and B. This works well in simple expressions, but in more complex cases, parentheses are required to direct the parsing. For example, to distinguish indirection through the value returned by a function from calling a function designated by a pointer, one writes *fp() and (*pf)() respectively
There is no clear logic about the precedence -- you need to memorize it or to use parenthesis when you are not sure about the precedence.
The precedence and associativity of the operators fall out of the language grammar. For postfix and unary operators, that's as follows:
postfix-expression:
primary-expression
postfix-expression [ expression ]
postfix-expression ( argument-expression-listopt )
postfix-expression . identifier
postfix-expression -> identifier
postfix-expression ++
postfix-expression --
( type-name ) { initializer-list }
( type-name ) { initializer-list , }
unary-expression:
postfix-expression
++ unary-expression
-- unary-expression
unary-operator cast-expression
sizeof unary-expression
sizeof ( type-name )
_Alignof ( type-name )
unary-operator: one of
& * + - ~ !
cast-expression:
unary-expression
( type-name ) cast-expression
C 2011 Online Draft, Appendix A.2 Phase Structure Grammar
So, how does that determine precedence and associativity, and why does it matter? Let's start with an expression like *p++ - are we dereferencing p++, or are incrementing *p? Those are two very different operations, so it matters how the grammar is structured. Let's trace through it:
* p ++
| | |
| primary |
| expression |
| | |
| postfix |
| expression |
| | |
| +------+-----+
| |
| postfix
| expression
| |
| unary
| expression
| |
unary cast
operator expression
| |
+---------+---------+
|
unary
expression
In English:
unary-expression produces unary-operator cast-expression
unary-operator produces *
cast-expression produces unary-expression
unary-expression produces postfix-expression
postfix-expression produces postfix-expression ++
postfix-expression produces primary-expression
primary-expression produces p
This means the expression *p++ is parsed as *(p++) - the * operator will be applied to the result of p++.
Same kind of thing for *p[i] - we will wind up dereferencing the pointer at p[i], rather than subscripting *p.
For a slightly more complicated example that gets into associativity, let's look at the member selection operator ->, as in the expression foo->bar->bletch->blurga. The grammar for the -> member selection operator is
postfix-expression -> identifier
This tells us that foo->bar->bletch reduces to postfix-expression and blurga reduces to identifier. Hence, the associativity of the operator is right-to-left, and the expression parses as ((foo->bar)->blurga)->bletch, rather than foo->(bar->(blurga->bletch)).
Those tables you're looking at are summaries of how the grammar is set up. The grammar is set up the way it is so that operator and operand groupings are somewhat intuitive. You expect an expression like ++foo.bar[i] to increment foo.bar[i], you expect *f() to dereference the pointer value returned from a function, etc.
Related
My question is about the following line of code, taken from "The C Programming Language" 2nd Edition:
*p++->str;
The book says that this line of code increments p after accessing whatever str points to.
My understanding is as follows:
Precedence and associativity say that the order in which the operators will be evaluated is
->
++
*
The postfix increment operator ++ yields a value (i.e. value of its operand), and has the side effect of incrementing this operand before the next sequence point (i.e. the following ;)
Precedence and associativity describe the order in which operators are evaluated and not the order in which the operands of the operators are evaluated.
My Question:
My question is around the evaluation of the highest precedence operator (->) in this expression. I believe that to evaluate this operator means to evaluate both of the operands, and then apply the operator.
From the perspective of the -> operator, is the left operand p or p++? I understand that both return the same value.
However, if the first option is correct, I would ask "how is it possible for the evaluation of the -> operator to ignore the presence of the ++".
If the second option is correct, I would ask "doesn't the evaluation of -> in this case then require the evaluation of a lower precedence operator ++ here (and the evaluation of ++ completes before that of ->)"?
To understand the expression *p++->str you need to understand how *p++ works, or in general how postfix increment works on pointers.
In case of *p++, the value at the location p points to is dereferenced before the increment of the pointer p.
n1570 - ยง6.5.2.4/2:
The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). [...]. The value computation of the result is sequenced before the side effect of updating the stored value of the operand.
In case of *p++->str, ++ and -> have equal precedence and higher than * operator. This expression will be parenthesised as *((p++)->str) as per the operator precedence and associativity rule.
One important note here is precedence and associativity has nothing to do with the order of evaluation. So, though ++ has higher precedence it is not guaranteed that p++ will be evaluated first. Which means the expression p++ (in the expression *p++->str) will be evaluated as per the rule quoted above from the standard. (p++)->str will access the str member p points to and then it's value is dereferenced and then the value of p is incremented any time between the last and next sequence point.
Postfix ++ and -> have the same precedence. a++->b parses as (a++)->b, i.e. ++ is done first.
*p++->str; executes as follows:
The expression parses as *((p++)->str). -> is a meta-postfix operator, i.e. ->foo is a postfix operator for all identifiers foo. Postfix operators have the highest precedence, followed by prefix operators (such as *). Associativity doesn't really apply: There is only one operand and only one way to "associate" it with a given operator.
p++ is evaluated. This yields the (old) value of p and schedules an update, setting p to p+1, which will happen at some point before the next sequence point. Call the result of this expression tmp0.
tmp0->str is evaluated. This is equivalent to (*tmp0).str: It dereferences tmp0, which must be a pointer to a struct or union, and gets the str member. Call the result of this expression tmp1.
*tmp1 is evaluated. This dereferences tmp1, which must be a pointer (to a complete type). Call the result of this expression tmp2.
tmp2 is ignored (the expression is in void context). We reach ; and p must have been incremented before this point.
I'm confused about the precedence and associativity of postfix/prefix operators.
On one hand, as I'm reading K&R book, it states that:
(*ip)++
The parentheses are necessary in this last example; without them, the expression would increment ip instead of what it points to, because unary operators like * and ++ associate right to left.
No mention whatsoever of a difference of associativity between postfix/prefix operators. Both are treated equally. The book also states that * and ++ have the same precedence.
On the other hand, this page states that:
1) Precedence of prefix ++ and * is same. Associativity of both is right to left.
2) Precedence of postfix ++ is higher than both * and prefix ++. Associativity of postfix ++ is left to right.
Which one should I trust? Is it something that changed with the C revisions over the years?
TL;DR: the two descriptions are saying the same thing, using the same words and symbols with slightly different meaning.
On one hand, as I'm reading K&R book, it states that:
(*ip)++
The parentheses are necessary in this last example; without them, the expression would increment ip instead of what it points to,
because unary operators like * and ++ associate right to left.
No mention whatsoever of a difference of associativity between
postfix/prefix operators. Both are treated equally. The book also
states that * and ++ have the same precedence.
It's unclear which edition of K&R you're reading, but the first, at least, does treat the prefix and postfix versions of the increment and decrement operators as a single operator each, with effects depending on whether their operand precedes or follows them.
On the other hand, this page states that:
1) Precedence of prefix ++ and * is same. Associativity of both is
right to left.
2) Precedence of postfix ++ is higher than both * and prefix ++.
Associativity of postfix ++ is left to right.
The language standard and most modern treatments describe the prefix and postfix versions as different operators, disambiguated by their position relative to their operand. The rest of this answer explains how this is an alternative description of the same thing.
Observe that when only unary operators are involved, associativity questions arise only between one prefix and one postfix operator of the same precedence. Among a chain of only prefix or only postfix operations, there is no ambiguity with respect to how they associate. For example, given - - x, you cannot meaningfully group it as (- -) x. The only alternative is - (- x).
Next, observe that all the highest-precedence operators are postfix unary operators, and that in K&R, all the second-precedence operators are prefix unary operators except ambi-fix ++ and --. Applying right-to-left associativity to the second-precedence operators, then, disambiguates only expressions involving postfix ++ or -- and a prefix unary operator, and does so in favor of the postfix operator. This is equivalent to the modern approach of distinguishing the postfix and prefix versions of those operators and assigning higher precedence to the postfix versions.
To get the rest of the way to the modern description, consider the observations I already made that associativity questions arise for unary operators only when prefix and postfix operators are chained, and that all the highest-precedence operators are postfix unary operators. Having distinguished postfix ++ and -- as separate, higher-precedence operators than their prefix versions, one could put them in their own tier between the other postfix operators and all the prefix operators, but putting them instead in the same tier with all the other postfix operators changes nothing about how any expression is interpreted, and is simpler. That's how it is usually represented these days, including in your second resource.
As for left-to-right vs. right-to-left associativity, the question is, again, moot for a precedence tier containing only prefix or only postfix operators. However, describing postfix operators as associating left-to-right and prefix operators as associating right-to-left is consistent with their semantic order of operations.
You can refer to the C11 standard although its section on precedence is a little hard to follow. See sec. 6.5.1. (footnote 85 says "The syntax specifies the precedence of operators in the evaluation of an expression, which is the same
as the order of the major subclauses of this subclause, highest precedence first.")
Basically, postfix operators are higher precedence than prefix because they come earlier in that section, 6.5.2.4 vs. 6.5.3.1. So K&R is correct (no surprise there!) that *ip++ means *(ip++), which is different from (*ip)++, however its point about it being due to associativity is a bit misleading I'd say. And the geeksforgeeks site's point #2 is also correct.
#GaryO's answer is spot on! Postfix has higher precedence because they come earlier.
Here's a small test to sanity check to convince yourself.
I made two integer arrays and a pointer to the start of each array, then ran (*p)++ and *p++ on the two pointers. I printed out the pointer and array state before and after for reference.
#include <stdio.h>
#define PRINT_ARRS printf("a = {%d, %d, %d}\n", a[0], a[1], a[2]); \
printf("b = {%d, %d, %d}\n\n", b[0], b[1], b[2]);
#define PRINT_PTRS printf("*p1 = a[%ld] = %d\n", p1 - a, *p1); \
printf("*p2 = b[%ld] = %d\n\n", p2 - b, *p2);
int main()
{
int a[3] = {1 , 1, 1};
int b[3] = {10,10, 10};
int *p1 = a;
int *p2 = b;
PRINT_ARRS
PRINT_PTRS
printf("(*p1)++: %d\n", (*p1)++);
printf("*p1++ : %d\n\n", *p2++);
PRINT_ARRS
PRINT_PTRS
}
Compiling with gcc and running on my machine produces:
a = {1, 1, 1}
b = {10, 10, 10}
*p1 = a[0] = 1
*p2 = b[0] = 10
(*p1)++: 1
*p2++ : 10
a = {2, 1, 1}
b = {10, 10, 10}
*p1 = a[0] = 2
*p2 = b[1] = 10
You can see that (*p1)++ increments the array value while *p2++ increments the pointer.
While calculating postfix expression in C, if our token is an operator we have to place it the stack in such a way that it's has the highest priority.
My question is among the operators *, /, %, which has the highest priority.
Do we need to consider associativity as well ? Since all these operators have LEFT-TO-RIGHT associativity, will / get higher preference over * ?
Precedence usually only applies to infix notations. Postfix (and Prefix) notations are usually considered to explicitly specify which operands are associated with which operator. Precedence only comes into play when there is ambiguity in the parsing, which is not the case in postfix notation.
The precedence question that arises in an infix expression
4 * 5 + 3 / 12
simply doesn't exist after conversion to an RPN form
4 5 * 3 + 12 /
or a prefix form
(/ (+ (* 4 5) 3) 12)
.
There is some possibility for confusion when considering something like the Shunting-Yard Algorithm which can be used to generate an RPN representation from -- or directly evaluate -- an infix expression. It deals with operator precedence by deferring operators onto a secondary stack until a lower precedence operator forces it to be popped and evaluated (or output).
Operators *, /, % are same in precedence and there associativity is left to right. So an expression like:
a * b / c /* both operators have same precedence */
is same as:
(a * b) / c
Similarly an expression like:
a / b * c /* both operators have same precedence */
is same as:
( a / b ) * c
So even operators are same in precedence, but suppose if they appears in an expression(without parenthesis) then left most operator has higher precedence because of left to right associativity.
Note Conceptually we use parenthesis in an expression to overwrite precedence of operators, so although expression: a / b * c is same as: (a / b) * c but we can force to evaluate * first using ( ) by writing expression as a / ( b * c). What I means to say if you have confusion in operator precedence while writing code use parenthesis.
EDIT:
In POSTFIX and PREFIX form don't use parenthesis ( ). Precedence of operator are decided in order of there appearance in expression, So while evaluating an expression its not need to search next operation to perform - and so evaluation becomes fast.
While in INFIX expression precedence of operators can be overwritten by brackets ( ). Hence brackets are there in infix expression - and it need to search which operation to perform next e.g. a + b % d - and evaluation of expression is slow.
That is the reason conversion are useful in computer science.
So compiler first translates an infix expression into equivalent postfix form(using grammar rules) then generates target code to evaluate expression value. That is the reason why we study postfix and prefix form.
And according to precedence and associativity rules the following expression:
a * b / c /* both operators have same precedence */
will be translates into:
a b * c /
And expression
a / b * c /* both operators have same precedence */
will be translated into
a b / c *
My question is among the operators *, /, %, which has the highest priority.
They are equal, just as + and - (binary) are equal.
Do we need to consider associativity as well?
Yes, for example 1 + 2 + 3 needs to become (1 + 2) + 3, i.e. 1, 2, ADD, 3, ADD, as opposed to 1, 2, 3, ADD, ADD.
Since all these operators have LEFT-TO-RIGHT associativity, will / get higher preference over * ?
Associativity doesn't have anything to do with precence. The question doesn't make sense.
But if you're just calculating an existing RPN expression, as your title says, I don't know why you're asking any of this. You just push the operands and evaluate the operators as they occur. Are you really asking about translation into RPN?
From page 123 of The C Programming Language by K&R:
(p++)->x increments p after accessing x. (This last set of parentheses is unnecessary. Why?)
Why is it unnecessary considering that -> binds stronger than ++?
EDIT: Contrast the given expression with ++p->x, the latter is evaluated as ++(p->x) which would increment x, not p. So in this case parentheses are necessary and we must write (++p)->x if we want to increment p.
The only other possible interpretation is:
p++(->x)
and that doesn't mean anything. It's not even valid. The only possible way to interpret this in a valid way is (p++)->x.
Exactly because -> binds stronger than ++. (it doesn't, thanks #KerrekSB.)
increments p after accessing x.
So first you access x of p, then you increment p. That perfectly matches the order of evaluation of the -> and the + operators.
Edit: aww, these edit's...
So what happens when you write ++p->x is that it could be interpreted either as ++(p->x) or as (++p)->x (which one is actually chosen is just a matter of language design, K&R thought it would be a good idea to make it evaluate as in the first case). The thing is that this ambiguity doesn't exist in the case of p++->x, since it can only be interpreted as (p++)->x. The other alternatives, p(++->x), p(++->)x and p++(->x) are really just syntactically malformed "expressions".
The maximal munch strategy says that p++->x is divided into the following preprocessing tokens:
p then ++ then -> then x
In p++->x expression there are two operators, the postfix ++ operator and the postifx -> operator. Both operators being postfix operators, they have the same precedence and there is no ambiguity in parsing the expression. p++->x is equivalent to (p++)->x.
For ++p->x expression, the situation is different.
In ++p->x, the ++ is not a postfix operator, it is the ++ unary operator. C gives postfix operators higher precedence over all unary operators and this is why ++p->x is actually equivalent to ++(p->x).
EDIT: I changed the first part of the answer as a result of Steve's comment.
Both post-increment and member access operator are postfix expressions and bind the same. Considering that they apply to the primary or postfix expression to the left, there can't be ambiguity.
In
p++->x
The postfix-++ operator can apply only to the expression to the left of it (i.e. to p).
Similarly ->x can only be an access to the expression to its left, which is p++. Writing that expression as (p++) is not needed, but also does no harm.
The "after" in your description of the effects, does not express temporal order of increment and member access. It only expresses that the result of p++ is the value p had before the increment and that that value is the value used for the member access.
The expresion p++ results in a pointer with the value of p. Later on, the ++ part is performed, but for the purposes of interpreting the expression, it may just as well not be there. ->x makes the compiler add the offset for the member x to the original address in p and access that value.
If you change the statement to :
p->x; p++;
it would do exactly the same thing.
The order of precedence is actually exactly the same, as can be seen here - but it doesn't really matter.
Talking about the associativity of operators in C, I was wondering why there are differences associativities among operators that have the same precedence. for example, postfix increment and postfix decrement have left associativity; while prefix increment and prefix decrement have right associativity. Isn't it simple to have just left or right associativity for all the same precedence operators?
Are there any reasons behind that?
Isn't it simple to have just left or right associativity for all the
same precedence operators?
Yes and it is the case in C. May be you assumed that prefix and postfix have the same precedence which is wrong. Postfix has a higher precedence than prefix!
Also there is another curious case to consider as to why certain operators have certain associativity. From Wiki,
For example, in C, the assignment a = b is an expression that returns
a value (namely, b converted to the type of a) with the side effect of
setting a to this value. An assignment can be performed in the middle
of an expression. (An expression can be made into a statement by
following it with a semicolon; i.e. a = b is an expression but a = b;
is a statement). The right-associativity of the = operator allows
expressions such as a = b = c to be interpreted as a = (b = c),
thereby setting both a and b to the value of c. The alternative (a =
b) = c does not make sense because a = b is not an lvalue.
Binary operators are all left-associative except the assignment operator which is right-associative.
Postfix operators are sometimes (for exemple in K&R 2nd) said to be right-associative but this is to express the idea they have higher precedence than unary operators.