Unary Minus vs Binop Minus - syntactic-sugar

My Question is; When writing a compiler, is it valid in all cases, to desuger unary minus, to the binary minus, with first operand = 0, that is can I change;
-x
to:
0-x
or will desugering alike this yield issues down the line?

Related

How are operators with the same precedence in C evaluated?

In the context of C operators precedence. In reality, the following code will be evaluated as follows:
int32_t * result = (int32_t *) addr_base + offset;
|
|
↓
int32_t * result = ((int32_t *) addr_base) + offset;
However, what I see in the table bellow is that cast and + should be evaluated from right to left. In other words, in my example + is right to cast, so I would expect that the first statement above will be evaluated like this:
int32_t * result = (int32_t *) (addr_base + offset);
As + is on the right side of cast (so more priority according to the table).
Why is that actually happening?
The + in row 2 is the unary plus operator. This is similar to the unary negation operator (e.g. x = -y)
The operator for addition is in row 4 and has a lower precedence than the cast.
In case two operators have the same precedence the order is given by the Associativity (left to right or right to left)
The cast operator has higher precedence than addition, so that's why the operand is grouped with the cast operator.
Unary plus has the same precedence as cast. If you were to use those in the same expression, the associativity would go from right to left.
A cast operator has the same precedence as a unary + operator, but the + operator in (int32_t) addr_base + offset is a binary + operator, which has lower precedence.
A unary operator is one with only one operand. In x = -y;, the - is a unary operator. We can also write x = +y;, where the + is a unary operator, but this is rarely done since it is largely superfluous.
A binary operator is one with two operands. In x = y + z, the + is a binary operator.
Although unary plus has the same precedence as casts, the expression in question involves binary +, which is two rows lower in the precedence table. So
(int32_t *) addr_base + offset
is unambiguously parsed as
((int32_t *) addr_base) + offset

Postfix/Prefix operator precedence and associativity

I'm confused about the precedence and associativity of postfix/prefix operators.
On one hand, as I'm reading K&R book, it states that:
(*ip)++
The parentheses are necessary in this last example; without them, the expression would increment ip instead of what it points to, because unary operators like * and ++ associate right to left.
No mention whatsoever of a difference of associativity between postfix/prefix operators. Both are treated equally. The book also states that * and ++ have the same precedence.
On the other hand, this page states that:
1) Precedence of prefix ++ and * is same. Associativity of both is right to left.
2) Precedence of postfix ++ is higher than both * and prefix ++. Associativity of postfix ++ is left to right.
Which one should I trust? Is it something that changed with the C revisions over the years?
TL;DR: the two descriptions are saying the same thing, using the same words and symbols with slightly different meaning.
On one hand, as I'm reading K&R book, it states that:
(*ip)++
The parentheses are necessary in this last example; without them, the expression would increment ip instead of what it points to,
because unary operators like * and ++ associate right to left.
No mention whatsoever of a difference of associativity between
postfix/prefix operators. Both are treated equally. The book also
states that * and ++ have the same precedence.
It's unclear which edition of K&R you're reading, but the first, at least, does treat the prefix and postfix versions of the increment and decrement operators as a single operator each, with effects depending on whether their operand precedes or follows them.
On the other hand, this page states that:
1) Precedence of prefix ++ and * is same. Associativity of both is
right to left.
2) Precedence of postfix ++ is higher than both * and prefix ++.
Associativity of postfix ++ is left to right.
The language standard and most modern treatments describe the prefix and postfix versions as different operators, disambiguated by their position relative to their operand. The rest of this answer explains how this is an alternative description of the same thing.
Observe that when only unary operators are involved, associativity questions arise only between one prefix and one postfix operator of the same precedence. Among a chain of only prefix or only postfix operations, there is no ambiguity with respect to how they associate. For example, given - - x, you cannot meaningfully group it as (- -) x. The only alternative is - (- x).
Next, observe that all the highest-precedence operators are postfix unary operators, and that in K&R, all the second-precedence operators are prefix unary operators except ambi-fix ++ and --. Applying right-to-left associativity to the second-precedence operators, then, disambiguates only expressions involving postfix ++ or -- and a prefix unary operator, and does so in favor of the postfix operator. This is equivalent to the modern approach of distinguishing the postfix and prefix versions of those operators and assigning higher precedence to the postfix versions.
To get the rest of the way to the modern description, consider the observations I already made that associativity questions arise for unary operators only when prefix and postfix operators are chained, and that all the highest-precedence operators are postfix unary operators. Having distinguished postfix ++ and -- as separate, higher-precedence operators than their prefix versions, one could put them in their own tier between the other postfix operators and all the prefix operators, but putting them instead in the same tier with all the other postfix operators changes nothing about how any expression is interpreted, and is simpler. That's how it is usually represented these days, including in your second resource.
As for left-to-right vs. right-to-left associativity, the question is, again, moot for a precedence tier containing only prefix or only postfix operators. However, describing postfix operators as associating left-to-right and prefix operators as associating right-to-left is consistent with their semantic order of operations.
You can refer to the C11 standard although its section on precedence is a little hard to follow. See sec. 6.5.1. (footnote 85 says "The syntax specifies the precedence of operators in the evaluation of an expression, which is the same
as the order of the major subclauses of this subclause, highest precedence first.")
Basically, postfix operators are higher precedence than prefix because they come earlier in that section, 6.5.2.4 vs. 6.5.3.1. So K&R is correct (no surprise there!) that *ip++ means *(ip++), which is different from (*ip)++, however its point about it being due to associativity is a bit misleading I'd say. And the geeksforgeeks site's point #2 is also correct.
#GaryO's answer is spot on! Postfix has higher precedence because they come earlier.
Here's a small test to sanity check to convince yourself.
I made two integer arrays and a pointer to the start of each array, then ran (*p)++ and *p++ on the two pointers. I printed out the pointer and array state before and after for reference.
#include <stdio.h>
#define PRINT_ARRS printf("a = {%d, %d, %d}\n", a[0], a[1], a[2]); \
printf("b = {%d, %d, %d}\n\n", b[0], b[1], b[2]);
#define PRINT_PTRS printf("*p1 = a[%ld] = %d\n", p1 - a, *p1); \
printf("*p2 = b[%ld] = %d\n\n", p2 - b, *p2);
int main()
{
int a[3] = {1 , 1, 1};
int b[3] = {10,10, 10};
int *p1 = a;
int *p2 = b;
PRINT_ARRS
PRINT_PTRS
printf("(*p1)++: %d\n", (*p1)++);
printf("*p1++ : %d\n\n", *p2++);
PRINT_ARRS
PRINT_PTRS
}
Compiling with gcc and running on my machine produces:
a = {1, 1, 1}
b = {10, 10, 10}
*p1 = a[0] = 1
*p2 = b[0] = 10
(*p1)++: 1
*p2++ : 10
a = {2, 1, 1}
b = {10, 10, 10}
*p1 = a[0] = 2
*p2 = b[1] = 10
You can see that (*p1)++ increments the array value while *p2++ increments the pointer.

Is there any operator in c which is both unary and binary?

Is there any operator in c which is both unary and binary ? This question was asked in one of the interview.
The asterisk (*) can be used for dereferencing (unary) or multiplication (binary).
The ampersand (&) can be used for referencing (unary) or bitwise AND (binary).
The plus/minus signs (+/-) can be used for identity/negation (unary) or addition/subtraction (binary).
But, as others pointed out, those are symbols shared by different operators. Each of those operators have only one n-arity.
No, there isn't. Every operator is either unary, binary, or ternary.
Some unary and binary operators happen to use the same symbol:
* for dereference and multiplication
- for negation and subtraction
+ for identity and addition
& for address-of and bitwise "and"
But unary and binary * are still distinct operators that happen to be spelled the same way.
What I think only . operator is both unary and binary in C (not specified in standard):
.:- Unary: In designators of structures- {.meber1 = x, .member3 = z} (C99 and latter). Binary: Accessing structure members.
There is no operator in C which is unary and binary as well.
Symbols, like +, -, * and &, are used as unary and binary operators but then these symbols are treated as different operators:
+, - Unary: i = -1 j = +1. Binary: i = i+1, j = j+1
* Unary: Dereference operator. Binary: Multiplication operator.
& Unary: Reference operator. Binary: Bitwise AND operator.

Precedence of operators in RPN

While calculating postfix expression in C, if our token is an operator we have to place it the stack in such a way that it's has the highest priority.
My question is among the operators *, /, %, which has the highest priority.
Do we need to consider associativity as well ? Since all these operators have LEFT-TO-RIGHT associativity, will / get higher preference over * ?
Precedence usually only applies to infix notations. Postfix (and Prefix) notations are usually considered to explicitly specify which operands are associated with which operator. Precedence only comes into play when there is ambiguity in the parsing, which is not the case in postfix notation.
The precedence question that arises in an infix expression
4 * 5 + 3 / 12
simply doesn't exist after conversion to an RPN form
4 5 * 3 + 12 /
or a prefix form
(/ (+ (* 4 5) 3) 12)
.
There is some possibility for confusion when considering something like the Shunting-Yard Algorithm which can be used to generate an RPN representation from -- or directly evaluate -- an infix expression. It deals with operator precedence by deferring operators onto a secondary stack until a lower precedence operator forces it to be popped and evaluated (or output).
Operators *, /, % are same in precedence and there associativity is left to right. So an expression like:
a * b / c /* both operators have same precedence */
is same as:
(a * b) / c
Similarly an expression like:
a / b * c /* both operators have same precedence */
is same as:
( a / b ) * c
So even operators are same in precedence, but suppose if they appears in an expression(without parenthesis) then left most operator has higher precedence because of left to right associativity.
Note Conceptually we use parenthesis in an expression to overwrite precedence of operators, so although expression: a / b * c is same as: (a / b) * c but we can force to evaluate * first using ( ) by writing expression as a / ( b * c). What I means to say if you have confusion in operator precedence while writing code use parenthesis.
EDIT:
In POSTFIX and PREFIX form don't use parenthesis ( ). Precedence of operator are decided in order of there appearance in expression, So while evaluating an expression its not need to search next operation to perform - and so evaluation becomes fast.
While in INFIX expression precedence of operators can be overwritten by brackets ( ). Hence brackets are there in infix expression - and it need to search which operation to perform next e.g. a + b % d - and evaluation of expression is slow.
That is the reason conversion are useful in computer science.
So compiler first translates an infix expression into equivalent postfix form(using grammar rules) then generates target code to evaluate expression value. That is the reason why we study postfix and prefix form.
And according to precedence and associativity rules the following expression:
a * b / c /* both operators have same precedence */
will be translates into:
a b * c /
And expression
a / b * c /* both operators have same precedence */
will be translated into
a b / c *
My question is among the operators *, /, %, which has the highest priority.
They are equal, just as + and - (binary) are equal.
Do we need to consider associativity as well?
Yes, for example 1 + 2 + 3 needs to become (1 + 2) + 3, i.e. 1, 2, ADD, 3, ADD, as opposed to 1, 2, 3, ADD, ADD.
Since all these operators have LEFT-TO-RIGHT associativity, will / get higher preference over * ?
Associativity doesn't have anything to do with precence. The question doesn't make sense.
But if you're just calculating an existing RPN expression, as your title says, I don't know why you're asking any of this. You just push the operands and evaluate the operators as they occur. Are you really asking about translation into RPN?

Infix to postfix algorithm that takes care of unary operators

The I/p to the algo will be an expression like this:
a+(-b)
a*-b+c
i.e any expression that a standard C compiler would support.
Now I've the input already formatted as a stream of tokens , the tokens contain info whether its an operator or an operand.
The algorithm should take this in and give me a postfix expression that I can evaluate.
If I use the standard conversion algo, I cant differentiate between an unary and a binary op.
Like a*(-b) would give me ab-* ,which would evaluate in the wrong way.
If an operator is the first thing in your expression, or comes after another operator, or comes after a left parenthesis, then it's an unary operator.
You have to use other symbols for unary operators in your output string, because otherwise it is not possible to distinguish between binary and unary variants in the postfix notation.
In your input, when you have 2 consecutive operators, the second operator will be unary.
If you have more consecutive operators, all but the first will be unary operators.
Transform all your unary - operators to an operand -1 and an operator *, and remove all unary + operators.
If the first element is an operator, it is an unary operator.
Parenthesis are a special case, but you can do a first pass in which you ignore them. In the following example - is consecutive to *.
4*(-(5))
and your tokens would become:
4
*
(
-1
*
(
5
)
)
You could simply convert -6 to 06- to eliminate unary operators completely. I like this approach since it is more orthogonal and you do not need to take care of special cases when processing.
An alternative approach is to use different symbols for the unary and the binary versions of operators using the same symbol, eg. - remains binary minus and ~ becomes negation sign.

Resources