Determining k of LR(k) from this example? - c

I have prepared the following grammar that generates a subset of C logical and integer arithmetic expressions:
Expression:
LogicalOrExpression
LogicalOrExpression ? Expression : LogicalOrExpression
LogicalOrExpression:
LogicalAndExpression
LogicalOrExpression || LogicalAndExpression
LogicalAndExpression:
EqualityExpression
LogicalAndExpression && RelationalExpression
EqualityExpression:
RelationalExpression
EqualityExpression EqualityOperator RelationalExpression
EqualityOperator:
==
!=
RelationalExpression:
AdditiveExpression
RelationalExpression RelationalOperator AdditiveExpression
RelationalOperator:
<
>
<=
>=
AdditiveExpression:
MultiplicativeExpression
AdditiveExpression AdditiveOperator MultiplicativeExpression
AdditiveOperator:
+
-
MultiplicativeExpression:
UnaryExpression
MultiplicativeExpression MultiplicativeOperator UnaryExpression
MultiplicativeOperator:
*
/
%
UnaryExpression:
PrimaryExpression
UnaryOperator UnaryExpression
UnaryOperator:
+
-
!
PrimaryExpression:
BoolLiteral // TERMINAL
IntegerLiteral // TERMINAL
Identifier // TERMINAL
( Expression )
I want to try using shift/reduce parsing and so would like to know what is the smallest k (if any) for which this grammar is LR(k)? (and more generally how to determine the k from an arbitrary grammar if possible?)

The sample grammar is (almost) an operator precedence grammar, or Floyd grammar (FG). To make it an FG, you'd have to macro-expand the non-terminals whose right-hand sides consist of only a single terminal, because operator precedence grammars must be operator grammars, and an operator grammar has the feature that no right-hand side has two consecutive non-terminals.
All operator-precedence grammars are LR(1). It's also trivial to show whether or not an operator grammar has the precedence property, and particularly trivial in the case that every terminal appears in precisely one right-hand side, as in your grammar. An operator grammar in which every terminal appears in precisely one right-hand side is always an operator-precedence grammar [1] and consequently always LR(1).
FGs are a large class of grammars, some of them even useful (Algol 60, for example, was described by an FG) for which it is easy to answer the question about them being LR(k) for some k, since the answer is always "yes, with K == 1". Just for precision, here are the properties. We use the normal convention where a grammar G is a 4-tuple (N, Σ, P, S) where N is a set of non-terminals; Σ is a set of terminals, P is a set of productions, and S is the start symbol. We write V for N &Union; Σ. In any grammar, we have:
N &Intersection; Σ &equals; ∅
S &in; N
P &subset; V&plus; × V*
The "context-free" requirement restricts P so every left-hand-side is a single non-terminal:
P &subset; Σ × V*
In an operator grammar, P is further restricted: no right-hand side is either empty, and no right-hand side has two consecutive non-terminals:
P &subset; Σ × (V+ − V*ΣΣV*)
In an operator precedence grammar, we define three precedence relations, ⋖, ⋗ and ≐. These are defined in terms of the relations Leads and Trails [2], where `
T Leads V iff T is the first terminal in some string derived from V
T Trails V iff T is the last terminal in some string derived from V
Then:
t1 ⋖ t2 iff ∃v &bepsi; t2 Leads v ∧ N&rightarrow;V*t1vV* &in; P
t1 ⋗ t2 iff ∃v &bepsi; t1 Trails v ∧ N&rightarrow;V*vt2V* &in; P
t1 &esdot; t2 iff N&rightarrow;V*t1t2V* &in; P ∨ N&rightarrow;V*t1V't2V* &in; P
An intuitive way of thinking about those relations is this: Normally when we do the derivations, we just substitute RHS for LHS, but suppose we substitute ⋖ RHS ⋗ instead. Then we can modify a derivation by dropping the non-terminals and collapsing strings of consecutive ⋖ and ⋗ to single symbols, and finally adding &esdot; between any two consecutive terminals which have no intervening operator. From that, we just read off the relations.
Now, we can perform that computation on any operator grammar, but there is nothing which forces the above relations to be exclusive. An operator grammar is a Floyd grammar precisely if those three relations are mutually exclusive.
Verifying that an operator grammar has mutually exclusive precedence relations is straight-forward; Leads and Trails require a transitive closure over First and Last, which is roughly O(|G|2) (it's actually the product of the number of non-terminals and the number of productions); from there, the precedence relations can be computed with a single linear scan over all productions in the grammar, which is O(|G|).

From Donald Knuths On the Translation of Languages from Left to Right, in the abstract,
It is shown that the problem of whether or not a grammar is LR(k) for some k is undecidable,
In otherwords,
Given a grammar G, "∃k. G ∊ LR(k)" is undecidable.
Therefore, the best we can do in general is try constructing a parser for LR(0), then LR(1), LR(2), etc. At some point you will succeed, or you may at some point give up when k becomes large.
This specific grammar
In this specific case, I happen to know that the grammar you give is LALR(1), which means it must therefore be LR(1). I know this because I have written LALR parsers for similar languages. It can't be LR(0) for obvious reasons (the grammar {A -> x, A -> A + x} is not LR(0)).

Related

In C programming Language: What order would 3 (Number 3) be assigned to the variables? As in which variable would receive 3 first, second and third?

Question Continued: In C programming Language: For the Question below: What order would 3 (Number 3) be assigned to the variables? As in which variable would receive 3 first, second and third? And which variable would have 3 in the end?
Question: A = B = C = 3
Further explanation of What I'm asking/My attempts to understand this concept:
According to the image I've attached stating the Associativity of Operators, the Assignment operators should be from left to right no?
So should 3 be assigned to A, then B, and then C?
According to a practice question solution it is the opposite, 3 being assigned to C, then B, then A, so am very confused why it's right to left? When the Associativity of Operators say it's left to right!
The expression A = B = C = 3 is parsed in C as A = (B = (C = 3)). The assignment operator associates right-to-left.
However, the actual assignment is specified as a side effect of the expression, and the order in which these side effects occur is not specified by the C standard.
The image in the question is wrong to show the order of assignment operators as left to right, and the source of the image should be regarded with suspicion. The association of assignment operators arises out of the grammar rules in the C standard, where 6.5.16 shows one rule as:
assignment-expression: unary-expression assignment-operator assignment-expression
The fact that the right operand is an assignment expression means that in X = Y, Y can be another assignment expression, such as Z = 4, but X cannot be. So A = B = C = 3 must be parsed as C = 3 being an assignment expression inside of B = …, and B = C = 3 must be an assignment expression inside of A = …. Contrast this with a rule for one of the additive operators in C 6.5.6:
additive-expression: additive-expression - multiplicative-expression
In that rule, the additive-expression is on the left, so A - B - C necessarily groups as (A - B) - C.

Checking order of operations in C 'if' statement

The following snippet of C code (where a and b are both type double) is what my question is about:
if(1.0-a < b && b <= 1.0)
Based on the order of operations shown in Wikipedia I understand this as evaluating the same as the following code snippet with parentheses:
if( ( (1.0-a) < b ) && ( b <= 1.0) )
which is what I want. I just want to double check my understanding that the two code snippets are indeed equivalent by the order of operations in C.
Note: obviously I could just use the second code snippet and make explicit what I want if() to evaluate; I ask because I've used the first snippet in my code for a while and I want to make sure my previous results from the code are okay.
Quick answer: yes, it is equivalent.
This means that the result of both code snippets is the same; the meaning is the same, but be careful when you talk about order of operations. It looks to me like your question here is about precedence and associativity. The latter tells you what an expression means, not the order of evaluation of its operands. To learn about order of evaluation, read about sequence points: Undefined behavior and sequence points
You ask about "order of operations", but I don't think that's what you really want to know.
The phrase "order of operations" refers to the time order in which operations are performed. In most cases, the order in which operations are performed within an expression is unspecified. The && operator is one of the few exceptions to this; it guarantees that its left operand is evaluated before its right operand (and the right operand might not be evaluated at all).
The parentheses you added can affect which operands are associated with which operators -- and yes, the two expressions
1.0-a < b && b <= 1.0
and
( (1.0-a) < b ) && ( b <= 1.0)
are equivalent.
Parentheses can be used to override operator precedence. They do not generally affect the order in which the operators are evaluated.
An example: this:
x + y * z
is equivalent to this:
x + (y * z)
because multiplication has a higher precedence than addition. But the three operands x, y, and z may be evaluated in any of the 6 possible orders:
x, y, z
x, z, y
y, x, z
y, z, x
z, x, y
z, y, x
The order makes no difference in this case (unless some of them are volatile), but it can matter if they're subexpressions with side effects.

Implications of operator precedence in C

I understand that this topic has come up umpteen times but I request a moment.
I have tried understanding this many times, also in context of order of evaluation. I was looking for some explicit examples to understand op. precedence and I found one here: http://docs.roxen.com/pike/7.0/tutorial/expressions/operator_tables.xml What I would like to know is if the examples given there (I have cut-pasted them below) are correct.
1+2*2 => 1+(2*2)
1+2*2*4 => 1+((2*2)*4)
(1+2)*2*4 => ((1+2)*2)*4
1+4,c=2|3+5 => (1+4),(c=(2|(3+5)))
1 + 5&4 == 3 => (1 + 5) & (4 == 3)
c=1,99 => (c=1),99
!a++ + ~f() => (!(a++)) + (~(f()))
s == "klas" || i < 9 => (s == "klas") || (i < 9)
r = s == "sten" => r = (s == "sten")
For instance, does 1+2*2*4 is really 1+((2*2)*4) or could as well have been, 1+(2*(2*4)) according to C specification. Any help or further reference to examples would be useful. Thanks again.
Although those examples come from a different language, I think they are the same as operator precedence in C. In general, you'd be better off using a reference for the C language, such as the C standard, or a summary such as the one in Wikipedia.
However, I don't believe that is actually what you are asking. Operator precedence has no implications for order of evaluation. All operator precedence does is show you how to parenthesize the expression. A C compiler is allowed to evaluate the operations in just about any order it wishes to. It is also allowed to use algebraic identities if it is provable that they will have the same result for all valid inputs (this is not usually the case for floating point calculations, but it is usually true for unsigned integer calculations).
The only cases where the compiler is required to produce code with a specific evaluation order are:
Short-circuit boolean operators && and ||: the left argument must be evaluated first, and in some cases the right argument may not be evaluated;
The so-called ternary operator ?:: the left argument (before the ?) must be evaluated first; subsequently, exactly one of the other two operators will be evaluated. Note that this operator groups to the right, demonstrating that there is no relationship between grouping and evaluation order. That is, pred_1 ? action_1() : pred_2 ? action_2() : pred_3 ? action_3() is the same as pred_1 ? action_1() : (pred_2 ? action_2() : pred_3 ? action_3()), but it's pred_1 which must be evaluated first.
The comma operator ,: the left argument must be evaluated first. This is not the same as the use of the comma in function calls.
Function arguments must be evaluated before the function is called, although the order of evaluation of the arguments is not specified, and neither is the order of evaluation of the expression which produces the function.
The last phrase refers to examples such as this:
// This code has Undefined Behaviour. DO NOT USE
typedef void(*takes_int_returns_void)(int);
takes_int_returns_void fvector[3] = {...}
//...
//...
(*fvector[i++])(i);
Here, a compiler may choose to increment i before or after it evaluates the argument to the function (or other less pleasant possibilities), so you don't actually know what value the function will be called with.
In the case of 1+2*2*4, the compiler must generate code which will produce 17. How it does that is completely up to the compiler. Furthermore, if all x, y and z are all unsigned integers, a compiler may compile 1 + x*y*z with any order of multiplications it wants to, even reordering to y*(x*z).
Most operators have precedence from left to right.This will give a detailed idea about operator precedence :
Click here!
Binary operators, other than assignment operators, go from left to right when they are of equal precedence, so 1 + 2 * 2 * 4 is equivalent to 1 + ((2 * 2) * 4). Obviously in this particular case 1 + (2 * (2 * 4)) gives the same answer, but it won't always. For instance, 1 + 2 / 2.0 * 4 evaluates to 1 + ((2 / 2.0) * 4) == 5.0 and not to 1 + (2 / (2.0 * 4)) == 1.25.
Order of evaluation is a completely different thing from operator precedence. For one thing, operator precedence is always well-defined, order of evaluation sometimes is not (e.g. the order in which function arguments are evaluated).
This is a perfect tutorial about operator precedence and order of evaluation. Enjoy!

Assignment statement used in conditional operators

Can anybody please tell me why this statement is giving an error - Lvalue Required
(a>b?g=a:g=b);
but this one is correct
(a>b?g=a:(g=b));
where a , b and g are integer variables , and a and b are taken as input from keyboard.
In the expression,
(a > b ? g = a : g = b);
the relational operator > has the highest precedence, so a > b is grouped as an operand. The conditional-expression operator ? : has the next-highest precedence. Its first operand is a>b, and its second operand is g = a. However, the last operand of the conditional-expression operator is considered to be g rather than g = b, since this occurrence of g binds more closely to the conditional-expression operator than it does to the assignment operator. A syntax error occurs because = b does not have a left-hand operand (l-value).
You should use parentheses to prevent errors of this kind and produce more readable code which has been done in your second statement
(a > b ? g = a : (g = b));
in which last operand g = b of : ? has an l-value g and thats why it is correct.
Alternatively you can do
g = a > b ? a : b
The expression:
(a>b?g=a:g=b)
parsed as:
(a>b?g=a:g)=b
And we can't assign to an expression so its l-value error.
Read: Conditional operator differences between C and C++ Charles Bailey's answer:
Grammar for ?: is as follows:
conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression
This means that a ? b : c = d parses as (a ? b : c) = d even though (due to the 'not an l-value' rule) this can't result in a valid expression.
One side note:
Please keep space in you expression so that it become readable for example.
(a>b?g=a:g=b);
Should be written as:
(a > b? g = a: g = b);
similarly, you should add space after ; and ,.
The problem is operator precedence: In C the ternary conditional operator (?:) has a higher precedence than the assignment operator (=).
Without parenthesis (which don't do anything here) your expression would be this:
a > b ? g = a : g = b;
The operator with the highest precedence in there would be the comparison >, so this is where you'll get your first logical grouping:
(a > b) ? g = a : g = b;
The next highest expression is the ternary conditional, which results in the following expression:
((a > b) ? (g = a) : (g)) = b;
As you can see, you'll now end up with an lvalue (i.e. a value; not a variable) on the left side of your assignment operator, something that won't work.
As you already noticed, the solution to this is to simply group the expressions on your own. I'd even consider this good practice, especially if you're unsure how your precedence might play out. If you don't want to think about it, add parenthesis. Just keep code readability in mind, so if you can, resolve the operator precedence on your own, to ensure you've got everything right and readable.
As for readability: I'd probably use a classic if() here or move the assignment operator outside the ternary conditional, which is how you usually define max():
g = a > b ? a : b;
Or more general as a macro or inline function:
#define max(a, b) ((a) > (b) ? (a) : (b))
inline int max(int a, int b) {
return a > b ? a : b;
}
if(a>b)
{
g = a;
}
else
{
g = b;
}
that can be replaced with this
g = a > b ? a : b; //if a>b use the first (a) else use the second (b)
Your expression (a>b?g=a:g=b) is parsed as :
(a>b?g=a:g)=b
// ^^^
From the Microsoft documentation :
conditional-expression:
logical-or-expression
logical-or-expression ? expression : conditional-expression
In C, the operator ?: has an higher precedence that the operator =. Then it means that ( a ? b : c = d ) will be parsed as ( a ? b : c ) = d. Due to l-value's rule, the first expression is also valid but is not doing what you think.
To avoid this error, you can do also :
g = ( a > b ) ? a : b;
This question usually triggers a barrage of answers trying to explain the situation through the concept of operator precedence. In reality it cannot be explained that way, since this is a typical example of an input, on which surrogate concepts like "operator precedence" break down. As you probably know, there's really no "operator precedence" in C. There are only grammatical groupings, which generally cannot be expressed precisely through any linear ordering of operators.
Let's take a look at what the language specification says about it. The relevant portions of C grammar in this case are the grammars of ?: operator and = operator. For ?: operator it is
conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression
and for the = operator it is
assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression
In the first case the critical part is the last operand of ?: operator: it is not an expression, but rather a conditional-expression. The conditional-expression is a different entry point into the grammar of C expression: it "enters" the grammar at the point where it is no longer possible to include a top-level = operator into a conditional-expression. The only way to "smuggle" a = operator into a conditional-expression is to descend the grammar all the way to the very bottom
primary-expression:
identifier
constant
string-literal
( expression )
generic-selection
and then wrap around all the way to the top using the ( expression ) branch. This means that a conditional-expression can contain a = operator only when it is explicitly wrapped in (...). E.g. the grammar prohibits you from having g = b as the last operand of ?: operator. If you want something like that, you have to explicitly parenthesize it: <smth> ? <smth> : (g = b).
A very similar situation exists with the second piece of grammar: assignment operator. The left-hand side (LHS) of assignment is unary-expression. And unary-expression "enters" the general grammar of C expression at the point where it is too late to include a top level ?: operator. The only way to reach the ?: operator from unary-expression is to descend all the way down to primary-expression and take the ( expression ) branch. This means that grammar prohibits you from having a > b ? g = a : g as the LHS operand of = operator. If you want something like that, you have to explicitly parentesize it: (a > b ? g = a : g) = <smth>.
For this reason "popular" answers claiming that "operator precedence" makes the language to parse your expression as
(a > b ? g = a : g) = b
are actually completely incorrect. In reality, there's no derivation tree in formal C grammar that would make your input fit the syntax of C language. Your input is not parsable at all. It is not an expression. It is simply syntactically invalid. C language sees it as a syntactic gibberish.
Now, in practice you might see some implementations to respond with a "lvalue required as left operand of assignment" diagnostic message. Formally, this is a misleading diagnostic message. Since the above input does not satisfy the grammar of C language expression, there's no "assignment" in it, there's no "left operand" and there's no meaningful "lvalue" requirement.
Why do compilers issue this strange message? Most likely they do indeed parse this input as a valid C expression
(a > b ? g = a : g) = b
The result of ?: is never an lvalue in C, hence the error. However, this interpretation of your input is non-standard (extended?) behavior, which has no basis in formal C language. This behavior of specific implementations might be caused by their attempts to reconcile C and C++ grammars (which are quite different in this area), by their attempts to produce a more readable (albeit "fake") error message or by some other reason.
Typically, in such implementations a similar issue also would pop up in case of inputs like
a + b = 5
The same error would be issued, suggesting a (a + b) = 5 parse, while from the pedantic point of view a + b = 5 is not parsable as an expression at all (for the same reasons as described above).
Again, formally, this is not enough to say that the compiler is "broken": the compiler is required to detect a constraint violation and issue some diagnostic message, which is exactly what happens here. The fact that the text of the diagnostic message does not correctly reflect the nature of the problem is inconsequential (the compiler can simply say "Ha ha ha!"). But one undesirable consequence of such misleading diagnostics is that it misleads users into misinterpreting the problem, which is BTW painfully evident from the barrage of formally incorrect answers posted to this question.

If a regular language only contains Kleene star, then is it possible that it comes from the concatenation of two non-regular languages?

I want to know that given a regular language L that only contains Kleene star operator (e.g (ab)*), is it possible that L can be generated by the concatenation of two non-regular languages? I try to prove that L can be only generated by the concatenation of two regular languages.
Thanks.
This statement is false. Consider these two languages over Σ = {a}:
L1 = { an | n is a power of two } ∪ { ε }
L2 = { an | n is not a power of two } ∪ { ε }
Neither of these languages are regular (the first one can be proven to be nonregular by using the Myhill-Nerode theorem, and the second is closely related to the complement of L1 and can also be proven to be nonregular.
However, I'm going to claim that L1L2 = a*. First, note that any string in the concatenation L1L2 has the form an and therefore is an element of a*. Next, take any string in a*; let it be an. If n is a power of two, then it can be formed as the concatenation of an from L1 and ε from L2. Otherwise, n isn't a power of two, and it can be formed as the concatenation of ε from L1 and an from L2. Therefore, L1L2 = a*, so the theorem you're trying to prove is false.
Hope this helps!

Resources