Shift/reduce conflicts in yacc - c

I am learning lex and yacc programming and this yacc program to validate and evaluate arithmetic expression in giving me 10 shift/reduce conflicts. Can you point out whats wrong with this program
This is 611.y
%{
#include<stdio.h>
int flag=1;
%}
%token id num
%left '(' ')'
%left '+' '-'
%left '/' '*'
%nonassoc UMINUS
%%
stmt:
expression { printf("\n valid exprn");}
;
expression:
'(' expression ')'
| '(' expression {printf("\n Syntax error: Missing right paranthesis");}
| expression '+' expression {printf("\nplus recog!");$$=$1+$3;printf("\n %d",$$);}
| expression '+' { printf ("\n Syntax error: Right operand is missing ");}
| expression '-' expression {printf("\nminus recog!");$$=$1-$3;printf("\n %d",$$);}
| expression '-' { printf ("\n Syntax error: Right operand is missing ");}
| expression '*' expression {printf("\nMul recog!");$$=$1*$3;printf("\n %d",$$);}
| expression '*' { printf ("\n Syntax error: Right operand is missing ");}
| expression '/' expression {printf("\ndivision recog!");if($3==0) printf("\ndivision cant be done, as divisor is zero.");else {$$=$1+$3;printf("\n %d",$$);}}
| expression '/' { printf ("\n Syntax error: Right operand is missing ");}
| expression '%' expression
| expression '%' { printf ("\n Syntax error: Right operand is missing ");}
| id
| num
;
%%
main()
{
printf(" Enter an arithmetic expression\n");
yyparse();
}
yyerror()
{
printf(" Invalid arithmetic Expression\n");
exit(1);
}
This is 611.l
%{
#include "y.tab.h"
#include<stdio.h>
#include<ctype.h>
extern int yylval;
int val;
%}
%%
[a-zA-Z][a-zA-Z0-9]* {printf("\n enter the value of variable %s:",yytext);scanf("%d",&val);yylval=val;return id;}
[0-9]+[.]?[0-9]* {yylval=atoi(yytext);return num;}
[ \t] ;
\n {return 0;}
. {return yytext[0];}
%%
int yywrap()
{
return 1;
}
When I complie the code like this
lex 611.l
yacc -d 611.y
It gives me
yacc:10 shift/reduce conflicts.
Please help me out here.

Two things are wrong:
Precedence of '%' is missing, add it to '/' '*'
The '(' expression error handler is ambiguous (in an expression (4*(2+3)+5*7 there are many ways to insert a missing parenthesis) and is in fact in conflict with the normal '(' expression ')' rule. It is non-trivial to make such a handler work. I would recommend to remove it and rely on built-in yacc error handler.
Simple error handling can be implemented like this:
stmt:
expression { printf("\n valid exprn");}
| error { printf(" Invalid arithmetic Expression\n"); }
;
expression:
'(' expression ')'
| '(' error ')' { printf(" Invalid arithmetic Expression\n"); }
| ... /* all the rest */
You won't need all other error handlers too.

Related

Bison read strings together

I would like to make a project with Flex and Bison.
I have a grammar (that's only a part of mine):
variable_name:
| text {printf("VARIABLE NAME (TEXT) IN BISON: %s\n", $1); $$ = _variable_init($1);}
| character {printf("VARIABLE NAME (CHARACTER) IN BISON: %s\n", $1); $$ = _variable_init($1);}
;
bool_expr:
| true_t { printf("BOOL EXP TRUE\n"); $$ = _bool_expression_init_bool(TRUE); }
| false_t { printf("BOOL EXP FALSE\n"); $$ = _bool_expression_init_bool(FALSE); }
| variable_name { printf("BOOL EXP VARIABLE: %s\n", $1->name); $$ = _bool_expression_init_variable($1); }
| bool_expr eq bool_expr { printf("BOOL EXP ==\n"); $$ = _bool_expression_init_binary_op($1, "==", $3); }
| bool_expr noteq bool_expr { printf("BOOL EXP !=\n"); $$ = _bool_expression_init_binary_op($1, "!=", $3); }
| bool_expr and bool_expr { printf("BOOL EXP AND\n"); $$ = _bool_expression_init_binary_op($1, "&&", $3); }
| bool_expr or bool_expr { printf("BOOL EXP OR\n"); $$ = _bool_expression_init_binary_op($1, "||", $3); }
| '!' bool_expr { printf("BOOL EXP NOT\n"); $$ = _bool_expression_init_unary_op("!", $2); }
| '(' bool_expr ')' { printf("BOOL EXP ()\n"); $$ = _bool_expression_init_with_brackets($2); }
| int_expr eq int_expr { printf("BOOL EXP INT==\n"); $$ = _bool_expression_init_from_int($1, "==", $3); }
| int_expr noteq int_expr { printf("BOOL EXP INT!=\n"); $$ = _bool_expression_init_from_int($1, "!=", $3); }
| int_expr get int_expr { printf("BOOL EXP INT>=\n"); $$ = _bool_expression_init_from_int($1, ">=", $3); }
| int_expr let int_expr { printf("BOOL EXP INT<=\n"); $$ = _bool_expression_init_from_int($1, "<=", $3); }
| int_expr '<' int_expr { printf("BOOL EXP INT<\n"); $$ = _bool_expression_init_from_int($1, "<", $3); }
| int_expr '>' int_expr { printf("BOOL EXP INT>\n"); $$ = _bool_expression_init_from_int($1, ">", $3); }
;
int_expr:
| num { printf("INT EXP NUM\n"); $$ = _int_expression_init_int($1); }
| variable_name { printf("INT EXP VARIABLE\n"); _int_expression_init_variable($1); }
| int_expr '+' int_expr { printf("INT EXP +\n"); $$ = _int_expression_init_binary_op($1, "+", $3); }
| int_expr '-' int_expr { printf("INT EXP -\n"); $$ = _int_expression_init_binary_op($1, "-", $3); }
| int_expr '*' int_expr { printf("INT EXP *\n"); $$ = _int_expression_init_binary_op($1, "*", $3); }
| int_expr '/' int_expr { printf("INT EXP /\n"); $$ = _int_expression_init_binary_op($1, "/", $3); }
| '(' int_expr ')' { printf("INT EXP ())\n"); $$ = _int_expression_init_bracket($2); }
;
I copied only the important parts (hopefully there is the issue).
So when I want to parse this
var != 10
as a bool_expr the Bison identifies var as a variable and prints:
VARIABLE NAME (TEXT) IN BISON: var
but the in the next moment it prints
BOOL EXP VARIABLE: var !=
and it thinks that the variable is the "var !=" when there is a rule
int_expr != int_expr
but it doesn't check this part.
Btw variable has Variable* type (struct), int_expr has IntExpression* (struct), bool_expr has BoolExpression* (struct).
I don't what I should do. I tried and it worked when I write 6 additional rule to bool_expr that are almost same as the last 6. I replaced the first int_expr to variable, but it's disgusting. As I knew Bison search for the longest match not the first.
The log:
VARIABLE NAME (TEXT) IN BISON: var
VARIABLE NAME IN FUNCTION: var //It was called in variable_init function
BOOL EXP VARIABLE: var!=
INT EXP NUM //10
There is nothing in your grammar which allows the parser to distinguish between integer variables and boolean variables. Both are simply variable_name; moreover, either a boolean variable or an integer variable could be followed by a comparison operator. (In the case of var != 10, it would be possible to deduce that var had to be an integer variable, once it is seen that the right-hand side of the operator is an integer. But a != b would still be ambiguous, since the variable names do not carry a marker to indicate their type.)
Bison, by default, produces an LALR(1) parser, which means that every reduction needs to be determined by examining at most one token following the end of the production. (That's what the "(1)" in "LALR(1)" means.) In other words, the parser would have to be able to decide between reducing the variable_name "var" to a bool_expr or an int_expr no later than when it sees the != token. That's not possible, and because it is not possible, Bison should have reported a reduce/reduce conflict, which you are apparently ignoring. (If it didn't, then the rest of your grammar is relevant. But that would be surprising.)
Bison doesn't just give up when it sees a conflict. It makes a somewhat arbitrary choice (in the case of reduce/reduce conflicts, it chooses the reduction which occurs earliest in the grammar, in this case to bool_expr) and builds the parser regardless. Occasionally, this default produces the correct parse, but in most cases the resulting parser is flawed, with behaviour similar to what you are experiencing. So although Bison claims that the conflict report is just a warning, you ignore it at your peril.
As #user253751 notes in a comment, you can ask Bison to produce a GLR grammar, which allows arbitrary lookahead (at the cost of slowing down the parse). However, Bison's GLR implementation still requires the grammar to be unambiguous. Ambiguous parses will be detected at run-time, during the parse, and will cause the parse to fail; that would be the case with the vara != varb ambiguous expression noted above. (You can provide your own ambiguity resolution mechanism. But that's an extremely advanced technique, and in this case it won't work unless the ambiguity resolver has access to semantic information, like the declared type of each variable.)
Without seeing the rest of your grammar, it's hard to know whether type resolution could be done at compile-time (because variables need to be declared with a specific type), or only at run-time, but in neither case can that determination be made by the parser. So if you have boolean variables (and even if you don't), you cannot do better in the parser than to just have one expression non-terminal.
If you require declaration before use, then you could do type-resolution in your reduction actions by consulting the symbol table you are building up. At that point, you can either insert an automatic conversion function, or report an error (depending on whether you feel that automatic conversion is convenient). If you only require declaration (not prior declaration), then you can do the type-resolution after the parse is complete, by walking the AST twice: first to build up the symbol table, and second to resolve types.
If you consider type mismatches to be an error, then reporting the error in a semantic action is a lot more user-friendly. During the parse, it is very difficult to produce an error message more informative than "syntax error at line 10". But in the semantic action, you know precisely what the error is and what produced it, so it's easy to produce error messages like "the comparison at line 10 requires that 'var' be an integer variable, not a boolean variable." Your users will thank you.
By the way, the usual convention is to use UPPER_CASE for the symbolic names of terminals, like !=, which would usually be named something like NOTEQ or T_EQ. But if you are using Bison, you can make your grammar a lot more readable by using quoted aliases for your terminals:
%token EQ "==" NOTEQ "!="
GEQ ">=" LEQ "<="
Then you can use the symbolic names in your lexical analyser:
"!=" { return NOTEQ; }
without forcing whoever reads your grammar to guess what the name means:
expr: expr "==" expr
| expr "!=" expr
| ...
You must use double quotes; '+' is quite different.

Can't make arithmetic operator precedence with parenthesis in bison

I've currently building a Python parser and I'm at the definition of arithmetic expressions. The rules behind arithmetic expressions are working properly up until I add the parenthesis.
Here is the starting point:
%token TOKEN_ARITH_ADD TOKEN_ARITH_SUB
%token TOKEN_ARITH_MUL TOKEN_ARITH_DIV TOKEN_ARITH_MOD
%token TOKEN_ARITH_POWER
%token TOKEN_ASSIGN
%token TOKEN_PAREN_OPEN TOKEN_PAREN_CLOSE
and then:
arith_expr: factor
| arith_expr TOKEN_ARITH_ADD factor { $$ = ast_init_arith_op($3, "+", $1); };
| arith_expr TOKEN_ARITH_SUB factor { $$ = ast_init_arith_op($3, "-", $1); };
| TOKEN_PAREN_OPEN arith_expr TOKEN_PAREN_CLOSE { $$ = $2; };
;
factor: power { $$ = ast_init_arith_op($1, NULL, NULL); };
| factor TOKEN_ARITH_MUL power { $$ = ast_init_arith_op($3, "*", $1); };
| factor TOKEN_ARITH_DIV power { $$ = ast_init_arith_op($3, "/", $1); };
| factor TOKEN_ARITH_MOD power { $$ = ast_init_arith_op($3, "%", $1); };
;
power: term
| power TOKEN_ARITH_POWER term { $$ = ast_init_arith_op($3, "**", $1); }
term: identifier;
| literal_int;
| literal_float;
The results is that if, for instance, I enter this :
myVar = (a + b) * 2
I get error: syntax error, unexpected TOKEN_ARITH_MUL, expecting TOKEN_EOL.
So I've tried to change the %token for %left for the first three ones, with the same problem.
I've also tried to change the %token for the assign to a %right, unfortunately I got an error at compile time (error: rule given for assign, which is a token) - in retrospect, make sense.
It looks like the TOKEN_PAREN_OPEN arith_expr TOKEN_PAREN_CLOSE collapse to a arith_expr and the assign kicks in right away. What am I doing wrong?
According to your grammar, a multiplication operator can appear only between a factor and a power. An expression enclosed in parentheses is neither and cannot be reduced to either. As far as the part of the grammar presented goes, it is an arith_expr.
#n.m.'s comment is correct: you put the rule for a parenthesized expression in the wrong place. It should be a term, not an arith_expr. However, your followup comment suggests that you misunderstood. Do not change the production. Just move it, as is, to be one of the alternatives for term:
term: identifier
| literal_int
| literal_float
| TOKEN_PAREN_OPEN arith_expr TOKEN_PAREN_CLOSE
;
That allows a parenthsized expression to appear as a complete expression itself or as an operand of any operator.

Error: undeclared variable $

When I run the bisonprogram below (by bison file.y) , I get the error missing a declaration type for $2 in 'seq' :
%union {
char cval;
}
%token <cval> AMINO
%token STARTCODON STOPCODON
%%
series: STARTCODON seq STOPCODON {printf("%s", $2);}
seq : AMINO
| seq AMINO
;
%%
I would like to know why I get this error, and how I can correctly declare the variable $2
You haven't told Bison what type seq is, so it doesn't know what to do with $2.
Use the %type directive:
%type <cval> seq
Note that the type used for $2 is a single char, which is not a string as expected by the "%s" format. You need to come up with a way to create your own string from the sequence.

Lex program to recognise valid arithmetic expression and also to recognise valid identifies and operators

the below program checks the arithmatic expression like a+b a-b it gives the output valid or invalid;
%{
#include<stdio.h>
#include<stdlib.h>
int c,d,bo=0,bc=0;
%}
operand [a-zA-Z0-9]+
operator [+\-\/*]
%%
//the operand is one count higher than the operator if that fails then its is invalid eg a+b operand is two and operator is 1;
{operator} {d++;printf("%s is an operator \n",yytext);}
{operand} {c=d+1;printf("%s is an operand \n",yytext);}
"(" {if(bc<=bo)bo++;}
")" {bc++;}
\n {if(bo==bc&&c>d){printf("valid exp");}else {printf("invalid exp");};exit(0);}
%%
void main(){
yylex();
}
the problem that im am facing is when i check a++b it says valid but when i try a+b- and other values like )a+b(, (a+b(,+a-b++ it gives me right output .only for a++b and a--b or a+-b it fails.
Im kinda stuck.
this is the if condition for \n,I put it on \n bcz when I press enter it gives me the output and exits.
if(bo==bc && c>d) //c>d means if operand is greater than operator
{ printf("valid exp"); }
else {
printf("invalid exp"); }
i just changed c=d+1 to c++; it was a logical error that produced the error rather than checking for operand greater that operator i added 1 extra to operand and it always evaluated to true in a++b
%{
#include<stdio.h>
#include<stdlib.h>
int c,d,bo=0,bc=0;
%}
operand [a-zA-Z0-9]+
operator [+\-\/*]
%%
//the operand is one count higher than the operator if that fails then its is invalid eg a+b operand is two and operator is 1;
{operator} {d++;printf("%s is an operator \n",yytext);}
{operand} {c++;printf("%s is an operand \n",yytext);}
"(" {if(bc<=bo)bo++;}
")" {bc++;}
\n {if(bo==bc&&c>d){printf("valid exp");}else {printf("invalid exp");};exit(0);}
%%
void main(){
yylex();
}

Array initialization with a ternary operator?

I don't have access to the C11 specification, therefore I can't investigate this bug.
The following declaration rises an error during compilation:
int why[2] = 1 == 1 ? {1,2} : {3,4};
The error is: expected expression before { and: expected expression before :
This is not valid C11.
You can only initialize an array with an initializer-list not with an expression.
int why[2] = { ... }; // initializer-list {}
Moreover, 1 == 1 ? {1,2} : {3,4} is not a valid C expression because {1, 2} is not a C expression.
Just for information using compound literals you can have something close to what you want using a pointer object:
int *why = (1 == 1) ? (int[2]) {1,2} : (int[2]) {3,4};
from Charles Bailey's answer here: Gramma from conditional-expression
conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression
And
1 == 1 ? {1,2} : {3,4};
^ ^ are not expressions
that is the reason compiler gives error like:
error: expected expression before ‘{’ token // means after ?
error: expected expression before ‘:’ token // before :
Edit as #Rudi Rüssel commented:
following is a valid code in c:
int main(){
{}
;
{1,2;}
}
we use {} to combine statements ; in C.
note: if I write {1,2} then its error (*expected ‘;’ before ‘}’ token*), because 1,2 is an expression but not a statement.
For OP: what is The Expression Statement in C and what is Block Statement and Expression Statements
edit2:
Note: How #ouah uses typecase to convert it into expression, yes:
To understand run this code:
int main(){
printf("\n Frist = %d, Second = %d\n",((int[2]){1,2})[0],((int[2]) {1,2})[1]);
}
It works like:
~$ ./a.out
Frist = 1, Second = 2
Initializer lists are not expressions, so they cannot be used in expressions.
I suggest you leave the array uninitialized and use memcpy.
int why[2];
memcpy( why, 1 == 1 ? (int[2]){1,2} : (int[2]){3,4}, sizeof why );

Resources