for an if and if-else statement grammar I have some grammar.
The following is a simplified excerpt from my code to show how I have the if and if-else statement grammar, so if there are errors unrelated to that grammar then don't worry about it. I assure you there are no compilation errors in the code I am using:
%token IF ELSE VOID ID VOID_PARAMS
%nonassoc shift_else
%nonassoc ElSE
%%
Func: VOID ID VOID_PARAMS '{' Stmt '}'
;
If_Stmt: IF '(' L_expr ')' Stmt
;
Stmt: If_Stmt shift_else
;
| If_Stmt ELSE Stmt
;
| ';'
| ...
;
L_expr: ...
;
It has been working just fine for a while, but now it is finding errors when it reaches the end of a function. For example:
void foo(void) {
if (1 > 5)
;
}
gives this output ( using yyerror() ):
Found unexpected token: '}' on line 4
Any suggestions as to why this could be happening? And what can I do to fix this?
After reading if(1 > 5) ; the parser is expecting an ELSE or a shift_else.
Presumably there's not an ELSE there, since it doesn't appear in the source code.
Unless your lexer conjures a shift_else out of thin air, the next token will be a }, which is not an ELSE or shift_else, hence the error.
Related
So basically, in my bison file if yyparse fails (i.e there is syntax error) I want to print 'ERROR' statement and not print anything else that I do in the above part of bison file in fac the stmt part. if yyparse returns 1 is there any way to skip the content of middle part of bison file? such as idk maybe writing if statement above the stmt part etc? I would appreciate any help! Thanks in advance.
Such as :
%{
#include "header.h"
#include <stdio.h>
void yyerror (const char *s)
{}
extern int line;
%}
%token ...///
%// some tokens types etc...
%union
{
St class;
int value;
char* str;
int line_num;
float float_value;
}
%start prog
%%
prog: '[' stmtlst ']'
;
stmtlst: stmtlst stmt |
;
stmt: setStmt | if | print | unaryOperation | expr
{
if ($1.type==flt && $1.line_num!=0) {
printf("Result of expression on %d is (",$1.line_num);
printf( "%0.1f)\n", $1.float_value);
$$.type=flt;
}
else if ($1.type==integer && $1.line_num!=0){
$$.type=integer;
printf("Result of expression on %d is (%d)\n",$1.line_num,$1.value);
}
else if ($1.type==string && $1.line_num!=0) {
$$.type=string;
printf("Result of expression on %d is (%s)\n",$1.line_num,$1.str);
}
else if ($1.type==mismatch && $1.line_num!=0)
{
$$.type=mismatch;
printf("Type mismatch on %d \n",$1.line_num);
}
else{ }
}
%%
int main ()
{
if (yyparse()) {
// if parse error happens only print this printf and not above stmt part
printf("ERROR\n");
return 1;
}
else {
// successful parsing
return 0;
}
}
Unfortunately, time travel is not an option. By the time the error is detected, the printf calls have already happened. You cannot make them unhappen.
What you can so (and must do if you are writing a compiler rather than a calculator) is create some data structure which represents the parsed program, instead of trying to execute it immediately. That data structure -- which could be an abstract syntax tree (AST) or a list of three-address instructions, or whatever, will be used to produce the compiled program. Only when the compiled program is run will the statements be evaluated.
Here is a little part of my code, and I got an error saying
request for member 's' is something not a structure or a union.
I have this error because I don't need anymore to use s, because I specified his type. The problem I have, is that I need another way to make refference to that 's', instead of $3.s , and I can't find how to do that. If I put only $3, I won't get an error at '$3.s[0]', but I'll get an error at 'strcpy($3.s, $3.s+1)'
I am new in lex&yacc and the things that I know until now, can't help me to solve this.
%union{
int i;
char *s;
}
%left '+','-'
%left '*','/'
%left UNARYMINUS
%type <i> expr
%type <s> instr
%token <i> NUMBER
%token <s> WORD
%token <s> SPACE
%%
instr: SPACE instr { }
|WORD '=' expr ';' {
int v;
if ($3.s[0]=='$')
{
fprintf(fout, "\tmove\t$%d, %s\n\n", variabile($1.s), $3.s);
strcpy($3.s, $3.s+1);
v=atoi($3.s);
if (v>nvar)
erasereg(v);
}
else
fprintf(fout, "\taddi\t$%d, $0, %s\n\n", variabila($1.s), $3.s);
free($1.s);
free($3.s);
}
;
With %type <i> expr, you tell Yacc that expr is an integer but you still check whether it points to $. It's either one or the other. Instead of trying to cram all the functionality into the block that parses instr, you could:
match $variables with lex rules and look it up in the symbol table there
"$"[A-Za-z][A-Za-z0-9]* { return var_lookup(yytext); }
Or you could look them up in the yacc rule for expr
expr: WORD {
$$ = $1[0]=='$' ? var_lookup($1) : atoi($1);
}
Also, Arguments to %left are separated by spaces, not commas, and you don't call non-function pointers, you use/dereference them.
I'm writing my own scripting language using flex and bison. I have a grammar and I'm able to generate a parser which works fine with a correct script. I would like to be able to add also some meaningful error message for special error situations. For example I would like to be able to recognize unmatched parenthesis for a block of statements or a missing semicolon and so on.
Suppose I have these statements (here the grammar is not complete):
...
statements: statement SEMICOLON statements
| statement SEMICOLON;
statement: ifStatement
| whileStatement
;
ifStatement: IF expression THEN statements END
| IF expression THEN statements ELSE statements END
;
whileStatement: DO statements WHILE expression END
;
...
I would like to be able to print messages such as "Missing semicolon" or "Missing then keyword" and so on. Should I modify my grammar to enable error handling? Or is there some Bison feature to do this?
Update (Sept 2021)
Since version 3.7 Bison supports user-defined error messages: specify %define parse.error custom, and provide a yyreport_syntax_error function, something like:
int
yyreport_syntax_error (const yypcontext_t *ctx)
{
int res = 0;
YYLOCATION_PRINT (stderr, *yypcontext_location (ctx));
fprintf (stderr, ": syntax error");
// Report the tokens expected at this point.
{
enum { TOKENMAX = 10 };
yysymbol_kind_t expected[TOKENMAX];
int n = yypcontext_expected_tokens (ctx, expected, TOKENMAX);
if (n < 0)
// Forward errors to yyparse.
res = n;
else
for (int i = 0; i < n; ++i)
fprintf (stderr, "%s %s",
i == 0 ? ": expected" : " or", yysymbol_name (expected[i]));
}
// Report the unexpected token.
{
yysymbol_kind_t lookahead = yypcontext_token (ctx);
if (lookahead != YYSYMBOL_YYEMPTY)
fprintf (stderr, " before %s", yysymbol_name (lookahead));
}
fprintf (stderr, "\n");
return res;
}
More about this in the The Syntax Error Reporting Function yyreport_syntax_error section of the documentation.
Original Answer (March 2013)
Bison is not the proper tool to generate custom error messages, yet its standard error messages are not too bad either, provided you enable %error-verbose. Have a look at the documentation: http://www.gnu.org/software/bison/manual/bison.html#Error-Reporting.
If you really want to provide custom error message, do read the documentation about YYERROR, and generate rules for the patterns you want to catch, and raise errors yourself. For instance, here dividing by 0 is treated as a syntax error (which is dubious, but provides an example of custom syntax error messages).
exp:
NUM { $$ = $1; }
| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp
{
if ($3)
$$ = $1 / $3;
else
{
$$ = 1;
fprintf (stderr, "%d.%d-%d.%d: division by zero",
#3.first_line, #3.first_column,
#3.last_line, #3.last_column);
}
}
Note also that providing strings for tokens generates better error messages:
%token NUM
would generate unexpected NUM, while
%token NUM "number"
would generate unexpected number.
I'm trying to implement a calculator for nor expressions, such as true nor true nor (false nor false) using Flex and Bison, but I keep getting my error message back. Here is my .l file:
%{
#include <stdlib.h>
#include "y.tab.h"
%}
%%
("true"|"false") {return BOOLEAN;}
.|\n {yyerror();}
%%
int main(void)
{
yyparse();
return 0;
}
int yywrap(void)
{
return 0;
}
int yyerror(void)
{
printf("Error\n");
}
Here is my .y file:
/* Bison declarations. */
%token BOOLEAN
%left 'nor'
%% /* The grammar follows. */
input:
/* empty */
| input line
;
line:
'\n'
| exp '\n' { printf ("%s",$1); }
;
exp:
BOOLEAN { $$ = $1; }
| exp 'nor' exp { $$ = !($1 || $3); }
| '(' exp ')' { $$ = $2; }
;
%%
Does anyone see the problem?
The simple way to handle all the single-character tokens, which as #vitaut correctly says you aren't handling at all yet, is to return yytext[0] for the dot rule, and let the parser sort out which ones are legal.
You have also lost the values of the BOOLEANs 'true' and 'false', which should be stored into yylval as 1 and 0 respectively, which will then turn up in $1, $3 etc. If you're going to have more datatypes in the longer term, you need to look into the %union directive.
The reason why you get errors is that your lexer only recognizes one type of token, namely BOOLEAN, but not the newline, parentheses or nor (and you produce an error for everything else). For single letter tokens like parentheses and the newline you can return the character itself as a token type:
\n { return '\n'; }
For nor thought you should introduce a token type like you did for BOOLEAN and add an appropriate rule to the lexer.
see the following code for yacc.
if i remove the production factor : '!' expr, the parsing conflict disappears.
what is happening here?
%{
#include <stdio.h>
#include <ctype.h>
%}
%token TRUE
%token FALSE
%%
line : line expr '\n' { printf("%d\n", $2); }
| line '\n'
|
;
expr : expr "or" term { printf("expr : expr or term\n"); $$ = $1 | $3; }
| term { printf("expr : term\n"); }
;
term : term "and" factor { printf("term : term and factor\n"); $$ = $1 & $3; }
| factor { printf("term : factor\n"); }
;
factor : '(' expr ')' { printf("factor : (expr)\n"); $$ = $2; }
| '!' expr { printf("factor : !expr\n"); $$ = !$2; }
| TRUE { printf("factor : TRUE\n"); }
| FALSE { printf("factor : FALSE\n"); }
;
%%
#include "lex.yy.c"
int main(int argc, char** argv)
{
while (yyparse() == 0) {
}
return 0;
}
It looks to me like the conflict probably arises because when the parser sees a '!', it's running into problems with your rewrites for 'expr'. Ignoring the other productions for 'factor', specifically look at these two productions:
expr : expr "or" term { printf("expr : expr or term\n"); $$ = $1 | $3; }
| term { printf("expr : term\n"); }
;
factor : '!' expr { printf("factor : !expr\n"); $$ = !$2; }
Since expr is recursive, when the parser sees a '!', it knows that negation applies to the following expr, but if you write "! TRUE OR TRUE", does that negation apply only to the first true, or to the entire disjunction?
EDIT: In other words, it can't decide if it needs to shift the "or" or reduce "expr".
Setting the -v command line option in yacc will produce a .output file that's got all kinds of goodies in it, including diagnostic information for shift/reduce conflicts. It'll show you all the states of the DFA and where conflicts occur, and sometimes show you exactly why.
Putting negations in their own production logically "between" 'term' and 'factor' should do the trick.
If you change factor: ! expr to factor: ! factor the conflicts will go away.
Analyzing just the first conflict, the problem is that a term can reduce to expr or become a more complex term. Without !, this decision can be made with only one symbol of lookahead.
Note that shift/reduce conflicts are not necessarily errors. The conflict is resolved by doing the shift, which may well be what you want. Most real production grammars contain a number of shift/reduce conflicts.