How to skip parts of bison if yyparse fails - c

So basically, in my bison file if yyparse fails (i.e there is syntax error) I want to print 'ERROR' statement and not print anything else that I do in the above part of bison file in fac the stmt part. if yyparse returns 1 is there any way to skip the content of middle part of bison file? such as idk maybe writing if statement above the stmt part etc? I would appreciate any help! Thanks in advance.
Such as :
%{
#include "header.h"
#include <stdio.h>
void yyerror (const char *s)
{}
extern int line;
%}
%token ...///
%// some tokens types etc...
%union
{
St class;
int value;
char* str;
int line_num;
float float_value;
}
%start prog
%%
prog: '[' stmtlst ']'
;
stmtlst: stmtlst stmt |
;
stmt: setStmt | if | print | unaryOperation | expr
{
if ($1.type==flt && $1.line_num!=0) {
printf("Result of expression on %d is (",$1.line_num);
printf( "%0.1f)\n", $1.float_value);
$$.type=flt;
}
else if ($1.type==integer && $1.line_num!=0){
$$.type=integer;
printf("Result of expression on %d is (%d)\n",$1.line_num,$1.value);
}
else if ($1.type==string && $1.line_num!=0) {
$$.type=string;
printf("Result of expression on %d is (%s)\n",$1.line_num,$1.str);
}
else if ($1.type==mismatch && $1.line_num!=0)
{
$$.type=mismatch;
printf("Type mismatch on %d \n",$1.line_num);
}
else{ }
}
%%
int main ()
{
if (yyparse()) {
// if parse error happens only print this printf and not above stmt part
printf("ERROR\n");
return 1;
}
else {
// successful parsing
return 0;
}
}

Unfortunately, time travel is not an option. By the time the error is detected, the printf calls have already happened. You cannot make them unhappen.
What you can so (and must do if you are writing a compiler rather than a calculator) is create some data structure which represents the parsed program, instead of trying to execute it immediately. That data structure -- which could be an abstract syntax tree (AST) or a list of three-address instructions, or whatever, will be used to produce the compiled program. Only when the compiled program is run will the statements be evaluated.

Related

Flex and Bison code - syntax error always

First of all I need to say that I am very new to Flex and Bison and I am a bit confused. There is a school project that want us to create a compiler using Flex and Bison for some kind of CLIPS language.
My code has a lot of problems but the main one is that whatever i type i see a syntax error while the result should be something else. The ideal scenario would be to fully work for the language CLIPS. EG when i write "4" it get syntax error. Reading my code maybe will get you understand this better. If i write "test 3 4" it doesnt show syntax error but it counts it as an unknown token and thats wrong again..i'm completely lost. the code is a prototype by the school and we need to do some changes. if you have any questions dont hesitate to ask. THank you!
P.S.: dont mind the comments, they are in greek.
FLEX CODE:
%option noyywrap
/* Kwdikas C gia orismo twn apaitoumenwn header files kai twn metablhtwn.
Otidhpote anamesa sta %{ kai %} metaferetai autousio sto arxeio C pou
tha dhmiourghsei to Flex. */
%{
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/* Header file pou periexei lista me ola ta tokens */
#include "token.h"
/* Orismos metrhth trexousas grammhs */
int line = 1;
%}
/* Onomata kai antistoixoi orismoi (ypo morfh kanonikhs ekfrashs).
Meta apo auto, mporei na ginei xrhsh twn onomatwn (aristera) anti twn,
synhthws idiaiterws makroskelwn kai dysnohtwn, kanonikwn ekfrasewn */
/* dimiourgia KE simfona me ta orismata tis glossas */
DELIMITER [ \t]+
INTCONST [+-]*[1-9][0-9]*
VARIABLE [?][A-Za-z0-9]*
DEFINITIONS [a-zA-Z][-|_|A-Z|a-z|0-9]*
COMMENTS ^;.*$
/* Gia kathe pattern (aristera) pou tairiazei ekteleitai o antistoixos
kwdikas mesa sta agkistra. H entolh return epitrepei thn epistrofh
mias arithmhtikhs timhs mesw ths synarthshs yylex() */
/* an sinantisei diaxoristi i sxolio to agnoei, an sinantisei akeraio,metavliti i orismo ton emfanizei. se kathe alli periptosi ektiponei oti den anagnorizei to token, ti grammi pou vrisketai kai to string pou dothike */
%%
{DELIMITER} {;}
"bind" { return BIND;}
"test" { return TEST;}
"read" { return READ;}
"printout" { return PRINTOUT;}
"deffacts" { return DEFFACTS;}
"defrule" { return DEFRULE;}
"->" { return '->';}
"=" { return '=';}
"+" { return '+';}
"-" { return '-';}
"*" { return '*';}
"/" { return '/';}
"(" { return '(';}
")" { return ')';}
{INTCONST} { return INTCONST; }
{VARIABLE} { return VARIABLE; }
{DEFINITIONS} { return DEFINITIONS; }
{COMMENTS} {;}
\n { line++; printf("\n"); }
.+ { printf("\tLine=%d, UNKNOWN TOKEN, value=\"%s\"\n",line, yytext);}
<<EOF>> { printf("#END-OF-FILE#\n"); exit(0); }
%%
/* Pinakas me ola ta tokens se antistoixia me tous orismous sto token.h */
char *tname[11] = {"DELIMITER","INTCONST" , "VARIABLE", "DEFINITIONS", "COMMENTS", "BIND", "TEST", "READ", "PRINTOUT", "DEFFACTS", "DEFRULE"};
BISON CODE:
%{
/* Orismoi kai dhlwseis glwssas C. Otidhpote exei na kanei me orismo h arxikopoihsh
metablhtwn & synarthsewn, arxeia header kai dhlwseis #define mpainei se auto to shmeio */
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(char *);
%}
/* Orismos twn anagnwrisimwn lektikwn monadwn. */
%token INTCONST VARIABLE DEFINITIONS PLUS NEWLINE MINUS MULT DIV COM BIND TEST READ PRINTOUT DEFFACTS DEFRULE
%%
/* Orismos twn grammatikwn kanonwn. Kathe fora pou antistoixizetai enas grammatikos
kanonas me ta dedomena eisodou, ekteleitai o kwdikas C pou brisketai anamesa sta
agkistra. H anamenomenh syntaksh einai:
onoma : kanonas { kwdikas C } */
program:
program expr NEWLINE { printf("%d\n", $2); }
|
;
expr:
INTCONST { $$ = $1; }
| VARIABLE { $$ = $1; }//prosthiki tis metavlitis
| PLUS expr expr { $$ = $2 + $3; }//prosthiki tis prosthesis os praksi
| MINUS expr expr { $$ = $2 - $3; } //prosthiki tis afairesis os praksi
| MULT expr expr { $$ = $2 * $3; }//prosthiki tou pollaplasiasmou os praksi
| DIV expr expr { $$ = $2 / $3; }//prosthiki tis diairesis os praksi
| COM { $$ = $1; }//prosthiki ton sxolion
| DEFFACTS expr { $$ = $2; }//prosthiki ton gegonoton
| DEFRULE expr { $$ = $2; }//prosthiki ton kanonon
| BIND expr expr { $$ = $2;}//prosthiki tis bind
| TEST expr expr { $$ = $2 ;}//prosthiki tis test
| READ expr expr { $$ = $2 ;}//prosthiki tis read
| PRINTOUT expr expr { $$ = $2 ;}//prosthiki tis printout
;
%%
/* H synarthsh yyerror xrhsimopoieitai gia thn anafora sfalmatwn. Sygkekrimena kaleitai
apo thn yyparse otan yparksei kapoio syntaktiko lathos. Sthn parakatw periptwsh h
synarthsh epi ths ousias typwnei mhnyma lathous sthn othonh. */
void yyerror(char *s) {
fprintf(stderr, "Error: %s\n", s);
}
/* H synarthsh main pou apotelei kai to shmeio ekkinhshs tou programmatos.
Sthn sygkekrimenh periptwsh apla kalei thn synarthsh yyparse tou Bison
gia na ksekinhsei h syntaktikh analysh. */
int main(void) {
yyparse();
return 0;
}
TOKEN FILE:
#define DELIMITER 1
#define INTCONST 2
#define VARIABLE 3
#define DEFINITIONS 4
#define COMMENTS 5
#define BIND 6
#define TEST 7
#define READ 8
#define PRINTOUT 9
#define DEFFACTS 10
#define DEFRULE 11
MAKEFILE:
all:
bison -d simple-bison-code.y
flex mini-clips-la.l
gcc simple-bison-code.tab.c lex.yy.c -o B2
./B2
clean:
rm simple-bison-code.tab.c simple-bison-code.tab.h lex.yy.c B2
Your top-level rule is:
program:
program expr NEWLINE
which cannot succeed unless the parser sees a NEWLINE token. But it will never see one, because your lexical scanner never sends one; when it sees a newline, it increments the line count but doesn't return anything.
All your tokens are considered invalid because your lexical scanner uses its own definitions of the token values. You shouldn't do that. The parser generator (bison/yacc) will generate a header file containing the correct definitions; that is, the values it is expecting to see.
There are various other problems, probably more than I noticed. The most important is that you should not call exit(0) in the <<EOF>> rule, since that will mean that the parser can never succeed; it does not succeed until it is passed an EOF token. In fact, you should not normally have an <<EOF>> rule; the default action is to return 0 and that is pretty well the only action which makes sense.
Also, '->' is not a correct C literal. The compiler would have complained about it if you had enabled compiler warnings (-Wall), which you should always do, even if you are compiling generated code.
And your scanner's last pattern, intended to trigger on bad tokens, is .+, which will match the entire line, not just the erroneous character. Since (f)lex scanners accept the pattern with the longest match, most of your other patterns will never match. (Flex usually warns you about unmatchable patterns. Didn't you get such a warning?)
The fallback pattern should be .|\n, although you can use . if you are absolutely sure that every newline will be matched by some rule. I like to use %option nodefault, which will cause flex to warn me if there is some possible input not matched by any rule.

Swap between stdin and file in Bison

I have the following code in Bison, which extends the mfcalc proposed in the guide, implementing some functions like yylex() externally with FLEX.
To understand my problem, the key rules are in non-terminal token called line at the beginning of the grammar. Concretely, the rules EVAL CLOSED_STRING '\n' and END (this token is sent by FLEX when EOF is detected. The first opens a file and points the input to that file. The second closes the file and points the input to stdin input.
I'm trying to make a rule eval "file_path" to load tokens from a file and evaluate them. Initially I have yyin = stdin (I use the function setStandardInput() to do this).
When a user introduces eval "file_path" the parser swaps yyinfrom stdin to the file pointer (with the function setFileInput()) and the tokens are readen correctly.
When the END rule is reached by the parser, it tries to restore the stdin input but it gets bugged. This bug means the calculator doesn't ends but what I write in the input isn't evaluated.
Note: I supposed there are no errors in the grammar, because error recovery it's not complete. In the file_path you can use simple arithmetic operations.
As a summary, I want to swap among stdin and file pointers as inputs, but when I swap to stdin it gets bugged, except I start the calculator with stdin as default.
%{
/* Library includes */
#include <stdio.h>
#include <math.h>
#include "utils/fileutils.h"
#include "lex.yy.h"
#include "utils/errors.h"
#include "utils/stringutils.h"
#include "table.h"
void setStandardInput();
void setFileInput(char * filePath);
/* External functions and variables from flex */
extern size_t yyleng;
extern FILE * yyin;
extern int parsing_line;
extern char * yytext;
//extern int yyerror(char *s);
extern int yyparse();
extern int yylex();
int yyerror(char * s);
%}
/***** TOKEN DEFINITION *****/
%union{
char * text;
double value;
}
%type <value> exp asig
%token LS
%token EVAL
%token <text> ID
%token <text> VAR
%token <value> FUNCTION
%token <value> LEXEME
%token <value> RESERVED_WORD
%token <value> NUMBER
%token <value> INTEGER
%token <value> FLOAT
%token <value> BINARY
%token <value> SCIENTIFIC_NOTATION
%token <text> CLOSED_STRING
%token DOCUMENTATION
%token COMMENT
%token POW
%token UNRECOGNIZED_CHAR
%token MALFORMED_STRING_ERROR
%token STRING_NOT_CLOSED_ERROR
%token COMMENT_ERROR
%token DOCUMENTATION_ERROR
%token END
%right '='
%left '+' '-'
%left '/' '*'
%left NEG_MINUS
%right '^'
%right '('
%%
input: /* empty_expression */ |
input line
;
line: '\n'
| asig '\n' { printf("\t%f\n", $1); }
| asig END { printf("\t%f\n", $1); }
| LS { print_table(); }
| EVAL CLOSED_STRING '\n' {
// Getting the file path
char * filePath = deleteStringSorroundingQuotes($2);
setFileInput(filePath);
| END { closeFile(yyin); setStandardInput();}
;
exp: NUMBER { $$ = $1; }
| VAR {
lex * result = table_search($1, LEXEME);
if(result != NULL) $$ = result->value;
}
| VAR '(' exp ')' {
lex * result = table_search($1, FUNCTION);
// If the result is a function, then invokes it
if(result != NULL) $$ = (*(result->function))($3);
else yyerror("That identifier is not a function.");
}
| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp {
if($3 != 0){ $$ = $1 / $3;};
yyerror("You can't divide a number by zero");
}
| '-' exp %prec NEG_MINUS { $$ = -$2; }
| exp '^' exp { $$ = pow($1, $3); }
| '(' exp ')' { $$ = $2; }
| '(' error ')' {
yyerror("An error has ocurred between the parenthesis."); yyerrok; yyclearin;
}
;
asig: exp { $$ = $1; }
| VAR '=' asig {
int type = insertLexeme($1, $3);
if(type == RESERVED_WORD){
yyerror("You tried to assign a value to a reserved word.");
YYERROR;
}else if(type == FUNCTION){
yyerror("You tried to assign a value to a function.");
YYERROR;
}
$$ = $3;
}
;
%%
void setStandardInput(){
printf("Starting standard input:\n");
yyin = NULL;
yyin = stdin;
yyparse();
}
void setFileInput(char * filePath){
FILE * inputFile = openFile(filePath);
if(inputFile == NULL){
printf("The file couldn't be loaded. Redirecting to standard input: \n");
setStandardInput();
}else{
yyin = inputFile;
}
}
int main(int argc, char ** argv) {
create_table(); // Table instantiation and initzialization
initTable(); // Symbol table initzialization
setStandardInput(); // yyin = stdin
while(yyparse()!=1);
print_table();
// Table memory liberation
destroyTable();
return 0;
}
int yyerror(char * s){
printf("---------- Error in line %d --> %s ----------------\n", parsing_line, s);
return 0;
}
It's not too difficult to create a parser and a scanner which can be called recursively. (See below for an example.) But neither the default bison-generated parser nor the flex-generated scanner are designed to be reentrant. So with the default parser/scanner, you shouldn't call yyparse() inside SetStandardInput, because that function is itself called by yyparse.
If you had a recursive parser and scanner, on the other hand, you could significantly simplify your logic. You could get rid of the END token (which is, in any case, practically never a good idea) and just recursively call yyparse in your action for EVAL CLOSED_STRING '\n'.
If you want to use the default parser and scanner, then your best solution is to use Flex's buffer stack to push and later pop a "buffer" corresponding to the file to be evaluated. (The word "buffer" here is a bit confusing, I think. A Flex "buffer" is actually an input source, such as a file; it's called a buffer because only a part of it is in memory, but Flex will read the entire input source as part of processing a "buffer".)
You can read about the buffer stack usage in the flex manual, which includes sample code. Note that in the sample code, the end of file condition is entirely handled inside the scanner, which is usual for this architecture.
It is possible in this case to manufacture an end-of-file indicator (although you cannot use END because that is used to indicate the end of all input). That has the advantage of ensuring that the contents of the evaluated file are parsed as a whole, without leaking a partial parse back to the including file, but you will still want to pop the buffer stack inside the scanner because it annoyingly tricky to get end-of-file handling correct without violating any of the API constraints (one of which is that you cannot reliably read EOF twice on the same "buffer").
In this case, I would recommend generating a reentrant parser and scanner and simply doing a recursive call. It's a clean and simple solution, and avoiding global variables is always good.
A simple example. The simple language below only has echo and eval statements, both of which require a quoted string argument.
There are a variety of ways to hook together a reentrant scanner and reentrant parser. All of them have some quirks and the documentation (although definitely worth reading) has some holes. This is a solution which I've found useful. Note that most of the externally visible functions are defined in the scanner file, because they rely on interfaces defined in that file for manipulating the reentrant scanner context object. You can get flex to export a header with the approriate definitions, but I've generally found it simpler to write my own wrapper functions and export those. (I don't usually export yyscan_t either; normally I create a context object of my own which has a yyscan_t member.)
There is an annoying circularity which is largely the result of bison not allowing for the possibility to introduce user code at the top of yyparse. Consequently, it is necessary to pass the yyscan_t used to call the lexer as an argument to yyparse, which means that it is necessary to declare yyscan_t in the bison file. yyscan_t is actually declared in the scanner generated file (or the flex-generated header, if you've asked for one), but you can't include the flex-generated header in the bison-generated header because the flex-generated header requires YYSTYPE which is declared in the bison-generated header.
I normally avoid this circularity by using a push parser, but that's pushing the boundaries for this question, so I just resorted to the usual work-around, which is to insert
typedef void* yyscan_t;
in the bison file. (That's the actual definition of yyscan_t, whose actual contents are supposed to be opaque.)
I hope the rest of the example is self-evident, but please feel free to ask for clarification if there is anything which you don't understand.
file recur.l
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "recur.tab.h"
%}
%option reentrant bison-bridge
%option noinput nounput nodefault noyywrap
%option yylineno
%%
"echo" { return T_ECHO; }
"eval" { return T_EVAL; }
[[:alpha:]][[:alnum:]]* {
yylval->text = strdup(yytext);
return ID;
}
["] { yyerror(yyscanner, "Unterminated string constant"); }
["][^"\n]*["] {
yylval->text = malloc(yyleng - 1);
memcpy(yylval->text, yytext + 1, yyleng - 2);
yylval->text[yyleng - 2] = '\0';
return STRING;
}
"." { return yytext[0]; }
[[:digit:]]*("."[[:digit:]]*)? {
yylval->number = strtod(yytext, NULL);
return NUMBER;
}
[ \t]+ ;
.|\n { return yytext[0]; }
%%
/* Use "-" or NULL to parse stdin */
int parseFile(const char* path) {
FILE* in = stdin;
if (path && strcmp(path, "-") != 0) {
in = fopen(path, "r");
if (!in) {
fprintf(stderr, "Could not open file '%s'\n", path);
return 1;
}
}
yyscan_t scanner;
yylex_init (&scanner);
yyset_in(in, scanner);
int rv = yyparse(scanner);
yylex_destroy(scanner);
if (in != stdin) fclose(in);
return rv;
}
void yyerror(yyscan_t yyscanner, const char* msg) {
fprintf(stderr, "At line %d: %s\n", yyget_lineno(yyscanner), msg);
}
file recur.y
%code {
#include <stdio.h>
}
%define api.pure full
%param { scanner_t context }
%union {
char* text;
double number;
}
%code requires {
int parseFILE(FILE* in);
}
%token ECHO "echo" EVAL "eval"
%token STRING ID NUMBER
%%
program: %empty | program command '\n'
command: echo | eval | %empty
echo: "echo" STRING { printf("%s\n", $2); }
eval: "eval" STRING { FILE* f = fopen($2, "r");
if (f) {
parseFILE(f);
close(f);
}
else {
fprintf(stderr, "Could not open file '%s'\n",
$2);
YYABORT;
}
}
%%

Lexical & Grammar Analysis using Yacc and Lex

I am fairly new to Yacc and Lex programming but I am training myself with a analyser for C programs.
However, I am facing a small issue that I didn't manage to solve.
When there is a declaration for example like int a,b; I want to save a and b in an simple array. I did manage to do that but it saving a bit more that wanted.
It is actually saving "a," or "b;" instead of "a" and "b".
It should have worked as $1should only return tID which is a regular expression recognising only a string chain. I don't understand why it take the coma even though it defined as a token. Does anyone know how to solve this problem ?
Here is the corresponding yacc declarations :
Declaration :
tINT Decl1 DeclN
{printf("Declaration %s\n", $2);}
| tCONST Decl1 DeclN
{printf("Declaration %s\n", $2);}
;
Decl1 :
tID
{$$ = $1;
tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
printf("Added %s at adress %d\n", $1, compteur);
compteur++;}
| tID tEQ E
{$$ = $1;
tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
printf("Added %s at adress %d\n", $1, compteur);
pile[compteur]=$3;
compteur++;}
;
DeclN :
/*epsilon*/
| tVIR Decl1 DeclN
And the extract of the Lex file :
separateur [ \t\n\r]
id [A-Za-z][A-Za-z0-9_]*
nb [0-9]+
nbdec [0-9]+\.[0-9]+
nbexp [0-9]+e[0-9]+
"," { return tVIR; }
";" { return tPV; }
"=" { return tEQ; }
{separateur} ;
{id} { yylval.str = yytext; return tID; }
{nb}|{nbdec}|{nbexp} { yylval.nb = atoi(yytext); return tNB; }
%%
int yywrap() {return 1;}
The problem is that yytext is a reference into lex's token scanning buffer, so it is only valid until the next time the parser calls yylex. You need to make a copy of the string in yytext if you want to return it. Something like:
{id} { yylval.str = strdup(yytext); return tID; }
will do the trick, though it also exposes you to the possibility of memory leaks.
Also, in general when writing lex/yacc parsers involving single character tokens, it is much clearer to use them directly as charcter constants (eg ',', ';', and '=') rather than defining named tokens (tVIR, tPV, and tEQ in your code).

What is wrong with this Bison grammar?

Im trying to build a Bison grammar and seem to be missing something. I kept it yet very basic, still I am getting a syntax error and can't figure out why:
Here is my Bison Code:
%{
#include <stdlib.h>
#include <stdio.h>
int yylex(void);
int yyerror(char *s);
%}
// Define the types flex could return
%union {
long lval;
char *sval;
}
// Define the terminal symbol token types
%token <sval> IDENT;
%token <lval> NUM;
%%
Program:
Def ';'
;
Def:
IDENT '=' Lambda { printf("Successfully parsed file"); }
;
Lambda:
"fun" IDENT "->" "end"
;
%%
main() {
yyparse();
return 0;
}
int yyerror(char *s)
{
extern int yylineno; // defined and maintained in flex.flex
extern char *yytext; // defined and maintained in flex.flex
printf("ERROR: %s at symbol \"%s\" on line %i", s, yytext, yylineno);
exit(2);
}
Here is my Flex Code
%{
#include <stdlib.h>
#include "bison.tab.h"
%}
ID [A-Za-z][A-Za-z0-9]*
NUM [0-9][0-9]*
HEX [$][A-Fa-f0-9]+
COMM [/][/].*$
%%
fun|if|then|else|let|in|not|head|tail|and|end|isnum|islist|isfun {
printf("Scanning a keyword\n");
}
{ID} {
printf("Scanning an IDENT\n");
yylval.sval = strdup( yytext );
return IDENT;
}
{NUM} {
printf("Scanning a NUM\n");
/* Convert into long to loose leading zeros */
char *ptr = NULL;
long num = strtol(yytext, &ptr, 10);
if( errno == ERANGE ) {
printf("Number was to big");
exit(1);
}
yylval.lval = num;
return NUM;
}
{HEX} {
printf("Scanning a NUM\n");
char *ptr = NULL;
/* convert hex into decimal using offset 1 because of the $ */
long num = strtol(&yytext[1], &ptr, 16);
if( errno == ERANGE ) {
printf("Number was to big");
exit(1);
}
yylval.lval = num;
return NUM;
}
";"|"="|"+"|"-"|"*"|"."|"<"|"="|"("|")"|"->" {
printf("Scanning an operator\n");
}
[ \t\n]+ /* eat up whitespace */
{COMM}* /* eat up one-line comments */
. {
printf("Unrecognized character: %s at linenumber %d\n", yytext, yylineno );
exit(1);
}
%%
And here is my Makefile:
all: parser
parser: bison flex
gcc bison.tab.c lex.yy.c -o parser -lfl
bison: bison.y
bison -d bison.y
flex: flex.flex
flex flex.flex
clean:
rm bison.tab.h
rm bison.tab.c
rm lex.yy.c
rm parser
Everything compiles just fine, I do not get any errors runnin make all.
Here is my testfile
f = fun x -> end;
And here is the output:
./parser < a0.0
Scanning an IDENT
Scanning an operator
Scanning a keyword
Scanning an IDENT
ERROR: syntax error at symbol "x" on line 1
since x seems to be recognized as a IDENT the rule should be correct, still I am gettin an syntax error.
I feel like I am missing something important, hopefully somebody can help me out.
Thanks in advance!
EDIT:
I tried to remove the IDENT in the Lambda rule and the testfile, now it seems to run through the line, but still throws
ERROR: syntax error at symbol "" on line 1
after the EOF.
Your scanner recognizes keywords (and prints out a debugging line, but see below), but it doesn't bother reporting anything to the parser. So they are effectively ignored.
In your bison definition file, you use (for example) "fun" as a terminal, but you do not provide the terminal with a name which could be used in the scanner. The scanner needs this name, because it has to return a token id to the parser.
To summarize, what you need is something like this:
In your grammar, before the %%:
token T_FUN "fun"
token T_IF "if"
token T_THEN "then"
/* Etc. */
In your scanner definition:
fun { return T_FUN; }
if { return T_IF; }
then { return T_THEN; }
/* Etc. */
A couple of other notes:
Your scanner rule for recognizing operators also fails to return anything, so operators will also be ignored. That's clearly not desirable. flex and bison allow an easier solution for single-character operators, which is to let the character be its own token id. That avoids having to create a token name. In the parser, a single-quoted character represents a token-id whose value is the character; that's quite different from a double-quoted string, which is an alias for the declared token name. So you could do this:
"=" { return '='; }
/* Etc. */
but it's easier to do all the single-character tokens at once:
[;+*.<=()-] { return yytext[0]; }
and even easier to use a default rule at the end:
. { return yytext[0]; }
which will have the effect of handling unrecognized characters by returning an unknown token id to the parser, which will cause a syntax error.
This won't work for "->", since that is not a single character token, which will have to be handled in the same way as keywords.
Flex will produce debugging output automatically if you use the -d flag when you create the scanner. That's a lot easier than inserting your own debugging printout, because you can turn it off by simply removing the -d option. (You can use %option debug instead if you don't want to change the flex invocation in your makefile.) It's also better because it provides consistent information, including position information.
Some minor points:
The pattern [0-9][0-9]* could more easily be written [0-9]+
The comment pattern "//".* does not require a $ lookahead at the end, since .* will always match the longest sequence of non-newline characters; consequently, the first unmatched character must either be a newline or the EOF. $ lookahead will not match if the pattern is terminated with an EOF, which will cause odd errors if the file ends with a comment without a newline at the end.
There is no point using {COMM}* since the comment pattern does not match the newline which terminates the comment, so it is impossible for there to be two consecutive comment matches. But anyway, after matching a comment and the following newline, flex will continue to match a following comment, so {COMM} is sufficient. (Personally, I wouldn't use the COMM abbreviation; it really adds nothing to readability, IMHO.)

Flex and Bison Calculator

I'm trying to implement a calculator for nor expressions, such as true nor true nor (false nor false) using Flex and Bison, but I keep getting my error message back. Here is my .l file:
%{
#include <stdlib.h>
#include "y.tab.h"
%}
%%
("true"|"false") {return BOOLEAN;}
.|\n {yyerror();}
%%
int main(void)
{
yyparse();
return 0;
}
int yywrap(void)
{
return 0;
}
int yyerror(void)
{
printf("Error\n");
}
Here is my .y file:
/* Bison declarations. */
%token BOOLEAN
%left 'nor'
%% /* The grammar follows. */
input:
/* empty */
| input line
;
line:
'\n'
| exp '\n' { printf ("%s",$1); }
;
exp:
BOOLEAN { $$ = $1; }
| exp 'nor' exp { $$ = !($1 || $3); }
| '(' exp ')' { $$ = $2; }
;
%%
Does anyone see the problem?
The simple way to handle all the single-character tokens, which as #vitaut correctly says you aren't handling at all yet, is to return yytext[0] for the dot rule, and let the parser sort out which ones are legal.
You have also lost the values of the BOOLEANs 'true' and 'false', which should be stored into yylval as 1 and 0 respectively, which will then turn up in $1, $3 etc. If you're going to have more datatypes in the longer term, you need to look into the %union directive.
The reason why you get errors is that your lexer only recognizes one type of token, namely BOOLEAN, but not the newline, parentheses or nor (and you produce an error for everything else). For single letter tokens like parentheses and the newline you can return the character itself as a token type:
\n { return '\n'; }
For nor thought you should introduce a token type like you did for BOOLEAN and add an appropriate rule to the lexer.

Resources