Flex and Bison C adder gives error after not executing expression - c

I have been trying to implement a calculator, and I cannot find sources for this Flex, Bison and C program. I really can't tell what I'm doing wrong. Here are my files:
henry#FusionReactor:~/Downloads/YASPLANG$ ls
a.out compiler.output compiler.y lex.yy.c
compiler.l compiler.tab.c file.o README
My grammar:
henry#FusionReactor:~/Downloads/YASPLANG$ cat compiler.y
%{
#include <stdio.h>
void yyerror(const char *msg) {
fprintf(stderr, "%s\n", msg);
}
int yydebug = 1;
%}
%code requires
{
#define YYSTYPE double
}
/*
%union {
char c;
char *s;
double d;
}
*/
%define parse.error verbose
%define parse.lac full
%token NUM
%token PLUS
%token NEWLINE
%left PLUS
%%
answered: %empty {; }
| answered answer {; }//{ printf("%lg is answered.\n",$2); $$ = $1; }
;
answer: NEWLINE { $$ = 0.0; }
| expr NEWLINE { $$ = $1; }
;
expr: expr PLUS expr { $$ = $1 + $3; }//printf("%lg\n", $$ = $1 + $3); printf("Doing %lg + %lg.\n", $1, $3); }
| NUM { $$ = $1; }//printf("%lg\n", $$ = $1); printf("Found number: %lg.\n", $1); }
;
%%
And my lexer:
henry#FusionReactor:~/Downloads/YASPLANG$ cat compiler.l
%{
int yylex(void);
#include "compiler.tab.c"
%}
%%
[ \t] ;
'\n' ;//{ return NEWLINE; }
'+' { return PLUS; }
^[0-9]+(\.[0-9]+)? { sscanf(yytext, "%lf", &yylval); printf("%lf = %s\n", yylval, yytext); return NUM; }
%%
int main(const char **argv, int argc) {
return yyparse();
}
I compiled it with these commands (I keep everything in my Downloads, and the folder name is an acronym, let's not worry about that too much):
henry#FusionReactor:~/Downloads/YASPLANG$ bison --report=all --verbose --debug compiler.y
henry#FusionReactor:~/Downloads/YASPLANG$ flex compiler.l
henry#FusionReactor:~/Downloads/YASPLANG$ gcc lex.yy.c -lfl
When I execute and type "5 + 5 [NEWLINE]" twice, I get:
henry#FusionReactor:~/Downloads/YASPLANG$ ./a.out
Starting parse
Entering state 0
Reducing stack by rule 1 (line 26):
-> $$ = nterm answered ()
Stack now 0
Entering state 1
Reading a token: 5 + 5
5.000000 = 5
Next token is token NUM ()
Shifting token NUM ()
Entering state 3
Reducing stack by rule 6 (line 34):
$1 = token NUM ()
-> $$ = nterm expr ()
Stack now 0 1
Entering state 6
Reading a token: +5
5 + 5
5.000000 = 5
Next token is token NUM ()
LAC: initial context established for NUM
LAC: checking lookahead NUM: Err
Constructing syntax error message
LAC: checking lookahead $end: Err
LAC: checking lookahead NUM: Err
LAC: checking lookahead PLUS: S7
LAC: checking lookahead NEWLINE: S8
syntax error, unexpected NUM, expecting PLUS or NEWLINE
Error: popping nterm expr ()
Stack now 0 1
Error: popping nterm answered ()
Stack now 0
Cleanup: discarding lookahead token NUM ()
Stack now 0
henry#FusionReactor:~/Downloads/YASPLANG$
I am very puzzled, and if it would be possible to tell how and why it will not work, that would be very helpful.

In (f)lex, the pattern '+' matches a sequence of two or more ', because ' is just an ordinary character. If you want to match a single +, use "+" or \+ or [+].
Similarly, '\n' matches the three-character sequence 'ENTER'. Just use \n.
Finally, ^[0-9]+(\.[0-9]+)? will only match a number at the very beginning of a line, because the pattern starts with^. You want it to match anywhere, so lose the ^.
The reason no sensible error is reported is that (f)lex adds an implicit default rule which matches any single character and executes the ECHO action (which writes the character to stdout. That default rule is hardly ever what you want; I strongly recommend using
%option nodefault
to suppress it. You must then provide your own default rule, with a more sensible action.

Related

How to skip parts of bison if yyparse fails

So basically, in my bison file if yyparse fails (i.e there is syntax error) I want to print 'ERROR' statement and not print anything else that I do in the above part of bison file in fac the stmt part. if yyparse returns 1 is there any way to skip the content of middle part of bison file? such as idk maybe writing if statement above the stmt part etc? I would appreciate any help! Thanks in advance.
Such as :
%{
#include "header.h"
#include <stdio.h>
void yyerror (const char *s)
{}
extern int line;
%}
%token ...///
%// some tokens types etc...
%union
{
St class;
int value;
char* str;
int line_num;
float float_value;
}
%start prog
%%
prog: '[' stmtlst ']'
;
stmtlst: stmtlst stmt |
;
stmt: setStmt | if | print | unaryOperation | expr
{
if ($1.type==flt && $1.line_num!=0) {
printf("Result of expression on %d is (",$1.line_num);
printf( "%0.1f)\n", $1.float_value);
$$.type=flt;
}
else if ($1.type==integer && $1.line_num!=0){
$$.type=integer;
printf("Result of expression on %d is (%d)\n",$1.line_num,$1.value);
}
else if ($1.type==string && $1.line_num!=0) {
$$.type=string;
printf("Result of expression on %d is (%s)\n",$1.line_num,$1.str);
}
else if ($1.type==mismatch && $1.line_num!=0)
{
$$.type=mismatch;
printf("Type mismatch on %d \n",$1.line_num);
}
else{ }
}
%%
int main ()
{
if (yyparse()) {
// if parse error happens only print this printf and not above stmt part
printf("ERROR\n");
return 1;
}
else {
// successful parsing
return 0;
}
}
Unfortunately, time travel is not an option. By the time the error is detected, the printf calls have already happened. You cannot make them unhappen.
What you can so (and must do if you are writing a compiler rather than a calculator) is create some data structure which represents the parsed program, instead of trying to execute it immediately. That data structure -- which could be an abstract syntax tree (AST) or a list of three-address instructions, or whatever, will be used to produce the compiled program. Only when the compiled program is run will the statements be evaluated.

Flex and Bison code - syntax error always

First of all I need to say that I am very new to Flex and Bison and I am a bit confused. There is a school project that want us to create a compiler using Flex and Bison for some kind of CLIPS language.
My code has a lot of problems but the main one is that whatever i type i see a syntax error while the result should be something else. The ideal scenario would be to fully work for the language CLIPS. EG when i write "4" it get syntax error. Reading my code maybe will get you understand this better. If i write "test 3 4" it doesnt show syntax error but it counts it as an unknown token and thats wrong again..i'm completely lost. the code is a prototype by the school and we need to do some changes. if you have any questions dont hesitate to ask. THank you!
P.S.: dont mind the comments, they are in greek.
FLEX CODE:
%option noyywrap
/* Kwdikas C gia orismo twn apaitoumenwn header files kai twn metablhtwn.
Otidhpote anamesa sta %{ kai %} metaferetai autousio sto arxeio C pou
tha dhmiourghsei to Flex. */
%{
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/* Header file pou periexei lista me ola ta tokens */
#include "token.h"
/* Orismos metrhth trexousas grammhs */
int line = 1;
%}
/* Onomata kai antistoixoi orismoi (ypo morfh kanonikhs ekfrashs).
Meta apo auto, mporei na ginei xrhsh twn onomatwn (aristera) anti twn,
synhthws idiaiterws makroskelwn kai dysnohtwn, kanonikwn ekfrasewn */
/* dimiourgia KE simfona me ta orismata tis glossas */
DELIMITER [ \t]+
INTCONST [+-]*[1-9][0-9]*
VARIABLE [?][A-Za-z0-9]*
DEFINITIONS [a-zA-Z][-|_|A-Z|a-z|0-9]*
COMMENTS ^;.*$
/* Gia kathe pattern (aristera) pou tairiazei ekteleitai o antistoixos
kwdikas mesa sta agkistra. H entolh return epitrepei thn epistrofh
mias arithmhtikhs timhs mesw ths synarthshs yylex() */
/* an sinantisei diaxoristi i sxolio to agnoei, an sinantisei akeraio,metavliti i orismo ton emfanizei. se kathe alli periptosi ektiponei oti den anagnorizei to token, ti grammi pou vrisketai kai to string pou dothike */
%%
{DELIMITER} {;}
"bind" { return BIND;}
"test" { return TEST;}
"read" { return READ;}
"printout" { return PRINTOUT;}
"deffacts" { return DEFFACTS;}
"defrule" { return DEFRULE;}
"->" { return '->';}
"=" { return '=';}
"+" { return '+';}
"-" { return '-';}
"*" { return '*';}
"/" { return '/';}
"(" { return '(';}
")" { return ')';}
{INTCONST} { return INTCONST; }
{VARIABLE} { return VARIABLE; }
{DEFINITIONS} { return DEFINITIONS; }
{COMMENTS} {;}
\n { line++; printf("\n"); }
.+ { printf("\tLine=%d, UNKNOWN TOKEN, value=\"%s\"\n",line, yytext);}
<<EOF>> { printf("#END-OF-FILE#\n"); exit(0); }
%%
/* Pinakas me ola ta tokens se antistoixia me tous orismous sto token.h */
char *tname[11] = {"DELIMITER","INTCONST" , "VARIABLE", "DEFINITIONS", "COMMENTS", "BIND", "TEST", "READ", "PRINTOUT", "DEFFACTS", "DEFRULE"};
BISON CODE:
%{
/* Orismoi kai dhlwseis glwssas C. Otidhpote exei na kanei me orismo h arxikopoihsh
metablhtwn & synarthsewn, arxeia header kai dhlwseis #define mpainei se auto to shmeio */
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(char *);
%}
/* Orismos twn anagnwrisimwn lektikwn monadwn. */
%token INTCONST VARIABLE DEFINITIONS PLUS NEWLINE MINUS MULT DIV COM BIND TEST READ PRINTOUT DEFFACTS DEFRULE
%%
/* Orismos twn grammatikwn kanonwn. Kathe fora pou antistoixizetai enas grammatikos
kanonas me ta dedomena eisodou, ekteleitai o kwdikas C pou brisketai anamesa sta
agkistra. H anamenomenh syntaksh einai:
onoma : kanonas { kwdikas C } */
program:
program expr NEWLINE { printf("%d\n", $2); }
|
;
expr:
INTCONST { $$ = $1; }
| VARIABLE { $$ = $1; }//prosthiki tis metavlitis
| PLUS expr expr { $$ = $2 + $3; }//prosthiki tis prosthesis os praksi
| MINUS expr expr { $$ = $2 - $3; } //prosthiki tis afairesis os praksi
| MULT expr expr { $$ = $2 * $3; }//prosthiki tou pollaplasiasmou os praksi
| DIV expr expr { $$ = $2 / $3; }//prosthiki tis diairesis os praksi
| COM { $$ = $1; }//prosthiki ton sxolion
| DEFFACTS expr { $$ = $2; }//prosthiki ton gegonoton
| DEFRULE expr { $$ = $2; }//prosthiki ton kanonon
| BIND expr expr { $$ = $2;}//prosthiki tis bind
| TEST expr expr { $$ = $2 ;}//prosthiki tis test
| READ expr expr { $$ = $2 ;}//prosthiki tis read
| PRINTOUT expr expr { $$ = $2 ;}//prosthiki tis printout
;
%%
/* H synarthsh yyerror xrhsimopoieitai gia thn anafora sfalmatwn. Sygkekrimena kaleitai
apo thn yyparse otan yparksei kapoio syntaktiko lathos. Sthn parakatw periptwsh h
synarthsh epi ths ousias typwnei mhnyma lathous sthn othonh. */
void yyerror(char *s) {
fprintf(stderr, "Error: %s\n", s);
}
/* H synarthsh main pou apotelei kai to shmeio ekkinhshs tou programmatos.
Sthn sygkekrimenh periptwsh apla kalei thn synarthsh yyparse tou Bison
gia na ksekinhsei h syntaktikh analysh. */
int main(void) {
yyparse();
return 0;
}
TOKEN FILE:
#define DELIMITER 1
#define INTCONST 2
#define VARIABLE 3
#define DEFINITIONS 4
#define COMMENTS 5
#define BIND 6
#define TEST 7
#define READ 8
#define PRINTOUT 9
#define DEFFACTS 10
#define DEFRULE 11
MAKEFILE:
all:
bison -d simple-bison-code.y
flex mini-clips-la.l
gcc simple-bison-code.tab.c lex.yy.c -o B2
./B2
clean:
rm simple-bison-code.tab.c simple-bison-code.tab.h lex.yy.c B2
Your top-level rule is:
program:
program expr NEWLINE
which cannot succeed unless the parser sees a NEWLINE token. But it will never see one, because your lexical scanner never sends one; when it sees a newline, it increments the line count but doesn't return anything.
All your tokens are considered invalid because your lexical scanner uses its own definitions of the token values. You shouldn't do that. The parser generator (bison/yacc) will generate a header file containing the correct definitions; that is, the values it is expecting to see.
There are various other problems, probably more than I noticed. The most important is that you should not call exit(0) in the <<EOF>> rule, since that will mean that the parser can never succeed; it does not succeed until it is passed an EOF token. In fact, you should not normally have an <<EOF>> rule; the default action is to return 0 and that is pretty well the only action which makes sense.
Also, '->' is not a correct C literal. The compiler would have complained about it if you had enabled compiler warnings (-Wall), which you should always do, even if you are compiling generated code.
And your scanner's last pattern, intended to trigger on bad tokens, is .+, which will match the entire line, not just the erroneous character. Since (f)lex scanners accept the pattern with the longest match, most of your other patterns will never match. (Flex usually warns you about unmatchable patterns. Didn't you get such a warning?)
The fallback pattern should be .|\n, although you can use . if you are absolutely sure that every newline will be matched by some rule. I like to use %option nodefault, which will cause flex to warn me if there is some possible input not matched by any rule.

Swap between stdin and file in Bison

I have the following code in Bison, which extends the mfcalc proposed in the guide, implementing some functions like yylex() externally with FLEX.
To understand my problem, the key rules are in non-terminal token called line at the beginning of the grammar. Concretely, the rules EVAL CLOSED_STRING '\n' and END (this token is sent by FLEX when EOF is detected. The first opens a file and points the input to that file. The second closes the file and points the input to stdin input.
I'm trying to make a rule eval "file_path" to load tokens from a file and evaluate them. Initially I have yyin = stdin (I use the function setStandardInput() to do this).
When a user introduces eval "file_path" the parser swaps yyinfrom stdin to the file pointer (with the function setFileInput()) and the tokens are readen correctly.
When the END rule is reached by the parser, it tries to restore the stdin input but it gets bugged. This bug means the calculator doesn't ends but what I write in the input isn't evaluated.
Note: I supposed there are no errors in the grammar, because error recovery it's not complete. In the file_path you can use simple arithmetic operations.
As a summary, I want to swap among stdin and file pointers as inputs, but when I swap to stdin it gets bugged, except I start the calculator with stdin as default.
%{
/* Library includes */
#include <stdio.h>
#include <math.h>
#include "utils/fileutils.h"
#include "lex.yy.h"
#include "utils/errors.h"
#include "utils/stringutils.h"
#include "table.h"
void setStandardInput();
void setFileInput(char * filePath);
/* External functions and variables from flex */
extern size_t yyleng;
extern FILE * yyin;
extern int parsing_line;
extern char * yytext;
//extern int yyerror(char *s);
extern int yyparse();
extern int yylex();
int yyerror(char * s);
%}
/***** TOKEN DEFINITION *****/
%union{
char * text;
double value;
}
%type <value> exp asig
%token LS
%token EVAL
%token <text> ID
%token <text> VAR
%token <value> FUNCTION
%token <value> LEXEME
%token <value> RESERVED_WORD
%token <value> NUMBER
%token <value> INTEGER
%token <value> FLOAT
%token <value> BINARY
%token <value> SCIENTIFIC_NOTATION
%token <text> CLOSED_STRING
%token DOCUMENTATION
%token COMMENT
%token POW
%token UNRECOGNIZED_CHAR
%token MALFORMED_STRING_ERROR
%token STRING_NOT_CLOSED_ERROR
%token COMMENT_ERROR
%token DOCUMENTATION_ERROR
%token END
%right '='
%left '+' '-'
%left '/' '*'
%left NEG_MINUS
%right '^'
%right '('
%%
input: /* empty_expression */ |
input line
;
line: '\n'
| asig '\n' { printf("\t%f\n", $1); }
| asig END { printf("\t%f\n", $1); }
| LS { print_table(); }
| EVAL CLOSED_STRING '\n' {
// Getting the file path
char * filePath = deleteStringSorroundingQuotes($2);
setFileInput(filePath);
| END { closeFile(yyin); setStandardInput();}
;
exp: NUMBER { $$ = $1; }
| VAR {
lex * result = table_search($1, LEXEME);
if(result != NULL) $$ = result->value;
}
| VAR '(' exp ')' {
lex * result = table_search($1, FUNCTION);
// If the result is a function, then invokes it
if(result != NULL) $$ = (*(result->function))($3);
else yyerror("That identifier is not a function.");
}
| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp {
if($3 != 0){ $$ = $1 / $3;};
yyerror("You can't divide a number by zero");
}
| '-' exp %prec NEG_MINUS { $$ = -$2; }
| exp '^' exp { $$ = pow($1, $3); }
| '(' exp ')' { $$ = $2; }
| '(' error ')' {
yyerror("An error has ocurred between the parenthesis."); yyerrok; yyclearin;
}
;
asig: exp { $$ = $1; }
| VAR '=' asig {
int type = insertLexeme($1, $3);
if(type == RESERVED_WORD){
yyerror("You tried to assign a value to a reserved word.");
YYERROR;
}else if(type == FUNCTION){
yyerror("You tried to assign a value to a function.");
YYERROR;
}
$$ = $3;
}
;
%%
void setStandardInput(){
printf("Starting standard input:\n");
yyin = NULL;
yyin = stdin;
yyparse();
}
void setFileInput(char * filePath){
FILE * inputFile = openFile(filePath);
if(inputFile == NULL){
printf("The file couldn't be loaded. Redirecting to standard input: \n");
setStandardInput();
}else{
yyin = inputFile;
}
}
int main(int argc, char ** argv) {
create_table(); // Table instantiation and initzialization
initTable(); // Symbol table initzialization
setStandardInput(); // yyin = stdin
while(yyparse()!=1);
print_table();
// Table memory liberation
destroyTable();
return 0;
}
int yyerror(char * s){
printf("---------- Error in line %d --> %s ----------------\n", parsing_line, s);
return 0;
}
It's not too difficult to create a parser and a scanner which can be called recursively. (See below for an example.) But neither the default bison-generated parser nor the flex-generated scanner are designed to be reentrant. So with the default parser/scanner, you shouldn't call yyparse() inside SetStandardInput, because that function is itself called by yyparse.
If you had a recursive parser and scanner, on the other hand, you could significantly simplify your logic. You could get rid of the END token (which is, in any case, practically never a good idea) and just recursively call yyparse in your action for EVAL CLOSED_STRING '\n'.
If you want to use the default parser and scanner, then your best solution is to use Flex's buffer stack to push and later pop a "buffer" corresponding to the file to be evaluated. (The word "buffer" here is a bit confusing, I think. A Flex "buffer" is actually an input source, such as a file; it's called a buffer because only a part of it is in memory, but Flex will read the entire input source as part of processing a "buffer".)
You can read about the buffer stack usage in the flex manual, which includes sample code. Note that in the sample code, the end of file condition is entirely handled inside the scanner, which is usual for this architecture.
It is possible in this case to manufacture an end-of-file indicator (although you cannot use END because that is used to indicate the end of all input). That has the advantage of ensuring that the contents of the evaluated file are parsed as a whole, without leaking a partial parse back to the including file, but you will still want to pop the buffer stack inside the scanner because it annoyingly tricky to get end-of-file handling correct without violating any of the API constraints (one of which is that you cannot reliably read EOF twice on the same "buffer").
In this case, I would recommend generating a reentrant parser and scanner and simply doing a recursive call. It's a clean and simple solution, and avoiding global variables is always good.
A simple example. The simple language below only has echo and eval statements, both of which require a quoted string argument.
There are a variety of ways to hook together a reentrant scanner and reentrant parser. All of them have some quirks and the documentation (although definitely worth reading) has some holes. This is a solution which I've found useful. Note that most of the externally visible functions are defined in the scanner file, because they rely on interfaces defined in that file for manipulating the reentrant scanner context object. You can get flex to export a header with the approriate definitions, but I've generally found it simpler to write my own wrapper functions and export those. (I don't usually export yyscan_t either; normally I create a context object of my own which has a yyscan_t member.)
There is an annoying circularity which is largely the result of bison not allowing for the possibility to introduce user code at the top of yyparse. Consequently, it is necessary to pass the yyscan_t used to call the lexer as an argument to yyparse, which means that it is necessary to declare yyscan_t in the bison file. yyscan_t is actually declared in the scanner generated file (or the flex-generated header, if you've asked for one), but you can't include the flex-generated header in the bison-generated header because the flex-generated header requires YYSTYPE which is declared in the bison-generated header.
I normally avoid this circularity by using a push parser, but that's pushing the boundaries for this question, so I just resorted to the usual work-around, which is to insert
typedef void* yyscan_t;
in the bison file. (That's the actual definition of yyscan_t, whose actual contents are supposed to be opaque.)
I hope the rest of the example is self-evident, but please feel free to ask for clarification if there is anything which you don't understand.
file recur.l
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "recur.tab.h"
%}
%option reentrant bison-bridge
%option noinput nounput nodefault noyywrap
%option yylineno
%%
"echo" { return T_ECHO; }
"eval" { return T_EVAL; }
[[:alpha:]][[:alnum:]]* {
yylval->text = strdup(yytext);
return ID;
}
["] { yyerror(yyscanner, "Unterminated string constant"); }
["][^"\n]*["] {
yylval->text = malloc(yyleng - 1);
memcpy(yylval->text, yytext + 1, yyleng - 2);
yylval->text[yyleng - 2] = '\0';
return STRING;
}
"." { return yytext[0]; }
[[:digit:]]*("."[[:digit:]]*)? {
yylval->number = strtod(yytext, NULL);
return NUMBER;
}
[ \t]+ ;
.|\n { return yytext[0]; }
%%
/* Use "-" or NULL to parse stdin */
int parseFile(const char* path) {
FILE* in = stdin;
if (path && strcmp(path, "-") != 0) {
in = fopen(path, "r");
if (!in) {
fprintf(stderr, "Could not open file '%s'\n", path);
return 1;
}
}
yyscan_t scanner;
yylex_init (&scanner);
yyset_in(in, scanner);
int rv = yyparse(scanner);
yylex_destroy(scanner);
if (in != stdin) fclose(in);
return rv;
}
void yyerror(yyscan_t yyscanner, const char* msg) {
fprintf(stderr, "At line %d: %s\n", yyget_lineno(yyscanner), msg);
}
file recur.y
%code {
#include <stdio.h>
}
%define api.pure full
%param { scanner_t context }
%union {
char* text;
double number;
}
%code requires {
int parseFILE(FILE* in);
}
%token ECHO "echo" EVAL "eval"
%token STRING ID NUMBER
%%
program: %empty | program command '\n'
command: echo | eval | %empty
echo: "echo" STRING { printf("%s\n", $2); }
eval: "eval" STRING { FILE* f = fopen($2, "r");
if (f) {
parseFILE(f);
close(f);
}
else {
fprintf(stderr, "Could not open file '%s'\n",
$2);
YYABORT;
}
}
%%

When using multiple buffers with Flex, how do I avoid having tokens get split between buffers

Let's assume I have a simple grammar of positive integers and alphabetic strings separated by commas. I want to parse this grammar using Flex and Bison, and I want to use multiple input buffers with Flex, for whatever reasons (maybe data is arriving over the network or a serial line or whatever). The problem I'm seeing is that when a string or an integer (which are both variable length tokens) are split between the end of one buffer and the start of the next, the lexer reports two tokens when there should only be one.
In the example below, the chunks are 10, asdf and g,. If that was all in one buffer it would yield the tokens INT(10) COMMA STR(asdfg) COMMA. But with the 'g' in a different buffer from the 'asdf', the lexer actually produces INT(10) COMMA STR(asdf) STR(g) COMMA. It appears the logic upon reaching the end of the buffer is (1) check if the input matches a token, (2) refill the buffer. I feel it should be the other way around: (2) refill the buffer, (1) check if the input matches a token.
I want to make sure I'm not doing something silly with the way I'm changing the buffers.
stdout/stderr:
read_more_input: Setting up buffer containing: 10,
--accepting rule at line 48 ("10")
Starting parse
Entering state 0
Reading a token: Next token is token INT_TERM ()
Shifting token INT_TERM ()
Entering state 1
Return for a new token:
--accepting rule at line 50 (",")
Reading a token: Next token is token COMMA ()
Shifting token COMMA ()
Entering state 4
Reducing stack by rule 2 (line 67):
$1 = token INT_TERM ()
$2 = token COMMA ()
-> $$ = nterm int_non_term ()
Stack now 0
Entering state 3
Return for a new token:
--(end of buffer or a NUL)
--EOF (start condition 0)
read_more_input: Setting up buffer containing: asdf
--(end of buffer or a NUL)
--accepting rule at line 49 ("asdf")
Reading a token: Next token is token STR_TERM ()
Shifting token STR_TERM ()
Entering state 6
Return for a new token:
--(end of buffer or a NUL)
--EOF (start condition 0)
read_more_input: Setting up buffer containing: g,
--accepting rule at line 49 ("g")
Reading a token: Next token is token STR_TERM ()
syntax errorError: popping token STR_TERM ()
Stack now 0 3
Error: popping nterm int_non_term ()
Stack now 0
Cleanup: discarding lookahead token STR_TERM ()
Stack now 0
Lex file:
%{
#include <stdbool.h>
#include "yacc.h"
bool read_more_input(yyscan_t scanner);
%}
%option reentrant bison-bridge
%%
[0-9]+ { yylval->int_value = atoi(yytext); return INT_TERM; }
[a-zA-Z]+ { yylval->str_value = strdup(yytext); return STR_TERM; }
, { return COMMA; }
<<EOF>> {
if (!read_more_input(yyscanner)) {
yyterminate();
}
}
Yacc file:
%{
// This appears to be a bug. This typedef breaks a dependency cycle between the headers.
// See https://stackoverflow.com/questions/44103798/cyclic-dependency-in-reentrant-flex-bison-headers-with-union-yystype
typedef void * yyscan_t;
#include <stdbool.h>
#include "yacc.h"
#include "lex.h"
%}
%define api.pure full
%lex-param {yyscan_t scanner}
%parse-param {yyscan_t scanner}
%define api.push-pull push
%union {
int int_value;
char * str_value;
}
%token <int_value> INT_TERM
%type <int_value> int_non_term
%token <str_value> STR_TERM
%type <str_value> str_non_term
%token COMMA
%%
complete : int_non_term str_non_term { printf(" === %d === %s === \n", $1, $2); }
int_non_term : INT_TERM COMMA { $$ = $1; }
str_non_term : STR_TERM COMMA { $$ = $1; }
%%
char * packets[]= {"10,", "asdf", "g,"};
int current_packet = 0;
bool read_more_input(yyscan_t scanner) {
if (current_packet >= 3) {
fprintf(stderr, "read_more_input: No more input\n");
return false;
}
fprintf(stderr, "read_more_input: Setting up buffer containing: %s\n", packets[current_packet]);
size_t buffer_size = strlen(packets[current_packet]) + 2;
char * buffer = (char *) calloc(buffer_size, sizeof(char));
memcpy(buffer, packets[current_packet], buffer_size - 2);
yy_scan_buffer(buffer, buffer_size, scanner);
current_packet++;
return true;
}
int main(int argc, char** argv) {
yyscan_t scanner;
yylex_init(&scanner) ;
read_more_input(scanner);
yyset_debug(1, scanner);
yydebug = 1;
int status;
yypstate *ps = yypstate_new ();
YYSTYPE pushed_value;
do {
status = yypush_parse(ps, yylex(&pushed_value, scanner), &pushed_value, scanner);
} while(status == YYPUSH_MORE);
yypstate_delete (ps);
yylex_destroy (scanner) ;
return 0;
}
That is not the expected use case for multiple buffers. Multiple input buffers are generally used to handle things like #include or even macro expansion, where the included text definitely should respect token boundaries. (Consider an #included file which has an unterminated comment...)
If you want to paste together inputs from different sources in a way which allows tokens to flow across buffer boundaries, redefine the YY_INPUT macro to meet your needs.
YY_INPUT is the macro hook for customising input; it is given a buffer and a maximum length and it must copy the specified number of bytes (or fewer) into the buffer, and also indicate hoe many bytes were provided (0 bytes is taken as meaning end of input, at which point yywrap will be called.)
YY_INPUT is expanded inside yylex so it has access to the yylex arguments, which includes the lexer state. yywrap in a reentrant lexer is called with the scanner state as an argument. So you can use the two mechanisms together, if you want to.
Unfortunately, this does not allow "zero-copy" buffer switching. But flex is not optimised for in-memory input buffers in general: you can provide flex with a buffer using yyscan_buffer but the buffer must be terminated with two NUL bytes, and it will be modified during the scan, so the feature is rarely useful.
Here's a trivial example, which lets you set yylex up with a NULL-terminated argv-like array of strings, and proceeds to lex them all as a single input. (If you choose to use argv+1 to initialize this array, you'll notice that it runs tokens together from consecutive arguments.)
%{
#include <string.h>
#include <parser.tab.h>
#define YY_EXTRA_TYPE char**
/* FIXME:
* This assumes that none of the string segments are empty
* strings (or for the feature-not-a-bug interpretation,
* it allows the list to be terminated by NULL or an empty string).
*/
#define YY_INPUT(buf,result,max_size) { \
char* segment = *yyextra; \
if (segment == NULL) result = 0; \
else { \
size_t avail = strnlen(segment, max_size); \
memcpy(buf, segment, avail); \
if (segment[avail]) *yyextra += avail; \
else ++yyextra; \
result = avail; \
} \
}
%}
%option reentrant bison-bridge
%option noinput nounput nodefault noyywrap
%%
[[:space:]]+ ;
[0-9]+ { yylval->number = strtol(yytext, 0, 10); return NUMBER; }
[[:alpha:]_][[:alnum:]_]* { yylval->string = strdup(yytext); return ID; }
. { return *yytext; }
%%
/* This function must be exported in some header */
void yylex_strings(char** argv, yyscan_t scanner) {
yyset_extra(argv, scanner);
}

Flex and Bison Calculator

I'm trying to implement a calculator for nor expressions, such as true nor true nor (false nor false) using Flex and Bison, but I keep getting my error message back. Here is my .l file:
%{
#include <stdlib.h>
#include "y.tab.h"
%}
%%
("true"|"false") {return BOOLEAN;}
.|\n {yyerror();}
%%
int main(void)
{
yyparse();
return 0;
}
int yywrap(void)
{
return 0;
}
int yyerror(void)
{
printf("Error\n");
}
Here is my .y file:
/* Bison declarations. */
%token BOOLEAN
%left 'nor'
%% /* The grammar follows. */
input:
/* empty */
| input line
;
line:
'\n'
| exp '\n' { printf ("%s",$1); }
;
exp:
BOOLEAN { $$ = $1; }
| exp 'nor' exp { $$ = !($1 || $3); }
| '(' exp ')' { $$ = $2; }
;
%%
Does anyone see the problem?
The simple way to handle all the single-character tokens, which as #vitaut correctly says you aren't handling at all yet, is to return yytext[0] for the dot rule, and let the parser sort out which ones are legal.
You have also lost the values of the BOOLEANs 'true' and 'false', which should be stored into yylval as 1 and 0 respectively, which will then turn up in $1, $3 etc. If you're going to have more datatypes in the longer term, you need to look into the %union directive.
The reason why you get errors is that your lexer only recognizes one type of token, namely BOOLEAN, but not the newline, parentheses or nor (and you produce an error for everything else). For single letter tokens like parentheses and the newline you can return the character itself as a token type:
\n { return '\n'; }
For nor thought you should introduce a token type like you did for BOOLEAN and add an appropriate rule to the lexer.

Resources