I'm making a parser for a language similar to C. I have problem to the statement expression .
I've seen at many posts how they have the declaration of the expression, but for me it doesn't work... I have many conflicts. Can you help me please?
.y file :
%{
#include <stdio.h>
#include <stdlib.h>
extern FILE *yyin;
extern int yylex();
int line=1;
int error=0;
#define YYERROR_VERBOSE
void yyerror(const char *msg)
{
error = 1;
printf("ERROR in line %d : %s.\n", line, msg);
}
%}
%expect 2
%start programme
%token SEMICOLON
%token COMMA
%token L_PARENTHESIS
%token R_PARENTHESIS
%token L_REC_BRACKET
%token R_REC_BRACKET
%token L_CURLY_BRACKET
%token R_CURLY_BRACKET
%token QUESTION_MARK
%token COLON
%token DOT
%token AND
%token STAR
%token PLUS
%token MINUS
%token EXCLAMATION_MARK
%token SLASH
%token PERCENT
%token LESS
%token GREATER
%token LESS_EQUAL
%token GREATER_EQUAL
%token EQUAL
%token NOT_EQUAL
%token LOGIC_AND
%token LOGIC_OR
%token UN_PLUS
%token UN_MINUS
%token EQ
%token MUL_EQUAL
%token DIV_EQUAL
%token MOD_EQUAL
%token PLUS_EQUAL
%token MINUS_EQUAL
%token TRUE
%token FALSE
%token NUL
%token NUMBER
%token T_WORD
%token WORD
%right L_PARENTHESIS SEMICOLON
%nonassoc STAR
%right SLASH PERCENT PLUS MINUS LESS GREATER
%right LESS_EQUAL GREATER_EQUAL EQUAL NOT_EQUAL LOGIC_AND LOGIC_OR
%right UN_PLUS UN_MINUS
%right AND EXCLAMATION_MARK
%right EQ MUL_EQUAL DIV_EQUAL MOD_EQUAL PLUS_EQUAL MINUS_EQUAL
%%
programme : expression;
expression : TRUE
| FALSE
| NUL
| id L_PARENTHESIS list_exp R_PARENTHESIS
| expression L_REC_BRACKET expression R_REC_BRACKET
| unitary_op expression
| expression binary_op exp
| unitary_op_assig expression
| expression unitary_op_assig
| expression binary_op_assig expression
| expression QUESTION_MARK expression COLON expression
| exp
| DELETE expression
;
exp : T_WORD
| id
| L_PARENTHESIS expression R_PARENTHESIS
| double_const;
id: WORD;
list_exp : list expression;
list: | list_exp COMMA;
int_const : NUMBER int_const
| NUMBER;
double_const : int_const
| int_const DOT int_const;
binary_op : STAR
| SLASH
| PERCENT
| PLUS
| MINUS
| LESS
| GREATER
| LESS_EQUAL
| GREATER_EQUAL
| EQUAL
| NOT_EQUAL
| LOGIC_AND
| LOGIC_OR;
unitary_op_assig : UN_PLUS
| UN_MINUS;
unitary_op : AND
| STAR
| PLUS
| MINUS
| EXCLAMATION_MARK;
binary_op_assig : EQ
| MUL_EQUAL
| DIV_EQUAL
| MOD_EQUAL
| PLUS_EQUAL
| MINUS_EQUAL;
%%
int main(int argc, char *argv[])
{
if(argc == 2) yyin = fopen(argv[1], "r");
else if(argc < 2){
printf("No file found.\n");
return 0;
} else{
printf("Only one file is permitted.\n");
return 0;
}
yyparse ();
if(error == 0) printf("Finished at %d line. No errors!\n",line);
return 0;
}
.l file
%{
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "bison.tab.h"
extern int line;
%}
%%
"\n" {line++;}
";" {return SEMICOLON;}
"," {return COMMA;}
"(" {return L_PARENTHESIS;}
")" {return R_PARENTHESIS;}
"[" {return L_REC_BRACKET;}
"]" {return R_REC_BRACKET;}
"{" {return L_CURLY_BRACKET;}
"}" {return R_CURLY_BRACKET;}
"?" {return QUESTION_MARK;}
":" {return COLON;}
"." {return DOT;}
"&" {return AND;}
"*" {return STAR;}
"+" {return PLUS;}
"-" {return MINUS;}
"!" {return EXCLAMATION_MARK;}
"/" {return SLASH;}
"%" {return PERCENT;}
"<" {return LESS;}
">" {return GREATER;}
"<=" {return LESS_EQUAL;}
">=" {return GREATER_EQUAL;}
"==" {return EQUAL;}
"!=" {return NOT_EQUAL;}
"&&" {return LOGIC_AND;}
"||" {return LOGIC_OR;}
"++" {return UN_PLUS;}
"--" {return UN_MINUS;}
"=" {return EQ;}
"*=" {return MUL_EQUAL;}
"/=" {return DIV_EQUAL;}
"%=" {return MOD_EQUAL;}
"+=" {return PLUS_EQUAL;}
"-=" {return MINUS_EQUAL;}
"true" {return TRUE;}
"false" {return FALSE;}
"NULL" {return NUL;}
[0-9] {return NUMBER;}
[a-zA-Z][a-zA-Z0-9]* {return WORD;}
[a-zA-Z0-9]+ {return T_WORD;}
. {;}
%%
With bison -d bison.y I get the message : conflicts: (a number) shift/reduce conflicts.
For example I have with bison -v bison.y:
state 71
54 expression: expression . L_REC_BRACKET expression R_REC_BRACKET
58 | expression . unitary_op_assig
59 | expression . binary_op_assig expression
61 | expression . QUESTION_MARK expression COLON expression
64 | DELETE expression .
L_REC_BRACKET shift, and go to state 75
QUESTION_MARK shift, and go to state 76
UN_PLUS shift, and go to state 43
UN_MINUS shift, and go to state 44
EQ shift, and go to state 77
MUL_EQUAL shift, and go to state 78
DIV_EQUAL shift, and go to state 79
MOD_EQUAL shift, and go to state 80
PLUS_EQUAL shift, and go to state 81
MINUS_EQUAL shift, and go to state 82
L_REC_BRACKET [reduce using rule 64 (expression)]
QUESTION_MARK [reduce using rule 64 (expression)]
UN_PLUS [reduce using rule 64 (expression)]
UN_MINUS [reduce using rule 64 (expression)]
EQ [reduce using rule 64 (expression)]
MUL_EQUAL [reduce using rule 64 (expression)]
DIV_EQUAL [reduce using rule 64 (expression)]
MOD_EQUAL [reduce using rule 64 (expression)]
PLUS_EQUAL [reduce using rule 64 (expression)]
MINUS_EQUAL [reduce using rule 64 (expression)]
$default reduce using rule 64 (expression)
unitary_op_assig go to state 83
binary_op_assig go to state 84
Related
I am learning bison/flex. I successfully parse a simple c code file with bison/flex.
Now I was wondering on parsing the included header file in a test c code using bison/flex. Will it can do that ?
To put in a simpler way, I am attaching sample code to give idea of my question.
Here is the Test file that includes a header file (.h) also.
test.c which includes a header file header.h
#include <stdio.h>
#include "header.h"
int main (int c, int b)
{
bigNumber a; /* I wanted that when parser come across to
"bigNumber" then it knows the datatype of
"bigNumber" and print its
type as defined in "header.h" */
while ( 1 )
{
newData d; /* Same should happen to "newData" also */
}
}
header.h
#define newData int
#define bigNumber double
lexer.l
%{
#include <stdio.h>
#include <string.h>
#include "c.tab.h"
%}
alpha [a-zA-Z]
digit [0-9]
%%
[ \t] { ; }
[ \n] { yylineno = yylineno + 1;}
int { return INT; }
float { return FLOAT; }
char { return CHAR; }
void { return VOID; }
double { return DOUBLE; }
for { return FOR; }
while { return WHILE; }
if { return IF; }
else { return ELSE; }
printf { return PRINTF; }
struct { return STRUCT; }
^"#include ".+ { ; }
{digit}+ { return NUM; }
{alpha}({alpha}|{digit})* { return ID; }
"<=" { return LE; }
">=" { return GE; }
"==" { return EQ; }
"!=" { return NE; }
">" { return GT; }
"<" { return LT; }
"." { return DOT; }
\/\/.* { ; }
\/\*(.*\n)*.*\*\/ { ; }
. { return yytext[0]; }
%%
bison file (c.y)
%{
#include <stdio.h>
#include <stdlib.h>
#include"lex.yy.c"
#include<ctype.h>
int count=0;
extern FILE *fp;
%}
%token INT FLOAT CHAR DOUBLE VOID
%token FOR WHILE
%token IF ELSE PRINTF
%token STRUCT
%token NUM ID
%token INCLUDE
%token DOT
%right '='
%left AND OR
%left '<' '>' LE GE EQ NE LT GT
%%
start
: Function
| Declaration
;
/* Declaration block */
Declaration
: Type Assignment ';'
| Assignment ';'
| FunctionCall ';'
| ArrayUsage ';'
| Type ArrayUsage ';'
| StructStmt ';'
| error
;
/* Assignment block */
Assignment
: ID '=' Assignment
| ID '=' FunctionCall
| ID '=' ArrayUsage
| ArrayUsage '=' Assignment
| ID ',' Assignment
| NUM ',' Assignment
| ID '+' Assignment
| ID '-' Assignment
| ID '*' Assignment
| ID '/' Assignment
| NUM '+' Assignment
| NUM '-' Assignment
| NUM '*' Assignment
| NUM '/' Assignment
| '\'' Assignment '\''
| '(' Assignment ')'
| '-' '(' Assignment ')'
| '-' NUM
| '-' ID
| NUM
| ID
;
/* Function Call Block */
FunctionCall
: ID'('')'
| ID'('Assignment')'
;
/* Array Usage */
ArrayUsage
: ID'['Assignment']'
;
/* Function block */
Function
: Type ID '(' ArgListOpt ')' CompoundStmt
;
ArgListOpt
: ArgList
|
;
ArgList
: ArgList ',' Arg
| Arg
;
Arg
: Type ID
;
CompoundStmt
: '{' StmtList '}'
;
StmtList
: StmtList Stmt
|
;
Stmt
: WhileStmt
| Declaration
| ForStmt
| IfStmt
| PrintFunc
| ';'
;
/* Type Identifier block */
Type
: INT
| FLOAT
| CHAR
| DOUBLE
| VOID
;
/* Loop Blocks */
WhileStmt
: WHILE '(' Expr ')' Stmt
| WHILE '(' Expr ')' CompoundStmt
;
/* For Block */
ForStmt
: FOR '(' Expr ';' Expr ';' Expr ')' Stmt
| FOR '(' Expr ';' Expr ';' Expr ')' CompoundStmt
| FOR '(' Expr ')' Stmt
| FOR '(' Expr ')' CompoundStmt
;
/* IfStmt Block */
IfStmt
: IF '(' Expr ')' Stmt
;
/* Struct Statement */
StructStmt
: STRUCT ID '{' Type Assignment '}'
;
/* Print Function */
PrintFunc
: PRINTF '(' Expr ')' ';'
;
/*Expression Block*/
Expr
:
| Expr LE Expr
| Expr GE Expr
| Expr NE Expr
| Expr EQ Expr
| Expr GT Expr
| Expr LT Expr
| Assignment
| ArrayUsage
;
%%
int main(int argc, char *argv[])
{
yyin = fopen(argv[1], "r");
if(!yyparse())
printf("\nParsing complete\n");
else
printf("\nParsing failed\n");
fclose(yyin);
return 0;
}
yyerror(char *s) {
printf("%d : %s %s\n", yylineno, s, yytext );
}
int yywrap()
{
return 1;
}
what modification should be done in lexer (.l) and bison (.y) file so that while parsing c file, if that c file includes some header file then it go to that header file reads it and return to original test c file and if the custom defined datatype is present then it will know its datatype from header file and prints it.
Will it can be possible?
What modification I have to make ?
Thank you
Flex has a feature which makes it relatively easy to handle things like C's #include directive. It's described at length in the flex manual chapter on multiple input buffers, with code examples, and you should refer to that document for precise details. (I put some sample code at the end of this answer.)
In the flex manual, the scanner itself recognizes the #include directive and handles the inclusion transparently; the parser never sees the directive. That has a certain appeal; the parser only needs to parse a stream of tokens, and the lexical analyser takes full responsibility for producing the token stream, which includes reading from included files.
But as your header.h shows, handling #include directives is not all that is required. To actually parse C, you need to implement the C preprocessor, or at least as much of it as you care about. That includes being able to #define macros, and it also includes substituting the macro definition for any use of the macro in the program. That's a much more complicated process. And that's not all, because the preprocessor also allows conditional inclusion (#ifdef, #if, etc.). In order to implement the #if directive, you need to be able to parse and evaluate an arbitrary arithmetic expression (without variables but with macro substitution), which might best be done by calling a C expression parser.
There are various ways of structuring a solution which includes a preprocessor. One possibility is to implement the entire preprocessor in the lexical scanner, consistent with the sample code included in this answer. However, as can also be seen in that sample code, such an implementation will be quite irritating because it will basically involve writing a state machine for each preprocessor directive, something which could much more easily be accomplished with a parser generator.
So a second possibility is to embed the preprocessor in the C parser. But that will require a lot of communication from the parser to the lexer, complicated by the fact that the parse is not usually synchronised with the lexical analysis (because the parser often has read a lookahead token which has not yet been parsed when parser actions execute). For example, if the macro definition mapping is kept in the parser, then the parser will have to be able to push replacement tokens onto the input stream so that they are subsequently read before the lookahead token.
And yet another possibility is to put the preprocessor as a third component, between the lexer and the parser. Bison can produce "push-parsers", in which the parser is called with each successive token rather than calling yylex every time it needs a token. That's a much more convenient interface for integrating with a macro preprocessor. In this model, the preprocessor could be implemented as a separate bison-generated parser which reads tokens from the lexer in the normal way and feeds them one at a time to the C parser using the push API.
The full preprocessor implementation is not as complicated as a C compiler. But it's not something which can be summarised in a few paragraphs in a Stack Overflow answer.
So the best I can do here is provide a simple implementation (adapted from the flex manual) for the buffer state stack implementation of the #include directive. I assume that you are familiar with (f)lex "start conditions", which are used to build a simple state machine for parsing preprocessor directives. If not, see the previous chapter of the flex manual.)
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
%}
%x PP PP_INCLUDE PP_INCLUDE_SKIP
%%
/* If a line starts with a #, handle it as a preprocessor directive */
^[[:blank:]]*# { begin(PP); }
/* The rest of your rules go here */
/* ... */
<PP>{
/* This state parses only the first word of the directive. */
[[:blank:]]+ ; /* Ignore whitespace */
"/*"[^*]*[*]+([^*/][^*]*[*]+)*"/" ; /* Also ignore comments */
include { begin(PP_INCLUDE); } /* include directive */
/* Handle other preprocessor directives here */
\n { begin(INITIAL); } /* null directive does nothing */
.+ { yyerror("Invalid preprocessor directive"); }
}
<PP_INCLUDE>{
/* This state parses and handles #include directives */
["][^"]+["] |
[<][^>]+[>] { yytext[yylen - 1] = 0;
FILE* incl = find_include_file(yytext + 1);
if (incl) {
yyin = incl;
yypush_buffer_state(yy_create_buffer(yyin, YY_BUF_SIZE));
BEGIN(INITIAL);
}
else {
fprintf(stderr, "Could not find include file %s\n", yytext + 1);
/* You might want to stop the parse instead of continuing. */
BEGIN(PP_INCLUDE_SKIP);
}
}
<PP_INCLUDE_SKIP,PP_INCLUDE>{
/* PP_INCLUDE_SKIP is used to ignore the rest of the preprocessor directive,
* producing an error if anything is on the line other than a comment.
*/
[[:blank:]]+ ; /* Ignore whitespace */
"/*"[^*]*[*]+([^*/][^*]*[*]+)*"/" ; /* Also ignore comments */
.+|\n { yyerror("Invalid #include directive");
BEGIN(INITIAL);
}
}
<*><<EOF>> { yypop_buffer_state();
/* The buffer stack starts with a buffer reading from yyin.
* If the EOF was found in the initial input file, the stack will
* be empty after the pop, and YY_CURRENT_BUFFER will be NULL. In
* that case, the parse is finished and we return EOF to the caller.
* Otherwise, we need to skip the rest of the #include directive
* and continue producing tokens from where we left off.
*/
if (YY_CURRENT_BUFFER)
BEGIN(PP_INCLUDE_SKIP);
else
return 0;
}
I created a small interpreter using flex/bison.
This just can print a number, but I want to know how can add a string print?
lexer :
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
%}
%%
<INITIAL>[s|S][h|H][o|O][w|W] {return show;}
<INITIAL>[0-9a-zA-z]+ {yylval.num=atoi(yytext);return string;}
<INITIAL>[\-\+\=\;\*\/] {return yytext[0];}
%%
int yywrap (void) {return 1;}
yacc :
%{
void yyerror(char *s);
#include <stdio.h>
#include <stdlib.h>
%}
%union {int num;}
%start line
%token show
%token <num> number
%type <num> line exp term
%%
line : show exp ';' {printf("showing : %d\n",$2);}
| line show exp ';' {printf("showing : %d\n",$3);}
;
exp : term {$$ = $1;}
| exp '+' term {$$ = $1 + $3;}
| exp '-' term {$$ = $1 - $3;}
| exp '*' term {$$ = $1 * $3;}
| exp '/' term {$$ = $1 / $3;}
;
term : number {$$ = $1;}
%%
int main (void)
{
return yyparse();
}
void yyerror (char *s)
{
printf("-%s at %s !\n",s );
}
test data :
show 5;
show 5+5;
show 5*2-5+1;
I want add string code to the lexer :
<INITIAL>\" {BEGIN(STRING);}
<STRING>\" {BEGIN(INITIAL);}
Now how to use from content of in <STRING>?
Can you help me to complete my interpreter?
I need to add this examples to my interpreter :
show "hello erfan";//hello erfan
show "hello ".5;//hello 5
Please help me.
At the moment your interpreter doesn't work on numbers either! (Its an interpreter because it generates the results directly and does not generate code like a compiler would).
To make it work for numbers (again) you'd have to return a number token from the lexer and not a string. This line is wrong:
<INITIAL>[0-9a-zA-z]+ {yylval.num=atoi(yytext);return string;}
It should return a number token:
<INITIAL>[0-9]+ {yylval.num=atoi(yytext);return number;}
Now lets add a string. I see you made a start:
<INITIAL>\" {BEGIN(STRING);}
<STRING>\" {BEGIN(INITIAL);}
We need to add the state for a string to the lexer:
%x STRING
We should also match the contents of the string. I'll cheat a little here:
<STRING>[^"]*\" {BEGIN(INITIAL); return(string);}
We also need to return the string value in the lval. Cheating again I can store a char pointer in the integer
<STRING>[^"]*\" {BEGIN(INITIAL); yylval.num=strdup(yytext); return(string); }
Now we have to add strings to the yacc grammar. I'm cheating again by not allowing integers and strings to be mixed. You can expand that later if you wish:
line : show exp ';' {printf("showing : %d\n",$2);}
| line show exp ';' {printf("showing : %d\n",$3);}
| show string ';' {printf("showing : %s\n",$2);}
| line show string ';' {printf("showing : %s\n",$3);}
;
We need to remember to declare the string token:
%token <num> number string
Now we can put that all together:
The lexer file:
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
%}
%x STRING
%%
<INITIAL>[s|S][h|H][o|O][w|W] {return show;}
<INITIAL>[0-9]+ {yylval.num=atoi(yytext);return number;}
<INITIAL>[\-\+\=\;\*\/] {return yytext[0];}
<INITIAL>\" {BEGIN(STRING);}
<STRING>[^"]*\" {BEGIN(INITIAL);yylval.num=strdup(yytext);return(string);}
%%
int yywrap (void) {return 1;}
The parser file:
%{
void yyerror(char *s);
#include <stdio.h>
#include <stdlib.h>
%}
%union {int num;}
%start line
%token show
%token <num> number string
%type <num> line exp term
%%
line : show exp ';' {printf("showing : %d\n",$2);}
| line show exp ';' {printf("showing : %d\n",$3);}
| show string ';' {printf("showing : %s\n",$2);}
| line show string ';' {printf("showing : %s\n",$3);}
;
exp : term {$$ = $1;}
| exp '+' term {$$ = $1 + $3;}
| exp '-' term {$$ = $1 - $3;}
| exp '*' term {$$ = $1 * $3;}
| exp '/' term {$$ = $1 / $3;}
;
term : number {$$ = $1;}
%%
int main (void)
{
return yyparse();
}
void yyerror (char *s)
{
printf("-%s at %s !\n",s );
}
#include "lex.yy.c"
Its basic and it works (I tested it). I've left plenty of things to be polished. You can remove the quote character from the string text; you can make the string token ave a string value rather than an integer to avoid the horrible type mismatch and you can make the show statements a bit more complex, but at least I've got you started.
I am experiencing an issue trying to use Flex and Bison together. When I reach the part of compiling with the gcc command (gcc -c y.tab.c lex.yy.c), i keep getting errors for the flex file saying
error: expected identifier or ‘(’ before string constant
Here is the code :
FLEX (filename is arxeioflex.l) :
%{
#include "y.tab.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
%}
%option noyywrap
id [a-zA-Z][a-zA-Z0-9]*
num [0-9]*
%%
%%
%%
"extern" {return EXTERN;}
"void" {return VOID;}
"(" {return LP;}
")" {return RP;}
"int" {return INT;}
"bool" {return BOOL;}
"string" {return STRING;}
";" {return SC;}
"," {return S;}
"&" {return DEC;}
"begin" {return BEGIN;}
"end" {return END;}
"{" {return LB;}
"}" {return RB;}
"if" {return IF;}
"else" {return ELSE;}
"=" {return ASIGN;}
"return" {return RETURN;}
"||" {return OR;}
"&&" {return AND;}
"!" {return NOT;}
"==" {return EQUAL;}
"!=" {return NEQUAL;}
"<" {return LESS;}
">" {return GREATER;}
"<=" {return LESSEQUAL;}
">=" {return GREATEREQUAL;}
"*" {return 'MUL';}
"/" {return DIV;}
"%" {return MOD;}
"+" {return 'PLUS';}
"-" {return MINUS;}
"true" {return TRUE;}
"false" {return FALSE;}
BISON :
%{
#include <stdio.h>
#include <math.h>
void yyerror(char *);
extern FILE *yyin;
extern FILE *yyout;
%}
%token EXTERN;
%token VOID;
%token LP RP;
%token ID;
%token INT BOOL STRING;
%token SC;
%token S; /*Seperator*/
%token DEC; /*&*/
%token BEGIN END;
%token LB RB;
%token IF ELSE;
%token ASSIGN; /*=*/
%token RETURN;
%token OR;
%token AND;
%token NOT;
%token EQUAL;
%token NEQUAL;
%token LESS;
%token GREATER;
%token LESSEQUAL;
%token GREATEREQUAL;
%token MUL;
%token DIV;
%token MOD;
%token PLUS;
%token MINUS;
%token TRUE;
%token FALSE;
%%
program: externs header definitions commands;
externs: extern_prot
|externs externs;
extern_prot: EXTERN function_prot;
header: VOID ID SC SC;
definitions: definition
| definition definitions;
definition: variable_def
| function_def
| function_prot;
variable_def: data_type var_list;
data_type: INT
| BOOL
| STRING;
var_list: ID
| ID S var_list;
function_def: f_header definitions commands;
function_prot: f_header SC;
f_header: f_type ID LP typical_par_list RP;
f_type: INT
| BOOL
| VOID;
typical_par_list: typical_par
| typical_par S typical_par_list;
typical_par: data_type DEC ID;
commands: BEGIN n_command END;
/*n_command = (entolh*) */
n_command: command
| command n_command;
command: plain_command SC
| struct_command
| complex_command;
complex_command: LB n_command RB;
struct_command: if_clause;
plain_command: assignment
| function_call
| return_command
| null_command;
if_clause: IF LP general_expr RP command else_clause;
else_clause: ELSE command;
assignment: ID ASSIGN general_expr;
function_call: ID LP real_par_list RP;
real_par_list: real_par
| real_par real_par_list;
real_par: general_expr;
return_command: RETURN general_expr;
null_command: ;
general_expr: general_term
| general_term OR general_expr;
general_term: general_factor
| general_factor AND general_term;
general_factor: NOT general_first_term;
general_first_term: plain_expr comparison_field;
comparison_field: comparison_effector plain_expr;
comparison_effector: EQUAL
| NEQUAL
| LESS
| GREATER
| LESSEQUAL
| GREATEREQUAL;
plain_expr: plain_term
| plain_term PLUS plain_expr
| plain_term MINUS plain_expr;
plain_term: plain_factor
| plain_factor MUL plain_term
| plain_factor DIV plain_term
| plain_factor MOD plain_term;
plain_factor: plain: PLUS plain_first_term
| MINUS plain_first_term;
plain_first_term: ID
| constant
| function_call
| LP general_expr RP;
constant: ID
| TRUE
| FALSE;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main ( int argc, char **argv )
{
++argv; --argc;
if ( argc > 0 )
yyin = fopen( argv[0], "r" );
else
yyin = stdin;
yyout = fopen ( "output", "w" );
yyparse ();
return 0;
}
ERRORS :
arxeioflex.l:13: error: expected identifier or ‘(’ before ‘%’ token
arxeioflex.l:15: error: expected identifier or ‘(’ before string constant
arxeioflex.l:16: error: expected identifier or ‘(’ before string constant
arxeioflex.l:17: error: expected identifier or ‘(’ before string constant
arxeioflex.l:18: error: expected identifier or ‘(’ before string constant
arxeioflex.l:19: error: expected identifier or ‘(’ before string constant
arxeioflex.l:20: error: expected identifier or ‘(’ before string constant
arxeioflex.l:21: error: expected identifier or ‘(’ before string constant
arxeioflex.l:22: error: expected identifier or ‘(’ before string constant
arxeioflex.l:23: error: expected identifier or ‘(’ before string constant
arxeioflex.l:24: error: expected identifier or ‘(’ before string constant
arxeioflex.l:25: error: expected identifier or ‘(’ before string constant
arxeioflex.l:26: error: expected identifier or ‘(’ before string constant
arxeioflex.l:27: error: expected identifier or ‘(’ before string constant
arxeioflex.l:28: error: expected identifier or ‘(’ before string constant
arxeioflex.l:29: error: expected identifier or ‘(’ before string constant
arxeioflex.l:30: error: expected identifier or ‘(’ before string constant
arxeioflex.l:31: error: expected identifier or ‘(’ before string constant
arxeioflex.l:32: error: expected identifier or ‘(’ before string constant
arxeioflex.l:33: error: expected identifier or ‘(’ before string constant
arxeioflex.l:34: error: expected identifier or ‘(’ before string constant
arxeioflex.l:35: error: expected identifier or ‘(’ before string constant
arxeioflex.l:36: error: expected identifier or ‘(’ before string constant
arxeioflex.l:37: error: expected identifier or ‘(’ before string constant
arxeioflex.l:38: error: expected identifier or ‘(’ before string constant
arxeioflex.l:39: error: expected identifier or ‘(’ before string constant
arxeioflex.l:40: error: expected identifier or ‘(’ before string constant
arxeioflex.l:41: error: expected identifier or ‘(’ before string constant
arxeioflex.l:42: error: expected identifier or ‘(’ before string constant
arxeioflex.l:43: error: expected identifier or ‘(’ before string constant
arxeioflex.l:44: error: expected identifier or ‘(’ before string constant
arxeioflex.l:45: error: expected identifier or ‘(’ before string constant
arxeioflex.l:46: error: expected identifier or ‘(’ before string constant
arxeioflex.l:47: error: expected identifier or ‘(’ before string constant
arxeioflex.l:49: error: expected identifier or ‘(’ before ‘%’ token
arxeioflex.l has too many sections, replace:
num [0-9]*
%%
%%
%%
"extern" {return EXTERN;}
by
num [0-9]*
%%
"extern" {return EXTERN;}
I am learning lex and yacc programming and this yacc program to validate and evaluate arithmetic expression in giving me 25shift/reduce conflicts. After reading other stackoverflow explanations, I understand the the issue is precedence, however, I am not sure how to resolve using error handling. I was wondering if someone could help me out.
I have tried an Association handler like this:
%token VARNAME
%token DIGIT
%token EQ
%token ADD SUB MULT DIV
%token LPAREN RPAREN
%token END
%left ADD SUB MULT DIV LPAREN
%right RPAREN END
%nonassoc EQ VARNAME DIGIT
But, it doesn't work and I'm confused. This is my 1.y file:
%{
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char* argv[])
{
FILE *fp;
extern FILE *yyin;
extern int yylex();
extern int yyparse();
extern int yydebug;
yydebug = 1;
if (argc == 2)
{
fp = fopen(argv[1], "r");
if (!fp)
{
fprintf (stderr, "Usage: %s <filename>\n", argv[0]);
}
else
{
yyin = fp;
yyparse();
fprintf (stdout, "***PARSE COMPLETE***\n");
}
}
else
fprintf (stderr, "Usage: %s <filename>\n", argv[0]);
}
void yyerror (const char *err)
{
fprintf (stderr, "Error: %s\n", err);
}
%}
%token VARNAME
%token DIGIT
%token EQ
%token ADD SUB MULT DIV
%token LPAREN RPAREN
%token END
%%
statement:
END
|
expression END
{printf ("DONE!\n");};
|
error
{ printf(" INVALID EXPRESSION\n"); }
;
expression:
VARNAME
{printf("PARSED A VARIABLE!\n");};
|
DIGIT
{printf("PARSED A DIGIT!\n");};
|
expression ADD expression
{printf("PARSED A PLUS SIGN!\n");};
|
expression SUB expression
{printf("PARSED A MINUS SIGN!\n");};
|
expression MULT expression
{printf("PARSED A MULTPLY SIGN!\n");};
|
expression DIV expression
{printf("PARSED A DIVIDE SIGN!\n");};
|
expression EQ expression
{printf("PARSED A EQUALS SIGN!\n");};
|
LPAREN expression RPAREN
{printf("PARSED A PARENTHESIS!\n");};
;
%%
I have tried an Association handler like this:
%token VARNAME
%token DIGIT
%token EQ
%token ADD SUB MULT DIV
%token LPAREN RPAREN
%token END
%left ADD SUB MULT DIV LPAREN
%right RPAREN END
%nonassoc EQ VARNAME DIGIT
Is this association correct?
The issues are fairly well covered in the GNU bison manual.
As #Jonathan Leffler says:
You need to specify the precedence and associativity of the operators. Look up %left, %right and %nonassoc. Or write the grammar in a different manner altogether. For example, the grammar in the C standard doesn't need the precedence levels or associativity.
The %token items look plausible. I think %left does not need LPAREN; nor does %right need RPAREN. I'm not sure whether you need %nonassoc at all. I'm assume that the 'code'( at the start and the )'code' at the end are artefacts of trying to format a comment. Normally, you want ADD and SUB at a lower priority than MULT and DIV.
In the arithmetic example these are usually resolved by writing the grammar as:
expression : term
| term ADD expression
| term SUB expression
;
term : factor
| factor MULT term
| factor DIV term
;
factor : VARNAME
| DIGIT
| LPAREN expression RPAREN
;
Now you have both the associativity and the precedence of the operators defined in an unambiguous grammar.
I am trying to use flex and yacc to parse 'C' source code. Unfortunately I am getting the error "expected identifier or '(' before '{' token" on lines 1,12,13,14... . Any ideas why?
This is my flex file (called mini.l):
%{
%}
digit [0-9]
letter [a-zA-Z]
number (digit)+
id (letter|_)(letter|digit|_)*
integer (int)
character (char)
comma [,]
%%
{integer} {return INT;}
{character} {return CHAR;}
{number} {return NUM;}
{id} {return IDENTIFIER;}
{comma} {return ',';}
[-+*/] {return *yytext;}
. {}
%%
main()
{
yylex();
}
The corresponding yacc file (called my_yacc.y) is as shown below:
%{
#include <ctype.h>
#include <stdio.h>
/* #include "myhead.h" */
#include "mini.l"
#define YYSTYPE double
# undef fprintf
%}
%token INT
%token CHAR
%token IDENTIFIER
%token NUM
%token ','
%left '+' '-'
%left '*' '/'
%right UMINUS
%%
lines:lines expr '\n' {printf("%g\n",$2);}
|lines '\n'
|D
|
;
expr :expr '*' expr {$$=$1*$3;}
|expr '/' expr {$$=$1/$3;}
|expr '+' expr {$$=$1+$3;}
|expr '-' expr {$$=$1+$3;}
|'(' expr ')' {$$=$2;}
|'-' expr %prec UMINUS {$$=-$2;}
|IDENTIFIER {}
|NUM {}
;
T :INT {}
|CHAR {}
;
L :L ',' IDENTIFIER {}
|IDENTIFIER {}
;
D :T L {printf("T is %g, L is %g",$1,$2);}
;
%%
/*void yyerror (char *s)
{
fprintf (stderr, "%s\n", s);
}
*/
I am compiling the generated code using the commands:
flex mini.l
yacc my_yacc.y
gcc y.tab.c -ly
The errors you are seeing are coming from the C compiler and are caused by you misunderstanding of how flex and yacc work together. When I run it though the same tools as you, I get the following errors, as you also do, and as noted by #KeithThompson:
In file included from my_yacc.y:5:0:
mini.l:1:1: error: expected identifier or '(' before '%' token
%{
^
mini.l:5:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before' letter'
letter [a-zA-Z]
^
mini.l:12:11: error: expected identifier or '(' before '{' token
{integer} {return INT;}
^
... elided the rest ...
Although you have used the commands flex, yacc and gcc in the correct sequence, you have included the file mini.l in your bison input. This is incorrect. You should be including the output of the flex that was created from mini.l. This file is called lex.yy.c. It is also necessary to include this at the end on the yacc input file. (This is because if you don't you get the error found by #flolo). If you make the necessary changes to your yacc file, you will have this:
%{
#include <ctype.h>
#include <stdio.h>
/* #include "myhead.h" */
/* #include "mini.l" */
#define YYSTYPE double
# undef fprintf
%}
%token INT
%token CHAR
%token IDENTIFIER
%token NUM
%token ','
%left '+' '-'
%left '*' '/'
%right UMINUS
%%
lines:lines expr '\n' {printf("%g\n",$2);}
|lines '\n'
|D
|
;
expr :expr '*' expr {$$=$1*$3;}
|expr '/' expr {$$=$1/$3;}
|expr '+' expr {$$=$1+$3;}
|expr '-' expr {$$=$1+$3;}
|'(' expr ')' {$$=$2;}
|'-' expr %prec UMINUS {$$=-$2;}
|IDENTIFIER {}
|NUM {}
;
T :INT {}
|CHAR {}
;
L :L ',' IDENTIFIER {}
|IDENTIFIER {}
;
D :T L {printf("T is %g, L is %g",$1,$2);}
;
%%
void yyerror (char *s)
{
fprintf (stderr, "%s\n", s);
}
#include "lex.yy.c"
If you run your command sequence on this now, you will find it both compiles and runs, and also process C language input correctly.
The mistake you made is a common error made by new users to the tools flex and yacc.