Unintentional concatenation in Bison/Yacc grammar - c

I am experimenting with lex and yacc and have run into a strange issue, but I think it would be best to show you my code before detailing the issue. This is my lexer:
%{
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
%}
%%
[a-zA-Z]+ {
yylval.strV = yytext;
return ID;
}
[0-9]+ {
yylval.intV = atoi(yytext);
return INTEGER;
}
[\n] { return *yytext; }
[ \t] ;
. yyerror("invalid character");
%%
int yywrap(void) {
return 1;
}
This is my parser:
%{
#include <stdio.h>
int yydebug=1;
void prompt();
void yyerror(char *);
int yylex(void);
%}
%union {
int intV;
char *strV;
}
%token INTEGER ID
%%
program: program statement EOF { prompt(); }
| program EOF { prompt(); }
| { prompt(); }
;
args: /* empty */
| args ID { printf(":%s ", $<strV>2); }
;
statement: ID args { printf("%s", $<strV>1); }
| INTEGER { printf("%d", $<intV>1); }
;
EOF: '\n'
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
void prompt() {
printf("> ");
}
int main(void) {
yyparse();
return 0;
}
A very simple language, consisting of no more than strings and integer and a basic REPL. Now, you'll note in the parser that args are output with a leading colon, the intention being that, when combined with the first pattern of the rule of the statement the interaction with the REPL would look something like this:
> aaa aa a
:aa :a aaa>
However, the interaction is this:
> aaa aa a
:aa :a aaa aa aa
>
Why does the token ID in the following rule
statement: ID args { printf("%s", $<strV>1); }
| INTEGER { printf("%d", $<intV>1); }
;
have the semantic value of the total input string, newline included? How can my grammar be reworked so that the interaction I intended?

You have to preserve token strings as they are read if you want them to remain valid. I modified the statement rule to read:
statement: ID { printf("<%s> ", $<strV>1); } args { printf("%s", $<strV>1); }
| INTEGER { printf("%d", $<intV>1); }
;
Then, with your input, I get the output:
> aaa aa a
<aaa> :aa :a aaa aa a
>
Note that at the time the initial ID is read, the token is exactly what you expected. But, because you did not preserve the token, the string has been modified by the time you get back to printing it after the args have been parsed.

I think there is an associativity conflict between the args and statement productions. This is borne out by the (partial) output from the bison -v parser.output file:
Nonterminals, with rules where they appear
$accept (6)
on left: 0
program (7)
on left: 1 2 3, on right: 0 1 2
statement (8)
on left: 4 5, on right: 1
args (9)
on left: 6 7, on right: 4 7
EOF (10)
on left: 8, on right: 1 2
Indeed, I'm having a hard time trying to figure out what your grammar is trying to accept. As a side note, I'd probably move your EOF production into the lexer as an EOL token; this will make resynchronizing on parse errors easier.
Better explanation of your intent would be helpful.

Related

Why doesn't this grammar parse the return statement?

I am trying to write a grammar that can parse the following 3 inputs
-- testfile --
class hi implements ho:
var x:int;
end;
-- testfile2 --
interface xs:
myFunc(int,int):int
end;
-- testfile3 --
class hi implements ho:
method myMethod(x:int)
return y;
end
end;
this is lexer.l:
%{
#include <stdio.h>
#include <stdlib.h>
#include "parser.tab.h"
#include <string.h>
int line_number = 0;
void lexerror(char *message);
%}
newline (\n|\r\n)
whitespace [\t \n\r]*
digit [0-9]
alphaChar [a-zA-Z]
alphaNumChar ({digit}|{alphaChar})
hexDigit ({digit}|[A-Fa-f])
decNum {digit}+
hexNum {digit}{hexDigit}*H
identifier {alphaChar}{alphaNumChar}*
number ({hexNum}|{decNum})
comment "/*"[.\r\n]*"*/"
anything .
%s InComment
%option noyywrap
%%
<INITIAL>{
interface return INTERFACE;
end return END;
class return CLASS;
implements return IMPLEMENTS;
var return VAR;
method return METHOD;
int return INT;
return return RETURN;
if return IF;
then return THEN;
else return ELSE;
while return WHILE;
do return DO;
not return NOT;
and return AND;
new return NEW;
this return THIS;
null return _NULL;
":" return COL;
";" return SCOL;
"(" return BRACL;
")" return BRACR;
"." return DOT;
"," return COMMA;
"=" return ASSIGNMENT;
"+" return PLUS;
"-" return MINUS;
"*" return ASTERISK;
"<" return LT;
{decNum} {
yylval = atoi(yytext);
return DEC;
}
{hexNum} {
const int len = strlen(yytext)-1;
char* substr = (char*) malloc(sizeof(char) * len);
strncpy(substr,yytext,len);
yylval = (int)strtol
( substr
, NULL
, 16);
free (substr);
return HEX;
}
{identifier} {
yylval= (char *) malloc(sizeof(char)*strlen(yytext));
strcpy(yylval, yytext);
return ID;
}
{whitespace} {}
"/*" BEGIN InComment;
}
{newline} line_number++;
<InComment>{
"*/" BEGIN INITIAL;
{anything} {}
}
. lexerror("Illegal input");
%%
void lexerror(char *message)
{
fprintf(stderr,"Error: \"%s\" in line %d. = %s\n",
message,line_number,yytext);
exit(1);
}
this is parser.y:
%{
# include <stdio.h>
int yylex(void);
void yyerror(char *);
extern int line_number;
%}
%start Program
%token INTERFACE END CLASS IMPLEMENTS VAR METHOD INT RETURN IF THEN ELSE
%token WHILE DO NOT AND NEW THIS _NULL EOC SCOL COL BRACL BRACR DOT COMMA
%token ASSIGNMENT PLUS ASTERISK MINUS LT EQ DEC HEX ID NEWLINE
%%
Program: INTERFACE Interface SCOL { printf("interface\n"); }
| CLASS Class SCOL { printf("class\n");}
| error { printf("error on: %s\n", $$); }
;
Interface: ID COL
AbstractMethod
END
;
AbstractMethod: ID BRACL Types BRACR COL Type
;
Types : Type COMMA Types
| Type
;
Class: ID
IMPLEMENTS ID COL
Member SCOL
END
;
Member: VAR ID COL Type
| METHOD ID BRACL Pars BRACR Stats END
;
Type: INT
| ID
;
Pars: Par COMMA Pars
| Par
;
Par: ID COL Type
;
Stats: Stat SCOL Stat
| Stat
;
Stat: RETURN Expr
| IF Expr THEN Stats MaybeElse END
| WHILE Expr DO Stats END
| VAR ID COL Type COL ASSIGNMENT Expr
| ID COL ASSIGNMENT Expr
| Expr
;
MaybeElse :
| ELSE Stats
;
Expr: NOT Term
| NEW ID
| Term PLUS Term
| Term ASTERISK Term
| Term AND Term
| Term ArithOp Term
| Term
;
ArithOp: MINUS
| LT
| ASSIGNMENT
;
Term: BRACL Expr BRACR
| Num
| THIS
| ID
| Term DOT ID BRACL Exprs BRACR
| error { printf("error in term: %s\n", $$); }
;
Num : HEX
| INT
;
Exprs : Expr COMMA Exprs
| Expr
;
%%
void yyerror(char *s) {
fprintf(stderr, "Parse Error on line %i: %s\n", line_number, s);
}
int main(void){
yyparse();
}
the first two inputs are recognized as expected,
However, the third one fails with the error error on: y and I don't have an idea why.
As I see it, this should be a Class with a Member METHOD that contains a Stat(ement) RETURN with an Expr Term being an ID.
I tried commenting and removing all the unneccesary bits, but the result is still the same.
I also took a look at the parser to verify that my identifiers parse correctly, but as I see it they should.
Why is the y in return y not recognized here?
Is there some conflict in the grammar I am unaware of?
(Please note that I am not expecting you to fix the complete grammar; I am merely asking for the reason this is not working. I am sure there are other errors in there, but I am really stuck fixing this one.)
here is also my makefile:
CC = gcc
LEX = flex
YAC = bison
scanner: parser.y lexer.l
$(YAC) -d -Wcounterexamples parser.y
$(LEX) lexer.l
$(CC) parser.tab.c parser.tab.h lex.yy.c -o parser
clean:
rm -f *.tab.h *.tab.c *.gch *.yy.c
rm ./parser
testing:
cat testfile3 | ./parser
First you have one error in your grammar :
Stats: Stat SCOL Stat
| Stat
;
must be
Stats: Stat SCOL Stats
| Stat
;
('s' added at the end of line)
Second your definition in testfile3 does not follow your grammar and must be
class hi implements ho:
method myMethod(x:int)
return y
end;
end;
so the ';' after return y must be moved after the first end
(and return x seems more logical, but this is an other subject, you do not check the validity of the ID)
Out of that a class can have only one member, it's very limited / restrictive

Why isn't my bison printing the variable names?

So i'm using a flex/bison parser but the variable names arent printing correctly. It understands the number values. I've tried messing with everything but I'm lost. heres a link to the output. its where it prints "Data: 0" that i'm trying to get the variable name [https://imgur.com/vJDpgpR][1]
invocation is: ./frontEnd data.txt
//main.c
#define BUF_SIZE 1024
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern FILE* yyin;
extern yyparse();
int main(int argc, char* argv[]){
if(argc < 2){
FILE* fp = fopen("temp.txt", "a");
printf("Entering data: \n");
void *content = malloc(BUF_SIZE);
if (fp == 0)
printf("error opening file");
int read;
while ((read = fread(content, BUF_SIZE, 1, stdin))){
fwrite(content, read, 1, fp);
}
if (ferror(stdin))
printf("There was an error reading from stdin");
fclose(fp);
yyparse(fp);
}
if(argc == 2){
yyin = fopen(argv[2], "r");
if(!yyin)
{
perror(argv[2]);
printf("ERROR: file does not exist.\n");
return 0;
}
yyparse (yyin);
}
return 0;
}
void yyerror(char *s){
fprintf(stderr, "error: exiting %s \n", s);
}
//lex.l
%{
#include <stdio.h>
#include <stdlib.h>
#include "parser.tab.h"
extern SYMTABNODEPTR symtable[SYMBOLTABLESIZE];
extern int curSymSize;
%}
%option noyywrap
%option nounput yylineno
%%
"stop" return STOP;
"iter" return ITER;
"scanf" return SCANF;
"printf" return PRINTF;
"main" return MAIN;
"if" return IF;
"then" return THEN;
"let" return LET;
"func" return FUNC;
"//" return COMMENT; printf("\n");
"start" return START;
"=" return ASSIGN;
"=<" return LE;
"=>" return GE;
":" return COLON;
"+" return PLUS;
"-" return MINUS;
"*" return MULT;
"/" return DIV;
"%" return MOD;
"." return DOT;
"(" return RPAREN;
")" return LPAREN;
"," return COMMA;
"{" return RBRACE;
"}" return LBRACE;
";" return SEMICOLON;
"[" return LBRACK;
"]" return RBRACK;
"==" return EQUAL;
[A-Z][a-z]* { printf("SYNTAX ERROR: Identifiers must start with lower case. "); }
[a-zA-Z][_a-zA-Z0-9]* {
printf("string: %s \n", yytext);
yylval.iVal = strdup(yytext);
yylval.iVal = addSymbol(yytext);
return ID;
}
[0-9]+ {
yylval.iVal = atoi(yytext);
printf("num: %s \n", yytext);
return NUMBER; }
[ _\t\r\s\n] ;
^"#".+$ return COMMENT;
. {printf("ERROR: Invalid Character "); yyterminate();}
<<EOF>> { printf("EOF: line %d\n", yylineno); yyterminate(); }
%%
// stores all variable id is in an array
SYMTABNODEPTR newSymTabNode()
{
return ((SYMTABNODEPTR)malloc(sizeof(SYMTABNODE)));
}
int addSymbol(char *s)
{
extern SYMTABNODEPTR symtable[SYMBOLTABLESIZE];
extern int curSymSize;
int i;
i = lookup(s);
if(i >= 0){
return(i);
}
else if(curSymSize >= SYMBOLTABLESIZE)
{
return (NOTHING);
}
else{
symtable[curSymSize] = newSymTabNode();
strncpy(symtable[curSymSize]->id,s,IDLENGTH);
symtable[curSymSize]->id[IDLENGTH-1] = '\0';
return(curSymSize++);
}
}
int lookup(char *s)
{
extern SYMTABNODEPTR symtable[SYMBOLTABLESIZE];
extern int curSymSize;
int i;
for(i=0;i<curSymSize;i++)
{
if(strncmp(s,symtable[i]->id,IDLENGTH) == 0){
return (i);
}
}
return(-1);
}
// parser.y
%{
#define YYERROR_VERBOSE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern char *yytext;
extern int yylex();
extern void yyerror(char *);
extern int yyparse();
extern FILE *yyin;
/* ------------- some constants --------------------------------------------- */
#define SYMBOLTABLESIZE 50
#define IDLENGTH 15
#define NOTHING -1
#define INDENTOFFSET 2
#ifdef DEBUG
char *NodeName[] =
{
"PROGRAM", "BLOCK", "VARS", "EXPR", "N", "A", "R", "STATS", "MSTAT", "STAT",
"IN", "OUT", "IF_STAT", "LOOP", "ASSIGN", "RO", "IDVAL", "NUMVAL"
};
#endif
enum ParseTreeNodeType
{
PROGRAM, BLOCK, VARS, EXPR, N, A, R, STATS, MSTAT, STAT,
IN, OUT,IF_STAT, LOOP, ASSIGN, RO, IDVAL, NUMVAL
};
#define TYPE_CHARACTER "char"
#define TYPE_INTEGER "int"
#define TYPE_REAL "double"
#ifndef TRUE
#define TRUE 1
#endif
#ifndef FALSE
#define FALSE 0
#endif
#ifndef NULL
#define NULL 0
#endif
// definitions for parse tree
struct treeNode {
int item;
int nodeID;
struct treeNode *first;
struct treeNode *second;
};
typedef struct treeNode TREE_NODE;
typedef TREE_NODE *TREE;
TREE makeNode(int, int, TREE, TREE);
#ifdef DEBUG
void printTree(TREE, int);
#endif
// symbol table definitions.
struct symbolTableNode{
char id[IDLENGTH];
};
typedef struct symbolTableNode SYMTABNODE;
typedef SYMTABNODE *SYMTABNODEPTR;
SYMTABNODEPTR symtable[SYMBOLTABLESIZE];
int curSymSize = 0;
%}
%start program
%union {
char *sVal;
int iVal;
TREE tVal;
}
// list of all tokens
%token SEMICOLON GE LE EQUAL COLON RBRACK LBRACK ASSIGNS LPAREN RPAREN COMMENT
%token DOT MOD PLUS MINUS DIV MULT RBRACE LBRACE START MAIN STOP LET COMMA
%token SCANF PRINTF IF ITER THEN FUNC
%left MULT DIV MOD ADD SUB
// tokens defined with values and rule names
%token<iVal> NUMBER ID
//%token<sVal> ID
%type<tVal> program type block vars expr N A R stats mStat stat in out if_stat loop assign RO
%%
program : START vars MAIN block STOP
{
TREE tree;
tree = makeNode(NOTHING, PROGRAM, $2,$4);
#ifdef DEBUG
printTree(tree, 0);
#endif
}
;
block : RBRACE vars stats LBRACE
{
$$ = makeNode(NOTHING, BLOCK, $2, $3);
}
;
vars : /*empty*/
{
$$ = makeNode(NOTHING, VARS,NULL,NULL);
}
| LET ID COLON NUMBER vars
{
$$ = makeNode($2, VARS, $5,NULL);
printf("id: %d", $2);
}
;
//variable:
// type ID{$$ = newNode($2,VARIABLE,$1,NULL,NULL);};
//type:
// INT {$$ = newNode(INT,TYPE,NULL,NULL,NULL);}
// | BOOL {$$ = newNode(BOOL,TYPE,NULL,NULL,NULL);}
// | CHAR {$$ = newNode(CHAR,TYPE,NULL,NULL,NULL);}
// | STRING{$$ = newNode(STRING,TYPE,NULL,NULL,NULL);};
expr : N DIV expr
{
$$ = makeNode(DIV, EXPR, $1, $3);
}
| N MULT expr
{
$$ = makeNode(MULT, EXPR, $1, $3);
}
| N
{
$$ = makeNode(NOTHING, EXPR, $1,NULL);
}
;
N : A PLUS N
{
$$ = makeNode(PLUS, N, $1, $3);
}
| A MINUS N
{
$$ = makeNode(MINUS, N, $1, $3);
}
| A
{
$$ = makeNode(NOTHING, N, $1,NULL);
}
;
A : MOD A
{
$$ = makeNode(NOTHING, A, $2,NULL);
}
| R
{
$$ = makeNode(NOTHING, A, $1,NULL);
}
;
R : LBRACK expr RBRACK
{
$$ = makeNode(NOTHING, R, $2,NULL);
}
| ID
{
$$ = makeNode($1, IDVAL, NULL,NULL);
}
| NUMBER
{
$$ = makeNode($1, NUMVAL, NULL,NULL);
}
;
stats : stat mStat
{
$$ = makeNode(NOTHING, STATS, $1, $2);
}
;
mStat : /* empty */
{
$$ = makeNode(NOTHING, MSTAT, NULL,NULL);
}
| stat mStat
{
$$ = makeNode(NOTHING, MSTAT, $1, $2);
}
;
stat: in DOT
{
$$ = makeNode(NOTHING, STAT, $1,NULL);
}
| out DOT
{
$$ = makeNode(NOTHING, STAT, $1,NULL);
}
| block
{
$$ = makeNode(NOTHING, STAT, $1,NULL);
}
| if_stat DOT
{
$$ = makeNode(NOTHING, STAT, $1,NULL);
}
| loop DOT
{
$$ = makeNode(NOTHING, STAT, $1,NULL);
}
| assign DOT
{
$$ = makeNode(NOTHING, STAT, $1,NULL);
}
;
in : SCANF LBRACK ID RBRACK
{
$$ = makeNode($3, IN,NULL,NULL);
}
;
out : PRINTF LBRACK expr RBRACK
{
$$ = makeNode(NOTHING, OUT,$3,NULL);
}
;
if_stat : IF LBRACK expr RO expr RBRACK THEN block
{
$$ = makeNode(NOTHING, IF_STAT, $4, $8);
}
;
loop : ITER LBRACK expr RO expr RBRACK block
{
$$ = makeNode(NOTHING, LOOP, $4, $7);
}
;
assign : ID ASSIGNS expr
{
$$ = makeNode($1, ASSIGN, $3,NULL);
}
;
RO : LE
{
$$ = makeNode(LE, RO, NULL,NULL);
}
| GE
{
$$ = makeNode(GE, RO, NULL,NULL);
}
| EQUAL
{
$$ = makeNode(EQUAL, RO, NULL,NULL);
}
| COLON COLON
{
$$ = makeNode(EQUAL, RO, NULL,NULL);
}
;
%%
// node generator
TREE makeNode(int iVal, int nodeID, TREE p1, TREE p2)
{
TREE t;
t = (TREE)malloc(sizeof(TREE_NODE));
t->item = iVal;
t->nodeID = nodeID;
t->first = p1;
t->second = p2;
//printf("NODE CREATED");
return(t);
}
// prints the tree with indentation for depth
void printTree(TREE tree, int depth){
int i;
if(tree == NULL) return;
for(i=depth;i;i--)
printf(" ");
if(tree->nodeID == NUMBER)
printf("INT: %d ",tree->item);
else if(tree->nodeID == IDVAL){
if(tree->item > 0 && tree->item < SYMBOLTABLESIZE )
printf("id: %s ",symtable[tree->item]->id);
else
printf("unknown id: %d ", tree->item);
}
if(tree->item != NOTHING){
printf("Data: %d ",tree->item);
}
// If out of range of the table
if (tree->nodeID < 0 || tree->nodeID > sizeof(NodeName))
printf("Unknown ID: %d\n",tree->nodeID);
else
printf("%s\n",NodeName[tree->nodeID]);
printTree(tree->first,depth+2);
printTree(tree->second,depth+2);
}
#include "lex.yy.c"
// heres the makefile I use for compilation
frontEnd: lex.yy.c parser.tab.c
gcc parser.tab.c main.c -o frontEnd -lfl -DDEBUG
parser.tab.c parser.tab.h: parser.y
bison -d parser.y
lex.yy.c: lex.l
flex lex.l
clean:
rm lex.yy.c y.tab.c frontEnd
'''
// data.txt
start
let x : 13
main {
scanf [ x ] .
printf [ 34 ] .
} stop[enter image description here][2]
[1]: https://i.stack.imgur.com/xlNnh.png
[2]: https://i.stack.imgur.com/HKRtX.png
I think this has a lot more to do with your AST and symbol table functions than with your parser, and practically nothing to do with bison itself.
For example, your function to print trees won't attempt to print an identifier's name if its symbol table index is 0.
if(tree->item > 0 && tree->item < SYMBOLTABLESIZE)
But the first symbol entered in the table will have index 0. (Perhaps you fixed this between pasting your code and generating the results. You should always check that the code you paste in a question corresponds precisely to the output which you show. But this isn't the only bug in your code; it's just an example.)
As another example, the immediate problem which causes Data: 0 to be printed instead of the symbol name is that your tree printer only prints symbol names for AST nodes of type IDVAL, but you create an AST IN node whose data field contains the variable's symbol table index. So either you need to fix your tree printer so it knows about IN nodes, or you need to change the IN node so that it has a child which is the IDVAL node. (That's probably the best solution in the long run.)
It's always a temptation to blame bison (or whatever unfamiliar tool you're using at the moment) for bugs, instead of considering the possibility that you've introduced bugs in your own support code. To avoid falling into this trap, it's always a good idea to test your library functions separately before using them in a more complicated project. For example, you could write a small test driver that builds a fixed AST tree, prints it, and deletes it. Once that works, and only when that works, you can check to see if your parser can build and print the same tree by parsing an input.
You will find that some simple good software design practices will make this whole process much smoother:
Organise your code into separate component files, each with its own header file. Document the library interfaces (and, if necessary, data structures) using comments in the header file. Briefly describe what each function's purpose is. If you can't find a brief description, it nay be that the function is trying to do too many different things.
In your parser, the functions and declarations needed to build and use ASTs are scattered between different parts of your lexer and parser files. This makes them much harder to read, debug, maintain and even use.
No matter what your teacher might tell you, if you find it necessary to #include the generated lexical scanner directly into the parser, then you probably have not found a good way to organise your support functions. You should always aim to make it possible to separately compile the parser and the scanner.
For data structures like your AST node, which use different member variables in different ways depending on an enumerated node type -- which is a model you'll find in other C projects as well, but is particularly common in parsers -- document the precise use of each field for every enumeration value. And make sure that every time you change the way you use the data or add new enumeration values, you fix the documentation accordingly.
This documentation will make it much easier to verify that your AST is being built correctly. As an additional benefit, you (or others using your code) will have an accurate description of how to interpret the contents of AST nodes, which makes it much easier to write code which analyses the tree.
In short, the way to write, debug and maintain any non-trivial project is not by "messing around" but by being systematic and modular. While it might seem like all of this takes precious time, particularly the documentation, it will almost always save you a lot of time in the long run.

Flex and Bison how to find depth level of command

I have Big problem in Bison - I need to find maximum depth level of command (P) in if statement..
So i code this for language.l (FLEX)
%{
#include "jazyk.tab.h"
int max = 0;
int j = 0;
%}
%%
[ \t]+
[Bb][Ee][Gg][Ii][Nn] return(LBEGIN);
[Ee][Nn][Dd] return(LEND);
[Ii][Ff] {j++; if(j>max)max=j; return(LIF);}
[Tt][Hh][Ee][Nn] return(LTHEN);
// command to find max depth level in If statement
[Pp] return(LP);
// V is statement
[Vv] return(LV);
[.] return(.);
[;] return(;);
[-+&~|^/%*(),!] { printf("unknown character in input: %c\n", *yytext);}
[\n] yyterminate();
%%
void maximum()
{
printf("Maximum depth level of command(P): %i\n", max);
}
And this for language.y (BISON)
%{
#include <stdio.h>
#define YYSTYPE float
void koniec(YYSTYPE);
extern char *yytext;
int counterIf;
int counterP;
%}
// define the "terminal symbol" token types (in CAPS by convention)
%token LBEGIN
%token LEND
%token LIF
%token LTHEN
%token LP
%token LV
%token .
%token ;
%start PROGRAM
%%
// the first rule defined is the highest-level rule
PROGRAM: LBEGIN prikazy LEND .
prikazy: prikaz ; prikazy
prikaz: LIF LV LTHEN prikaz {counterIf++;}
prikaz:
prikaz: LP
%%
int main() {
counterIf = 0;
counterP = 0;
printf("Examples to better copy in console: \n");
printf("begin p; p; end. \n");
printf("begin if v then p; end.\n");
printf("begin p; if v then if v then p; p; end.\n");
printf("\n");
if (yyparse()==0){
printf("Sucesfull \n");
printf("If counter: \n");
printf("%d \n", counterIf);
printf("Maximal depth level of command(P): \n");
printf("%d \n", counterP);
maximum();
}
else
printf("Wrong \n");
}
For example of functionality - when i write begin if v then p; end.Result must be: IF: 1; Max depth level of P: 2;
Or:
begin
p;
if v then
if v then p;
p;
end.
Result: IF: 2; max depth: 3;
Im really desperate right now. Please help me with depth counter :-( (And im sorry its not all in English)
Don't try to compute the depth in the scanner. The scanner has no idea about the structure of the program. The parser understands the nesting, so it is where you should count the depth.
Since you're not currently using semantic values for anything, I took the liberty of using them for the statistics. If you had real semantic values, you could add the statistics structure as a member, or use the location value.
When the parser encounters an if statement, it knows that there is one more if statement and that the current nesting depth is one more than the nesting depth of the target of the if.
I added a syntax for if statements with blocks because it was trivial and it makes the program a bit more interesting. When the parser adds a statement to a block, it needs to sum the current if count for the block and the if count for the new statement, and compute the maximum depth as the maximum of the two depths. The function merge_statistics does that.
I didn't really understand what your nesting depth should be; it's possible that the {0, 0} should be {0, 1}. (In the case of an empty block, I assumed that the nesting depth is 0 because there are no statements. But maybe you won't even allow empty blocks.)
You'll need to compile with a compiler which understands C99 (-std=c99 or -std=c11 if you use gcc) because I use compound literals.
I also removed the yyterminate call from your scanner and fixed it so that it insists on spaces between tokens, although maybe you don't care about that.
scanner
%option noinput nounput noyywrap yylineno nodefault
%{
#include "jazyk.tab.h"
%}
%%
[[:space:]]+
[Bb][Ee][Gg][Ii][Nn] return(LBEGIN);
[Ee][Nn][Dd] return(LEND);
[Ii][Ff] return(LIF);
[Tt][Hh][Ee][Nn] return(LTHEN);
[Pp] return(LP);
[Vv] return(LV);
[[:alpha:]]+ { printf("Unknown token: %s\n", yytext); }
[.;] return(*yytext);
. { printf("unknown character in input: %c\n", *yytext);}
parser
%{
#include <stdio.h>
typedef struct statistics {
int if_count;
int max_depth;
} statistics;
statistics merge_statistics(statistics a, statistics b) {
return (statistics){a.if_count + b.if_count,
a.max_depth > b.max_depth ? a.max_depth : b.max_depth};
}
#define YYSTYPE statistics
extern int yylineno;
int yylex();
void yyerror(const char* message);
%}
%token LIF "if" LTHEN "then" LBEGIN "begin" LEND "end"
%token LV
%token LP
%start program
%%
program: block '.' { printf("If count: %d, max depth: %d\n",
$1.if_count, $1.max_depth); }
block: "begin" statements "end" { $$ = $2; }
statements: /* empty */ { $$ = (statistics){0, 0}; }
| statements statement ';' { $$ = merge_statistics($1, $2); }
statement : LP { $$ = (statistics){0, 1}; }
| block
| "if" LV "then" statement { $$ = (statistics){$4.if_count + 1,
$4.max_depth + 1}; }
%%
void yyerror(const char* message) {
printf("At %d: %s\n", yylineno, message);
}
int main(int argc, char** argv) {
int status = yyparse();
/* Unnecessary because yyerror will print an error message */
if (status != 0) printf("Parse failed\n");
return status;
}
Test run:
$ ./jazyk <<<'begin if v then p; end.'
If count: 1, max depth: 2
$ ./jazyk <<<'begin p; if v then if v then p; p; end.'
If count: 2, max depth: 3
$ ./jazyk <<<'begin if v then begin if v then p; p; end; end.'
If count: 2, max depth: 3
$ ./jazyk <<<'begin if v then begin p; if v then p; end; end.'
If count: 2, max depth: 3
$ ./jazyk <<<'begin if v then begin if v then p; if v then p; end; end.'
If count: 3, max depth: 3

BISON AST production prints scrambled values

I'm trying to make a simple parser. It's for a homework assignment but also for own experimentation. I have completed the lexer and the parser and I'm trying now to output an AST. The problem is that when I'm adding, for example, two integers, the result tree is printed with unrecognizable symbols. A valid input should be +(1,1) and a valid output should be (+ 1 1). Instead of this, I'm getting ( + �|k �|k ). I've tried many things, without actually any significant result. The sprintf function returns a null terminator, so probably this is not the problem. Below is the parser code (.y file):
%{
#define YYDEBUG 1
%}
%start program
%token NUMBER
%token ID
%token PLUS MINUS TIMES
%token LP RP EQUALS COMMA
%token END
%token LET IN AND
%left PLUS MINUS
%left TIMES
%left LET IN AND
%left EQUALS
%%
program:{printf("Empty Input\n");} /* empty */
| program line /* do nothing */
line: expr END { printtree($1); printf("\n");}
;
expr : /*Empty*/
| LET deflist IN expr {}
| ID { printf("Found ID\n"); $$ = make_id_leaf($1);}
| NUMBER { printf("Found NUMBER\n"); $$ = make_number_leaf($1);}
| PLUS LP expr COMMA expr RP {$$ = make_plus_tree($3,$5); printf("Found expr PLUS expr.\n"); }
| TIMES LP expr COMMA expr RP {$$ = make_times_tree($3,$5); printf("Found expr TIMES expr. Result:%d\n", $$);}
| MINUS ID
| MINUS NUMBER { printf("found MINUS NUMBER\n"); }
;
deflist : definition
| definition AND deflist
;
definition : /*Empty*/
| ID EQUALS expr {printf("Found EQ\n");}
;
%%
/*int main (void) {return yyparse ( );}*/
int yyerror (char *s) {fprintf (stderr, "%s\n", s);}
The lexer file:
%{
#include "parser.h"
%}
DIGIT [0-9]
LETTER [a-zA-Z]
%%
LET {printf("Encountered LET\n"); return(LET);}
IN {printf("Encountered IN\n"); return(IN);}
AND {printf("Encountered AND\n"); return(AND);}
{DIGIT}+ {yylval = atoi(yytext); return NUMBER;}
{LETTER}* { if (strlen(yytext) <= 8){
yylval = strlen(yytext);
printf( "<ID, %s> ", yytext );
return(ID);
} else {
yytext[8] = '\0';
printf("WARNING! Long identifier. Truncating to 8 chars\n");
printf( "<ID, %s> ", yytext );
return(ID);
}
}
[ \t] ;
[\n] return(END);
"+" return(PLUS);
"-" return(MINUS);
"*" return(TIMES);
"=" return(EQUALS);
"(" return(LP);
")" return(RP);
"," return(COMMA);
<<EOF>> return(0);
%%
int yywrap (void) {return 1;}
The main.c which includes the yyparse() function:
#include <stdio.h>
#include <stdlib.h>
#include "tree.h"
#include "treedefs.h"
int main(int argc, char **argv){
yyparse();
return 0;
}
And the treedefs.h file which includes the function definitions (I've included only the struct definition, the number leaf and the plus tree):
typedef struct tree{
char *token;
TREE *l;
TREE *r;
TREE *child;
}TREE;
/* Make number leaves */
TREE *make_number_leaf(int n){
TREE *leafNum = malloc(sizeof(TREE));
char *c, ch[8];
sprintf(ch, "%d", n); /* Effective way to convert int to string */
c = ch;
leafNum->token = c;
leafNum->l = NULL;
leafNum->r = NULL;
leafNum->child = NULL;
printf("NUM Leaf is: %s\n", leafNum->token);
return (leafNum);
}
/* Addition tree */
TREE *make_plus_tree(TREE *l, TREE *r){
TREE *plusTree = malloc(sizeof(TREE));
plusTree->token = "+";
plusTree->l = l;
plusTree->r = r;
plusTree->child = NULL;
return (plusTree);
}
void printtree(TREE *tree)
{
if (tree->l || tree->r){
printf("(");
}
printf(" %s ", tree->token);
if (tree->l){
printtree(tree->l);
}
if (tree->r){
printtree(tree->r);
}
if (tree->l || tree->r){
printf(")");
}
}
The file tree.h includes only some declarations, no big deal, and definitely not related to the issue.
Why the numbers look like this? And how can I fix it? Any help will be greatly appreciated.
This problem actually has nothing to do with bison or flex. It's in your make_number_leaf implementation:
TREE *make_number_leaf(int n){
TREE *leafNum = malloc(sizeof(TREE));
char *c, ch[8];
// ^ local variable
sprintf(ch, "%d", n); /* Effective way to convert int to string */
c = ch;
leafNum->token = c;
// ^ dangling pointer
// Remainder omitted
}
As indicated in the comments above, ch is a local (stack-allocated) variable, whose lifetime ends when the function returns. Assigning its address to the variable c does nothing to change that. So the value of c which is stored into leafNum->token will become a dangling pointer as soon as the function returns.
So when you later attempt to print out the token, you are printing out the contents of random memory.
You need to malloc a character buffer, and remember to free it when you are freeing the TREE. (However, in the case where leafNum->token is a string literal, you cannot call free, so you need to be a bit cleverer.)

How to write a pure parser and reentrant scanner by "win_flex bison"?

I've write a parser for evaluating a logical expression. I know flex and bison use global variables (like yylval). I want a pure parser and a reentrant scanner for thread programming. My '.y' file is here:
%{
#include <stdio.h>
#include <string>
#define YYSTYPE bool
void yyerror(char *);
//int yylex (YYSTYPE* lvalp);
int yylex(void);
bool parseExpression(const std::string& inp);
%}
%token INTEGER
%left '&' '|'
%%
program:
program statement '\n'
| /* NULL */
;
statement:
expression { printf("%d\n", $1); return $1; }
;
expression:
INTEGER
| expression '|' expression { $$ = $1 | $3; }
| expression '&' expression { $$ = $1 & $3; }
| '(' expression ')' { $$ = $2; }
| '!' expression { $$ = !$2; }
;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
void main(void) {
std::string inp = "0|0\n";
bool nasi = parseExpression(inp);
printf("%s%d\n", "nasi ", nasi);
printf("Press ENTER to close. ");
getchar();
}
My '.y' file is here:
/* Lexer */
%{
#include "parser.tab.h"
#include <stdlib.h>
#include <string>
#define YYSTYPE bool
void yyerror(char *);
%}
%%
[0-1] {
if (strcmp(yytext, "0")==0)
{
yylval = false;
//*lvalp = false;
}
else
{
yylval = true;
//*lvalp = true;
}
return INTEGER;
}
[&|!()\n] { return *yytext; }
[ \t] ; /* skip whitespace */
. yyerror("Unknown character");
%%
int yywrap(void) {
return 1;
}
bool parseExpression(const std::string& inp)
{
yy_delete_buffer(YY_CURRENT_BUFFER);
/*Copy string into new buffer and Switch buffers*/
yy_scan_string(inp.c_str());
bool nasi = yyparse();
return nasi;
}
I've added %pure_parser to both files, changed yylex declaration to int yylex (YYSTYPE* lvalp); and replaced yylval to *lvalp, but I saw an error: 'lvalp' is undeclared identifier.. There are many examples about 'reentrant' and 'pure', but I can't find the best guideline.
Could someone guide me?
Thanks in advance.
Fortunately, I did it. Here is my code. I think it can be a good guideline for who wants write a pure parser.ل
My reentrant scanner:
/* Lexer */
%{
#include "parser.tab.h"
#include <stdlib.h>
#include <string>
#define YYSTYPE bool
void yyerror (yyscan_t yyscanner, char const *msg);
%}
%option reentrant bison-bridge
%%
[0-1] {
if (strcmp(yytext, "0")==0)
{
*yylval = false;
}
else
{
*yylval = true;
}
//yylval = atoi(yytext);
return INTEGER;
}
[&|!()\n] { return *yytext; }
[ \t] ; /* skip whitespace */
. yyerror (yyscanner, "Unknown character");
%%
int yywrap(yyscan_t yyscanner)
{
return 1;
}
bool parseExpression(const std::string& inp)
{
yyscan_t myscanner;
yylex_init(&myscanner);
struct yyguts_t * yyg = (struct yyguts_t*)myscanner;
yy_delete_buffer(YY_CURRENT_BUFFER,myscanner);
/*Copy string into new buffer and Switch buffers*/
yy_scan_string(inp.c_str(), myscanner);
bool nasi = yyparse(myscanner);
yylex_destroy(myscanner);
return nasi;
}
My pure parser:
%{
#include <stdio.h>
#include <string>
#define YYSTYPE bool
typedef void* yyscan_t;
void yyerror (yyscan_t yyscanner, char const *msg);
int yylex(YYSTYPE *yylval_param, yyscan_t yyscanner);
bool parseExpression(const std::string& inp);
%}
%define api.pure full
%lex-param {yyscan_t scanner}
%parse-param {yyscan_t scanner}
%token INTEGER
%left '&' '|'
%%
program:
program statement '\n'
| /* NULL */
;
statement:
expression { printf("%d\n", $1); return $1; }
;
expression:
INTEGER
| expression '|' expression { $$ = $1 | $3; }
| expression '&' expression { $$ = $1 & $3; }
| '(' expression ')' { $$ = $2; }
| '!' expression { $$ = !$2; }
;
%%
void yyerror (yyscan_t yyscanner, char const *msg){
fprintf(stderr, "%s\n", msg);
}
void main(void) {
std::string inp = "1|0\n";
bool nasi = parseExpression(inp);
printf("%s%d\n", "nasi ", nasi);
printf("Press ENTER to close. ");
getchar();
}
Notice that I've cheat and defined yyg myself as
struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
I don't find another way to get the YY_CURRENT_BUFFER. So, If someone knows the best way to get the YY_CURRENT_BUFFER, tell me,plz.
Here is a complete Flex/Bison C++ example. Everything is reentrant, no use of global variables. Both parser/lexer are encapsulated in a class placed in a separate namespace. You can instantiate as many "interpreters" in as many threads as you want.
https://github.com/ezaquarii/bison-flex-cpp-example
Disclaimer: it's not tested on Windows, but the code should be portable with minor tweaks.

Resources