I can't seem to figure out how to concatenate two string in yacc.
Here is the lex code
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+ {yylval.intval=atoi(yytext); return NR;}
[a-zA-Z]+ {yylval.strval=yytext; return STR;}
"0exit" {return 0;}
[ \t] ;
\n {return 0;}
. {return yytext[0];}
And here i have the basics to add two strings
%{
#include <stdio.h>
#include <string.h>
%}
%union {
int intval;
char* strval;
}
%token STR NR
%type <intval>NR
%type <strval>STR
%type <strval>operatie
%left '+' '-'
%right '='
%start S
%%
S : S operatie {}
| operatie {printf("%s\n",$<strval>$);}
;
operatie : STR '+' STR { char* s=malloc(sizeof(char)*(strlen($1)+strlen($3)+1));
strcpy(s,$1); strcat(s,$3);
$$=s;}
;
%%
int main(){
yyparse();
}
The code works, the problem is that the output is something like this:
If i input
aaaa + bbbb
i get the output
aaaa + bbbbbbbb
The problem is here:
yylval.strval = yytext;
yytext changes with every token and every buffer. Change it to
yylval.strval = strdup(yytext);
yytext is only valid until the lexer starts looking for the next token. So if you want to pass a string token from (f)lex to yacc/bison, you need to strdup it, and the parser code needs to free the copy.
See the bison faq.
Related
I have been working with a sample problem to construct a three address code for an expression. But to my surprise, YACC seems to skip my first input symbol. I will attach an image with the output to make it clear.
The rules aren't too complicated so I don't seem to understand where the issue is.
Here is my lex file:
%{
#include"y.tab.h"
%}
%%
[a-zA-Z]+ return ID;
[0-9]+ return NUM;
. return yytext[0];
%%
int yywrap(){return 1;}
Here is my yacc file:
%{
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
char st[50][50];
extern char * yytext;
int top=-1;char t[5]="0";char temp[5];
void push();
void code();
%}
%token NUM ID
%left '+' '-'
%left '*' '/'
%%
S:' 'E
|' 'A'='E
;
A:ID{push();printf("yytext is %s\n",yytext);}
;
E:E'+'{push();}T{code();}
|E'-'{push();}T{code();}
|T
;
T:T'*'{push();}F{code();}
|T'/'{push();}F{code();}
|F
;
F:ID{push();}
|NUM{push();}
;
%%
void push(){strcpy(st[++top],yytext);}
void code(){
strcpy(temp,"t");
strcat(temp,t);
t[0]++;
printf("%s = %s %s %s \n",temp,st[top-2],st[top-1],st[top]);
top=top-2;
strcpy(st[top],temp);
}
int main(){yyparse();}
int yyerror(){exit(0);}
i expect the print in the A:ID production to print the ID entered, but it is printing the '=' instead.
here is my output:
my output
In order to be sure that A was seen, yacc had to advance (look ahead) to see =. That overwrites your first token in yytext.
I am making a regular expression in lex for using it in yacc accepting arithmetic expressions. I want to eliminate blank space when the expression has a blank space I tried \s but the space is not getting eliminated and my yacc then gives syntax error because \s is not an operator.
LEX
%{
#include <stdio.h>
#include "y.tab.h"
int yylval;/*declared extern by yacc code. used to pass info to yacc*/
%}
letter [A-Za-z]
digit ([0-9])*
op "+"|"*"|"("|")"|"/"|"-"
ws [ \t\n\r\s]+$
other .
%%
{ws} { /*Nothing*/ }
{digit} { yylval = atoi(yytext); return NUM;}
{op} { return yytext[0];}
{other} { return yytext[0];}
%%
I created a small interpreter using flex/bison.
This just can print a number, but I want to know how can add a string print?
lexer :
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
%}
%%
<INITIAL>[s|S][h|H][o|O][w|W] {return show;}
<INITIAL>[0-9a-zA-z]+ {yylval.num=atoi(yytext);return string;}
<INITIAL>[\-\+\=\;\*\/] {return yytext[0];}
%%
int yywrap (void) {return 1;}
yacc :
%{
void yyerror(char *s);
#include <stdio.h>
#include <stdlib.h>
%}
%union {int num;}
%start line
%token show
%token <num> number
%type <num> line exp term
%%
line : show exp ';' {printf("showing : %d\n",$2);}
| line show exp ';' {printf("showing : %d\n",$3);}
;
exp : term {$$ = $1;}
| exp '+' term {$$ = $1 + $3;}
| exp '-' term {$$ = $1 - $3;}
| exp '*' term {$$ = $1 * $3;}
| exp '/' term {$$ = $1 / $3;}
;
term : number {$$ = $1;}
%%
int main (void)
{
return yyparse();
}
void yyerror (char *s)
{
printf("-%s at %s !\n",s );
}
test data :
show 5;
show 5+5;
show 5*2-5+1;
I want add string code to the lexer :
<INITIAL>\" {BEGIN(STRING);}
<STRING>\" {BEGIN(INITIAL);}
Now how to use from content of in <STRING>?
Can you help me to complete my interpreter?
I need to add this examples to my interpreter :
show "hello erfan";//hello erfan
show "hello ".5;//hello 5
Please help me.
At the moment your interpreter doesn't work on numbers either! (Its an interpreter because it generates the results directly and does not generate code like a compiler would).
To make it work for numbers (again) you'd have to return a number token from the lexer and not a string. This line is wrong:
<INITIAL>[0-9a-zA-z]+ {yylval.num=atoi(yytext);return string;}
It should return a number token:
<INITIAL>[0-9]+ {yylval.num=atoi(yytext);return number;}
Now lets add a string. I see you made a start:
<INITIAL>\" {BEGIN(STRING);}
<STRING>\" {BEGIN(INITIAL);}
We need to add the state for a string to the lexer:
%x STRING
We should also match the contents of the string. I'll cheat a little here:
<STRING>[^"]*\" {BEGIN(INITIAL); return(string);}
We also need to return the string value in the lval. Cheating again I can store a char pointer in the integer
<STRING>[^"]*\" {BEGIN(INITIAL); yylval.num=strdup(yytext); return(string); }
Now we have to add strings to the yacc grammar. I'm cheating again by not allowing integers and strings to be mixed. You can expand that later if you wish:
line : show exp ';' {printf("showing : %d\n",$2);}
| line show exp ';' {printf("showing : %d\n",$3);}
| show string ';' {printf("showing : %s\n",$2);}
| line show string ';' {printf("showing : %s\n",$3);}
;
We need to remember to declare the string token:
%token <num> number string
Now we can put that all together:
The lexer file:
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
%}
%x STRING
%%
<INITIAL>[s|S][h|H][o|O][w|W] {return show;}
<INITIAL>[0-9]+ {yylval.num=atoi(yytext);return number;}
<INITIAL>[\-\+\=\;\*\/] {return yytext[0];}
<INITIAL>\" {BEGIN(STRING);}
<STRING>[^"]*\" {BEGIN(INITIAL);yylval.num=strdup(yytext);return(string);}
%%
int yywrap (void) {return 1;}
The parser file:
%{
void yyerror(char *s);
#include <stdio.h>
#include <stdlib.h>
%}
%union {int num;}
%start line
%token show
%token <num> number string
%type <num> line exp term
%%
line : show exp ';' {printf("showing : %d\n",$2);}
| line show exp ';' {printf("showing : %d\n",$3);}
| show string ';' {printf("showing : %s\n",$2);}
| line show string ';' {printf("showing : %s\n",$3);}
;
exp : term {$$ = $1;}
| exp '+' term {$$ = $1 + $3;}
| exp '-' term {$$ = $1 - $3;}
| exp '*' term {$$ = $1 * $3;}
| exp '/' term {$$ = $1 / $3;}
;
term : number {$$ = $1;}
%%
int main (void)
{
return yyparse();
}
void yyerror (char *s)
{
printf("-%s at %s !\n",s );
}
#include "lex.yy.c"
Its basic and it works (I tested it). I've left plenty of things to be polished. You can remove the quote character from the string text; you can make the string token ave a string value rather than an integer to avoid the horrible type mismatch and you can make the show statements a bit more complex, but at least I've got you started.
I'm trying to create a simple parser/compiler, mostly for homework, but eventually for learning purposes and for fun too. I've written both the lexer and the parser file (for an initial subset of commands) and I want to output an AST. However, I'm stuck at a "syntax error" message, even when I'm trying to parse a simple '1+1'. Here is the lexer file:
%{
#include "parser.tab.h"
%}
DIGIT [0-9]
LETTER [a-zA-Z]
%%
[ \t\n] ;
{DIGIT}+ {yylval = atoi(yytext); return NUMBER;}
{LETTER}* { if (strlen(yytext) <= 8){
printf( "<ID, %s> ", yytext );
} else {
yytext[8] = '\0';
printf("WARNING! Long identifier. Truncating to 8 chars\n");
printf( "<ID, %s> ", yytext );
}
}
"+" {printf("Found '+' symbol\n");return(PLUS);}
"-" return(MINUS);
"*" return(TIMES);
"/" return(DIVIDE);
"(" return(LEFT_PARENTHESIS);
")" return(RIGHT_PARENTHESIS);
<<EOF>> return(END_OF_FILE);
%%
int yywrap (void) {return 1;}
And here is the parser file:
%{
#include <stdio.h>
/*#include "tree.h"
#include "treedefs.h"*/
int yylex();
#define YYSTYPE int
%}
%start program
%token NUMBER
%token ID
%token PLUS MINUS TIMES EQUAL
%token LEFT_PARENTHESIS RIGHT_PARENTHESIS
%token LET IN AND
%token END_OF_FILE
%left PLUS MINUS
%left TIMES DIVIDE
%%
program: /* empty */
| exp { printf("Result: %d\n", $1); }
| END_OF_FILE {printf("Encountered EOF\n");}
;
exp: NUMBER { $$ = $1;}
| exp PLUS exp { $$ = $1 + $3; }
| exp TIMES exp { $$ = $1 * $3; }
| '(' exp ')' { $$ = $2;}
;
%%
int yyerror (char *s) {fprintf (stderr, "%s\n", s);
}
Also, I've created a main.c, to keep the main() function separately. You can omit the tree*.h files as they only include functions relative to the AST.
#include <stdio.h>
#include <stdlib.h>
#include "tree.h"
#include "treedefs.h"
int main(int argc, char **argv){
yyparse();
TREE *RootNode = malloc(sizeof(TREE));
return 0;
}
I've read tons of examples but I couldn't find something (VERY) different from what I wrote. What am I doing wrong? Any help, will be greatly appreciated.
Your grammar accepts an expression OR an end of file. So if you give it an expression followed by an end of file, you get an error.
Another problem is that you return the token END_OF_FILE at the end of the input, rather than 0 -- bison is expecting a 0 for the EOF token and will give a syntax error if it doesn't see one at the end of the input.
The easiest fix for both of those is to get rid of the END_OF_FILE token and have the <<EOF>> rule return 0. Then your grammar becomes:
program: /* empty */ { printf("Empty input\n"); }
| exp { printf("Result: %d\n", $1); }
;
...rest of the grammar
Now you have the (potential) issue that your grammar only accepts a single expression. You might want to support multiple expressions separated by newlines or some other separator(; perhaps?), which can be done in a variety of ways.
There are a few problems with the code.
First, your lexer should include this:
%{
#include "parser.tab.h"
extern int yylval; // this line was missing
%}
Second, assuming you want the code to evaluate at the end of a statement, you have to include a rule for the end of a statement. That is, assuming it's to be line-oriented, you'd replace your current whitespace rule with these:
[ \t] {}
[\n] { return 0; }
Third, one of your lines is munged. Instead of this:
printf("WARNING! Long identifier. Truncating to 8 chars\n"$
It should be this:
printf("WARNING! Long identifier. Truncating to 8 chars\n");
First of all. I am new to flex/lex. So this could be a easy question for you guys or hard because i dont know where the problem is directly.
My Code:
/* example.lex */
%{
#include <stdio.h>
#include "global.h"
extern int yylval;
%}
%option noyywrap
delim [\t\n]
ws [\t\n]+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
nummer [0-9]+
%%
{ws} { /* Dont Do Anything */ }
{id} { yylval = atoi(yytext); return ID; }
{nummer} { yylval = atoi(yytext); return NUM; }
"+" { return '+'; }
"-" { return '-'; }
"*" { return '*'; }
%%
This is everything that my example.lex file has. Let me know if you need more information.
Any tips/help on what i should try to fix this problem is welcome
yylval is usually defined by bison (yacc). If you are not using bison, then you need to define yylval yourself.
In your case, if you are not using bison, you can simply remove the "extern" from the llval definition you have. If you use yylval in another file, you will have to declare it "extern" in that file.
If you are using yacc, you need to #include "y.tab.h" in your lex file. You can create y.tab.h by running 'bison -d file.y'.
If you are looking for a very simple answer, then change:
extern int yylval;
to
int yylval;