First of all. I am new to flex/lex. So this could be a easy question for you guys or hard because i dont know where the problem is directly.
My Code:
/* example.lex */
%{
#include <stdio.h>
#include "global.h"
extern int yylval;
%}
%option noyywrap
delim [\t\n]
ws [\t\n]+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
nummer [0-9]+
%%
{ws} { /* Dont Do Anything */ }
{id} { yylval = atoi(yytext); return ID; }
{nummer} { yylval = atoi(yytext); return NUM; }
"+" { return '+'; }
"-" { return '-'; }
"*" { return '*'; }
%%
This is everything that my example.lex file has. Let me know if you need more information.
Any tips/help on what i should try to fix this problem is welcome
yylval is usually defined by bison (yacc). If you are not using bison, then you need to define yylval yourself.
In your case, if you are not using bison, you can simply remove the "extern" from the llval definition you have. If you use yylval in another file, you will have to declare it "extern" in that file.
If you are using yacc, you need to #include "y.tab.h" in your lex file. You can create y.tab.h by running 'bison -d file.y'.
If you are looking for a very simple answer, then change:
extern int yylval;
to
int yylval;
Related
Bison always prints the input instead of running the action.
I begin with Bison and I try to make it working with the simpler rule as possible.
Lexer
%{
#include <stdio.h>
#include "wip.tab.h"
%}
%%
[\t\n ]+ ;
[a−z]+ { yylval.sval = strdup(yytext); return IDENTIFIER;}
%%
Parser
%{
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(char const *);
FILE *yyin;
%}
%union{
char *sval;
}
%token IDENTIFIER
%%
input:
%empty
| input line
;
line:
'\n'
| IDENTIFIER {printf("OK\n");}
;
%%
int main(void) {
FILE *myfile = fopen("example.wip", "r");
if (!myfile) {
printf("File can't be opened\n");
return -1;
}
yyin = myfile;
yyparse();
}
void yyerror(char const *s) {
fprintf(stderr, "%s\n", s);
}
The "example.wip" input file
hello
I expect the "OK" output in my terminal but the parser just prints the content of the file.
Thanks in advance.
Bison always prints the input instead of running the action.
Bison-generated never print the input unless that's what the actions say. Since none of your actions print anything other than "OK", that can't be what's going on here.
However, by default flex-generated lexers do print the input when they see a character that they don't recognize. To verify that this is what's going on, we can add a rule at the end of your lexer file that prints a proper error message for unrecognized characters:
. { fprintf(stderr, "Unrecognized character: '%c'\n", yytext[0]); }
And sure enough, this will tell us that all the characters in "hello" are unrecognized.
So what's wrong with the [a−z]+ pattern? Why doesn't it match "hello"? What's wrong is the −. It's not a regular ASCII dash, but a Unicode dash that has no special meaning to flex. So flex interprets [a−z] as a character class that can match one of three characters: a, the Unicode dash or z - not as a range from a to z.
To fix this, just replace it with a regular dash.
I am making a regular expression in lex for using it in yacc accepting arithmetic expressions. I want to eliminate blank space when the expression has a blank space I tried \s but the space is not getting eliminated and my yacc then gives syntax error because \s is not an operator.
LEX
%{
#include <stdio.h>
#include "y.tab.h"
int yylval;/*declared extern by yacc code. used to pass info to yacc*/
%}
letter [A-Za-z]
digit ([0-9])*
op "+"|"*"|"("|")"|"/"|"-"
ws [ \t\n\r\s]+$
other .
%%
{ws} { /*Nothing*/ }
{digit} { yylval = atoi(yytext); return NUM;}
{op} { return yytext[0];}
{other} { return yytext[0];}
%%
I'm trying to create a simple parser/compiler, mostly for homework, but eventually for learning purposes and for fun too. I've written both the lexer and the parser file (for an initial subset of commands) and I want to output an AST. However, I'm stuck at a "syntax error" message, even when I'm trying to parse a simple '1+1'. Here is the lexer file:
%{
#include "parser.tab.h"
%}
DIGIT [0-9]
LETTER [a-zA-Z]
%%
[ \t\n] ;
{DIGIT}+ {yylval = atoi(yytext); return NUMBER;}
{LETTER}* { if (strlen(yytext) <= 8){
printf( "<ID, %s> ", yytext );
} else {
yytext[8] = '\0';
printf("WARNING! Long identifier. Truncating to 8 chars\n");
printf( "<ID, %s> ", yytext );
}
}
"+" {printf("Found '+' symbol\n");return(PLUS);}
"-" return(MINUS);
"*" return(TIMES);
"/" return(DIVIDE);
"(" return(LEFT_PARENTHESIS);
")" return(RIGHT_PARENTHESIS);
<<EOF>> return(END_OF_FILE);
%%
int yywrap (void) {return 1;}
And here is the parser file:
%{
#include <stdio.h>
/*#include "tree.h"
#include "treedefs.h"*/
int yylex();
#define YYSTYPE int
%}
%start program
%token NUMBER
%token ID
%token PLUS MINUS TIMES EQUAL
%token LEFT_PARENTHESIS RIGHT_PARENTHESIS
%token LET IN AND
%token END_OF_FILE
%left PLUS MINUS
%left TIMES DIVIDE
%%
program: /* empty */
| exp { printf("Result: %d\n", $1); }
| END_OF_FILE {printf("Encountered EOF\n");}
;
exp: NUMBER { $$ = $1;}
| exp PLUS exp { $$ = $1 + $3; }
| exp TIMES exp { $$ = $1 * $3; }
| '(' exp ')' { $$ = $2;}
;
%%
int yyerror (char *s) {fprintf (stderr, "%s\n", s);
}
Also, I've created a main.c, to keep the main() function separately. You can omit the tree*.h files as they only include functions relative to the AST.
#include <stdio.h>
#include <stdlib.h>
#include "tree.h"
#include "treedefs.h"
int main(int argc, char **argv){
yyparse();
TREE *RootNode = malloc(sizeof(TREE));
return 0;
}
I've read tons of examples but I couldn't find something (VERY) different from what I wrote. What am I doing wrong? Any help, will be greatly appreciated.
Your grammar accepts an expression OR an end of file. So if you give it an expression followed by an end of file, you get an error.
Another problem is that you return the token END_OF_FILE at the end of the input, rather than 0 -- bison is expecting a 0 for the EOF token and will give a syntax error if it doesn't see one at the end of the input.
The easiest fix for both of those is to get rid of the END_OF_FILE token and have the <<EOF>> rule return 0. Then your grammar becomes:
program: /* empty */ { printf("Empty input\n"); }
| exp { printf("Result: %d\n", $1); }
;
...rest of the grammar
Now you have the (potential) issue that your grammar only accepts a single expression. You might want to support multiple expressions separated by newlines or some other separator(; perhaps?), which can be done in a variety of ways.
There are a few problems with the code.
First, your lexer should include this:
%{
#include "parser.tab.h"
extern int yylval; // this line was missing
%}
Second, assuming you want the code to evaluate at the end of a statement, you have to include a rule for the end of a statement. That is, assuming it's to be line-oriented, you'd replace your current whitespace rule with these:
[ \t] {}
[\n] { return 0; }
Third, one of your lines is munged. Instead of this:
printf("WARNING! Long identifier. Truncating to 8 chars\n"$
It should be this:
printf("WARNING! Long identifier. Truncating to 8 chars\n");
I'm currently working on a simple infix-to-postfix compiler for a given grammar. I'm currently at the stage of syntax analysis. I have already written a lexical analyzer, using Flex library, however I'm stuck on a seemingly simple problem. The information below might seem like a lot to process, but I presume the problem is rather basic to anyone with some experience in compiler construction.
Here is my lexer:
%{
#include <stdlib.h>
#include "global.h"
int lineno = 1, tokenval = NONE;
%}
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
%option noinput
%option nounput
%%
[ \t]+ {}
\n {lineno++;}
{digit}+ {tokenval = atoi(yytext);
printf("digit\n");
return NUM;}
{id} {int p;
p = lookup(yytext);
if(p==0){
p = insert(yytext, ID);
}
tokenval = p;
return symtable[p].token;
}
<<EOF>> {return DONE;}
. {tokenval = NONE;
return yytext[0];}
Nothing special here, just defining some tokens and handling them.
And my parser.y file:
%{
#include "global.h"
%}
%token digit
%%
start: line {printf("success!\n");};
line: expr ';' line | expr ;
expr: digit;
%%
void yyerror(char const *s)
{
printf("error\n");
};
int main()
{
yyparse();
return 0;
}
The problem is on the line:
expr: digit;
The compiler has evidently some problem with the digit token, since if I put instead anything constant other than a digit, it all works fine, and expressions like -; or +; will be accepted. I have no idea why is this happening, especially that I'm pretty sure my lexical analyzer works fine.
The global.h file is just a linkage for other files, contains necessary function prototypes and links to any necessary variables:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#define BSIZE 128
#define NONE -1
#define EOS '\0'
#define NUM 256
#define DIV 257
#define MOD 258
#define ID 259
#define DONE 260
extern int tokenval;
extern int lineno;
struct entry
{
char *lexptr;
int token;
};
extern struct entry symtable[];
int insert (char s[], int tok);
void error (char *m) ;
int lookup (char s[]) ;
void init () ;
void parse () ;
int yylex (void) ;
void expr () ;
void term () ;
void factor () ;
void match (int t) ;
void emit (int t, int tval) ;
void yyerror(char const *s);
Your scanner returns NUM when it has found a sequence of digits, not digit. The identifier digit is just used internally in your Flex specification.
Then you have another digit defined as a token in your Bison grammar, but it is not connected in any way to the Flex one.
To fix this, use NUM, both in your Bison grammar and as a return value from the lexer. Don't declare it yourself with #define, but let Bison create those declarations, from your %token definitions. You can use the -d flag to get Bison to output a header file. Run Bison before Flex, and #include Bison's output header file, with NUM in it, in your Flex code.
I can't seem to figure out how to concatenate two string in yacc.
Here is the lex code
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+ {yylval.intval=atoi(yytext); return NR;}
[a-zA-Z]+ {yylval.strval=yytext; return STR;}
"0exit" {return 0;}
[ \t] ;
\n {return 0;}
. {return yytext[0];}
And here i have the basics to add two strings
%{
#include <stdio.h>
#include <string.h>
%}
%union {
int intval;
char* strval;
}
%token STR NR
%type <intval>NR
%type <strval>STR
%type <strval>operatie
%left '+' '-'
%right '='
%start S
%%
S : S operatie {}
| operatie {printf("%s\n",$<strval>$);}
;
operatie : STR '+' STR { char* s=malloc(sizeof(char)*(strlen($1)+strlen($3)+1));
strcpy(s,$1); strcat(s,$3);
$$=s;}
;
%%
int main(){
yyparse();
}
The code works, the problem is that the output is something like this:
If i input
aaaa + bbbb
i get the output
aaaa + bbbbbbbb
The problem is here:
yylval.strval = yytext;
yytext changes with every token and every buffer. Change it to
yylval.strval = strdup(yytext);
yytext is only valid until the lexer starts looking for the next token. So if you want to pass a string token from (f)lex to yacc/bison, you need to strdup it, and the parser code needs to free the copy.
See the bison faq.