I have a new problem from the question: Call a function in a Yacc file from another c file So this time, I confront with the problem of yyin function in Lex and Yacc. My codes are following:
calc.l
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[0-9]+ { yylval=atoi(yytext); return NUMBER;}
[ \t];
\n return 0;
. return yytext[0];
%%
calc.y
%{
#include <stdio.h>
#include <string.h>
extern FILE* yyin;
%}
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression {printf("= %d\n",$1);}
;
expression: NUMBER '+' NUMBER {$$=$1+$3;}
| NUMBER '-' NUMBER {$$=$1-$3;}
| NUMBER 'x' NUMBER {$$=$1*$3;}
| NUMBER '/' NUMBER
{ if($3 == 0)
yyerror("Error, cannot divided by zero");
else
$$=$1/$3;
}
| NUMBER {$$=$1;}
;
%%
void parse(FILE* fileInput)
{
yyin= fileInput;
while(feof(yyin)==0)
{
yyparse();
}
}
main.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc,char* argv[])
{
FILE* fileInput;
char inputBuffer[36];
char lineData[36];
if((fileInput=fopen(argv[1],"r"))==NULL)
{
printf("Error reading files, the program terminates immediately\n");
exit(0);
}
parse(fileInput);
}
test.txt
2+1
5+1
1+3
This is how my codes work:
I created main.c to open a file and read it then call a function, parse(), with parameter fileInput.
In calc.y, I set the pointer yyin to be the same as fileInput so the parser can loop and read all lines in a text file (test.txt)
The problem is that yyin didn't read all lines in the text. The result I got is
= 3
The result I should get is
= 3
= 6
= 4
How could I solve this problem. Note that, I want main.c to remain.
This has nothing to do with yyin. Your problem is that your grammar does not recognize a series of statements; it only recognizes one. Having parsed the first line of the file and, ultimately, reduced it to a statement, the parser has nothing more it can do. It has no way to reduce a series of symbols starting with a statement to anything else. Since there is still input left when the parser jams up, it will return an error code to its caller, but the caller ignores that.
To enable the parser to handle multiple statements, you need a rule for that. Presumably you'll want that resulting symbol to be the start symbol, instead of statement. This would be a reasonable rule:
statements: statement
| statements statement
;
Your lexical analyzer returns 0 when it reads a newline.
All grammars from Yacc or Bison recognize 0 as meaning EOF (end of file). You told your grammar that you'd reached the end of file; it believed you.
Don't return 0 for newline. You'll probably need to have your first grammar rule iterate (accept a sequence of statements) — as suggested by John Bollinger in another answer.
Related
Bison always prints the input instead of running the action.
I begin with Bison and I try to make it working with the simpler rule as possible.
Lexer
%{
#include <stdio.h>
#include "wip.tab.h"
%}
%%
[\t\n ]+ ;
[a−z]+ { yylval.sval = strdup(yytext); return IDENTIFIER;}
%%
Parser
%{
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(char const *);
FILE *yyin;
%}
%union{
char *sval;
}
%token IDENTIFIER
%%
input:
%empty
| input line
;
line:
'\n'
| IDENTIFIER {printf("OK\n");}
;
%%
int main(void) {
FILE *myfile = fopen("example.wip", "r");
if (!myfile) {
printf("File can't be opened\n");
return -1;
}
yyin = myfile;
yyparse();
}
void yyerror(char const *s) {
fprintf(stderr, "%s\n", s);
}
The "example.wip" input file
hello
I expect the "OK" output in my terminal but the parser just prints the content of the file.
Thanks in advance.
Bison always prints the input instead of running the action.
Bison-generated never print the input unless that's what the actions say. Since none of your actions print anything other than "OK", that can't be what's going on here.
However, by default flex-generated lexers do print the input when they see a character that they don't recognize. To verify that this is what's going on, we can add a rule at the end of your lexer file that prints a proper error message for unrecognized characters:
. { fprintf(stderr, "Unrecognized character: '%c'\n", yytext[0]); }
And sure enough, this will tell us that all the characters in "hello" are unrecognized.
So what's wrong with the [a−z]+ pattern? Why doesn't it match "hello"? What's wrong is the −. It's not a regular ASCII dash, but a Unicode dash that has no special meaning to flex. So flex interprets [a−z] as a character class that can match one of three characters: a, the Unicode dash or z - not as a range from a to z.
To fix this, just replace it with a regular dash.
I'm trying to create a simple parser/compiler, mostly for homework, but eventually for learning purposes and for fun too. I've written both the lexer and the parser file (for an initial subset of commands) and I want to output an AST. However, I'm stuck at a "syntax error" message, even when I'm trying to parse a simple '1+1'. Here is the lexer file:
%{
#include "parser.tab.h"
%}
DIGIT [0-9]
LETTER [a-zA-Z]
%%
[ \t\n] ;
{DIGIT}+ {yylval = atoi(yytext); return NUMBER;}
{LETTER}* { if (strlen(yytext) <= 8){
printf( "<ID, %s> ", yytext );
} else {
yytext[8] = '\0';
printf("WARNING! Long identifier. Truncating to 8 chars\n");
printf( "<ID, %s> ", yytext );
}
}
"+" {printf("Found '+' symbol\n");return(PLUS);}
"-" return(MINUS);
"*" return(TIMES);
"/" return(DIVIDE);
"(" return(LEFT_PARENTHESIS);
")" return(RIGHT_PARENTHESIS);
<<EOF>> return(END_OF_FILE);
%%
int yywrap (void) {return 1;}
And here is the parser file:
%{
#include <stdio.h>
/*#include "tree.h"
#include "treedefs.h"*/
int yylex();
#define YYSTYPE int
%}
%start program
%token NUMBER
%token ID
%token PLUS MINUS TIMES EQUAL
%token LEFT_PARENTHESIS RIGHT_PARENTHESIS
%token LET IN AND
%token END_OF_FILE
%left PLUS MINUS
%left TIMES DIVIDE
%%
program: /* empty */
| exp { printf("Result: %d\n", $1); }
| END_OF_FILE {printf("Encountered EOF\n");}
;
exp: NUMBER { $$ = $1;}
| exp PLUS exp { $$ = $1 + $3; }
| exp TIMES exp { $$ = $1 * $3; }
| '(' exp ')' { $$ = $2;}
;
%%
int yyerror (char *s) {fprintf (stderr, "%s\n", s);
}
Also, I've created a main.c, to keep the main() function separately. You can omit the tree*.h files as they only include functions relative to the AST.
#include <stdio.h>
#include <stdlib.h>
#include "tree.h"
#include "treedefs.h"
int main(int argc, char **argv){
yyparse();
TREE *RootNode = malloc(sizeof(TREE));
return 0;
}
I've read tons of examples but I couldn't find something (VERY) different from what I wrote. What am I doing wrong? Any help, will be greatly appreciated.
Your grammar accepts an expression OR an end of file. So if you give it an expression followed by an end of file, you get an error.
Another problem is that you return the token END_OF_FILE at the end of the input, rather than 0 -- bison is expecting a 0 for the EOF token and will give a syntax error if it doesn't see one at the end of the input.
The easiest fix for both of those is to get rid of the END_OF_FILE token and have the <<EOF>> rule return 0. Then your grammar becomes:
program: /* empty */ { printf("Empty input\n"); }
| exp { printf("Result: %d\n", $1); }
;
...rest of the grammar
Now you have the (potential) issue that your grammar only accepts a single expression. You might want to support multiple expressions separated by newlines or some other separator(; perhaps?), which can be done in a variety of ways.
There are a few problems with the code.
First, your lexer should include this:
%{
#include "parser.tab.h"
extern int yylval; // this line was missing
%}
Second, assuming you want the code to evaluate at the end of a statement, you have to include a rule for the end of a statement. That is, assuming it's to be line-oriented, you'd replace your current whitespace rule with these:
[ \t] {}
[\n] { return 0; }
Third, one of your lines is munged. Instead of this:
printf("WARNING! Long identifier. Truncating to 8 chars\n"$
It should be this:
printf("WARNING! Long identifier. Truncating to 8 chars\n");
I'm doing a translation of acronyms. That is, if it finds 'OMS' print 'Organización Mundial del trabajo', but once I compile and run the program it runs infinitely.
I'm using Windows as a development environment. I have seen examples but I can't see where the error is. Here is the code:
%option noyywrap
%{
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
%}
%%
OMS {fprintf(yyout,"Organización Mundial del trabajo");}
%%
int main(int argc, char *argv[]) {
FILE*yyin=fopen(argv[1],"r");
FILE*yyout=fopen(argv[2],"w");
yylex();
fclose(yyin);
fclose(yyout);
return 0;
}
FILE*yyin=fopen(argv[1],"r");
FILE*yyout=fopen(argv[2],"w");
These lines declare and initialize two local variables named yyin and yyout. They are closed in the end of the function, but otherwise remain unused (that is, no one does any input/output with them). They are not available to the rest of the program. Meanwhile, global variables yyin and yyout, which are entirely separate from these local variables, remain untouched.
What you need to do is simply remove FILE* from both lines:
yyin=fopen(argv[1],"r");
yyout=fopen(argv[2],"w");
Now the names yyin and yyout refer to the global variables that are known to the rest of the program.
I'm really new in lex and yacc, I'd like to write an extremely simply program that asks input for a string, stores it in a variable, and checks if the same value is inserted once again. Lets say:
input1 = 'abc'
input2 = 'def'
input3 = 'ghi'
input4 = 'def'
STOP input2 equals input4
part of my lex file:
%option noyywrap
%{
#include <stdlib.h>
#include <string.h>
%}
alpha [a-zA-Z]
%%
{alpha}* return ID;
part of my yacc file
%{
# include <stdio.h>
# include <ctype.h>
# include <string.h>
%}
%union {
char* lexeme;
}
%token ID
%%
all the inputs should be matched within the ID token.
Not sure about the return ID;
I would have returned *yytext which contains your token in a char*.
You can actually take a look at this simple calculator example
About your issue, you may need to create/implement a list of char* to store each of your input token during the parsing, and then check if the current one belongs to the list. As this requires more work, the examples above (and on all the website) should help.
While I was digging into OpenNTPD source code files, I noticed new keywords and syntaxs that I've never seen in any C code before such as }%, %%, %type and %token in a file named parse.y:
%{
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
...
%}
%token LISTEN ON
%token SERVER SERVERS SENSOR CORRECTION RTABLE REFID WEIGHT
%token ERROR
%token <v.string> STRING
%token <v.number>
....
grammar : /* empty */
| grammar '\n'
| grammar main '\n'
| grammar error '\n' { file->errors++; }
;
main : LISTEN ON address listen_opts {
struct listen_addr *la;
struct ntp_addr *h, *next;
if ($3->a)
...
Most of the file's contents have the usual C syntax except these keywords. Does someone know what these keywords are and what they are used for?
My guess is that this is Yacc code (i.e. the definition of a grammar), not plain C. This is a notation similar to BNF.
And if you look at *.l files, you might also see a lot of C code, mixed with %%, %x, %s, %option etc. Then it's a lexer input file, which often is accompanied by a yacc *.y file.