Why Bison just prints the input? - c

Bison always prints the input instead of running the action.
I begin with Bison and I try to make it working with the simpler rule as possible.
Lexer
%{
#include <stdio.h>
#include "wip.tab.h"
%}
%%
[\t\n ]+ ;
[a−z]+ { yylval.sval = strdup(yytext); return IDENTIFIER;}
%%
Parser
%{
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(char const *);
FILE *yyin;
%}
%union{
char *sval;
}
%token IDENTIFIER
%%
input:
%empty
| input line
;
line:
'\n'
| IDENTIFIER {printf("OK\n");}
;
%%
int main(void) {
FILE *myfile = fopen("example.wip", "r");
if (!myfile) {
printf("File can't be opened\n");
return -1;
}
yyin = myfile;
yyparse();
}
void yyerror(char const *s) {
fprintf(stderr, "%s\n", s);
}
The "example.wip" input file
hello
I expect the "OK" output in my terminal but the parser just prints the content of the file.
Thanks in advance.

Bison always prints the input instead of running the action.
Bison-generated never print the input unless that's what the actions say. Since none of your actions print anything other than "OK", that can't be what's going on here.
However, by default flex-generated lexers do print the input when they see a character that they don't recognize. To verify that this is what's going on, we can add a rule at the end of your lexer file that prints a proper error message for unrecognized characters:
. { fprintf(stderr, "Unrecognized character: '%c'\n", yytext[0]); }
And sure enough, this will tell us that all the characters in "hello" are unrecognized.
So what's wrong with the [a−z]+ pattern? Why doesn't it match "hello"? What's wrong is the −. It's not a regular ASCII dash, but a Unicode dash that has no special meaning to flex. So flex interprets [a−z] as a character class that can match one of three characters: a, the Unicode dash or z - not as a range from a to z.
To fix this, just replace it with a regular dash.

Related

Why does yytext skip the first imput in YACC?

I have been working with a sample problem to construct a three address code for an expression. But to my surprise, YACC seems to skip my first input symbol. I will attach an image with the output to make it clear.
The rules aren't too complicated so I don't seem to understand where the issue is.
Here is my lex file:
%{
#include"y.tab.h"
%}
%%
[a-zA-Z]+ return ID;
[0-9]+ return NUM;
. return yytext[0];
%%
int yywrap(){return 1;}
Here is my yacc file:
%{
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
char st[50][50];
extern char * yytext;
int top=-1;char t[5]="0";char temp[5];
void push();
void code();
%}
%token NUM ID
%left '+' '-'
%left '*' '/'
%%
S:' 'E
|' 'A'='E
;
A:ID{push();printf("yytext is %s\n",yytext);}
;
E:E'+'{push();}T{code();}
|E'-'{push();}T{code();}
|T
;
T:T'*'{push();}F{code();}
|T'/'{push();}F{code();}
|F
;
F:ID{push();}
|NUM{push();}
;
%%
void push(){strcpy(st[++top],yytext);}
void code(){
strcpy(temp,"t");
strcat(temp,t);
t[0]++;
printf("%s = %s %s %s \n",temp,st[top-2],st[top-1],st[top]);
top=top-2;
strcpy(st[top],temp);
}
int main(){yyparse();}
int yyerror(){exit(0);}
i expect the print in the A:ID production to print the ID entered, but it is printing the '=' instead.
here is my output:
my output
In order to be sure that A was seen, yacc had to advance (look ahead) to see =. That overwrites your first token in yytext.

C with Bison+Flex Checking File for Rules

This is my html.l:
DOC_START "<html>"|"<HTML>"
DOC_END "</html>"|"</HTML>"
SPACE " "
TEXT .
%%
%%
This my html.y:
%{
#include "lex.yy.c"
%}
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
This is my html file:
<HTML>
foo bar
</HTML>
I'm compiling first flex file, after bison file. It gives has no rules error.
I want to check that whether if this file a proper html file as described in Doc statement. And it is expected to give an error or message to stdout. What do we need to do?
You have not followed the specification for a lex program as shown in the manual.
Although you have specified some regular expressions and given them names (in the definitions section), you have not told lex what to do when it find some (in the rules section, which you left empty). Add a rules section that returns a token, like this:
DOC_START "<html>"|"<HTML>"
DOC_END "</html>"|"</HTML>"
SPACE " "
TEXT .
%%
{DOC_START} return DOC_START;
{DOC_END} return DOC_END;
{SPACE} return SPACE;
{TEXT} return TEXT;
%%
Your bison code has not specified the tokens that are coming from lex, so you need to add those:
%{
#include "lex.yy.c"
%}
%token DOC_START DOC_END TEXT SPACE
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
However, if you do it this way the lex code is compiled before the token declarations. To fix this, put the include at the bottom of the file:
%token DOC_START DOC_END TEXT SPACE
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
#include "lex.yy.c"
Nearly there...
Now to output an error message, we need to provide code for the yyerror function. You wanted the output to go to stdout; we'll need the standard IO library stdio.h for that:
%{
#include <stdio.h>
void yyerror(const char* s);
%}
%token DOC_START DOC_END TEXT SPACE
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
void yyerror(const char* s)
{
fprintf(stdout, "Syntax error: %s\n", s);
}
#include "lex.yy.c"
Now I notice that your compiler is following the C99 standard and issuing warnings on implicit-function-declarations. The tools flex and bison sometimes generate code that causes these warnings. These are only warnings and not errors and can be ignored. If you do not want to see them you can put the option -ansi on your gcc compile line.
You code will now run - I tested it.
If you are getting errors like main not defined, you have not supplied the yacc library (-ly on the gcc line), but you could just type up your own main program:
%{
#include <stdio.h>
void yyerror(const char* s);
%}
%token DOC_START DOC_END TEXT SPACE
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
void yyerror(const char* s)
{
fprintf(stdout, "Syntax error: %s\n", s);
}
int main (void)
{
return(yyparse());
}
#include "lex.yy.c"
Now you will see that it compiles and runs, but every html file will give you a syntax error. This is because your bison grammar is incorrect. You have only allowed one single space or one single character inside an html file (not a sequence of them). If you can't solve that problem you need to ask another question - or even better read your teachers class notes more carefully!

yyin doesn't read all data in a text file

I have a new problem from the question: Call a function in a Yacc file from another c file So this time, I confront with the problem of yyin function in Lex and Yacc. My codes are following:
calc.l
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[0-9]+ { yylval=atoi(yytext); return NUMBER;}
[ \t];
\n return 0;
. return yytext[0];
%%
calc.y
%{
#include <stdio.h>
#include <string.h>
extern FILE* yyin;
%}
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression {printf("= %d\n",$1);}
;
expression: NUMBER '+' NUMBER {$$=$1+$3;}
| NUMBER '-' NUMBER {$$=$1-$3;}
| NUMBER 'x' NUMBER {$$=$1*$3;}
| NUMBER '/' NUMBER
{ if($3 == 0)
yyerror("Error, cannot divided by zero");
else
$$=$1/$3;
}
| NUMBER {$$=$1;}
;
%%
void parse(FILE* fileInput)
{
yyin= fileInput;
while(feof(yyin)==0)
{
yyparse();
}
}
main.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc,char* argv[])
{
FILE* fileInput;
char inputBuffer[36];
char lineData[36];
if((fileInput=fopen(argv[1],"r"))==NULL)
{
printf("Error reading files, the program terminates immediately\n");
exit(0);
}
parse(fileInput);
}
test.txt
2+1
5+1
1+3
This is how my codes work:
I created main.c to open a file and read it then call a function, parse(), with parameter fileInput.
In calc.y, I set the pointer yyin to be the same as fileInput so the parser can loop and read all lines in a text file (test.txt)
The problem is that yyin didn't read all lines in the text. The result I got is
= 3
The result I should get is
= 3
= 6
= 4
How could I solve this problem. Note that, I want main.c to remain.
This has nothing to do with yyin. Your problem is that your grammar does not recognize a series of statements; it only recognizes one. Having parsed the first line of the file and, ultimately, reduced it to a statement, the parser has nothing more it can do. It has no way to reduce a series of symbols starting with a statement to anything else. Since there is still input left when the parser jams up, it will return an error code to its caller, but the caller ignores that.
To enable the parser to handle multiple statements, you need a rule for that. Presumably you'll want that resulting symbol to be the start symbol, instead of statement. This would be a reasonable rule:
statements: statement
| statements statement
;
Your lexical analyzer returns 0 when it reads a newline.
All grammars from Yacc or Bison recognize 0 as meaning EOF (end of file). You told your grammar that you'd reached the end of file; it believed you.
Don't return 0 for newline. You'll probably need to have your first grammar rule iterate (accept a sequence of statements) — as suggested by John Bollinger in another answer.

Cannot find cause of 'syntax error' message in Bison

I'm trying to create a simple parser/compiler, mostly for homework, but eventually for learning purposes and for fun too. I've written both the lexer and the parser file (for an initial subset of commands) and I want to output an AST. However, I'm stuck at a "syntax error" message, even when I'm trying to parse a simple '1+1'. Here is the lexer file:
%{
#include "parser.tab.h"
%}
DIGIT [0-9]
LETTER [a-zA-Z]
%%
[ \t\n] ;
{DIGIT}+ {yylval = atoi(yytext); return NUMBER;}
{LETTER}* { if (strlen(yytext) <= 8){
printf( "<ID, %s> ", yytext );
} else {
yytext[8] = '\0';
printf("WARNING! Long identifier. Truncating to 8 chars\n");
printf( "<ID, %s> ", yytext );
}
}
"+" {printf("Found '+' symbol\n");return(PLUS);}
"-" return(MINUS);
"*" return(TIMES);
"/" return(DIVIDE);
"(" return(LEFT_PARENTHESIS);
")" return(RIGHT_PARENTHESIS);
<<EOF>> return(END_OF_FILE);
%%
int yywrap (void) {return 1;}
And here is the parser file:
%{
#include <stdio.h>
/*#include "tree.h"
#include "treedefs.h"*/
int yylex();
#define YYSTYPE int
%}
%start program
%token NUMBER
%token ID
%token PLUS MINUS TIMES EQUAL
%token LEFT_PARENTHESIS RIGHT_PARENTHESIS
%token LET IN AND
%token END_OF_FILE
%left PLUS MINUS
%left TIMES DIVIDE
%%
program: /* empty */
| exp { printf("Result: %d\n", $1); }
| END_OF_FILE {printf("Encountered EOF\n");}
;
exp: NUMBER { $$ = $1;}
| exp PLUS exp { $$ = $1 + $3; }
| exp TIMES exp { $$ = $1 * $3; }
| '(' exp ')' { $$ = $2;}
;
%%
int yyerror (char *s) {fprintf (stderr, "%s\n", s);
}
Also, I've created a main.c, to keep the main() function separately. You can omit the tree*.h files as they only include functions relative to the AST.
#include <stdio.h>
#include <stdlib.h>
#include "tree.h"
#include "treedefs.h"
int main(int argc, char **argv){
yyparse();
TREE *RootNode = malloc(sizeof(TREE));
return 0;
}
I've read tons of examples but I couldn't find something (VERY) different from what I wrote. What am I doing wrong? Any help, will be greatly appreciated.
Your grammar accepts an expression OR an end of file. So if you give it an expression followed by an end of file, you get an error.
Another problem is that you return the token END_OF_FILE at the end of the input, rather than 0 -- bison is expecting a 0 for the EOF token and will give a syntax error if it doesn't see one at the end of the input.
The easiest fix for both of those is to get rid of the END_OF_FILE token and have the <<EOF>> rule return 0. Then your grammar becomes:
program: /* empty */ { printf("Empty input\n"); }
| exp { printf("Result: %d\n", $1); }
;
...rest of the grammar
Now you have the (potential) issue that your grammar only accepts a single expression. You might want to support multiple expressions separated by newlines or some other separator(; perhaps?), which can be done in a variety of ways.
There are a few problems with the code.
First, your lexer should include this:
%{
#include "parser.tab.h"
extern int yylval; // this line was missing
%}
Second, assuming you want the code to evaluate at the end of a statement, you have to include a rule for the end of a statement. That is, assuming it's to be line-oriented, you'd replace your current whitespace rule with these:
[ \t] {}
[\n] { return 0; }
Third, one of your lines is munged. Instead of this:
printf("WARNING! Long identifier. Truncating to 8 chars\n"$
It should be this:
printf("WARNING! Long identifier. Truncating to 8 chars\n");

Lex rule for C preprocessor directive

I am writing a lex program to tokenize a C program. I've written the following rule to match a C preprocessor directive
#.* {printf("\n%s is a PREPROCESSOR DIRECTIVE",yytext);}
But when I use a file as an input to yyin, pre-processor directives in the file are matched by yytext displayed is empty
e.g I get
is a PREPROCESSOR DIRECTIVE
There is no problem when yyin is stdin but this arises only when a file is input. Is there an alternate LEX rule?
Focus on the fact that it doesn't work with a file instead of the lex specification, because that is more likely to cause the problem. The printf in the lex file should always at least print the #. The following does work with a file:
%{
#include <stdio.h>
%}
%%
#.* { printf("'%s' preproc\n", yytext); }
%%
int yywrap(void)
{
return 1;
}
int main(int argc, char ** argv)
{
if (argc > 1)
{
if ((yyin = fopen(argv[1], "r")) == NULL)
{
fprintf(stderr, "Can't open `%s'.\n", argv[1]);
exit(1);
}
}
return (yylex());
}

Resources