C with Bison+Flex Checking File for Rules - c

This is my html.l:
DOC_START "<html>"|"<HTML>"
DOC_END "</html>"|"</HTML>"
SPACE " "
TEXT .
%%
%%
This my html.y:
%{
#include "lex.yy.c"
%}
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
This is my html file:
<HTML>
foo bar
</HTML>
I'm compiling first flex file, after bison file. It gives has no rules error.
I want to check that whether if this file a proper html file as described in Doc statement. And it is expected to give an error or message to stdout. What do we need to do?

You have not followed the specification for a lex program as shown in the manual.
Although you have specified some regular expressions and given them names (in the definitions section), you have not told lex what to do when it find some (in the rules section, which you left empty). Add a rules section that returns a token, like this:
DOC_START "<html>"|"<HTML>"
DOC_END "</html>"|"</HTML>"
SPACE " "
TEXT .
%%
{DOC_START} return DOC_START;
{DOC_END} return DOC_END;
{SPACE} return SPACE;
{TEXT} return TEXT;
%%
Your bison code has not specified the tokens that are coming from lex, so you need to add those:
%{
#include "lex.yy.c"
%}
%token DOC_START DOC_END TEXT SPACE
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
However, if you do it this way the lex code is compiled before the token declarations. To fix this, put the include at the bottom of the file:
%token DOC_START DOC_END TEXT SPACE
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
#include "lex.yy.c"
Nearly there...
Now to output an error message, we need to provide code for the yyerror function. You wanted the output to go to stdout; we'll need the standard IO library stdio.h for that:
%{
#include <stdio.h>
void yyerror(const char* s);
%}
%token DOC_START DOC_END TEXT SPACE
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
void yyerror(const char* s)
{
fprintf(stdout, "Syntax error: %s\n", s);
}
#include "lex.yy.c"
Now I notice that your compiler is following the C99 standard and issuing warnings on implicit-function-declarations. The tools flex and bison sometimes generate code that causes these warnings. These are only warnings and not errors and can be ignored. If you do not want to see them you can put the option -ansi on your gcc compile line.
You code will now run - I tested it.
If you are getting errors like main not defined, you have not supplied the yacc library (-ly on the gcc line), but you could just type up your own main program:
%{
#include <stdio.h>
void yyerror(const char* s);
%}
%token DOC_START DOC_END TEXT SPACE
%%
Doc : DOC_START Other DOC_END
Other : TEXT
| SPACE
%%
void yyerror(const char* s)
{
fprintf(stdout, "Syntax error: %s\n", s);
}
int main (void)
{
return(yyparse());
}
#include "lex.yy.c"
Now you will see that it compiles and runs, but every html file will give you a syntax error. This is because your bison grammar is incorrect. You have only allowed one single space or one single character inside an html file (not a sequence of them). If you can't solve that problem you need to ask another question - or even better read your teachers class notes more carefully!

Related

Why Bison just prints the input?

Bison always prints the input instead of running the action.
I begin with Bison and I try to make it working with the simpler rule as possible.
Lexer
%{
#include <stdio.h>
#include "wip.tab.h"
%}
%%
[\t\n ]+ ;
[a−z]+ { yylval.sval = strdup(yytext); return IDENTIFIER;}
%%
Parser
%{
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(char const *);
FILE *yyin;
%}
%union{
char *sval;
}
%token IDENTIFIER
%%
input:
%empty
| input line
;
line:
'\n'
| IDENTIFIER {printf("OK\n");}
;
%%
int main(void) {
FILE *myfile = fopen("example.wip", "r");
if (!myfile) {
printf("File can't be opened\n");
return -1;
}
yyin = myfile;
yyparse();
}
void yyerror(char const *s) {
fprintf(stderr, "%s\n", s);
}
The "example.wip" input file
hello
I expect the "OK" output in my terminal but the parser just prints the content of the file.
Thanks in advance.
Bison always prints the input instead of running the action.
Bison-generated never print the input unless that's what the actions say. Since none of your actions print anything other than "OK", that can't be what's going on here.
However, by default flex-generated lexers do print the input when they see a character that they don't recognize. To verify that this is what's going on, we can add a rule at the end of your lexer file that prints a proper error message for unrecognized characters:
. { fprintf(stderr, "Unrecognized character: '%c'\n", yytext[0]); }
And sure enough, this will tell us that all the characters in "hello" are unrecognized.
So what's wrong with the [a−z]+ pattern? Why doesn't it match "hello"? What's wrong is the −. It's not a regular ASCII dash, but a Unicode dash that has no special meaning to flex. So flex interprets [a−z] as a character class that can match one of three characters: a, the Unicode dash or z - not as a range from a to z.
To fix this, just replace it with a regular dash.

yyin doesn't read all data in a text file

I have a new problem from the question: Call a function in a Yacc file from another c file So this time, I confront with the problem of yyin function in Lex and Yacc. My codes are following:
calc.l
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[0-9]+ { yylval=atoi(yytext); return NUMBER;}
[ \t];
\n return 0;
. return yytext[0];
%%
calc.y
%{
#include <stdio.h>
#include <string.h>
extern FILE* yyin;
%}
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression {printf("= %d\n",$1);}
;
expression: NUMBER '+' NUMBER {$$=$1+$3;}
| NUMBER '-' NUMBER {$$=$1-$3;}
| NUMBER 'x' NUMBER {$$=$1*$3;}
| NUMBER '/' NUMBER
{ if($3 == 0)
yyerror("Error, cannot divided by zero");
else
$$=$1/$3;
}
| NUMBER {$$=$1;}
;
%%
void parse(FILE* fileInput)
{
yyin= fileInput;
while(feof(yyin)==0)
{
yyparse();
}
}
main.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc,char* argv[])
{
FILE* fileInput;
char inputBuffer[36];
char lineData[36];
if((fileInput=fopen(argv[1],"r"))==NULL)
{
printf("Error reading files, the program terminates immediately\n");
exit(0);
}
parse(fileInput);
}
test.txt
2+1
5+1
1+3
This is how my codes work:
I created main.c to open a file and read it then call a function, parse(), with parameter fileInput.
In calc.y, I set the pointer yyin to be the same as fileInput so the parser can loop and read all lines in a text file (test.txt)
The problem is that yyin didn't read all lines in the text. The result I got is
= 3
The result I should get is
= 3
= 6
= 4
How could I solve this problem. Note that, I want main.c to remain.
This has nothing to do with yyin. Your problem is that your grammar does not recognize a series of statements; it only recognizes one. Having parsed the first line of the file and, ultimately, reduced it to a statement, the parser has nothing more it can do. It has no way to reduce a series of symbols starting with a statement to anything else. Since there is still input left when the parser jams up, it will return an error code to its caller, but the caller ignores that.
To enable the parser to handle multiple statements, you need a rule for that. Presumably you'll want that resulting symbol to be the start symbol, instead of statement. This would be a reasonable rule:
statements: statement
| statements statement
;
Your lexical analyzer returns 0 when it reads a newline.
All grammars from Yacc or Bison recognize 0 as meaning EOF (end of file). You told your grammar that you'd reached the end of file; it believed you.
Don't return 0 for newline. You'll probably need to have your first grammar rule iterate (accept a sequence of statements) — as suggested by John Bollinger in another answer.

lex yacc compare two strings

I'm really new in lex and yacc, I'd like to write an extremely simply program that asks input for a string, stores it in a variable, and checks if the same value is inserted once again. Lets say:
input1 = 'abc'
input2 = 'def'
input3 = 'ghi'
input4 = 'def'
STOP input2 equals input4
part of my lex file:
%option noyywrap
%{
#include <stdlib.h>
#include <string.h>
%}
alpha [a-zA-Z]
%%
{alpha}* return ID;
part of my yacc file
%{
# include <stdio.h>
# include <ctype.h>
# include <string.h>
%}
%union {
char* lexeme;
}
%token ID
%%
all the inputs should be matched within the ID token.
Not sure about the return ID;
I would have returned *yytext which contains your token in a char*.
You can actually take a look at this simple calculator example
About your issue, you may need to create/implement a list of char* to store each of your input token during the parsing, and then check if the current one belongs to the list. As this requires more work, the examples above (and on all the website) should help.

What are the keywords '%type' and '%token' used for in C?

While I was digging into OpenNTPD source code files, I noticed new keywords and syntaxs that I've never seen in any C code before such as }%, %%, %type and %token in a file named parse.y:
%{
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
...
%}
%token LISTEN ON
%token SERVER SERVERS SENSOR CORRECTION RTABLE REFID WEIGHT
%token ERROR
%token <v.string> STRING
%token <v.number>
....
grammar : /* empty */
| grammar '\n'
| grammar main '\n'
| grammar error '\n' { file->errors++; }
;
main : LISTEN ON address listen_opts {
struct listen_addr *la;
struct ntp_addr *h, *next;
if ($3->a)
...
Most of the file's contents have the usual C syntax except these keywords. Does someone know what these keywords are and what they are used for?
My guess is that this is Yacc code (i.e. the definition of a grammar), not plain C. This is a notation similar to BNF.
And if you look at *.l files, you might also see a lot of C code, mixed with %%, %x, %s, %option etc. Then it's a lexer input file, which often is accompanied by a yacc *.y file.

what is the use of tokens.h when I am programming a lexer?

I am programming a lexer in C and I read somewhere about the header file tokens.h. Is it there? If so, what is its use?
tokens.h is a file generated by yacc or bison that contains a list of tokens within your grammar.
Your yacc/bison input file may contain token declarations like:
%token INTEGER
%token ID
%token STRING
%token SPACE
Running this file through yacc/bison will result in a tokens.h file that contains preprocessor definitions for these tokens:
/* Something like this... */
#define INTEGER (1)
#define ID (2)
#define STRING (3)
Probably, tokens.h is a file generated by the parser generator (Yacc/Bison) containing token definitions so you can return tokens from the lexer to the parser.
With Lex/Flex and Yacc/Bison, it works like this:
parser.y:
%token FOO
%token BAR
%%
start: FOO BAR;
%%
lexer.l:
%{
#include "tokens.h"
%}
%%
foo {return FOO;}
bar {return BAR;}
%%

Resources