So we have a tutorial on flex,bison before we start our complation techniques course at university.
The following test should be split into lines and newlines
testtest test data
second line in the data
another line without a trailing newline
This is what my parser should output:
Line: testtest test data
NL
Line: second line in the data
NL
Line: another line without a trailing newline
When im running following
cat test.txt | ./parser
This returns:
LINE: testtest test data
It's a bad: syntax error
This is in my .y file:
%{
#include<stdio.h>
int yylex(); /* Supress C99 warning on OSX */
extern char *yytext; /* Correct for Flex */
unsigned int total;
%}
%token LINE
%token NL
%%
line : LINE {printf("LINE: %s\n", yytext);}
;
newline : NL {printf("NL\n");}
;
And this is in my binary.flex file:
%top{
#define YYSTYPE int
#include "binary.tab.h" /* Token values generated by bison */
}
%option noyywrap
%%
[^\n\r/]+ return LINE;
\n return NL;
%%
So, any ideas to solve this problem ?
PS: This is my .c file
#include<stdio.h>
#include "binary.tab.h"
extern unsigned int total;
int yyerror(char *c)
{
printf("It's a bad: %s\n", c);
return 0;
}
int main(int argc, char **argv)
{
if(!yyparse())
printf("It's a mario time: %d\n",total);
return 0;
}
Your bison grammar recognizes precisely one LINE (without a newline) because the bison grammar recognizes the first non-terminal. Just that, and no more.
If you want to recognize multiples lines, each consisting of a LINE and possibly a NL, you'll need to add a definition for an input consisting of multiple lines, each consisting of ... . I'm not sure why you would use bison for this, though, since the original problem seems easy to solve with just flex.
By the way, if your input file includes a \r character, none of your flex patterns will recognize it (the flex-generated default rule will catch it, but that is almost never what you want). Use %option nodefault so that you get a warning about this sort of error. And react when you see warnings: you will have seen several when you ran bison on your bison file, I'm sure.
Related
Running the code below in iTerm2 bash. Code file was created using Vim.
/* just like Unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
. { chars++; }
%%
main(int argc, char **argv)
{
yylex();
printf("%8d%8d%8d\n", lines, words, chars);
}
I ran commands
$flex fb1-1.1
$cc lex.yy.c -lfl
This is the error that it returns
fb1-1.1:17:1: warning: type specifier missing, defaults to 'int'
[-Wimplicit-int]
main(int argc, char **argv)
^
1 warning generated.
ld: library not found for -lfl
clang: error: linker command failed with exit code 1 (use -v to see invocation)
EDIT: Works now. Changed the main() to
int main(int argc, char* argv[])
Also ran changed -lfl to -ll
$flex fb1-1.1
$cc lex.yy.c -ll
$./a.out
this is a text
^D
1 4 15
Assembled from comments (because it was easier than finding a dupe):
In modern C (that is, C from this century), all functions need a return type and the only two legal prototypes for main are:
int main(void)
int main(int argc, char* argv[])
An obsolescent way to write the first is int main().
On Max OS, the flex distro doesn't include libfl.a. It comes with libl.a. So use -ll instead of -lfl. But much better is to avoid the problem by telling flex not to require yywrap by putting the following declaration in your prologue:
%option noyywrap
Even better is to use the following:
%option noinput nounput noyywrap nodefault
noinput and nounput will avoid "unused function" warnings when you compile with warnings enabled (which you should always do). nodefault tells flex to not insert a default action, and to produce a warning if one would be necessary. The default action is to echo the unmatched character on stdout, which is usually undesirable and often confusing.
I am trying to understand lex/yacc and currently I am failing at hello world. I probably messed something up, somewhere, but I can't seem to find it.
Also, I am not experienced with C language or with lex/flex/yacc/bison so this is all new to me.
test.l file;
%option noyywrap
%{
#include <stdio.h>
%}
%%
"hey" printf("hello!");
%%
int main()
{
return 0;
}
I compile this on windows, with the commands;
lex test.l
This returns lex.yy.c file without errors or warnings.
I then compile with;
cc lex.yy.c
Which without errors or warnings, creates the a.exe as supposed too.
When i then run the file with input from another file;
a.exe < input
It returns nothing.
Input file;
"hey"
Any information is welcome, since every single guide I found either creates errors (when literally copy pasted, even after clean install and guided-install) or is simply outdated or listed for "windows" while it uses commands that are non-windows >.<
It's the double quotes in the "hey" in your .l file they actually don't mean "hey" they mean hey so if you change your input file to just say hey rather than "hey" your code should work. If you want to parse " then your rule should be: "\"hey\""
Also lex should auto include stdio.h so you probably don't need it.
LOL you forgot to call yylex();
So:
%option noyywrap
%%
"hey" printf("hello!");
%%
int main()
{
yylex();
return 0;
}
Important subtlety
You will see this behaviour occur and you may not notice it right away but
Your original code will give you and output. The match will occur even with \"hey\" output and you will get:
"hello!"
Notice the "'s.
That's because lex injects a default rule for any character matching to just spit it back out, and because your "hey" rule matches hey and you have "'s around the input the quotes come out in the "hello!"
Your main does nothing - you need to call the lexer.
int main()
{
yylex();
return 0;
}
I am super new to FLEX, I'm trying to write a simple FLEX lex file, but for some reason I get the error in the header.
The complete error is:
bash syntax error near unexpected token '('
The command I am running is:
flex sample.lex
OR
flex sample.l
Where am I going wrong here?
%{
union{
int val;
char name [30];
char str [80];
}yylval;
#include <stdio.h>
%}
%%
. ECHO;
%%
int main(int argc, char **argv)
{
if (argc != 2) {
printf("no input file>\n");
exit (1);
}
printf("token lexeme attribute\n");
printf("--------------------------\n");
yyin=fopen(argv[1], "r");
if(yyin!=0)
{ printf("file opened");
fclose(yyin);}
exit (0);
}
int yywrap () { return 1; }
Are you running Flex on the file, and then GCC? I was able to compile this. First, run
flex test.l
This should generate a file name "lex.yy.c", which is C code generated by Flex. This file contains code to scan the input, and the C code in your .l file. Then running
gcc lex.yy.c
should produce an executable "a.out". Running
./a.out test.txt
produced the output you're printing, including "file opened". As was pointed out, main does not have a return type, but I guess my compiler was able to figure it out.
However, even if you can compile this, it looks like you are misunderstanding a few aspects of Flex.
union{
int val;
char name [30];
char str [80];
}yylval;
is a regular C union. Bison's %union declaration is used when a semantic value can have more than one type, and is not the same thing. %union should not be in the .l file.
Also, you need to call yylex() to actually start scanning the input.
Right now, I would suggest going to the Flex manual and doing some of the starter examples: http://westes.github.io/flex/manual/Simple-Examples.html. If you are using Flex and Bison together, you should also go through some of the Bison examples: http://www.gnu.org/software/bison/manual/html_node/Examples.html#Examples
I am writing a lexical analyzer here is the code:
%{
#include <stdio.h>
void showToken(char*);
%}
%%
int main(){
void showToken(char* name){
printf("<%s,%s>",name,yytext);
}
return 0;
}
%%
I am getting the following :
~/hedor1>cc -c -o lexical.o lexical.c
lexical.l:40: error: expected identifier or â(â before â%â token
I cant find where is the problem and moreover in the CODE SECTION must I write :
int main(){}
what happens if I don't write the main function above?
Primary problem
You can only have two %% lines in a Lex (Flex) analyzer.
...definitions...
%%
...lexical patterns...
%%
...everything else...
The programs Lex and Flex simply copy the content of the file after the second %% verbatim into the generated C code. And C doesn't like %% at any time.
Nitpick
You shouldn't nest functions inside each other like you're trying to with:
int main(){
void showToken(char* name){
printf("<%s,%s>",name,yytext);
}
return 0;
}
You need to separate main() from showToken(). (There is a GCC-specific extension that does allow nested functions. Don't use it.)
Also, when you have a line number in an error message, it is helpful to insert a comment to identify the line in the source. Or describe the line that is identified. But we shouldn't have to count the lines in your code, even if the error is in line 1...well, maybe lines 1-3 aren't too critical, but there is a fuzzy breakpoint after which identifying the line is important. By the time is has reached the teens, it is close to essential; the first 5 lines probably aren't crucial; in between (6-12) it's generally better to indicate the line number.
I'm trying to understand flex/bison, but the documentation is a bit difficult for me, and I've probably grossly misunderstood something. Here's a test case: http://namakajiri.net/misc/bison_charlit_test/
File "a" contains the single character 'a'. "foo.y" has a trivial grammar like this:
%%
file: 'a' ;
The generated parser can't parse file "a"; it gives a syntax error.
The grammar "bar.y" is almost the same, only I changed the character literal for a named token:
%token TOK_A;
%%
file: TOK_A;
and then in bar.lex:
a { return TOK_A; }
This one works just fine.
What am I doing wrong in trying to use character literals directly as bison terminals, like in the docs?
I'd like my grammar to look like "statement: selector '{' property ':' value ';' '}'" and not "statement: selector LBRACE property COLON value SEMIC RBRACE"...
I'm running bison 2.5 and flex 2.5.35 in debian wheezy.
Rewrite
The problem is a runtime problem, not a compile time problem.
The trouble is that you have two radically different lexical analyzers.
The bar.lex analyzer recognizes an a in the input and returns it as a TOK_A and ignores everything else.
The foo.lex analyzer echoes every single character, but that's all.
foo.lex — as written
%{
#include "foo.tab.h"
%}
%%
foo.lex — equivalent
%{
#include "foo.tab.h"
%}
%%
. { ECHO; }
foo.lex — required
%{
#include "foo.tab.h"
%}
%%
. { return *yytext; }
Working code
Here's some working code with diagnostic printing in place.
foo-lex.l
%%
. { printf("Flex: %d\n", *yytext); return *yytext; }
foo.y
%{
#include <stdio.h>
void yyerror(char *s);
%}
%%
file: 'a' { printf("Bison: got file!\n") }
;
%%
int main(void)
{
yyparse();
}
void yyerror(char *s)
{
fprintf(stderr, "%s\n", s);
}
Compilation and execution
$ flex foo-lex.l
$ bison foo.y
$ gcc -o foo foo.tab.c lex.yy.c -lfl
$ echo a | ./foo
Flex: 97
Bison: got file!
$
Point of detail: how did that blank line get into the output? Answer: the lexical analyzer put it there. The pattern . does not match a newline, so the newline was treated as if there was a rule:
\n { ECHO; }
This is why the input was accepted. If you change the foo-lex.l file to:
%%
. { printf("Flex-1: %d\n", *yytext); return *yytext; }
\n { printf("Flex-2: %d\n", *yytext); return *yytext; }
and then recompile and run again, the output is:
$ echo a | ./foo
Flex-1: 97
Bison: got file!
Flex-2: 10
syntax error
$
with no blank lines. This is because the grammar doesn't allow a newline to appear in a valid 'file'.