passing data to bison grammar from flex - c

Having an issue with my flex / bison grammar. Not sure if it is the way that I have set up the recursion that is shooting myself in the foot.
When trying to access the data passed via yylval I would use the $1... variable for each element of the production. However when doing this it is not splitting the values into each token. It prints the whole production. This only happens with the second sentence in the metadata production, the first seems to be OK.
I was intending to create a check(int token_val) function that contains a switch(token_val) and checks the return value for each token and then acts on its yytext appropriately. Is there a way to use the $ variable notation that will give me the return value from the commands production? Or is this the incorrect way to go about things?
I have checked the references for this but maybe I have missed something, would appreciate someone to clarify.
Code: bison
input: input metadata
| metadata
;
metadata: command op data {printf("%s is valid.\n", $3);} // check_data($1) ?
| data op data op data op data {printf("row data is valid\n\t %s\n", $1);}
;
command: PROD_TITL
| _DIR
| DOP
| DIT
| FORMAT
| CAMERA
| CODEC
| DATE
| REEL
| SCENE
| SLATE
;
op: EQUALS
| COLON
| SEP
;
data: META
| REEL_ID
| SCENE_ID
| SLATE_ID
| TAKE
| MULTI_T
| LENS
| STOP
| FILTERS
;
%%
int main(void) {
return yyparse();
}
lex:
%{
#include <stdio.h>
#include <string.h>
#include "ca_mu.tab.h"
%}
%option yylineno
%%
\"[^"\n]*["\n] {yylval = yytext; return META;}
[aA-aZ][0-9]+ {yylval = yytext; return REEL_ID;}
([0-9aA-zZ]*\/[0-9aA-zZ]*) {yylval = yytext; return SCENE_ID;}
[0-9]+ {yylval = yytext; return SLATE_ID;}
[0-9][aA-zZ]+ {yylval = yytext; return TAKE;}
[0-9]+-[0-9]+ {yylval = yytext; return MULTI_T;}
[0-9]+MM {yylval = yytext; return LENS;}
T[0-9]\.[0-9]+ {yylval = yytext; return STOP;}
"{"([^}]*)"}" {yylval = yytext; return FILTERS;}
Output sample:
"My Production" is valid.
"Dir Name" is valid.
"DOP Name" is valid.
"DIT Name" is valid.
"16:9" is valid.
"Arri Alexa" is valid.
"ProRes" is valid.
"02/12/2020" is valid.
A001 is valid.
23/22a is valid.
001 is valid.
row data is valid
1, 50MM, T1.8, { ND.3 } // $1 prints all tokens?
row data is valid
3AFS, 50MM, T1.8, {ND.3}
input
/* This is a comment */
production_title = "My Production"
director = "Dir Name"
DOP = "DOP Name"
DIT = "DIT Name"
format = "16:9"
camera = "Arri Alexa"
codec = "ProRes"
date = "02/12/2020"
reel: A001
scene: 23/22a
slate: 001
1, 50MM, T1.8, { ND.3 }
3AFS, 50MM, T1.8, {ND.3}
slate: 002
1, 65MM, T1.8, {ND.3 BPM1/2}
slate: 003
1-3, 24MM, T1.9, {ND.3}
END

The problem is here, in your scanner actions:
yylval = yytext;
You must never do this.
yytext points into a temporary buffer which is only valid until the next call to yylex(), and that means you are effectively making yylval a dangling pointer. Always copy the string, as with:
yylval = strdup(yytext);
(Don't forget to call free() on the copied strings when you no longer need the copies.)

I think your language is too simple and doesn't define the structure of the input. For example:
reel: A001 // repetition of reels, consisting of....
scene: 23/22a // repetition of scenes, consisting of...
slate: 001 // repetition of slates, consisting of...
1, 50MM, T1.8, { ND.3 } // repetition of slate data
This is a structure, so the input is:
movie: metadata reels
;
metadata: /* see your stuff */ ;
reels: reel
| reels reel
;
reel: REEL REEL_ID scenes
;
scenes: scene
| scenes scene
;
scene: SCENE SCENE_ID slates
;
slates: slate
| slates slate
;
slate: SLATE SLATE_ID slate_datas
;
slate_datas: slate_data
| slate_datas slate_data
;
slate_data: /*your stuff*/ ;

Related

stucture of yacc definitions

I am in the process of writing a parser for a markup language for a personal project:
sample:
/* This is a comment */
production_title = "My Production"
director = "Joe Smith"
DOP = "John Blogs"
DIT = "Random Name"
format = "16:9"
camera = "Arri Alexa"
codec = "ProRes"
date = _auto
Reel: A001
Scene: 23/22a
Slate: 001
1-2, 50MM, T1.8, {ND.3}
3AFS, 50MM, T1.8, {ND.3}
Slate: 002:
1, 65MM, T1.8, {ND.3 BPM1/2}
Slate: 003:
1-3, 24MM, T1.9 {ND.3}
Reel: A002
Scene: 23/22a
Slate: 004
1-5, 32MM, T1.9, {ND.3}
Scene: 23/21
Slate: 005
1, 100MM, T1.9, {ND.6}
END
I have started learning lex and yacc, and have run into a couple of issues regarding the structure of the grammar definitions.
yacc.y
%{
#include <stdio.h>
int yylex();
void yyerror(char *s);
%}
%token PROD_TITL _DIR DOP DIT FORMAT CAMERA CODEC DATE EQUALS
%right META
%%
meta: PROD_TITL EQUALS META {
printf("%s is set to %s\n",$1, $3);
}
| _DIR EQUALS META {
printf("%s is set to %s\n",$1, $3);
}
%%
int main(void) {
return yyparse();
}
void yyerror(char *s) {fprintf(stderr, "%s\n", s);}
lex.l
%{
#include <stdio.h>
#include <string.h>
#include "y.tab.h"
%}
%%
"production_title" {yylval = strdup(yytext); return PROD_TITL;}
"director" {yylval = strdup(yytext); return _DIR;}
"DOP" return DOP;
"DIT" return DIT;
"format" return FORMAT;
"camera" return CAMERA;
"codec" return CODEC;
"date" return DATE;
"exit" exit(EXIT_SUCCESS);
\"[^"\n]*["\n] { yylval = strdup(yytext);
return META;
}
= return EQUALS;
[ \t\n] ;
"/*"([^*]|\*+[^*/])*\*+"/" ;
. printf("unrecognized input\n");
%%
int yywrap(void) {
return 1;
}
The main issue that I am having is that the program only runs correctly on the first parse then it returns a syntax error which is incorrect. Is this something todo with the way that I have written the grammar?
example output from sample.txt and typed in commands:
hc#linuxtower:~/Documents/CODE/parse> ./a.out < sample.txt
production_title is set to "My Production"
syntax error
hc#linuxtower:~/Documents/CODE/parse> ./a.out
production_title = "My Production"
production_title is set to "My Production"
director = "Joe Smith"
syntax error
When compiling I get warnings in the lex.l file with regards to my regex's:
ca_mu.l: In function ‘yylex’:
ca_mu.l:9:9: warning: assignment makes integer from pointer without a cast [-Wint-conversion]
"production_title" {yylval = strdup(yytext); return PROD_TITL;}
^
ca_mu.l:10:9: warning: assignment makes integer from pointer without a cast [-Wint-conversion]
"director" {yylval = strdup(yytext); return _DIR;}
^
ca_mu.l:20:10: warning: assignment makes integer from pointer without a cast [-Wint-conversion]
\"[^"\n]*["\n] { yylval = strdup(yytext);
^
Could this be the source of the problem or an additional issue?
Those are two separate issues.
Your grammar is as follows, leaving out the actions:
meta: PROD_TITL EQUALS META
| _DIR EQUALS META
That means that your grammar accepts one of two sequences, both having exactly three tokens. That is, it accepts "PROD_TITL EQUALS META" or "_DIR EQUALS META". That's it. Once it finds one of those things, it has parsed as much as it knows how to parse, and it expects to be told that the input is complete. Any other input is an error.
The compiler is complaining about yylval = strdup(yytext); because it has been told that yylval is of type int. That's yacc/bison's default semantic type; if you don't do anything to change it, that's what bison will assume, and it will insert extern int yylval; in the header file it generates, so that the lexer knows what the semantic type is. If you search the internet you'll probably find a variety of macro hacks suggested to change this, but the correct way to do it with a "modern" bison is to insert the following declaration in your bison file, somewhere in the prologue:
%declare api.value.type { char* }
Later on, you'll probably find that you want a union type instead of making everything a string. Before you reach that point, you should read the section in the Bison manual on Defining Semantic Values. (In fact, you'd be well-advised to read the Bison manual from the beginning up to that point, including the simple examples in section 2. It's not that long, and it's pretty easy reading.)

Flex and Bison code - syntax error always

First of all I need to say that I am very new to Flex and Bison and I am a bit confused. There is a school project that want us to create a compiler using Flex and Bison for some kind of CLIPS language.
My code has a lot of problems but the main one is that whatever i type i see a syntax error while the result should be something else. The ideal scenario would be to fully work for the language CLIPS. EG when i write "4" it get syntax error. Reading my code maybe will get you understand this better. If i write "test 3 4" it doesnt show syntax error but it counts it as an unknown token and thats wrong again..i'm completely lost. the code is a prototype by the school and we need to do some changes. if you have any questions dont hesitate to ask. THank you!
P.S.: dont mind the comments, they are in greek.
FLEX CODE:
%option noyywrap
/* Kwdikas C gia orismo twn apaitoumenwn header files kai twn metablhtwn.
Otidhpote anamesa sta %{ kai %} metaferetai autousio sto arxeio C pou
tha dhmiourghsei to Flex. */
%{
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/* Header file pou periexei lista me ola ta tokens */
#include "token.h"
/* Orismos metrhth trexousas grammhs */
int line = 1;
%}
/* Onomata kai antistoixoi orismoi (ypo morfh kanonikhs ekfrashs).
Meta apo auto, mporei na ginei xrhsh twn onomatwn (aristera) anti twn,
synhthws idiaiterws makroskelwn kai dysnohtwn, kanonikwn ekfrasewn */
/* dimiourgia KE simfona me ta orismata tis glossas */
DELIMITER [ \t]+
INTCONST [+-]*[1-9][0-9]*
VARIABLE [?][A-Za-z0-9]*
DEFINITIONS [a-zA-Z][-|_|A-Z|a-z|0-9]*
COMMENTS ^;.*$
/* Gia kathe pattern (aristera) pou tairiazei ekteleitai o antistoixos
kwdikas mesa sta agkistra. H entolh return epitrepei thn epistrofh
mias arithmhtikhs timhs mesw ths synarthshs yylex() */
/* an sinantisei diaxoristi i sxolio to agnoei, an sinantisei akeraio,metavliti i orismo ton emfanizei. se kathe alli periptosi ektiponei oti den anagnorizei to token, ti grammi pou vrisketai kai to string pou dothike */
%%
{DELIMITER} {;}
"bind" { return BIND;}
"test" { return TEST;}
"read" { return READ;}
"printout" { return PRINTOUT;}
"deffacts" { return DEFFACTS;}
"defrule" { return DEFRULE;}
"->" { return '->';}
"=" { return '=';}
"+" { return '+';}
"-" { return '-';}
"*" { return '*';}
"/" { return '/';}
"(" { return '(';}
")" { return ')';}
{INTCONST} { return INTCONST; }
{VARIABLE} { return VARIABLE; }
{DEFINITIONS} { return DEFINITIONS; }
{COMMENTS} {;}
\n { line++; printf("\n"); }
.+ { printf("\tLine=%d, UNKNOWN TOKEN, value=\"%s\"\n",line, yytext);}
<<EOF>> { printf("#END-OF-FILE#\n"); exit(0); }
%%
/* Pinakas me ola ta tokens se antistoixia me tous orismous sto token.h */
char *tname[11] = {"DELIMITER","INTCONST" , "VARIABLE", "DEFINITIONS", "COMMENTS", "BIND", "TEST", "READ", "PRINTOUT", "DEFFACTS", "DEFRULE"};
BISON CODE:
%{
/* Orismoi kai dhlwseis glwssas C. Otidhpote exei na kanei me orismo h arxikopoihsh
metablhtwn & synarthsewn, arxeia header kai dhlwseis #define mpainei se auto to shmeio */
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(char *);
%}
/* Orismos twn anagnwrisimwn lektikwn monadwn. */
%token INTCONST VARIABLE DEFINITIONS PLUS NEWLINE MINUS MULT DIV COM BIND TEST READ PRINTOUT DEFFACTS DEFRULE
%%
/* Orismos twn grammatikwn kanonwn. Kathe fora pou antistoixizetai enas grammatikos
kanonas me ta dedomena eisodou, ekteleitai o kwdikas C pou brisketai anamesa sta
agkistra. H anamenomenh syntaksh einai:
onoma : kanonas { kwdikas C } */
program:
program expr NEWLINE { printf("%d\n", $2); }
|
;
expr:
INTCONST { $$ = $1; }
| VARIABLE { $$ = $1; }//prosthiki tis metavlitis
| PLUS expr expr { $$ = $2 + $3; }//prosthiki tis prosthesis os praksi
| MINUS expr expr { $$ = $2 - $3; } //prosthiki tis afairesis os praksi
| MULT expr expr { $$ = $2 * $3; }//prosthiki tou pollaplasiasmou os praksi
| DIV expr expr { $$ = $2 / $3; }//prosthiki tis diairesis os praksi
| COM { $$ = $1; }//prosthiki ton sxolion
| DEFFACTS expr { $$ = $2; }//prosthiki ton gegonoton
| DEFRULE expr { $$ = $2; }//prosthiki ton kanonon
| BIND expr expr { $$ = $2;}//prosthiki tis bind
| TEST expr expr { $$ = $2 ;}//prosthiki tis test
| READ expr expr { $$ = $2 ;}//prosthiki tis read
| PRINTOUT expr expr { $$ = $2 ;}//prosthiki tis printout
;
%%
/* H synarthsh yyerror xrhsimopoieitai gia thn anafora sfalmatwn. Sygkekrimena kaleitai
apo thn yyparse otan yparksei kapoio syntaktiko lathos. Sthn parakatw periptwsh h
synarthsh epi ths ousias typwnei mhnyma lathous sthn othonh. */
void yyerror(char *s) {
fprintf(stderr, "Error: %s\n", s);
}
/* H synarthsh main pou apotelei kai to shmeio ekkinhshs tou programmatos.
Sthn sygkekrimenh periptwsh apla kalei thn synarthsh yyparse tou Bison
gia na ksekinhsei h syntaktikh analysh. */
int main(void) {
yyparse();
return 0;
}
TOKEN FILE:
#define DELIMITER 1
#define INTCONST 2
#define VARIABLE 3
#define DEFINITIONS 4
#define COMMENTS 5
#define BIND 6
#define TEST 7
#define READ 8
#define PRINTOUT 9
#define DEFFACTS 10
#define DEFRULE 11
MAKEFILE:
all:
bison -d simple-bison-code.y
flex mini-clips-la.l
gcc simple-bison-code.tab.c lex.yy.c -o B2
./B2
clean:
rm simple-bison-code.tab.c simple-bison-code.tab.h lex.yy.c B2
Your top-level rule is:
program:
program expr NEWLINE
which cannot succeed unless the parser sees a NEWLINE token. But it will never see one, because your lexical scanner never sends one; when it sees a newline, it increments the line count but doesn't return anything.
All your tokens are considered invalid because your lexical scanner uses its own definitions of the token values. You shouldn't do that. The parser generator (bison/yacc) will generate a header file containing the correct definitions; that is, the values it is expecting to see.
There are various other problems, probably more than I noticed. The most important is that you should not call exit(0) in the <<EOF>> rule, since that will mean that the parser can never succeed; it does not succeed until it is passed an EOF token. In fact, you should not normally have an <<EOF>> rule; the default action is to return 0 and that is pretty well the only action which makes sense.
Also, '->' is not a correct C literal. The compiler would have complained about it if you had enabled compiler warnings (-Wall), which you should always do, even if you are compiling generated code.
And your scanner's last pattern, intended to trigger on bad tokens, is .+, which will match the entire line, not just the erroneous character. Since (f)lex scanners accept the pattern with the longest match, most of your other patterns will never match. (Flex usually warns you about unmatchable patterns. Didn't you get such a warning?)
The fallback pattern should be .|\n, although you can use . if you are absolutely sure that every newline will be matched by some rule. I like to use %option nodefault, which will cause flex to warn me if there is some possible input not matched by any rule.

Why is my bison/flex not working as intended?

I have this homework assignment where I have to transform some input into a particular output. The problem I'm having is that I can only convert the first line into the output I need, the other lines return a "syntax error" error.
Additionally, if I change the lines order, no lines are converted so only one particular line is working.
This is my input file:
Input.txt
B0102 Bobi 2017/01/16 V8 1, massage 12.50
J1841 Jeco 20.2 2017/01/17 V8 2, Tosse 2, tosquia 22.50
B2232 Bobi 2017/01/17 Tosse 1, Leptospirose 1, bath 30.00, massage 12.50
B1841 Jeco 21.4 2017/01/18 Leptospirose 1, Giardiase 2
And this is the output I should obtain:
Output
Bobi (B0102) paid 2 services/vaccines 22.50
Jeco (J1841) paid 3 services/vaccines 62.50
Bobi (B2232) paid 4 services/vaccines 62.50
Jeco (B1841) paid 2 services/vaccines 30.00
If I change the line order in the input file, not even the first line is converted.
However, if the order is as I showed above, this is my output:
Bobi (B0102) paid 2 services/vaccines 22.50
syntax error
This is my code:
file.y
%{
#include "file.h"
#include <stdio.h>
int yylex();
int counter = 0;
int vaccineCost = 10;
%}
%union{
char* code;
char* name;
float value;
int quantity;
};
%token COMMA WEIGHT DATE SERVICE VACCINE
%token CODE
%token NAME
%token VALUE
%token QUANTITY
%type <name> NAME
%type <code> CODE
%type <value> VALUE
%type <quantity> QUANTITY
%type <value> services
%start begining
%%
begining: /*empty*/
| animal
;
animal: CODE NAME WEIGHT DATE services {printf("%s (%s) paid %d services/vaccines %.2f\n", $2, $1, counter, $5); counter = 0;}
| CODE NAME DATE services {printf("%s (%s) paid %d services/vaccines %.2f\n", $2, $1, counter, $4); counter = 0;}
;
services: services COMMA SERVICE VALUE {$$ = $1 + $4; counter++;}
| services COMMA VACCINE QUANTITY{$$ = $1 + $4*vaccineCost;counter++;}
| SERVICE VALUE{$$ = $2;counter++;}
| VACCINE VALUE
{$$ = $2*vaccineCost;counter++;}
;
%%
int main(){
yyparse();
return 0;
}
void yyerror (char const *s) {
fprintf (stderr, "%s\n", s);
}
file.flex
%option noyywrap
%{
#include "file.h"
#include "file.tab.h"
#include <stdio.h>
#include <string.h>
%}
/*Patterns*/
YEAR 20[0-9]{2}
MONTH 0[1-9]|1[0-2]
DAY 0[1-9]|[1-2][0-9]|3[0-1]
%%
, {return COMMA,;}
[A-Z][0-9]{4} {yylval.code = strdup(yytext); return CODE;}
[A-Z][a-z]* {yylval.name = strdup(yytext); return NAME;}
[0-9]+[.][0-9] {return WEIGHT;}
{YEAR}"/"{MONTH}"/"{DAY} {return DATE;}
(banho|massagem|tosquia) {return SERVICE;}
[0-9]+\.[0-9]{2} {yylval.value = atof(yytext);return VALUE;}
(V8|V10|Anti-Rabatica|Giardiase|Tosse|Leptospirose) {return VACCINE;}
[1-9] {yylval.quantity = atoi(yytext);return QUANTITY;}
\n
.
<<EOF>> return 0;
%%
And these are the commands I execute:
bison -d file.y
flex -o file.c file.flex
gcc file.tab.c file.c -o exec -lfl
./exec < Input.txt
Can anyone point me in the right direction or tell me what is wrong with my code?
Thanks and if I my explaination wasn't good enough I'll try my best to explain it better!!
There are at least two different problems which cause those symptoms.
Your top-level grammar only accepts at most a single animal:
inicio: /*vazio*/
| animal
So an input containing more than one line won't be allowed. You need a top-level which accepts any number of animals. (By the way, modern bison versions let you write %empty as the right-hand side of an empty production, instead of having to (mis)use a comment.
The order of your scanner rules means that most of the words you want to recognise as VACINA will instead be recognised as NOME. Recall that when two patterns match the same token, the first one in the file wlll win. So with these rules:
[A-Z][a-z]* {yylval.nome = strdup(yytext); return NOME;}
(V8|V10|Anti-Rabatica|Giardiase|Tosse|Leptospirose) {return VACINA;}
Tokens like Tosse, which could match either rule, will be assumed to match the first rule. Only V8 and Anti-Rabatical, which [A-Z][a-z]* doesn't match, will fall through to the second rule. So your first input line doesn't trigger this problem, but all the other ones do.
You probably should handle newline characters syntactically, unless you allow treatment records to be split over multiple lines. And be aware that many (f)lex versions do not allow empty actions, as in your last two flex rules. This may cause lexical errors.
And finally
<<EOF>> return 0;
is unnecessary. That's how the scanner handles end-of-fike by default. <<EOF>> rules are often wring or redundant, and should only be used when clearly needed (and with great care).

Lexical & Grammar Analysis using Yacc and Lex

I am fairly new to Yacc and Lex programming but I am training myself with a analyser for C programs.
However, I am facing a small issue that I didn't manage to solve.
When there is a declaration for example like int a,b; I want to save a and b in an simple array. I did manage to do that but it saving a bit more that wanted.
It is actually saving "a," or "b;" instead of "a" and "b".
It should have worked as $1should only return tID which is a regular expression recognising only a string chain. I don't understand why it take the coma even though it defined as a token. Does anyone know how to solve this problem ?
Here is the corresponding yacc declarations :
Declaration :
tINT Decl1 DeclN
{printf("Declaration %s\n", $2);}
| tCONST Decl1 DeclN
{printf("Declaration %s\n", $2);}
;
Decl1 :
tID
{$$ = $1;
tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
printf("Added %s at adress %d\n", $1, compteur);
compteur++;}
| tID tEQ E
{$$ = $1;
tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
printf("Added %s at adress %d\n", $1, compteur);
pile[compteur]=$3;
compteur++;}
;
DeclN :
/*epsilon*/
| tVIR Decl1 DeclN
And the extract of the Lex file :
separateur [ \t\n\r]
id [A-Za-z][A-Za-z0-9_]*
nb [0-9]+
nbdec [0-9]+\.[0-9]+
nbexp [0-9]+e[0-9]+
"," { return tVIR; }
";" { return tPV; }
"=" { return tEQ; }
{separateur} ;
{id} { yylval.str = yytext; return tID; }
{nb}|{nbdec}|{nbexp} { yylval.nb = atoi(yytext); return tNB; }
%%
int yywrap() {return 1;}
The problem is that yytext is a reference into lex's token scanning buffer, so it is only valid until the next time the parser calls yylex. You need to make a copy of the string in yytext if you want to return it. Something like:
{id} { yylval.str = strdup(yytext); return tID; }
will do the trick, though it also exposes you to the possibility of memory leaks.
Also, in general when writing lex/yacc parsers involving single character tokens, it is much clearer to use them directly as charcter constants (eg ',', ';', and '=') rather than defining named tokens (tVIR, tPV, and tEQ in your code).

C Compiler using flex

I am making C compiler using flex on RED HAT LINUX . But its not going well , please help. Its not showing any of these.
Why its not showing the output in the printf statement.
How to create check of libraries and the pre defined directrives ?
keywords auto | break | default | case | for | char | malloc |int |const |continue |do |double |if |else | enum | extern |float | goto |long |void |return |short |static | sizeof|switch| typedef |union|unsigned| signed | volatile| while | do | struct
%%
{keywords}*
{
printf ( "\n\n\nThese are keywords:" ) ; printf( "'%s'\n" , yytext ) ;
}
"#include define"
{
printf ( "\n\n\nThese are predefined directives:" );
printf( "'%s'\n",yytext );
}
"#include <"|"#include \" "
{
printf ( "\n\n\nThese are libraries:");
printf ( "'%s'\n",yytext);
}
""|,|;|:|!
{
printf ("\n\n\n These are puctuation :") ;
printf("'%s'\n",yytext);
}
"<<"|">>"|"<"|">"|"<="|">="|"+"|"-"|"*"|"/"|"{"|"}"|"["|"]"|"&&"|"||"
{
printf("\n\n\nThese are operators:");
printf("'%s'\n",yytext);
}
"//"|"/*"|"*/"
{
printf("\n\n\nThese are comments:");
printf("'%s'\n",yytext);
}
%%
int yywrap(void)
{
return 1;
}
int main (void)
{
yylex();
return 0;
}
Your flex input is pretty broken, and won't actually generate a scanner, just a bunch of (probably very confusing) error messages.
The first issue is that spaces and newline are very important syntaxtically for flex -- spaces separate patterns from actions, and newlines end actions. So all the spaces in your keyword pattern are causing problems; you need to get rid of them:
keywords auto|break|default|case|for|char|malloc|int|const|continue|do|double|if|else|enum|extern|float|goto|long|void|return|short|static|sizeof|switch|typedef|union|unsigned|signed|volatile|while|do|struct
Then, you can't have a newline between the pattern and action in a rule (otherwise you end up with two patterns and no actions). So you need:
{keywords}* {
printf ( "\n\n\nThese are keywords:" ) ; printf( "'%s'\n" , yytext ) ;
}
and likewise for the rest of the rules.
Once you do that, you can get your code to run through flex and produce a scanner. Now you can give it input and see what it does.
Anything that matches one of the rules will trigger that rule. For example, break will trigger the "keywords" rule, but so will things like casecharswitch or any other string of keywords mashed together. That's because you used a * on that rule.
Anything that doesn't match any of the rules will simply be echoed to the output. Since that probably isn't what you want, you need to make sure that you have rules that match . and \n at the end of the rules, so that every possible input will match some rule.

Resources