I am writting a parser and a scanner in Ubuntu OS. In my flex code "scanner.l" I have an IDENTIFIER token and BOOL_LITERAL token. IDENTIFIER is any word and BOOL_LITERAL is either true or false.
In my bison code "parser.y" I have the grammar in which it should be able to take a BOO_LITERAL through the primary production.
However, the code is not working as intended. Here is the erro
Here are all of my files:
scanner.l
%{
#include <string>
#include <vector>
using namespace std;
#include "listing.h"
#include "tokens.h"
%}
%option noyywrap
ws [ \t\r]+
comment (\-\-.*\n)|\/\/.*\n
line [\n]
digit [0-9]
int {digit}+
real {int}"."{int}([eE][+-]?{digit})?
boolean ["true""false"]
punc [\(\),:;]
addop ["+""-"]
mulop ["*""\/"]
relop [="/=">">=""<="<]
id [A-Za-z][A-Za-z0-9]*
%%
{ws} { ECHO; }
{comment} { ECHO; nextLine();}
{line} { ECHO; nextLine();}
{relop} { ECHO; return(RELOP); }
{addop} { ECHO; return(ADDOP); }
{mulop} { ECHO; return(MULOP); }
begin { ECHO; return(BEGIN_); }
boolean { ECHO; return(BOOLEAN); }
end { ECHO; return(END); }
endreduce { ECHO; return(ENDREDUCE); }
function { ECHO; return(FUNCTION); }
integer { ECHO; return(INTEGER); }
real { ECHO; return(REAL); }
is { ECHO; return(IS); }
reduce { ECHO; return (REDUCE); }
returns { ECHO; return(RETURNS); }
and { ECHO; return(ANDOP); }
{boolean} { ECHO; return(BOOL_LITERAL); }
{id} { ECHO; return(IDENTIFIER);}
{int} { ECHO; return(INT_LITERAL); }
{real} { ECHO; return(REAL_LITERAL); }
{punc} { ECHO; return(yytext[0]); }
. { ECHO; appendError(LEXICAL, yytext); }
%%
parser.y
%{
#include <string>
using namespace std;
#include "listing.h"
int yylex();
void yyerror(const char* message);
%}
%error-verbose
%token INT_LITERAL REAL_LITERAL BOOL_LITERAL
%token IDENTIFIER
%token ADDOP MULOP RELOP ANDOP
%token BEGIN_ BOOLEAN END ENDREDUCE FUNCTION INTEGER IS REDUCE RETURNS REAL
%%
function:
function_header optional_variable body ;
function_header:
FUNCTION IDENTIFIER RETURNS type ';' ;
parameters:
parameters ',' |
parameter ;
parameter:
IDENTIFIER ':' type |
;
optional_variable:
variable |
;
variable:
IDENTIFIER ':' type IS statement_ ;
type:
INTEGER |
BOOLEAN |
REAL ;
body:
BEGIN_ statement_ END ';' ;
statement_:
statement ';' |
error ';' ;
statement:
expression |
REDUCE operator reductions ENDREDUCE ;
operator:
ADDOP |
MULOP ;
reductions:
reductions statement_ |
;
expression:
expression ANDOP relation |
relation ;
relation:
relation RELOP term |
term;
term:
term ADDOP factor |
factor ;
factor:
factor MULOP primary |
primary ;
primary:
'(' expression ')' |
INT_LITERAL |
REAL_LITERAL |
BOOL_LITERAL |
IDENTIFIER ;
%%
void yyerror(const char* message)
{
appendError(SYNTAX, message);
}
int main(int argc, char *argv[])
{
firstLine();
yyparse();
lastLine();
return 0;
}
Other associated files:
listing.h
enum ErrorCategories {LEXICAL, SYNTAX, GENERAL_SEMANTIC, DUPLICATE_IDENTIFIER,
UNDECLARED};
void firstLine();
void nextLine();
int lastLine();
void appendError(ErrorCategories errorCategory, string message);
listing.cc
#include <cstdio>
#include <string>
using namespace std;
#include "listing.h"
static int lineNumber;
static string error = "";
static int totalErrors = 0;
static void displayErrors();
void firstLine()
{
lineNumber = 1;
printf("\n%4d ",lineNumber);
}
void nextLine()
{
displayErrors();
lineNumber++;
printf("%4d ",lineNumber);
}
int lastLine()
{
printf("\r");
displayErrors();
printf(" \n");
return totalErrors;
}
void appendError(ErrorCategories errorCategory, string message)
{
string messages[] = { "Lexical Error, Invalid Character ", "",
"Semantic Error, ", "Semantic Error, Duplicate Identifier: ",
"Semantic Error, Undeclared " };
error = messages[errorCategory] + message;
totalErrors++;
}
void displayErrors()
{
if (error != "")
printf("%s\n", error.c_str());
error = "";
}
makeile
compile: scanner.o parser.o listing.o
g++ -o compile scanner.o parser.o listing.o
scanner.o: scanner.c listing.h tokens.h
g++ -c scanner.c
scanner.c: scanner.l
flex scanner.l
mv lex.yy.c scanner.c
parser.o: parser.c listing.h
g++ -c parser.c
parser.c tokens.h: parser.y
bison -d -v parser.y
mv parser.tab.c parser.c
mv parser.tab.h tokens.h
listing.o: listing.cc listing.h
g++ -c listing.cc
Note:
I have to run "makeile", "bison -d parser.y" and finally "makefile" again. Then, I run the following command "./compile < incremental1.txt" and I get the following error:
enter image description here
Please help me understand why I am getting a syntax error.
#SoronelHaetir has certainly identified one of the problems with your parser. But that problem cannot create the syntax error message which appears in your image. [Note 1] Your grammar allows identifiers in exactly the same place as boolean literals, so the fact that true is actually scanned as an identifier will not produce a syntax error in an expression which starts true and. (In other words, x and... would be parsed just the same.)
The problem is actually your use of 8.E+1 as a numeric literal. Your rule for REAL_LITERAL uses the pattern
{int}"."{int}([eE][+-]?{digit})?
which doesn't match 8.E+1 because there is no {int} followed the .. So when the scanner reaches the input 8.E+1, it produces the INT_LITERAL 8, which is the longest match. When it is asked for the next token, it first sees a ., but that doesn't match any pattern so it uses the default fallback action (ECHO), and then continues to the next character (E) which matches the IDENTIFIER pattern. And the input
true and 8 E ...
is indeed a syntax error: there is an unexpected identifier following the 8, and that's what bison reports.
Aside from fixing the pattern for real literals, you should make sure that you do something sensible with unrecognised characters; flex's default action -- which basically just ignores characters that can't match any pattern -- is not of much use, particularly in debugging (as I think the above explanation demonstrates).
There are a number of other issues with your patterns involving the same misconception about the syntax of character classes as shown in the boolean literal pattern. This indicates to me that you did not attempt to test your lexical scanner before hooking it into your parser. That's an essential step in writing parsers; if your lexical scanner is not returning the tokens you expect it to return, you're going to have a lot of trouble trying to figure out what errors there might be in your grammar.
You might find the debugging techniques outlined in this answer useful. (That post also has links to the flex and bison manuals. Section 6 of the flex manual is a brief but complete guide to the syntax of flex patterns, and you might want to take a few minutes to read it.)
Notes
Please copy and paste the text of error messages into your questions rather than using an image showing a screenshot. Images are very hard to read on smartphones, for example, or for people who rely on screen-readers. And it's not possible to copy a part of a screenshot into an answer, which I would have preferred to have done here.
Your boolean pattern should be "true"|"false" not ["true""false"].
Honestly, the way your patterns are set up is just weird. Is there some reason not to use:
...
%%
"true" { /* */ return BOOL_LITERAL; }
"false { /* */ return BOOL_LITERAL; }
Patterns make sense when you aren't trying to match literals but here you are.
Related
I'm trying to create simple Pascal compiler using Flex/Bison and I want to check what semantic values are stored withing tokens. I have following code for flex:
...
{ID} {yylval.stringValue= strdup(yytext); return(ID);}
...
And following code in bison:
...
program: PROGRAM ID LBRACKET identifier_list RBRACKET DELIM declarations subprogram_declarations compound_statement DOT {printf($2);}
...
And following test file:
program example(input, output);
...
Flex and bison recognize all perfectly and parse is ok, but if I want check token values like in code before it has no effect:
Starting parse
Entering state 0
Reading a token: Next token is token PROGRAM ()
Shifting token PROGRAM ()
Entering state 1
Reading a token: Next token is token ID ()
Shifting token ID ()
Entering state 3
Is there a way to print token value inside (), like token ID (example). I've checked similar questions and they do it the same way, or maybe I'm just missing something.
P.S. When I enable debug mode for flex it shows that it accepted "example" by rule {ID}, but where does that example stored and how should I use it in advance.
Flex and bison communicate semantic values through the semantic union yylval, by default a global variable. (Note 1) If a token has a semantic value, the flex action which reports that token type must set the appropriate member of the semantic union to the token's value, and bison will extract the value and place it on the parser stack.
Bison relies on user declarations to tell which union member is used for the semantic value of tokens and non-terminals (if they have semantic values). So if you have the flex action:
{ID} {yylval.stringValue= strdup(yytext); return(ID);}
one would expect to see the following in the corresponding bison input file:
%union {
/* ... */
char* stringValue;
}
%token <stringValue> ID
The last line tells bison that ID is a token type, and that its associated semantic type is the one with member name stringValue. Subsequently, you can refer to the semantic value of the token and bison will automatically insert the member access, so that if you have the rule:
program: PROGRAM ID LBRACKET identifier_list RBRACKET
DELIM declarations subprogram_declarations compound_statement DOT
{ printf("%s\n", $2); /* Always use a format string in a printf! */ }
The $2 will be replaced with the equivalent of stack[frame_base + 2].stringValue.
However, there is little point using an action like that in a bison file, since it is easy to use bison's trace facility to see how bison is processing the token stream. When traces are enabled, the token will be recorded when it is first seen by bison, as opposed to the above rule which won't print the ID token's semantic value until the entire program has been parsed.
By default, the trace facility only prints the token type, since Bison has no idea how to print an arbitrary semantic value. However, you can define printer rules for semantic types or for specific tokens or non-terminals. These rules should print the semantic value (without delimiters) to the output stream yyoutput. In such a rule, $$ can be used to access the semantic value (and bison will fill in the member access, as above).
Here's a complete example of a simple language consisting only of function calls:
File printer.y
%{
#include <stdio.h>
#include <string.h>
int yylex(void);
%}
%defines
%define parse.trace
%union {
char* str;
long num;
}
%token <str> ID
%token <num> NUM
%type <str> call
/* Printer rules: token-specific, non-terminal-specific, and by type. */
%printer { fprintf(yyoutput, "%s", $$); } ID
%printer { fprintf(yyoutput, "%s()", $$); } call
%printer { fprintf(yyoutput, "%ld", $$); } <num>
/* Destructor rule: by semantic type */
%destructor { free($$); } <str>
%code provides {
void yyerror(const char* msg);
}
%%
prog: %empty
| prog stmt ';'
stmt: %empty
| call { free($1); /* See Note 2 */ }
call: ID '(' args ')' { $$ = $1; /* For clarity; this is the default action */ }
args: %empty
| arglist
arglist: value
| arglist ',' value
value: NUM
| ID { free($1); /* See Note 2 */ }
| call { free($1); /* ditto */ }
%%
int main(int argc, char** argv) {
if (argc > 1 && strcmp(argv[1], "-d") == 0) yydebug = 1;
return yyparse();
}
void yyerror(const char* msg) {
fprintf(stderr, "%s\n", msg);
}
File printer.l
%{
#include <stdlib.h>
#include "printer.tab.h"
%}
%option noinput nounput noyywrap nodefault
%%
[[:space:]]+ ;
[[:alpha:]_][[:alnum:]_]* { yylval.str = strdup(yytext); return ID; }
[[:digit:]]+ { yylval.num = strtol(yytext, NULL, 10); return NUM; }
. return *yytext;
To build:
bison -d -t -o printer.tab.c printer.y
flex -o printer.lex.c printer.l
gcc -Wall -ggdb -std=c11 -o printer printer.lex.c printer.tab.c
Notes:
The semantic type doesn't have to be a union, but it's very common. See the bison manual for other options.
The strdup used to create the token must be matched with a free somewhere. In this simple example, the semantic value of the ID token (and the call non-terminal) are only used for tracing, so they can be freed as soon as they are consumed by some other non-terminal. Bison does not invoke the destructor for tokens which are used by a parsing rule; it assumes that the programmer knows whether the token will or will not be needed. The destructor rules are used for tokens which Bison itself pops off the stack, typically in response to syntax errors.
Bison cannot know for itself, where the semantic values shall be taken from. So
you have to define %printers for your tokens. In your case you have to define the type of the token and a corresponding printer:
%token <stringValue> ID
%printer { fprintf(yyoutput, "%s", $$); } ID;
Define one printer for each token, which you want to deep-inspect in traces, then it should work as you expect.
Hi I am new to bison and flex and I am trying to create a simple calculator however I seem to be having errors when trying to compile.
The following is my flex .l file (named a.l):
%{
#include "a.tab.h"
%}
number [0-9]+
%%
"+" {return ADD;}
"-" {return SUB;}
"*" {return MUL;}
"/" {return DIV;}
"|" {return ABS;}
{number} { return NUMBER;}
\n {return EOF;}
[ \t] { }
. {printf("Mystery Character %s\n", yytext); }
%%
and the following is my bison .y file(named a.y):
%{
#include <stdio.h>
int yyparse(void);
%}
%token NUMBER ADD SUB MUL DIV ABS EOL
%%
calclist: /*nothing*/ | calclist exp EOL { printf("=%d\n, $1") };
exp: factor
| exp ADD factor { $$ = $1 + $3; }
| exp SUB factor { $$ = $1 - $3; }
;
factor: term |
| factor MUL term { $$ = $1 * $3; }
| factor DIV term { $$ = $1 / $3; }
term: NUMBER
| ABS term { $$ = $2 >= 0? $2 : - $2; }
;
%%
int main(void)
{
return(yyparse());
}
void yyerror(char *s)
{
fprintf(stderr, "Error : Exiting %s\n", s);
}
This is what I write in the console:
flex a.l
bison a.y
gcc a.tab.c -lfl -o a.exe
The error I get is:
a.tab.c:(.text+0x1f2): undefined reference to `yylex'
collect2.exe: error: ld returned 1 exit status
I also get the following warnings:
a.tab.c: In function 'yyparse':
a.tab.c:595:16: warning: implicit declaration of function 'yylex' [-Wimplicit-function-declaration]
# define YYLEX yylex ()
^
a.tab.c:1240:16: note: in expansion of macro 'YYLEX'
yychar = YYLEX;
^~~~~
a.tab.c:1396:7: warning: implicit declaration of function 'yyerror' [-Wimplicit-function-declaration]
yyerror (YY_("syntax error"));
^~~~~~~
a.y: At top level:
a.y:32:6: warning: conflicting types for 'yyerror'
void yyerror(char *s)
^~~~~~~
a.tab.c:1396:7: note: previous implicit declaration of 'yyerror' was here
yyerror (YY_("syntax error"));
^~~~~~~
Would anybody be able to explain to me why these errors/warnings may be occuring?
There will be two generated C files, one generated by flex and one by bison. The one created by flex will be called "lex.yy.c" and you'll need to compile that one as well.
I'm starting on the whole world of Flex and Bison. So I followed a tutorial to write this l file for flex:
%{
#include <stdio.h>
#include <stdlib.h>
void yyerror(char *);
#include "y.tab.h"
%}
%%
/******************** RULES ********************/
/* One letter variables */
[a-z] {
yylval = *yytext - 'a'; // This is to return a number between 0 and 26 representting the letter variable.
printf("VAR: %s\n",yytext);
return VARIABLE;
}
/* Integer constants */
[0-9]+ {
yylval = atoi(yytext);
printf("INT: %d\n",yylval);
return INTEGER;
}
/* Operators */
[-+()=/*\n]+ { printf("OPR: %s\n",yytext); return *yytext; /*\n is considered an operator because it signals the end of a statement*/ }
/* This skips white space and tab chararcters */
[ \t] ;
/* Anything esle is not allowed */
. yyerror("Invalid character found");
/***************** SUBROUTINES *****************/
%%
int yywrap(void){
return 1;
}
And this is the grammar:
/***************** DEFINITIONS *****************/
%token INTEGER VARIABLE
%left '+' '-'
%left '*' '/'
%{
void yyerror(char *);
int yylex(void);
int sym[26];
%}
%%
/******************** RULES ********************/
program:
program statement '\n'
|
;
statement:
expr { printf("EXPR: %d\n", $1); }
| VARIABLE '=' expr { sym[$1] = $3; }
;
expr:
INTEGER
| VARIABLE { $$ = sym[$1]; }
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '(' expr ')' { $$ = $2; }
;
%%
/***************** SUBROUTINES *****************/
void yyerror(char *s){
printf("%s\n",s);
}
int main(void) {
yyparse();
return 0;
}
And serveral question arise. The first comes when compiling. This is how I compile:
bison -d bas.y -o y.tab.c
flex bas.l
gcc y.tab.h lex.yy.c y.tab.c -o bas_fe
Which gives me two warnings like this:
bas.y:24:7: warning: incompatible implicit declaration of built-in function ‘printf’
expr { printf("EXPR: %d\n", $1); }
^
bas.y: In function ‘yyerror’:
bas.y:39:4: warning: incompatible implicit declaration of built-in function ‘printf’
printf("%s\n",s);
Now, they are warnings and the print work, but I found it odd, since I have clearly included the libraries for use of the printf function.
My real question arises from my interaction with the program. This is the console output:
x = (3+5)
VAR: x
OPR: =
OPR: (
INT: 3
OPR: +
INT: 5
x
OPR: )
VAR: x
syntax error
Several questions arise from this.
1) Upon inputting x = (3+5) the program printout does not include the ')' Why?
2) When inputting x (expected output would have been 8) only then the ')' appears. Why?
3) And then there is the "syntax error" message. I'm assuming the message is automatically generated within the code of y.tab.c. Can it be changed to somthing more meaningful? Am I right in assuming that the syntax error is because the program found ) and newline and the variable and that this DOES NOT correspond to a program statement, as defined by the grammar?
I have clearly included the libraries for use of the printf function.
You included stdio.h in your flex file, but not in your bison file. And the warnings about printf being undeclared are from your bison file, not your flex file.
When you compile multiple files with gcc (or any other C compiler), the files are compiled independently and then linked together. So your command
gcc y.tab.h lex.yy.c y.tab.c -o bas_fe
does not concatenate the three files and compile them as a single unit. Rather, it compiles the three files independently, including uselessly compiling the header file y.tab.h.
What you should do is add a prolog block including #include <stdio.h> to your bas.y file.
[-+()=/*\n]+ {... return *yytext; ...}
This flex pattern matches any number of characters from the set [-+()=/*\n]. So in the input x=(3+5)\n, the )\n is being matched as a single token. However, the action returns *yytext, the first character of yytext, effectively ignoring the \n. Since your grammar requires \n, that creates a syntax error.
Simply remove the repetition operator from the pattern.
Can the error message be changed to something more meaningful?
If you have a reasonably modern bison, add the declaration
%error-verbose
to the beginning of your bison file.
I'm trying to understand flex/bison, but the documentation is a bit difficult for me, and I've probably grossly misunderstood something. Here's a test case: http://namakajiri.net/misc/bison_charlit_test/
File "a" contains the single character 'a'. "foo.y" has a trivial grammar like this:
%%
file: 'a' ;
The generated parser can't parse file "a"; it gives a syntax error.
The grammar "bar.y" is almost the same, only I changed the character literal for a named token:
%token TOK_A;
%%
file: TOK_A;
and then in bar.lex:
a { return TOK_A; }
This one works just fine.
What am I doing wrong in trying to use character literals directly as bison terminals, like in the docs?
I'd like my grammar to look like "statement: selector '{' property ':' value ';' '}'" and not "statement: selector LBRACE property COLON value SEMIC RBRACE"...
I'm running bison 2.5 and flex 2.5.35 in debian wheezy.
Rewrite
The problem is a runtime problem, not a compile time problem.
The trouble is that you have two radically different lexical analyzers.
The bar.lex analyzer recognizes an a in the input and returns it as a TOK_A and ignores everything else.
The foo.lex analyzer echoes every single character, but that's all.
foo.lex — as written
%{
#include "foo.tab.h"
%}
%%
foo.lex — equivalent
%{
#include "foo.tab.h"
%}
%%
. { ECHO; }
foo.lex — required
%{
#include "foo.tab.h"
%}
%%
. { return *yytext; }
Working code
Here's some working code with diagnostic printing in place.
foo-lex.l
%%
. { printf("Flex: %d\n", *yytext); return *yytext; }
foo.y
%{
#include <stdio.h>
void yyerror(char *s);
%}
%%
file: 'a' { printf("Bison: got file!\n") }
;
%%
int main(void)
{
yyparse();
}
void yyerror(char *s)
{
fprintf(stderr, "%s\n", s);
}
Compilation and execution
$ flex foo-lex.l
$ bison foo.y
$ gcc -o foo foo.tab.c lex.yy.c -lfl
$ echo a | ./foo
Flex: 97
Bison: got file!
$
Point of detail: how did that blank line get into the output? Answer: the lexical analyzer put it there. The pattern . does not match a newline, so the newline was treated as if there was a rule:
\n { ECHO; }
This is why the input was accepted. If you change the foo-lex.l file to:
%%
. { printf("Flex-1: %d\n", *yytext); return *yytext; }
\n { printf("Flex-2: %d\n", *yytext); return *yytext; }
and then recompile and run again, the output is:
$ echo a | ./foo
Flex-1: 97
Bison: got file!
Flex-2: 10
syntax error
$
with no blank lines. This is because the grammar doesn't allow a newline to appear in a valid 'file'.
Now I'm getting other things. When I do a bison -d calc.y I'm getting many source codes in the console (with many m4_define), but it doesn't generate any file. Now my code is like this:
%{
#define YYSTYPE double
#include <math.h>
%}
%token NUM
%%
input: /* empty */
| input line
;
line: '\n'
| exp '\n' { printf ("\t%.10g\n", $1); }
;
exp: NUM { $$ = $1; }
| exp exp '+' { $$ = $1 + $2; }
| exp exp '-' { $$ = $1 - $2; }
| exp exp '*' { $$ = $1 * $2; }
| exp exp '/' { $$ = $1 / $2; }
/* Exponentiation */
| exp exp '^' { $$ = pow ($1, $2); }
/* Unary minus */
| exp 'n' { $$ = -$1; }
;
%%
/* Lexical analyzer returns a double floating point
number on the stack and the token NUM, or the ASCII
character read if not a number. Skips all blanks
and tabs, returns 0 for EOF. */
#include <ctype.h>
#include <stdio.h>
yylex ()
{
int c;
/* skip white space */
while ((c = getchar ()) == ' ' || c == '\t')
;
/* process numbers */
if (c == '.' || isdigit (c))
{
ungetc (c, stdin);
scanf ("%lf", &yylval);
return NUM;
}
/* return end-of-file */
if (c == EOF)
return 0;
/* return single chars */
return c;
}
yyerror (s) /* Called by yyparse on error */
char *s;
{
printf ("%s\n", s);
}
main ()
{
yyparse ();
}
Original Question
I'm trying to create my own development language, but it's so hard to start and as I'm starting, I'm getting many errors and I don't know how to solve the. This is my code:
#include <ctype.h>
#include <stdio.h>
yylex ()
{
int c;
/* skip white space */
while ((c = getchar ()) == ' ' || c == '\t')
;
/* process numbers */
if (c == '.' || isdigit (c))
{
ungetc (c, stdin);
scanf ("%lf", &yylval);
return NUM;
}
/* return end-of-file */
if (c == EOF)
return 0;
/* return single chars */
return c;
}
main ()
{
yyparse ();
}
The calc.y source code file:
%token NUM
%%
input:
| input line
;
line: '\n'
| exp '\n' { printf ("\t%.10g\n", $1); }
;
exp: NUM { $$ = $1; }
| exp exp '+' { $$ = $1 + $2; }
| exp exp '-' { $$ = $1 - $2; }
| exp exp '*' { $$ = $1 * $2; }
| exp exp '/' { $$ = $1 / $2; }
/* Exponentiation */
| exp exp '^' { $$ = pow ($1, $2); }
/* Unary minus */
| exp 'n' { $$ = -$1; }
;
%%
And now the compiler log:
C:\Documents and Settings\Nathan Campos\Desktop>gcc calc.tab.c -lm -o rpcalc
calc.tab.c: In function `yylex':
calc.tab.c:15: error: `yylval' undeclared (first use in this function)
calc.tab.c:15: error: (Each undeclared identifier is reported only once
calc.tab.c:15: error: for each function it appears in.)
calc.tab.c:16: error: `NUM' undeclared (first use in this function)
What is wrong?
Revised answer
The amended code you provide compiles almost cleanly - you should #include <stdio.h> so that printf() is declared before it is used. (You should also use prototypes for functions - such as yyerror(const char *str), and generally drag the code into the 21st Century.)
It even responds correctly to '1 2 +'.
With a single file, you don't need to use 'bison -d'.
If you are seeing garbage, you need to review your build commands and build environment.
Original answer
Where to begin?
Recommendation: get hold of the O'Reilly book on Lex and Yacc (from the library) or Flex and Bison (an August 2009 update/rewrite - probably not in the library yet). If you need a resource quicker, then I suggest the Unix Version 7 manuals or the GNU Bison manual - both of which are available online. In particular, read the 7th edition documents on Lex and Yacc; you're not trying to do what wasn't covered in the original decriptions (though the C code there pre-dates the C89 standard by a decade or more).
You need to use bison -d to generate a header containing the token numbers. For source file 'zzz.y', this will generate C code 'zzz.tab.c' and 'zzz.tab.h'.
You need to include 'zzz.tab.h' in the main program.
You need to use C99 and should, therefore, have a return type on yylex() and main().
You need to declare yylval. Fortunately, the Bison 'zzz.tab.h' file will do that correctly; it isn't quite as simple as it appears.
You may want to allow for negative numbers in your lexical analyzer (-3.1416). You may want to allow for explicitly positive numbers too (+3.1416).
You probably need to ensure that '$$' and friends are of type double rather than the default type of int (#define YYSTYPE double).
IIRC, yylval is declared by Bison, but you have to supply the type. Without that, the compiler gets confused and can give misleading errors.
You need to define YYSTYPE. In your case, you may get away with "#define YYSTYPE double". Do this in the grammar, in a %{ ... %} block near the top.
%union is also available for declaring YYSTYPE as a union.
This looks like the Bison manual standard rpcalc example, so I'll assume you can look up "YYSTYPE" and "%union" easily enough.
In your .y file add these lines:
void yyerror(char *errorinfo);
int yylex(void);
You'll also need to define your token types as generated by flex:
%token NUM
and so on.
Edit: Flex and Bison (O'Reilly) is a great resource.
You have not declared the variable yylval anywhere.
It is not a good idea to #define YYSTYPE. It certainly doesn't work with the Bison I tried it with. What you should do is to tell Bison that the values you work with are doubles:
%union {
double dval;
}
%type <dval> exp NUM
Bison will now generate an appropriate YYSTYPE for you.
But I recommend that you look at a tutorial or something. Thomas Niemann's
A Compact Guide to Lex & Yacc
is a good one, giving examples and step-by-step explanations.
I teach a compiler course, and my experience is that Bison, and grammars, can be difficult to get started with by too much of trial-and-error.