Error: undeclared variable $ - c

When I run the bisonprogram below (by bison file.y) , I get the error missing a declaration type for $2 in 'seq' :
%union {
char cval;
}
%token <cval> AMINO
%token STARTCODON STOPCODON
%%
series: STARTCODON seq STOPCODON {printf("%s", $2);}
seq : AMINO
| seq AMINO
;
%%
I would like to know why I get this error, and how I can correctly declare the variable $2

You haven't told Bison what type seq is, so it doesn't know what to do with $2.
Use the %type directive:
%type <cval> seq
Note that the type used for $2 is a single char, which is not a string as expected by the "%s" format. You need to come up with a way to create your own string from the sequence.

Related

Bison read strings together

I would like to make a project with Flex and Bison.
I have a grammar (that's only a part of mine):
variable_name:
| text {printf("VARIABLE NAME (TEXT) IN BISON: %s\n", $1); $$ = _variable_init($1);}
| character {printf("VARIABLE NAME (CHARACTER) IN BISON: %s\n", $1); $$ = _variable_init($1);}
;
bool_expr:
| true_t { printf("BOOL EXP TRUE\n"); $$ = _bool_expression_init_bool(TRUE); }
| false_t { printf("BOOL EXP FALSE\n"); $$ = _bool_expression_init_bool(FALSE); }
| variable_name { printf("BOOL EXP VARIABLE: %s\n", $1->name); $$ = _bool_expression_init_variable($1); }
| bool_expr eq bool_expr { printf("BOOL EXP ==\n"); $$ = _bool_expression_init_binary_op($1, "==", $3); }
| bool_expr noteq bool_expr { printf("BOOL EXP !=\n"); $$ = _bool_expression_init_binary_op($1, "!=", $3); }
| bool_expr and bool_expr { printf("BOOL EXP AND\n"); $$ = _bool_expression_init_binary_op($1, "&&", $3); }
| bool_expr or bool_expr { printf("BOOL EXP OR\n"); $$ = _bool_expression_init_binary_op($1, "||", $3); }
| '!' bool_expr { printf("BOOL EXP NOT\n"); $$ = _bool_expression_init_unary_op("!", $2); }
| '(' bool_expr ')' { printf("BOOL EXP ()\n"); $$ = _bool_expression_init_with_brackets($2); }
| int_expr eq int_expr { printf("BOOL EXP INT==\n"); $$ = _bool_expression_init_from_int($1, "==", $3); }
| int_expr noteq int_expr { printf("BOOL EXP INT!=\n"); $$ = _bool_expression_init_from_int($1, "!=", $3); }
| int_expr get int_expr { printf("BOOL EXP INT>=\n"); $$ = _bool_expression_init_from_int($1, ">=", $3); }
| int_expr let int_expr { printf("BOOL EXP INT<=\n"); $$ = _bool_expression_init_from_int($1, "<=", $3); }
| int_expr '<' int_expr { printf("BOOL EXP INT<\n"); $$ = _bool_expression_init_from_int($1, "<", $3); }
| int_expr '>' int_expr { printf("BOOL EXP INT>\n"); $$ = _bool_expression_init_from_int($1, ">", $3); }
;
int_expr:
| num { printf("INT EXP NUM\n"); $$ = _int_expression_init_int($1); }
| variable_name { printf("INT EXP VARIABLE\n"); _int_expression_init_variable($1); }
| int_expr '+' int_expr { printf("INT EXP +\n"); $$ = _int_expression_init_binary_op($1, "+", $3); }
| int_expr '-' int_expr { printf("INT EXP -\n"); $$ = _int_expression_init_binary_op($1, "-", $3); }
| int_expr '*' int_expr { printf("INT EXP *\n"); $$ = _int_expression_init_binary_op($1, "*", $3); }
| int_expr '/' int_expr { printf("INT EXP /\n"); $$ = _int_expression_init_binary_op($1, "/", $3); }
| '(' int_expr ')' { printf("INT EXP ())\n"); $$ = _int_expression_init_bracket($2); }
;
I copied only the important parts (hopefully there is the issue).
So when I want to parse this
var != 10
as a bool_expr the Bison identifies var as a variable and prints:
VARIABLE NAME (TEXT) IN BISON: var
but the in the next moment it prints
BOOL EXP VARIABLE: var !=
and it thinks that the variable is the "var !=" when there is a rule
int_expr != int_expr
but it doesn't check this part.
Btw variable has Variable* type (struct), int_expr has IntExpression* (struct), bool_expr has BoolExpression* (struct).
I don't what I should do. I tried and it worked when I write 6 additional rule to bool_expr that are almost same as the last 6. I replaced the first int_expr to variable, but it's disgusting. As I knew Bison search for the longest match not the first.
The log:
VARIABLE NAME (TEXT) IN BISON: var
VARIABLE NAME IN FUNCTION: var //It was called in variable_init function
BOOL EXP VARIABLE: var!=
INT EXP NUM //10
There is nothing in your grammar which allows the parser to distinguish between integer variables and boolean variables. Both are simply variable_name; moreover, either a boolean variable or an integer variable could be followed by a comparison operator. (In the case of var != 10, it would be possible to deduce that var had to be an integer variable, once it is seen that the right-hand side of the operator is an integer. But a != b would still be ambiguous, since the variable names do not carry a marker to indicate their type.)
Bison, by default, produces an LALR(1) parser, which means that every reduction needs to be determined by examining at most one token following the end of the production. (That's what the "(1)" in "LALR(1)" means.) In other words, the parser would have to be able to decide between reducing the variable_name "var" to a bool_expr or an int_expr no later than when it sees the != token. That's not possible, and because it is not possible, Bison should have reported a reduce/reduce conflict, which you are apparently ignoring. (If it didn't, then the rest of your grammar is relevant. But that would be surprising.)
Bison doesn't just give up when it sees a conflict. It makes a somewhat arbitrary choice (in the case of reduce/reduce conflicts, it chooses the reduction which occurs earliest in the grammar, in this case to bool_expr) and builds the parser regardless. Occasionally, this default produces the correct parse, but in most cases the resulting parser is flawed, with behaviour similar to what you are experiencing. So although Bison claims that the conflict report is just a warning, you ignore it at your peril.
As #user253751 notes in a comment, you can ask Bison to produce a GLR grammar, which allows arbitrary lookahead (at the cost of slowing down the parse). However, Bison's GLR implementation still requires the grammar to be unambiguous. Ambiguous parses will be detected at run-time, during the parse, and will cause the parse to fail; that would be the case with the vara != varb ambiguous expression noted above. (You can provide your own ambiguity resolution mechanism. But that's an extremely advanced technique, and in this case it won't work unless the ambiguity resolver has access to semantic information, like the declared type of each variable.)
Without seeing the rest of your grammar, it's hard to know whether type resolution could be done at compile-time (because variables need to be declared with a specific type), or only at run-time, but in neither case can that determination be made by the parser. So if you have boolean variables (and even if you don't), you cannot do better in the parser than to just have one expression non-terminal.
If you require declaration before use, then you could do type-resolution in your reduction actions by consulting the symbol table you are building up. At that point, you can either insert an automatic conversion function, or report an error (depending on whether you feel that automatic conversion is convenient). If you only require declaration (not prior declaration), then you can do the type-resolution after the parse is complete, by walking the AST twice: first to build up the symbol table, and second to resolve types.
If you consider type mismatches to be an error, then reporting the error in a semantic action is a lot more user-friendly. During the parse, it is very difficult to produce an error message more informative than "syntax error at line 10". But in the semantic action, you know precisely what the error is and what produced it, so it's easy to produce error messages like "the comparison at line 10 requires that 'var' be an integer variable, not a boolean variable." Your users will thank you.
By the way, the usual convention is to use UPPER_CASE for the symbolic names of terminals, like !=, which would usually be named something like NOTEQ or T_EQ. But if you are using Bison, you can make your grammar a lot more readable by using quoted aliases for your terminals:
%token EQ "==" NOTEQ "!="
GEQ ">=" LEQ "<="
Then you can use the symbolic names in your lexical analyser:
"!=" { return NOTEQ; }
without forcing whoever reads your grammar to guess what the name means:
expr: expr "==" expr
| expr "!=" expr
| ...
You must use double quotes; '+' is quite different.

Can't make arithmetic operator precedence with parenthesis in bison

I've currently building a Python parser and I'm at the definition of arithmetic expressions. The rules behind arithmetic expressions are working properly up until I add the parenthesis.
Here is the starting point:
%token TOKEN_ARITH_ADD TOKEN_ARITH_SUB
%token TOKEN_ARITH_MUL TOKEN_ARITH_DIV TOKEN_ARITH_MOD
%token TOKEN_ARITH_POWER
%token TOKEN_ASSIGN
%token TOKEN_PAREN_OPEN TOKEN_PAREN_CLOSE
and then:
arith_expr: factor
| arith_expr TOKEN_ARITH_ADD factor { $$ = ast_init_arith_op($3, "+", $1); };
| arith_expr TOKEN_ARITH_SUB factor { $$ = ast_init_arith_op($3, "-", $1); };
| TOKEN_PAREN_OPEN arith_expr TOKEN_PAREN_CLOSE { $$ = $2; };
;
factor: power { $$ = ast_init_arith_op($1, NULL, NULL); };
| factor TOKEN_ARITH_MUL power { $$ = ast_init_arith_op($3, "*", $1); };
| factor TOKEN_ARITH_DIV power { $$ = ast_init_arith_op($3, "/", $1); };
| factor TOKEN_ARITH_MOD power { $$ = ast_init_arith_op($3, "%", $1); };
;
power: term
| power TOKEN_ARITH_POWER term { $$ = ast_init_arith_op($3, "**", $1); }
term: identifier;
| literal_int;
| literal_float;
The results is that if, for instance, I enter this :
myVar = (a + b) * 2
I get error: syntax error, unexpected TOKEN_ARITH_MUL, expecting TOKEN_EOL.
So I've tried to change the %token for %left for the first three ones, with the same problem.
I've also tried to change the %token for the assign to a %right, unfortunately I got an error at compile time (error: rule given for assign, which is a token) - in retrospect, make sense.
It looks like the TOKEN_PAREN_OPEN arith_expr TOKEN_PAREN_CLOSE collapse to a arith_expr and the assign kicks in right away. What am I doing wrong?
According to your grammar, a multiplication operator can appear only between a factor and a power. An expression enclosed in parentheses is neither and cannot be reduced to either. As far as the part of the grammar presented goes, it is an arith_expr.
#n.m.'s comment is correct: you put the rule for a parenthesized expression in the wrong place. It should be a term, not an arith_expr. However, your followup comment suggests that you misunderstood. Do not change the production. Just move it, as is, to be one of the alternatives for term:
term: identifier
| literal_int
| literal_float
| TOKEN_PAREN_OPEN arith_expr TOKEN_PAREN_CLOSE
;
That allows a parenthsized expression to appear as a complete expression itself or as an operand of any operator.

How to find a substring

Given a text file containing a string, I would find some specific substrings/sequences inside this string.
Bison file .y (Declaration+Rules)
%token <cval> AMINO
%token STARTCODON STOPCODON
%type <cval> seq
%%
series: STARTCODON seq STOPCODON {printf("%s", $2);}
seq: AMINO
| seq AMINO
;
%%
Here I want to print every sequence between STARTCODON and STOPCODON
Flex file .l (Rules)
%%
("ATG")+ {return STARTCODON;}
("TAA"|"TAG"|"TGA")+ {return STOPCODON;}
("GCT"|"GCC"|"GCA"|"GCG")+ {yylval.cval = 'A';
return AMINO;}
("CGT"|"CGC"|"CGA"|"CGG"|"AGA"|"AGG")+ {yylval.cval = 'R';
return AMINO;}
.
.
.
[ \t]+ /*ignore whitespace*/
\n /*ignore end of line*/
. {printf("-");}
%%
When I run the code I get only the output of the rule . {printf("-");}.
I am new Flex/Bison, I suspect that:
The bison rule series: STARTCODON seq STOPCODON {printf("%s", $2);} is not correct.
Flex doesn't subdivide correctly the entire string into tokens of 3 characters.
EDIT:
(Example) Input file: DnaSequence.txt:
Input string:cccATGAATTATTAGzzz, where lower characters (ccc, zzz) produce the (right) output -, ATG is the STARTCODON, AATTAT is the sequence of two AMINO (AAT TAT), and TAG is the STOPCODON.
This input string produces the (wrong) output ---.
EDIT:
Following the suggestions of #JohnBollinger I have added <<EOF>> {return ENDTXT;} in the Flex file, and the rule finalseries: series ENDTXT; in the Bison file.
Now it's returning the yyerror's error message, indicating a parsing error.
I suppose that we need a STARTTXT token, but I don't know how to implement it.
I am new Flex/Bison, I suspect that:
The bison rule series: STARTCODON seq STOPCODON {printf("%s", $2);} is not correct.
The rule is syntactically acceptable. It would be semantically correct if the value of token 2 were a C string, in which case it would cause that value to be printed to the standard output, but your Flex file appears to assume that type <cval> is char, which is not a C string, nor directly convertible to one.
Flex doesn't subdivide correctly the entire string into tokens of 3 characters.
Your Flex input looks OK to me, actually. And the example input / output you present indicates that Flex is indeed recognizing all your triplets from ATG to TAG, else the rule for . would be triggered more than three times.
The datatype problem is a detail that you'll need to sort out, but the main problem is that your production for seq does not set its semantic value. How that results in (seemingly) nothing being printed when the series production is used for a reduction depends on details that you have not disclosed, and probably involves undefined behavior.
If <cval> were declared as a string (char *), and if your lexer set its values as strings rather than as characters, then setting the semantic value might look something like this:
seq: AMINO { $$ = calloc(MAX_AMINO_ACIDS + 1, 1); /* check for allocation failure ... */
strcpy($$, $1); }
| seq AMINO { $$ = $1; strcat($$, $2); }
;
You might consider sticking with char as the type for the semantic value of AMINO, and defining seq to have a different type (i.e. char *). That way, your changes could be restricted to the grammar file. That would, however, call for a different implementation of the semantic actions in the production for seq.
Finally, note that although you say
Here I want to print every sequence between STARTCODON and STOPCODON
your grammar, as presented, has series as its start symbol. Thus, once it reduces the token sequence to a series, it expects to be done. If additional tokens follow (say those of another series) then that would be erroneous. If that's something you need to support then you'll need a higher-level start symbol representing a sequence of multiple series.

Shift/reduce conflicts in yacc

I am learning lex and yacc programming and this yacc program to validate and evaluate arithmetic expression in giving me 10 shift/reduce conflicts. Can you point out whats wrong with this program
This is 611.y
%{
#include<stdio.h>
int flag=1;
%}
%token id num
%left '(' ')'
%left '+' '-'
%left '/' '*'
%nonassoc UMINUS
%%
stmt:
expression { printf("\n valid exprn");}
;
expression:
'(' expression ')'
| '(' expression {printf("\n Syntax error: Missing right paranthesis");}
| expression '+' expression {printf("\nplus recog!");$$=$1+$3;printf("\n %d",$$);}
| expression '+' { printf ("\n Syntax error: Right operand is missing ");}
| expression '-' expression {printf("\nminus recog!");$$=$1-$3;printf("\n %d",$$);}
| expression '-' { printf ("\n Syntax error: Right operand is missing ");}
| expression '*' expression {printf("\nMul recog!");$$=$1*$3;printf("\n %d",$$);}
| expression '*' { printf ("\n Syntax error: Right operand is missing ");}
| expression '/' expression {printf("\ndivision recog!");if($3==0) printf("\ndivision cant be done, as divisor is zero.");else {$$=$1+$3;printf("\n %d",$$);}}
| expression '/' { printf ("\n Syntax error: Right operand is missing ");}
| expression '%' expression
| expression '%' { printf ("\n Syntax error: Right operand is missing ");}
| id
| num
;
%%
main()
{
printf(" Enter an arithmetic expression\n");
yyparse();
}
yyerror()
{
printf(" Invalid arithmetic Expression\n");
exit(1);
}
This is 611.l
%{
#include "y.tab.h"
#include<stdio.h>
#include<ctype.h>
extern int yylval;
int val;
%}
%%
[a-zA-Z][a-zA-Z0-9]* {printf("\n enter the value of variable %s:",yytext);scanf("%d",&val);yylval=val;return id;}
[0-9]+[.]?[0-9]* {yylval=atoi(yytext);return num;}
[ \t] ;
\n {return 0;}
. {return yytext[0];}
%%
int yywrap()
{
return 1;
}
When I complie the code like this
lex 611.l
yacc -d 611.y
It gives me
yacc:10 shift/reduce conflicts.
Please help me out here.
Two things are wrong:
Precedence of '%' is missing, add it to '/' '*'
The '(' expression error handler is ambiguous (in an expression (4*(2+3)+5*7 there are many ways to insert a missing parenthesis) and is in fact in conflict with the normal '(' expression ')' rule. It is non-trivial to make such a handler work. I would recommend to remove it and rely on built-in yacc error handler.
Simple error handling can be implemented like this:
stmt:
expression { printf("\n valid exprn");}
| error { printf(" Invalid arithmetic Expression\n"); }
;
expression:
'(' expression ')'
| '(' error ')' { printf(" Invalid arithmetic Expression\n"); }
| ... /* all the rest */
You won't need all other error handlers too.

bison:why do the result is wrong when print a constant in a action?

I have the grammatical:
%token T_SHARE
%token T_COMMENT T_PUBLIC T_WRITEABLE T_PATH T_GUESTOK T_VALID_USERS
T_WRITE_LIST T_CREATE_MODE T_DIRECTORY_MODE
%union
{
int number;
char *string;
}
%token <string> T_STRING
%token <number> T_NUMBER T_STATE
%%
parameters:
|parameters parameter
;
parameter:
section_share
|comment
....
section_share:
'[' T_SHARE ']' {section_print(T_SHARE);}
;
comment:
T_COMMENT '=' T_STRING {print(2);parameter_print(T_COMMENT);}
;
the function print is:
void print(int arg)
{
printf("%d\n", arg);
}
but it prints the argument `2' of print to other values that like "8508438", without rule. why?
It's very hard to understand what you are trying to ask, but I think you are confusing tokens' numeric codes with their semantic values. In particular, there is nothing special about the print(2) call in the action associated with your 'comment' rule. It is copied literally to the generated parser, so, given the definition of the print() function, a literal '2' should be printed each time that rule fires. I think that's what you say you observe.
If instead you want to print the semantic value associated with a symbol in the rule, then the syntax has the form $n, where the number after the dollar sign is the number of the wanted symbol in the rule, counting from 1. Thus, in the 'comment' rule, the semantic value associated with the T_STRING symbol can be referenced as $3. For example:
comment:
T_COMMENT '=' T_STRING { printf("The string is %s\n", $3); }
;
Semantic values of primitive tokens must be set by your lexical analyzer to be available; semantic values of non-terminals must be set by actions in your grammar. Note also that mid-rule actions get included in the count.
Although token symbols such as your T_COMMENT can be used directly in actions, it is not typically useful to do so. These symbols will be resolved by the C preprocessor to numbers characteristic of the specific symbol. The resulting token codes have nothing to do with the specific values parsed.

Resources