issue with the function definition in my grammar - c

I have an issue with the function definition in my C grammar wich can be found here http://www.archive-host.com/files/1959635/24fe084677d7655eb57ba66e1864081450017dd9/cAST.txt, it does not define correctly and I can't multiply it by something.
The code I am tring to input is this one :
int factorielle(int n)
{ int x;
if ( n == 0)
return 1;
else return n*factorielle(n-1);
}
The function definition is this one :
function_definition
: declaration_specifiers declarator compound_statement
| declarator compound_statement
;
declaration_specifiers should be linked to int and declarator to factorielle(int n), to do this I replaced this :
direct_declarator
: ID ((direct_declarator '[' ']') | (direct_declarator '(' parameter_type_list ')') | (direct_declarator '(' identifier_list ')') | (direct_declarator '(' ')') )*
with
direct_declarator
: ID ((direct_declarator '[' ']') | (direct_declarator '(' parameter_type_list ')') | (direct_declarator '(' identifier_list ')') | (direct_declarator '(' ')') | '(' parameter_type_list ')' )*
But it does not help much.
As for the multiplication I don't know how to do without bringing conflict.
is there a way to fix this please ?

You're likely to have an difficult time parsing real C code using a pure grammar with pure ANTLR.
The reason is that certain declarations look like legitimate executable statements. (While the referenced answer seems to be about LR(1) parsers, it is really about parsers that cannot handle ambiguity; ANTLR cannot).
The only way to tell them apart is to use context information available from earlier symbol declarations. So you will have to collect symbol types as you parse, and inspect that information in the grammar rule reductions to decide whether such instances are statements or declarations. (I don't know how one implements this in ANTLR, although I believe it to be possible).

I may have found a solution to the first part of the issue by remplacing
compound_statement
: '{' '}'
| '{' statement_list '}'
;
with
compound_statement
: '{' '}'
| '{' statement_list '}'
| '{' external_declaration+ '}'
;
and adding this to direct_declarator:
| ID '(' parameter_type_list ')'
But I don't know if it will bring some conflicts.

Related

How can i know if a variable was already declared using lex/yacc?

i have to design an original programming language and provide a syntactic analyzer for it. Now i'm at the point where i should check if a variable was already declared and if it was i shouldn't be able to declare it again. How can i do this?with an array?(how)
This is what I've done so far.
lex:
%{
#include <stdio.h>
#include <string.h>
#include "y.tab.h"
%}
%option noyywrap
%%
"/*"(.|\n)+"*/" ;
"float"|"char"|"string" {return TYPE;}
"int" {return INTTYPE;}
"for" {return FOR;}
"while" {return WHILE;}
"if" {return IF;}
"else" {return ELSE;}
"bool" {return BOOLTYPE;}
"true"|"false" {return BOOLVALUE;}
"++"|"--" {return INCRDECR;}
"/[^\"]+/" {return STRINGVALUE;}
"scat"|"scmp"|"scpy"|"slen" {return STRINGFUNCTION;}
"protected"|"private"|"public" { return CLASSTYPE;}
"class" {return CLASS;}
"eval" {return EVAL;}
[1-9][0-9]*|0 {yylval = atoi(yytext); return INTVALUE;}
[A-Za-z][A-Za-z0-9]* {yylval = strdup(yytext); return ID;}
"=" {return ASSIGN;}
"<="|"<"|">="|">"|"=="|"!=" {return COMP;}
"begin_prog" {return BGIN;}
"end_prog" {return END;}
[ \t] ;
\n {yylineno++;}
. {return yytext[0];}
yacc:
%{
#include <stdio.h>
extern FILE* yyin;
extern char* yytext;
extern int yylineno;
%}
%token TYPE INTTYPE FOR WHILE IF ELSE BOOLTYPE BOOLVALUE INCRDECR STRINGVALUE STRINGFUNCTION CLASSTYPE CLASS EVAL INTVALUE ID ASSIGN COMP BGIN END
%start s
%left '.'
%left ','
%left ';'
%left '+' '-'
%left '*' '/' '%'
%%
s: declarations block {printf ("correct input \n ");}
;
declarations : declaration ';'
| declarations declaration ';'
;
declaration : TYPE ID
|BOOLTYPE ID ASSIGN BOOLVALUE
|INTTYPE ID ASSIGN INTVALUE
|INTTYPE ID
|TYPE ID '(' list_param ')'
|TYPE ID '[' INTVALUE ']'
|INTTYPE ID '(' list_param ')'
|INTTYPE ID '[' INTVALUE ']'
|BOOLTYPE ID
;
block: BGIN blockinstr END
;
blockinstr: blockinstr listfiw
| blockinstr classs
| blockinstr fdeclaration
| listfiw
| listinstr
| stringg
| classs
| fdeclaration
;
fdeclaration : TYPE EVAL '(' list_paramf ')' '{' listinstr '}'
| INTTYPE EVAL '(' list_paramf ')' '{' listinstr '}'
| TYPE ID '(' list_param ')' '{' listinstr '}'
| INTTYPE ID '(' list_param ')' '{' listinstr '}'
;
list_param : parameter
| list_param ',' parameter
;
parameter : TYPE ID
| parameterf
;
list_paramf : parameterf
| list_paramf ',' parameterf
;
parameterf : INTTYPE ID
;
listfiw : iff
|forr
|whilee
;
iff: IF '(' ID COMP INTVALUE ')' '{' listinstr '}'
| IF '(' ID COMP ID ')' '{' listinstr '}'
| IF '(' ID ')' '{' listinstr '}'
| IF '(' ID COMP INTVALUE ')' '{' listinstr '}' ELSE '{' listinstr '}'
| IF '(' ID COMP ID ')' '{' listinstr '}' ELSE '{' listinstr '}'
| IF '(' ID ')' '{' listinstr '}' ELSE '{' listinstr '}'
| IF '(' ID COMP BOOLVALUE ')' '{' listinstr '}'
| IF '(' ID COMP BOOLVALUE ')' '{' listinstr '}' ELSE '{' listinstr '}'
;
forr: FOR '(' ID ASSIGN INTVALUE ';' ID COMP INTVALUE ';' ID INCRDECR ')' '{' listinstr '}'
| FOR '(' ID ASSIGN INTVALUE ';' ID COMP INTVALUE ';' ID INCRDECR ')' '{' FOR '(' ID ASSIGN INTVALUE ';' ID COMP INTVALUE ';' ID INCRDECR ')' '{' listinstr '}' '}'
;
whilee: WHILE '(' ID COMP INTVALUE ')' '{' listinstr '}'
| WHILE '(' INTVALUE ')' '{' listinstr '}'
;
listinstr : instruction ';'
| listinstr instruction ';'
;
instruction: ID ASSIGN BOOLVALUE
| ID ASSIGN operations
| ID '(' operations ')'
;
operations: operations '+' operations
| operations '*' operations
|operations '-' operations
|operations '/' operations
|operations '%' operations
|'(' operations ')'
|ID '[' INTVALUE ']'
|ID '(' operations ')'
|INTVALUE
|ID
;
stringg : stringg STRINGFUNCTION '(' ID ')' ';'
| stringg STRINGFUNCTION '(' ID ',' ID ')' ';'
| STRINGFUNCTION '(' ID ')' ';'
| STRINGFUNCTION '(' ID ',' ID ')' ';'
;
classs: CLASS ID '{' classlists '}'
;
classlists: classlist
| classlists classlist
;
classlist: CLASSTYPE ':' declarations
;
%%
int yyerror(char * s){
printf("err: %s line:%d\n",s,yylineno);
}
int main(int argc, char** argv[]){
yyin=fopen(argv[1],"r");
yyparse();
fclose(f);
fclose(yyin);
}
This works for declarations of any type.
I hope you can help me with my problem.
Thanks!!

Flex/Bison start condition

How to enable a start condition at the beginning of a rule and disable it at the end ? I have to ignore whitespace with some bison rules only.
How to ignore whitespace inside nested brackets.
define_directive:
DEFINE '(' class_name ')'{ ... }
;
I'm trying to write a parser for this sample code with some more rules.
#/*
* #Template Family
* #Description sample script template for Mate Programming language
* (multi-line comment)
*/
#namespace(sample)
#require(String fatherName)
#require(String motherName)
#require(Array childrenNames)
#define(Family : Template) #// end of header anything can go in body section below (comment)
Family Description
==================
Father's Name: #(fatherName)
Mother's Name: #(motherName)
Number of child: #(childrenNamesCount,0) #// valuation operator is null safe (comment)
List of children's names
------------------------
#foreach(childName:childrenNames)
> #(childName)
#empty
> there is no child name to display.
#end
##(varName) #// this should not be interpreted because escaped with # (comment)
Lexer and parser partially implemented. My problem is how to deal with whitespace inside statement keywords like #foreach, #require.
Whitespaces should be ignored for these.
desired sample output
Family Description
==================
Father's Name: Mira
Mother's Name: James
Number of child: 0
List of children's names
------------------------
> there is no child name to display.
##(varName)
bison file content
command:
fileword
| valuation
| alternative
| loop
| command_directive
;
fileword:
tokenword { scriptlangy_echo(yytext,"fileword.tokenword"); }
| MAGICESC { scriptlangy_echo("#","fileword.MAGICESC"); }
;
tokenword:
IDENTIFIER | NUMBER | STRING_LITERAL | WHITESPACE
| INC_OP | DEC_OP | AND_OP | OR_OP | LE_OP | GE_OP | EQ_OP | NE_OP | L_OP | G_OP
| ';' | ',' | ':' | '=' | ']' | '.' | '&' | '[' | '!' | '~' | '-' | '+' | '*' | '/' | '%' | '^' | '|' | ')' | '}' | '?' | '{' | '('
;
valuation:
'#' '(' expression ')' {
fprintf(yyout, "<val>");
}
| '#' '(' expression ',' default_value ')' {
fprintf(yyout, "<val>");
}
;
loop:
for_loop
| foreach_loop
| while_loop
;
while_loop:
WHILE '(' expression ')' end_block
| WHILE '(' expression ')' commands end_block
;
for_loop:
FOR '(' expression_statement expression_statement expression')' end_block
| FOR '(' expression_statement expression_statement expression')' commands end_block
;
foreach_loop:
foreach_block end_block
| foreach_block empty_block end_block
;
foreach_block:
FOREACH '(' IDENTIFIER ')'
| FOREACH '(' IDENTIFIER ':' expression')' commands
;
The key part of your question seems to be this:
I have to ignore whitespace with some bison rules only. How to ignore
whitespace inside nested brackets.
As I remarked in comments, your implementation idea of somehow doing this by having your parser rules manipulate scanner start conditions is pretty much a non-starter. Forget about that.
Since evidently your scanner does not, in general, ignore whitespace, it must emit tokens that represent whitespace, or perhaps tokens that represent something else plus whitespace (ugly). If it emits whitespace tokens then the thing to do is simply to account for them in your grammar rules. This is completely possible. In fact, you can build a parser for any context-free language on top of a scanner that just returns every character as its own token. The scanner / parser dichotomy is a functional and conceptual convenience, not a necessity.
For example, then, suppose we want to be able to parse numeric array literals, formed as a nonempty, comma-delimited list of decimal numbers enclosed in curly braces, with optional whitespace around commas and inside the braces. Suppose further that we have these terminal symbols to work with:
OPEN // open brace
CLOSE // close brace
NUM // maximal sequence of one or more decimal digits
COMMA // a comma
WS // a maximal run of whitespace
We might then write these rules:
array: array_start array_elements CLOSE;
array_start: OPEN
| OPEN WS
;
array_elements: array_element
| array_elements array_separator array_element
;
array_element: NUM
| NUM WS
;
array_separator: COMMA
| COMMA WS
;
There are, of course, many other ways to set up the details, but, generally speaking, this is how you handle whitespace with parser rules: not by ignoring it, but by accepting it.

Bison loop for conflict

to solve the dangling else problem, I used the following solution:
stmt : stmt_matched
| stmt_unmatched
;
stmt_unmatched : IF '(' exp ')' stmt
| IF '(' exp ')' stmt_matched ELSE stmt_unmatched
;
stmt_matched : IF '(' exp ')' stmt_matched ELSE stmt_matched
| stmt_for
| ...
;
For defining the rules of grammar about the for loop, I produce a conflict shift/reduce due to the same problem:
stmt_for : FOR '(' exp ';' exp ';' exp ')' stmt
;
How can I solve this problem?
Not all for statements are matched. Consider, for example
if (c) for (;;) if (d) ; else ;
So it is necessary to divide for statements into for_matched and for_unmatched. (And similarly with other compound statements such as while.)

Parsing with MPC library returns error on grammar definition

I'm trying to use MPC to define a grammar for a language called Wittgen (https://esolangs.org/wiki/Wittgen)
I defined the following grammar:
mpc_parser_t* Variable = mpc_new("variable");
mpc_parser_t* Assign_Operator = mpc_new("assign");
mpc_parser_t* Remind_Operator = mpc_new("remind");
mpc_parser_t* Expr = mpc_new("expr");
mpc_parser_t* Envinronment = mpc_new("envinronment");
mpca_lang(MPCA_LANG_DEFAULT,
" variable : /[a-zA-Z0-9]+/ ;"
" assign : '=' ;"
" remind : '#' ;"
" expr : <variable> | <remind> <variable> '}' | <variable> <assign> <expr>+ '}' ;"
" envinronment : /^/<expr>+/$/ ;",
Variable, Assign_Operator, Remind_Operator, Expr, Envinronment);
when I try to input a variable or a remind operator (like "foo247" or "#foo247}") it parses it correctly,
but when I try to parse an assignment ("foo247=foo}"), it returns me just
WITTGEN> foo357=foo}
<stdin>:1:7: error: expected one of 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ', one or more of one of 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ', '#' or end of input at '='
I can't find the error, I'm sure something is wrong defined in the grammar, but I can't find any clue in the official documentation or in the examples
I'm not an expert on mpc and I may be wrong, in fact I'm having my own problems with it at the moment, but I don't think it supports left recursion. So since expr is contained within the expr rule it causes an error.
Edit* I was able to solve my issue by moving part of my expansion over. So the equivalent for yours would be to move variable all the way to the right so it tries to parse using the other two expansions first. I can't say for sure if that's causing your issue, but it could be worth a shot.
I had my question answered from the author of mpc here:
I simply changed the part of rule definition from
" expr : <variable> | <remind> <variable> '}' | <variable> <assign> <expr>+ '}' ;"
to:
" expr : <remind> <variable> '}' | <variable> <assign> <expr>+ '}' | <variable>;"
it was happening because there's no backtracking in mpc, so the evaluation rule order is important

Execute an action for all component of a rule in bison

I have the following rule in my bison file :
affectation: VAR '=' expr ';'
| VAR PLUSEQ expr ';'
| VAR MINUSEQ expr ';'
;
I would like the parser to display the variable name and its content every time an affectation is done. For that, I use the action {printf("%s:%s\n",$1, $3);}. However, as there are 3 forms of affectation, is there a way to apply this action to all the components without writting :
affectation: VAR '=' expr ';'
{printf("%s:%s\n",$1, $3);}
| VAR PLUSEQ expr ';'
{printf("%s:%s\n",$1, $3);}
| VAR MINUSEQ expr ';'
{printf("%s:%s\n",$1, $3);}
;
Basically, the answer is no. In most use cases, the three productions would have different semantics, so it would be normal for their to be three different actions, although they might share code. (As always, refactoring the shared code can reduce the need for duplication.)
If the three rules were really semantically identical, you could collect the different operators into a prefix rule:
aff_pfx: VAR '=' | VAR PLUSEQ | VAR MINUSEQ
affectation: aff_pfx expr ';' { handle($1, $2); }
That relies on the default action copying $$ = $1 in all the productions for aff_pfx, so it is not fully general. Also, it completely erases any distinction between the three syntaxes, which seems unlikely to be correct.
If you are just trying to produce a trace of the parse,take a look at bison's built-in debugging features.

Resources