Parsing with MPC library returns error on grammar definition - c

I'm trying to use MPC to define a grammar for a language called Wittgen (https://esolangs.org/wiki/Wittgen)
I defined the following grammar:
mpc_parser_t* Variable = mpc_new("variable");
mpc_parser_t* Assign_Operator = mpc_new("assign");
mpc_parser_t* Remind_Operator = mpc_new("remind");
mpc_parser_t* Expr = mpc_new("expr");
mpc_parser_t* Envinronment = mpc_new("envinronment");
mpca_lang(MPCA_LANG_DEFAULT,
" variable : /[a-zA-Z0-9]+/ ;"
" assign : '=' ;"
" remind : '#' ;"
" expr : <variable> | <remind> <variable> '}' | <variable> <assign> <expr>+ '}' ;"
" envinronment : /^/<expr>+/$/ ;",
Variable, Assign_Operator, Remind_Operator, Expr, Envinronment);
when I try to input a variable or a remind operator (like "foo247" or "#foo247}") it parses it correctly,
but when I try to parse an assignment ("foo247=foo}"), it returns me just
WITTGEN> foo357=foo}
<stdin>:1:7: error: expected one of 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ', one or more of one of 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ', '#' or end of input at '='
I can't find the error, I'm sure something is wrong defined in the grammar, but I can't find any clue in the official documentation or in the examples

I'm not an expert on mpc and I may be wrong, in fact I'm having my own problems with it at the moment, but I don't think it supports left recursion. So since expr is contained within the expr rule it causes an error.
Edit* I was able to solve my issue by moving part of my expansion over. So the equivalent for yours would be to move variable all the way to the right so it tries to parse using the other two expansions first. I can't say for sure if that's causing your issue, but it could be worth a shot.

I had my question answered from the author of mpc here:
I simply changed the part of rule definition from
" expr : <variable> | <remind> <variable> '}' | <variable> <assign> <expr>+ '}' ;"
to:
" expr : <remind> <variable> '}' | <variable> <assign> <expr>+ '}' | <variable>;"
it was happening because there's no backtracking in mpc, so the evaluation rule order is important

Related

Parser that controls is every { closed and matches the closest one

My scanner is works fine but i couldn't find whats wrong with my parser
semi: "{" vallist "}"
| "{" "}""
;
val: tSTR
| tInt
;
vallist: vallist , val
| val
;
You have a number of problems, some of which are probably just typos in your copy-paste (what you have above will be rejected by bison).
Your main problem is probably using " (double quotes) for your tokens, which for the most part doesn't do anything useful -- it creates a 'new' token that is not the same as the single character token your lexer probably returns.
Instead, you want to use ' (single quotes):
semi: '{' vallist '}'
| '{' '}'
;
val: tSTR
| tInt
| semi
;
vallist: vallist ',' val
| val
;

How to define standard mathematical notation with mpc parser

I am reading a compiler tutorial here www.buildyourownlisp.com. It uses a parser combinator called mpc. What I have at the moment will parse polish notation, but I'm trying to work out how to use standard notation with it. I just can't seem to how to do it.
The rules of the parser are as follows:
. Any character is required.
a The character a is required.
[abcdef] Any character in the set abcdef is required.
[a-f] Any character in the range a to f is required.
a? The character a is optional.
a* Zero or more of the character a are required.
a+ One or more of the character a are required.
^ The start of input is required.
$ The end of input is required.
"ab" The string ab is required.
'a' The character a is required.
'a' 'b' First 'a' is required, then 'b' is required.
'a' | 'b' Either 'a' is required, or 'b' is required.
'a'* Zero or more 'a' are required.
'a'+ One or more 'a' are required.
<abba> The rule called abba is required.
The polish notation is written like this:
" \
number: /-?[0-9]+/'.'?/[0-9]+/ ; \
operator: '+' | '-' | '*' | '/' | '%' | \"add\" | \"sub\" | \"mul\" | \"div\" ; \
expr: <number> | '(' <operator> <expr>+ ')' ; \
dlispy: /^/ <operator> <expr>+ /$/ ;",
I've managed to make it accept decimal numbers by adding '.'?/[0-9]+/, but i can't work out how to restructure it to accept standard notation eg 2*(3+2) instead of *2 (+ 3 2). I know that I'll have to rewrite the expr and the dlispy rules, but I'm new to regex and BNF. Hope you can help, thanks
Written as yacc rules, that would be:
expr : '(' expr ')
| expr operator expr
| number
;

Execute an action for all component of a rule in bison

I have the following rule in my bison file :
affectation: VAR '=' expr ';'
| VAR PLUSEQ expr ';'
| VAR MINUSEQ expr ';'
;
I would like the parser to display the variable name and its content every time an affectation is done. For that, I use the action {printf("%s:%s\n",$1, $3);}. However, as there are 3 forms of affectation, is there a way to apply this action to all the components without writting :
affectation: VAR '=' expr ';'
{printf("%s:%s\n",$1, $3);}
| VAR PLUSEQ expr ';'
{printf("%s:%s\n",$1, $3);}
| VAR MINUSEQ expr ';'
{printf("%s:%s\n",$1, $3);}
;
Basically, the answer is no. In most use cases, the three productions would have different semantics, so it would be normal for their to be three different actions, although they might share code. (As always, refactoring the shared code can reduce the need for duplication.)
If the three rules were really semantically identical, you could collect the different operators into a prefix rule:
aff_pfx: VAR '=' | VAR PLUSEQ | VAR MINUSEQ
affectation: aff_pfx expr ';' { handle($1, $2); }
That relies on the default action copying $$ = $1 in all the productions for aff_pfx, so it is not fully general. Also, it completely erases any distinction between the three syntaxes, which seems unlikely to be correct.
If you are just trying to produce a trace of the parse,take a look at bison's built-in debugging features.

AngularDart directive expression grammar for mustache and other directives

What is the grammar of expressions that are permitted within AngularDart mustaches {{...}} and other directives?
Here is an EBNF grammar for AngularDart expressions, in the same notation used in the Dart Programming Language Specification. These expressions can appear as arguments to Angular directives. While the grammar allows, e.g., a semicolon-separated list of expressions, assignments and conditionals, these will not be accepted by all directives---e.g., ng-click supports multiple expressions possibly with assignments, whereas the mustache directive {{...}} expects a single expression.
expressions: expression (';' expressions)?
expression:
literal
| id args? # variable or function
| expression '.' id args? # member
| expression '|' id filterArg* # filter
| expression '[' expression ']'
| preOp expression
| expression binOp expression
| expression '?' expression ':' expression
| expression '=' expression # assignment
args: '(' expressionList? ')'
filterArg: ':' expression
expressionList: expression (',' expression)?
literal:
'null'
| stringLiteral
| numberLiteral
| boolLiteral
| '[' expressionList? ']'
| '{' (keyValuePair (',' keyValuePair)? )? '}'
keyValuePair:
expression ':' expression
The preOp and binOp are mainly those supported by Dart (though I will have to crosscheck that). There is a more nicely formatted version of the above here (I could not get the MD to cooperate).

issue with the function definition in my grammar

I have an issue with the function definition in my C grammar wich can be found here http://www.archive-host.com/files/1959635/24fe084677d7655eb57ba66e1864081450017dd9/cAST.txt, it does not define correctly and I can't multiply it by something.
The code I am tring to input is this one :
int factorielle(int n)
{ int x;
if ( n == 0)
return 1;
else return n*factorielle(n-1);
}
The function definition is this one :
function_definition
: declaration_specifiers declarator compound_statement
| declarator compound_statement
;
declaration_specifiers should be linked to int and declarator to factorielle(int n), to do this I replaced this :
direct_declarator
: ID ((direct_declarator '[' ']') | (direct_declarator '(' parameter_type_list ')') | (direct_declarator '(' identifier_list ')') | (direct_declarator '(' ')') )*
with
direct_declarator
: ID ((direct_declarator '[' ']') | (direct_declarator '(' parameter_type_list ')') | (direct_declarator '(' identifier_list ')') | (direct_declarator '(' ')') | '(' parameter_type_list ')' )*
But it does not help much.
As for the multiplication I don't know how to do without bringing conflict.
is there a way to fix this please ?
You're likely to have an difficult time parsing real C code using a pure grammar with pure ANTLR.
The reason is that certain declarations look like legitimate executable statements. (While the referenced answer seems to be about LR(1) parsers, it is really about parsers that cannot handle ambiguity; ANTLR cannot).
The only way to tell them apart is to use context information available from earlier symbol declarations. So you will have to collect symbol types as you parse, and inspect that information in the grammar rule reductions to decide whether such instances are statements or declarations. (I don't know how one implements this in ANTLR, although I believe it to be possible).
I may have found a solution to the first part of the issue by remplacing
compound_statement
: '{' '}'
| '{' statement_list '}'
;
with
compound_statement
: '{' '}'
| '{' statement_list '}'
| '{' external_declaration+ '}'
;
and adding this to direct_declarator:
| ID '(' parameter_type_list ')'
But I don't know if it will bring some conflicts.

Resources