How to parse lambda term - c

I would like to parse a lambda calculus. I dont know how to parse the term and respect parenthesis priority. Ex:
(lx ly (x(xy)))(lx ly xxxy)
I don't manage to find the good way to do this. I just can't see the adapted algorithm.
A term is represented by a structure that have a type (APPLICATION, ABSTRACTION, VARIABLE) and
a right and left component of type "struc term".
Any idea how to do this ?
EDIT
Sorry to disturb you again, but I really want to understand. Can you check the function "expression()" to let me know if I am right.
Term* expression(){
if(current==LINKER){
Term* t = create_node(ABSTRACTION);
get_next_symbol();
t->right = create_node_variable();
get_next_symbol();
t->left = expression();
}
else if(current==OPEN_PARENTHESIS){
application();
get_next_symbol();
if(current != CLOSE_PARENTHESIS){
printf("Error\n");
exit(1);
}
}
else if(current==VARIABLE){
return create_node_variable();
}
else if(current==END_OF_TERM)
{
printf("Error");
exit(1);
}
}
Thanks

The can be simplified by separating the application from other expressions:
EXPR -> l{v} APPL "abstraction"
-> (APPL) "brackets"
-> {v} "variable"
APPL -> EXPR + "application"
The only difference with your approach is that the application is represented as a list of expressions, because abcd can be implicitly read as (((ab)c)d) so you might at well store it as abcd while parsing.
Based on this grammar, a simple recursive descent parser can be created with a single character of lookahead:
EXPR: 'l' // read character, then APPL, return as abstraction
'(' // read APPL, read ')', return as-is
any // read character, return as variable
eof // fail
APPL: ')' // unread character, return as application
any // read EXPR, append to list, loop
eof // return as application
The root symbol is APPL, of course. As a post-parsing step, you can turn your APPL = list of EXPR into a tree of applications. The recursive descent is so simple that you can easily turn into an imperative solution with an explicit stack if you wish.

Related

How to write code for part of a recursive descent parser? [duplicate]

This question already has an answer here:
Can somebody walk me through what this question is trying to ask of me?
(1 answer)
Closed 3 years ago.
I'm looking for an answer to this Stack Overflow question: Can somebody walk me through what this question is trying to ask of me?
It asks to do the following, as the person who replied to it explains:
Write two functions,
ifblock
logic_expr
as part of a recursive descent parser in a language of your choosing.
<ifblock> --> if(<logic_expr>){<stmts>} [else {<stmts>}]
<logic_expr> --> <value> == <value> | <value> != <value>
For other non terminal symbols, 'stmts' and 'value' you are allowed to assume the existence of pre-written functions by the same names.
To get the next token from the input stream you can call 'lex()' which returns a code, as listed in the codes for the terminal symbols. Implement the 'ifblock' by requesting token codes by calling 'lex()' and to evaluate and match those with the required tokens according to the language grammar.
To evaluate the logical expression of the 'if' you need to step into a function 'logic_expr', which you need to write, for evaluating a logical expression as defined in the grammar, and you may assume that the non-terminal 'value' does already exist.
The answer could be like this. Of course, this is just a truncated pseudo-code for the grammar parser with not much in the way of error handling or an AST builder.
void parse() {
while (!EOF)
if (lex() == IF)
ifblock()
else // what is it?
}
void ifblock () {
if (lex() != LP)
return_error;
le = logic_expr();
if (lex() != RP)
return_error;
// parse statements in {}, optional else (if (lex() == ELSE) with {}
// if no errors
create_if_node(le, st, ...);
}
void logic_expr()
v1 = value();
op = lex();
v2 = value();
if (op == EQ)
return create_eq_node(v1, v2);
else if (op == NEQ)
return create_neq_node(v1, v2);
return_error();
}

Formal grammar of XML

Im trying to build small parser for XML files in C. I know, i could find some finished solutions but, i need just some basic stuff for embedded project. I`m trying to create grammar for describing XML without attributes, just tags, but it seems it is not working and i was not able to figure out why.
Here is the grammar:
XML : FIRST_TAG NIZ
NIZ : VAL NIZ | eps
VAL : START VAL END
| STR
| eps
Here is part of C code that implement this grammar :
void check() {
getSymbol();
if( sym == FIRST_LINE )
{
niz();
}
else {
printf("FIRST_LINE EXPECTED");
exit(1);
}
}
void niz() {
getSymbol();
if( sym == ERROR )
return;
if( sym == START ) {
back = 1;
val();
niz();
}
printf(" EPS OR START EXPECTED\n");
}
void val() {
getSymbol();
if( sym == ERROR )
return;
if( sym == START ) {
back = 0;
val();
getSymbol();
if( sym != END ) {
printf("END EXPECTED");
exit(1);
}
return;
}
if( sym == EMPTY_TAG || sym == STR)
return;
printf("START, STR, EMPTY_TAG OR EPS EXPECTED\n");
exit(1);
}
void getSymbol() {
int pom;
if(back == 1) {
back = 0;
return;
}
sym = getNextToken(cmd + offset, &pom);
offset += pom + 1;
}
EDIT: Here is the example of XML file that does not satisfy this grammar:
<?xml version="1.0"?>
<VATCHANGES>
<DATE>15/08/2012</DATE>
<TIME>1452</TIME>
<EFDSERIAL>01KE000001</EFDSERIAL>
<CHANGENUM>1</CHANGENUM>
<VATRATE>A</VATRATE>
<FROMVALUE>16.00</FROMVALUE>
<TOVALUE>18.00</TOVALUE>
<VATRATE>B</VATRATE>
<FROMVALUE>2.00</FROMVALUE>
<TOVALUE>0.00</TOVALUE>
<VATRATE>C</VATRATE>
<FROMVALUE>5.00</FROMVALUE>
<TOVALUE>0.00</TOVALUE>
<DATE>25/05/2010</DATE>
<CHANGENUM>2</CHANGENUM>
<VATRATE>C</VATRATE>
<FROMVALUE>0.00</FROMVALUE>
<TOVALUE>4.00</TOVALUE>
</VATCHANGES>
It gives END EXPECTED at the output.
First, your grammar needs some work. Assuming the preamble is handled correctly, you have a basic error in the definition of NIZ.
NIZ : VAL NIZ | eps
VAL : START VAL END
| STR
| eps
So we enter NIZ and we look for VAL first. The problem is the eps on the end of both VAL's possible productions and NIZ. Therefore, if VAL produces nothing (i.e. eps) and consumes no tokens in the process (which it can't to be proper, since eps is the production), NIZ reduces to:
NIZ: eps NIZ | eps
which isn't good.
Consider into something more along these lines: I just spewed this with no real foresight into having something beyond a purely basic construction.
XML: START_LINE ELEMENT
ELEMENT: OPENTAG BODY CLOSETAG
OPENTAG: lt id(n) gt
CLOSETAG: lt fs id(n) gt
BODY: ELEMENT | VALUE
VALUE: str | eps
This is super basic. Terminals include:
lt: '<'
gt: '>'
fs: '/'
str: any alphanumeric string excluding chars lt or gt.
id(n): any alphanumeric string excluding chars lt, gt, or fs.
I can almost feel the wrath of the XML purists raining down on me right now, but the point I'm trying to get across is that, when an grammar is well-defined, the RDP will literally write itself. Obviously the lexer (i.e. the token engine) needs to handle the terminals accordingly. Note: the id(n) is an id-stack to ensure you properly close the innermost tag, and is an attribute of your parser in accordance with how it manages tag ids. Its not traditional, but it makes things MUCH easier.
This can/should clearly be expanded to include stand-alone element declarations and short-cut element closure. For example, this grammar allows for elements of this form:
<ElementName>...</ElementName>
but not of this form:
<ElementName/>
Nor does it account for short-cut termination such as:
<ElementName>...</>
Accounting for such additions will obviously complicate the grammar considerably, but also make the parser substantially more robust. Like I said, the sample above is basic with a capital B. If you're really going to embark on this these are things you want to consider when designing your grammar, and thus also your RDP by consequence.
Anyway, just consider how a few reworks in your grammar can/will substantially make this easier on you.

Trying to make match on a rule that uses "recursive" identifier in flex

I have this line:
0, 6 -> W(1) L(#);
or
\# -> #shift_right R W(1) L
I have to parse this line with flex, and take every element from every part of the arrow and put it in a list. I know how to match simple things, but I don't know how to match multiple things with the same rule. I'm not allowed to increase the limit for rules. I have a hint: parse the pieces, pieces will then combine, and I can use states, but I don't know how to do that, and I can't find examples on the net. Can someone help me?
So, here an example:
{
a -> W(b) #invert_loop;
b -> W(a) #invert_loop;
-> L(#)
}
When this section begins I have to create a structure for each line, where I put what is on the left of -> in a vector, those are some parameters, and the right side in a list, where each term is kinda another structure. For what is on the right side I wrote rules:
writex W([a-zA-Z0-9.#]) for W(anything).
So I need to parse these lines, so I can put the parameters and the structures int the big structure. Something like this(for the first line):
new bigStruc with param = a and list of struct = W(anything), #invert(it is a notation for a reference to another structure)
So what I need is to know how to parse these line so that I can create and create and fill these bigStruct, also using to rules for simple structure(i have all I need for these structures, but I don't how to parse so that I can use these methods).
Sorry for my English and I hope this time I was more clear on what I want.
Last-minute editing: I have matched the whole line with a rule, and then work on it with strtok. There is a way to use previous rules to see what type of structure i have to create? I mean not to stay and put a lots of if, but to use writex W([a-zA-Z0-9.#]) to know that i have to create that kind of structure?
Ok, lets see how this snippet works for you:
// these are exclusive rules, so they do not overlap, for inclusive rules, use %s
%x dataStructure
%x addRules
%%
<dataStructure>-> { BEGIN addRules; }
\{ { BEGIN dataStructure; }
<addRules>; { BEGIN dataStructure; }
<dataStructure>\} { BEGIN INITIAL; }
<dataStructure>[^,]+ { ECHO; } //this will output each comma separated token
<dataStructure>. { } //ignore anything else
<dataStructure>\n { } //ignore anything else
<addRules>[^ ]+ { ECHO; } //this will output each space separated rule
<addRules>. { } //ignore anything else
<addRules>\n { } //ignore anything else
%%
I'm not entirely sure what it it you want. Edit your original post to include the contents of your comments, with examples, and please structure your English better. If you can't explain what you want without contradicting yourself, I can't help you.

How would I go about writing an interpreter in C? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'd love some references, or tips, possibly an e-book or two. I'm not looking to write a compiler, just looking for a tutorial I could follow along and modify as I go. Thank you for being understanding!
BTW: It must be C.
Any more replies would be appreciated.
A great way to get started writing an interpreter is to write a simple machine simulator. Here's a simple language you can write an interpreter for:
The language has a stack and 6 instructions:
push <num> # push a number on to the stack
pop # pop off the first number on the stack
add # pop off the top 2 items on the stack and push their sum on to the stack. (remember you can add negative numbers, so you have subtraction covered too). You can also get multiplication my creating a loop using some of the other instructions with this one.
ifeq <address> # examine the top of the stack, if it's 0, continue, else, jump to <address> where <address> is a line number
jump <address> # jump to a line number
print # print the value at the top of the stack
dup # push a copy of what's at the top of the stack back onto the stack.
Once you've written a program that can take these instructions and execute them, you've essentially created a very simple stack based virtual machine. Since this is a very low level language, you won't need to understand what an AST is, how to parse a grammar into an AST, and translate it to machine code, etc. That's too complicated for a tutorial project. Start with this, and once you've created this little VM, you can start thinking about how you can translate some common constructs into this machine. e.g. you might want to think about how you might translate a C if/else statement or while loop into this language.
Edit:
From the comments below, it sounds like you need a bit more experience with C before you can tackle this task.
What I would suggest is to first learn about the following topics:
scanf, printf, putchar, getchar - basic C IO functions
struct - the basic data structure in C
malloc - how to allocate memory, and the difference between stack memory and heap memory
linked lists - and how to implement a stack, then perhaps a binary tree (you'll need to
understand structs and malloc first)
Then it'll be good to learn a bit more about the string.h library as well
- strcmp, strdup - a couple useful string functions that will be useful.
In short, C has a much higher learning curve compared to python, just because it's a lower level language and you have to manage your own memory, so it's good to learn a few basic things about C first before trying to write an interpreter, even if you already know how to write one in python.
The only difference between an interpreter and a compiler is that instead of generating code from the AST, you execute it in a VM instead. Once you understand this, almost any compiler book, even the Red Dragon Book (first edition, not second!), is enough.
I see this is a bit of a late reply, however since this thread showed up at second place in the result list when I did a search for writing an interpreter and no one have mentioned anything very concrete I will provide the following example:
Disclaimer: This is just some simple code I wrote in a hurry in order to have a foundation for the explanation below and are therefore not perfect, but it compiles and runs, and seems to give the expected answers.
Read the following C-code from bottom to top:
#include <stdio.h>
#include <stdlib.h>
double expression(void);
double vars[26]; // variables
char get(void) { char c = getchar(); return c; } // get one byte
char peek(void) { char c = getchar(); ungetc(c, stdin); return c; } // peek at next byte
double number(void) { double d; scanf("%lf", &d); return d; } // read one double
void expect(char c) { // expect char c from stream
char d = get();
if (c != d) {
fprintf(stderr, "Error: Expected %c but got %c.\n", c, d);
}
}
double factor(void) { // read a factor
double f;
char c = peek();
if (c == '(') { // an expression inside parantesis?
expect('(');
f = expression();
expect(')');
} else if (c >= 'A' && c <= 'Z') { // a variable ?
expect(c);
f = vars[c - 'A'];
} else { // or, a number?
f = number();
}
return f;
}
double term(void) { // read a term
double t = factor();
while (peek() == '*' || peek() == '/') { // * or / more factors
char c = get();
if (c == '*') {
t = t * factor();
} else {
t = t / factor();
}
}
return t;
}
double expression(void) { // read an expression
double e = term();
while (peek() == '+' || peek() == '-') { // + or - more terms
char c = get();
if (c == '+') {
e = e + term();
} else {
e = e - term();
}
}
return e;
}
double statement(void) { // read a statement
double ret;
char c = peek();
if (c >= 'A' && c <= 'Z') { // variable ?
expect(c);
if (peek() == '=') { // assignment ?
expect('=');
double val = expression();
vars[c - 'A'] = val;
ret = val;
} else {
ungetc(c, stdin);
ret = expression();
}
} else {
ret = expression();
}
expect('\n');
return ret;
}
int main(void) {
printf("> "); fflush(stdout);
for (;;) {
double v = statement();
printf(" = %lf\n> ", v); fflush(stdout);
}
return EXIT_SUCCESS;
}
This is an simple recursive descend parser for basic mathematical expressions supporting one letter variables. Running it and typing some statements yields the following results:
> (1+2)*3
= 9.000000
> A=1
= 1.000000
> B=2
= 2.000000
> C=3
= 3.000000
> (A+B)*C
= 9.000000
You can alter the get(), peek() and number() to read from a file or list of code lines. Also you should make a function to read identifiers (basically words). Then you expand the statement() function to be able to alter which line it runs next in order to do branching. Last you add the branch operations you want to the statement function, like
if "condition" then
"statements"
else
"statements"
endif.
while "condition" do
"statements"
endwhile
function fac(x)
if x = 0 then
return 1
else
return x*fac(x-1)
endif
endfunction
Obviously you can decide the syntax to be as you like. You need to think about ways of define functions and how to handle arguments/parameter variables, local variables and global variables. If preferable arrays and data structures. References∕pointers. Input/output?
In order to handle recursive function calls you probably need to use a stack.
In my opinion this would be easier to do all this with C++ and STL. Where for example one std::map could be used to hold local variables, and another map could be used for globals...
It is of course possible to write a compiler that build an abstract syntax tree out of the code. Then travels this tree in order to produce either machine code or some kind of byte code which executed on a virtual machine (like Java and .Net). This gives better performance than naively parse line by line and executing them, but in my opinion that is NOT writing an interpreter. That is writing both a compiler and its targeted virtual machine.
If someone wants to learn to write an interpreter, they should try making the most basic simple and practical working interpreter.

Parsing some particular statements with antlr3 in C target

I have some questions about antlr3 with tree grammar in C target.
I have almost done my interpretor (functions, variables, boolean and math expressions ok) and i have kept the most difficult statements for the end (like if, switch, etc.)
1) I would like interpreting a simple loop statement:
repeat: ^(REPEAT DIGIT stmt);
I've seen many examples but nothing about the tree walker (only a topic here with the macros MARK() / REWIND(m) + #init / #after but not working (i've antlr errors: "unexpected node at offset 0")). How can i interpret this statement in C?
2) Same question with a simple if statement:
if: ^(IF condition stmt elseifstmt* elsestmt?);
The problem is to skip the statement if the condition is false and test the other elseif/else statements.
3) I have some statements which can stop the script (like "break" or "exit"). How can i interrupt the tree walker and skip the following tokens?
4) When a lexer or parser error is detected, antlr returns an error. But i would like to make my homemade error messages. How can i have the line number where parser crashed?
Ask me if you want more details.
Thanks you very much (and i apologize for my poor english)
About the repeat statement, i think i've found a way to do it. In antlr.org, i've found a complete interpreter for C-- language but made in Java.
I put here the while statement (a bit different but the way is the same):
whileStmt
scope{
Boolean breaked;
}
#after{
CommonTree stmtNode=(CommonTree)$whileStmt.start.getChild(1);
CommonTree exprNode=(CommonTree)$whileStmt.start.getChild(0);
int test;
$whileStmt::breaked=false;
while($whileStmt::breaked==false){
stream.push(stream.getNodeIndex(exprNode));
test=expr().value;
stream.pop();
if (test==0) break;
stream.push(stream.getNodeIndex(stmtNode));
stmt();
stream.pop();
}
}
: ^(WHILE . .)
;
I've tried to transform this code into C language:
repeat
scope {
int breaked;
int tours;
}
#after
{
int test;
pANTLR3_BASE_TREE repeatstmt = (pANTLR3_BASE_TREE)$repeat.start->getChild($repeat.start,1);
pANTLR3_BASE_TREE exprstmt = (pANTLR3_BASE_TREE)$repeat.start->getChild($repeat.start,0);
$repeat::breaked = 0;
test = 1;
while($repeat::breaked == 0)
{
TW_FOLLOWPUSH(exprstmt);
TW_FOLLOWPOP();
test++;
if(test == $repeat::tours)
break;
TW_FOLLOWPUSH(repeatstmt);
CTX->repeat(CTX);
TW_FOLLOWPOP();
}
}
: ^(REPEAT DIGIT stmt)
{
$repeat::tours = $DIGIT.text->toInt32($DIGIT.text);
}
But nothing happened (stmt is parsed juste once).
Do you have an idea about this please?
About the homemade errors messages, i've found the macro GETLINE() in the lexer. It works when the tree walker crashes but antlr continues to display errors messages for lexer or parser errors.
Thanks.

Resources