How would I go about writing an interpreter in C? [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'd love some references, or tips, possibly an e-book or two. I'm not looking to write a compiler, just looking for a tutorial I could follow along and modify as I go. Thank you for being understanding!
BTW: It must be C.
Any more replies would be appreciated.

A great way to get started writing an interpreter is to write a simple machine simulator. Here's a simple language you can write an interpreter for:
The language has a stack and 6 instructions:
push <num> # push a number on to the stack
pop # pop off the first number on the stack
add # pop off the top 2 items on the stack and push their sum on to the stack. (remember you can add negative numbers, so you have subtraction covered too). You can also get multiplication my creating a loop using some of the other instructions with this one.
ifeq <address> # examine the top of the stack, if it's 0, continue, else, jump to <address> where <address> is a line number
jump <address> # jump to a line number
print # print the value at the top of the stack
dup # push a copy of what's at the top of the stack back onto the stack.
Once you've written a program that can take these instructions and execute them, you've essentially created a very simple stack based virtual machine. Since this is a very low level language, you won't need to understand what an AST is, how to parse a grammar into an AST, and translate it to machine code, etc. That's too complicated for a tutorial project. Start with this, and once you've created this little VM, you can start thinking about how you can translate some common constructs into this machine. e.g. you might want to think about how you might translate a C if/else statement or while loop into this language.
Edit:
From the comments below, it sounds like you need a bit more experience with C before you can tackle this task.
What I would suggest is to first learn about the following topics:
scanf, printf, putchar, getchar - basic C IO functions
struct - the basic data structure in C
malloc - how to allocate memory, and the difference between stack memory and heap memory
linked lists - and how to implement a stack, then perhaps a binary tree (you'll need to
understand structs and malloc first)
Then it'll be good to learn a bit more about the string.h library as well
- strcmp, strdup - a couple useful string functions that will be useful.
In short, C has a much higher learning curve compared to python, just because it's a lower level language and you have to manage your own memory, so it's good to learn a few basic things about C first before trying to write an interpreter, even if you already know how to write one in python.

The only difference between an interpreter and a compiler is that instead of generating code from the AST, you execute it in a VM instead. Once you understand this, almost any compiler book, even the Red Dragon Book (first edition, not second!), is enough.

I see this is a bit of a late reply, however since this thread showed up at second place in the result list when I did a search for writing an interpreter and no one have mentioned anything very concrete I will provide the following example:
Disclaimer: This is just some simple code I wrote in a hurry in order to have a foundation for the explanation below and are therefore not perfect, but it compiles and runs, and seems to give the expected answers.
Read the following C-code from bottom to top:
#include <stdio.h>
#include <stdlib.h>
double expression(void);
double vars[26]; // variables
char get(void) { char c = getchar(); return c; } // get one byte
char peek(void) { char c = getchar(); ungetc(c, stdin); return c; } // peek at next byte
double number(void) { double d; scanf("%lf", &d); return d; } // read one double
void expect(char c) { // expect char c from stream
char d = get();
if (c != d) {
fprintf(stderr, "Error: Expected %c but got %c.\n", c, d);
}
}
double factor(void) { // read a factor
double f;
char c = peek();
if (c == '(') { // an expression inside parantesis?
expect('(');
f = expression();
expect(')');
} else if (c >= 'A' && c <= 'Z') { // a variable ?
expect(c);
f = vars[c - 'A'];
} else { // or, a number?
f = number();
}
return f;
}
double term(void) { // read a term
double t = factor();
while (peek() == '*' || peek() == '/') { // * or / more factors
char c = get();
if (c == '*') {
t = t * factor();
} else {
t = t / factor();
}
}
return t;
}
double expression(void) { // read an expression
double e = term();
while (peek() == '+' || peek() == '-') { // + or - more terms
char c = get();
if (c == '+') {
e = e + term();
} else {
e = e - term();
}
}
return e;
}
double statement(void) { // read a statement
double ret;
char c = peek();
if (c >= 'A' && c <= 'Z') { // variable ?
expect(c);
if (peek() == '=') { // assignment ?
expect('=');
double val = expression();
vars[c - 'A'] = val;
ret = val;
} else {
ungetc(c, stdin);
ret = expression();
}
} else {
ret = expression();
}
expect('\n');
return ret;
}
int main(void) {
printf("> "); fflush(stdout);
for (;;) {
double v = statement();
printf(" = %lf\n> ", v); fflush(stdout);
}
return EXIT_SUCCESS;
}
This is an simple recursive descend parser for basic mathematical expressions supporting one letter variables. Running it and typing some statements yields the following results:
> (1+2)*3
= 9.000000
> A=1
= 1.000000
> B=2
= 2.000000
> C=3
= 3.000000
> (A+B)*C
= 9.000000
You can alter the get(), peek() and number() to read from a file or list of code lines. Also you should make a function to read identifiers (basically words). Then you expand the statement() function to be able to alter which line it runs next in order to do branching. Last you add the branch operations you want to the statement function, like
if "condition" then
"statements"
else
"statements"
endif.
while "condition" do
"statements"
endwhile
function fac(x)
if x = 0 then
return 1
else
return x*fac(x-1)
endif
endfunction
Obviously you can decide the syntax to be as you like. You need to think about ways of define functions and how to handle arguments/parameter variables, local variables and global variables. If preferable arrays and data structures. References∕pointers. Input/output?
In order to handle recursive function calls you probably need to use a stack.
In my opinion this would be easier to do all this with C++ and STL. Where for example one std::map could be used to hold local variables, and another map could be used for globals...
It is of course possible to write a compiler that build an abstract syntax tree out of the code. Then travels this tree in order to produce either machine code or some kind of byte code which executed on a virtual machine (like Java and .Net). This gives better performance than naively parse line by line and executing them, but in my opinion that is NOT writing an interpreter. That is writing both a compiler and its targeted virtual machine.
If someone wants to learn to write an interpreter, they should try making the most basic simple and practical working interpreter.

Related

How does the break statement work in this function? [duplicate]

Can you break out of an if statement or is it going to cause crashes? I'm starting to acquaint myself with C, but this seems controversial. The first image is from a book on C
("Head First C") and the snippet shows code written by Harvard's CS classes staff. What is actually going on and has it something to do with C standards?
breaks don't break if statements.
On January 15, 1990, AT&T's long-distance telephone system crashed, and 60,000 people lost their phone service. The cause? A developer working on the C code used in the exchanges tried to use a break to break out of an if statement. But breaks don't break out of ifs. Instead, the program skipped an entire section of code and introduced a bug that interrupted 70 million phone calls over nine hours.
for (size = 0; size < HAY_MAX; size++)
{
// wait for hay until EOF
printf("\nhaystack[%d] = ", size);
int straw = GetInt();
if (straw == INT_MAX)
break;
// add hay to stack
haystack[size] = straw;
}
printf("\n");
break interacts solely with the closest enclosing loop or switch, whether it be a for, while or do .. while type. It is frequently referred to as a goto in disguise, as all loops in C can in fact be transformed into a set of conditional gotos:
for (A; B; C) D;
// translates to
A;
goto test;
loop: D;
iter: C;
test: if (B) goto loop;
end:
while (B) D; // Simply doesn't have A or C
do { D; } while (B); // Omits initial goto test
continue; // goto iter;
break; // goto end;
The difference is, continue and break interact with virtual labels automatically placed by the compiler. This is similar to what return does as you know it will always jump ahead in the program flow. Switches are slightly more complicated, generating arrays of labels and computed gotos, but the way break works with them is similar.
The programming error the notice refers to is misunderstanding break as interacting with an enclosing block rather than an enclosing loop. Consider:
for (A; B; C) {
D;
if (E) {
F;
if (G) break; // Incorrectly assumed to break if(E), breaks for()
H;
}
I;
}
J;
Someone thought, given such a piece of code, that G would cause a jump to I, but it jumps to J. The intended function would use if (!G) H; instead.
This is actually the conventional use of the break statement. If the break statement wasn't nested in an if block the for loop could only ever execute one time.
MSDN lists this as their example for the break statement.
As already mentioned that, break-statement works only with switches and loops. Here is another way to achieve what is being asked. I am reproducing
https://stackoverflow.com/a/257421/1188057 as nobody else mentioned it. It's just a trick involving the do-while loop.
do {
// do something
if (error) {
break;
}
// do something else
if (error) {
break;
}
// etc..
} while (0);
Though I would prefer the use of goto-statement.
I think the question is a little bit fuzzy - for example, it can be interpreted as a question about best practices in programming loops with if inside. So, I'll try to answer this question with this particular interpretation.
If you have if inside a loop, then in most cases you'd like to know how the loop has ended - was it "broken" by the if or was it ended "naturally"? So, your sample code can be modified in this way:
bool intMaxFound = false;
for (size = 0; size < HAY_MAX; size++)
{
// wait for hay until EOF
printf("\nhaystack[%d] = ", size);
int straw = GetInt();
if (straw == INT_MAX)
{intMaxFound = true; break;}
// add hay to stack
haystack[size] = straw;
}
if (intMaxFound)
{
// ... broken
}
else
{
// ... ended naturally
}
The problem with this code is that the if statement is buried inside the loop body, and it takes some effort to locate it and understand what it does. A more clear (even without the break statement) variant will be:
bool intMaxFound = false;
for (size = 0; size < HAY_MAX && !intMaxFound; size++)
{
// wait for hay until EOF
printf("\nhaystack[%d] = ", size);
int straw = GetInt();
if (straw == INT_MAX)
{intMaxFound = true; continue;}
// add hay to stack
haystack[size] = straw;
}
if (intMaxFound)
{
// ... broken
}
else
{
// ... ended naturally
}
In this case you can clearly see (just looking at the loop "header") that this loop can end prematurely. If the loop body is a multi-page text, written by somebody else, then you'd thank its author for saving your time.
UPDATE:
Thanks to SO - it has just suggested the already answered question about crash of the AT&T phone network in 1990. It's about a risky decision of C creators to use a single reserved word break to exit from both loops and switch.
Anyway this interpretation doesn't follow from the sample code in the original question, so I'm leaving my answer as it is.
You could possibly put the if into a foreach a for, a while or a switch like this
Then break and continue statements will be available
foreach ([1] as $i) if ($condition) { // Breakable if
//some code
$a = "b";
// Le break
break;
// code below will not be executed
}
for ($i=0; $i < 1 ; $i++) if ($condition) {
//some code
$a = "b";
// Le break
break;
// code below will not be executed
}
switch(0){ case 0: if($condition){
//some code
$a = "b";
// Le break
break;
// code below will not be executed
}}
while(!$a&&$a=1) if ($condition) {
//some code
$a = "b";
// Le break
break;
// code below will not be executed
}

Using a c-program to read an NMEA-string

I am trying to make a c-program that will will a string, but I want it only to read a very small part of it.
The NMEA-telegram that I try to read is $WIXDR, and do receive the necessary strings.
Here's 2 examples of strings that I get into the CPU:
$WIXDR,C,1.9,C,0,H,83.2,P,0,P,1023.9,H,0*46
$WIXDR,V,0.01,M,0,Z,10,s,0,R,0.8,M,0,V,0.0,M,1,Z,0,s,1,R,0.0,M,1,R,89.9,M,2,R,0.0,M,3*60
If it were only 1 string (not both C and V), this would not be a problem for me.
The problem here is that it's 2 seperate strings. One with the temperature, and one with rain-info.
The only thing that I'm interested in is the value "1.9" from
$WIXDR,C,1.9,C,0......
Here's what I have so far:
void ProcessXDR(char* buffPtr)
{
char valueBuff[10];
int result, x;
float OutSideTemp;
USHORT uOutSideTemp;
// char charTemperature, charRain
IODBerr eCode;
//Outside Temperature
result = ReadAsciiVariable(buffPtr, &valueBuff[0], &buffPtr, sizeof(valueBuff));
sscanf(&valueBuff[0],"%f",&OutSideTemp);
OutSideTemp *= 10;
uOutSideTemp = (USHORT)OutSideTemp;
eCode = IODBWrite(ANALOG_IN,REG_COM_XDR,1,&uOutSideTemp,NULL);
}
// XDR ...
if(!strcmp(&nmeaHeader[0],"$WIXDR"))
{
if(PrintoutEnable)printf("XDR\n");
ProcessXDR(buffPtr);
Timer[TIMER_XDR] = 1200; // Update every minute
ComStateXDR = 1;
eCode = IODBWrite(DISCRETE_IN,REG_COM_STATE_XDR,1,&ComStateXDR,NULL);
}
There's more, but this is the main part that I have.
I have found the answer to my own question. The code that would do as I intented is as follows:
What my little code does, is to look for the letter C, and if the C is found, it will take the value after it and put it into "OutSideTemp". The reason I had to look for C is that there is also a similar string received with the letter V (Rain).
If someone have any input in a way it could be better, I don't mind, but this little piece here does what I need it to do.
Here's to example telegrams I receive (I wanted the value 3.0 to be put into "OutSideTemp"):
$WIXDR,C,3.0,C,0,H,59.2,P,0,P,1026.9,H,04F
$WIXDR,V,0.00,M,0,Z,0,s,0,R,0.0,M,0,V,0.0,M,1,Z,0,s,1,R,0.0,M,1,R,89.9,M,2,R,0.0,M,358
void ProcessXDR(char* buffPtr)
{
char valueBuff[10];
int result, x;
float OutSideTemp;
USHORT uOutSideTemp;
// char charTemperature, charRain
IODBerr eCode;
// Look for "C"
result = ReadAsciiVariable(buffPtr, &valueBuff[0], &buffPtr, sizeof(valueBuff));
// sscanf(&valueBuff[0],"%f",&charTemperature);
if (valueBuff[0] == 'C')
//Outside Temperature
result = ReadAsciiVariable(buffPtr, &valueBuff[0], &buffPtr, sizeof(valueBuff));
sscanf(&valueBuff[0],"%f",&OutSideTemp);
OutSideTemp *= 10;
uOutSideTemp = (USHORT)OutSideTemp;
eCode = IODBWrite(ANALOG_IN,REG_COM_XDR,1,&uOutSideTemp,NULL);
}

Can gcc/clang optimize initialization computing?

I recently wrote a parser generator tool that takes a BNF grammar (as a string) and a set of actions (as a function pointer array) and output a parser (= a state automaton, allocated on the heap). I then use another function to use that parser on my input data and generates a abstract syntax tree.
In the initial parser generation, there is quite a lot of steps, and i was wondering if gcc or clang are able to optimize this, given constant inputs to the parser generation function (and never using the pointers values, only dereferencing them) ? Is is possible to run the function at compile time, and embed the result (aka, the allocated memory) in the executable ?
(obviously, that would be using link time optimization, since the compiler would need to be able to check that the whole function does indeed have the same result with the same parameters)
What you could do in this case is have code that generates code.
Have your initial parser generator as a separate piece of code that runs independently. The output of this code would be a header file containing a set of variable definitions initialized to the proper values. You then use this file in your main code.
As an example, suppose you have a program that needs to know the number of bits that are set in a given byte. You could do this manually whenever you need:
int count_bits(uint8_t b)
{
int count = 0;
while (b) {
count += b & 1;
b >>= 1;
}
return count;
}
Or you can generate the table in a separate program:
int main()
{
FILE *header = fopen("bitcount.h", "w");
if (!header) {
perror("fopen failed");
exit(1);
}
fprintf(header, "int bit_counts[256] = {\n");
int count;
unsigned v;
for (v=0,count=0; v<256; v++) {
uint8_t b = v;
while (b) {
count += b & 1;
b >>= 1;
}
fprintf(header, " %d,\n" count);
}
fprintf(header, "};\n");
fclose(header);
return 0;
}
This create a file called bitcount.h that looks like this:
int bit_counts[256] = {
0,
1,
1,
2,
...
7,
};
That you can include in your "real" code.

passing struct to function C

I have initialised 3 instances of a cache I have defined using typedef. I have done some processing on them in a serious of if statements in the following way :
cache cache1;
cache cache2;
cache cache3;
int a;
void main(...) {
if (a == 0) {
cache1.attribute = 5;
}
else if (a == 1) {
cache2.attribute = 1;
}
else if (a == 2) {
cache3.attribute = 2 ;
}
However now I need to make the design modular in the following way:
cache cache1;
cache cache2;
cache cache3;
void cache_operator( cache user_cache, int a ) {
user_cache.attribute = a;
}
void main(...) {
if (a == 0) {
cache_operator(cache1,5);
}
else if (a == 1) {
cache_operator(cache2,1);
}
...
I am having trouble with passing the cache to the method. I'm used to java programming and I'm not very familiar with c pointers. However, if I pass the cache itself as shown above I am passing a copy of the cache on the stack which then produces results different to the original code. How do I properly transform the first design into the second design when it comes to passing the appropriate cache to the function and making sure it is accessed properly.
In C language, if you want to keep track of the original 'data' instead of creating a copy in the function, you have to pass the pointer of that data to that function.
Pointer in C is just like the reference to object in JAVA.
Following is how you do it.
void cache_operator( cache *user_cache, int a )
{
user_cache->attribute = a;
}
Following is how you call the function.
cache_operator(&cache1,5);
I also started with JAVA. I don't know why some universities nowadays use JAVA as beginning language... It is quite strange, since JAVA is a high-level language making the abstraction of low-level detail, whereas C is a rather low-level language. In the past, this will never be the case..

How to make external Mathematica functions interruptible?

I had an earlier question about integrating Mathematica with functions written in C++.
This is a follow-up question:
If the computation takes too long I'd like to be able to abort it using Evaluation > Abort Evaluation. Which of the technologies suggested in the answers make it possible to have an interruptible C-based extension function? How can "interruptibility" be implemented on the C side?
I need to make my function interruptible in a way which will corrupt neither it, nor the Mathematica kernel (i.e. it should be possible to call the function again from Mathematica after it has been interrupted)
For MathLink - based functions, you will have to do two things (On Windows): use MLAbort to check for aborts, and call MLCallYieldFunction, to yield the processor temporarily. Both are described in the MathLink tutorial by Todd Gayley from way back, available here.
Using the bits from my previous answer, here is an example code to compute the prime numbers (in an inefficient manner, but this is what we need here for an illustration):
code =
"
#include <stdlib.h>
extern void primes(int n);
static void yield(){
MLCallYieldFunction(
MLYieldFunction(stdlink),
stdlink,
(MLYieldParameters)0 );
}
static void abort(){
MLPutFunction(stdlink,\" Abort \",0);
}
void primes(int n){
int i = 0, j=0,prime = 1, *d = (int *)malloc(n*sizeof(int)),ctr = 0;
if(!d) {
abort();
return;
}
for(i=2;!MLAbort && i<=n;i++){
j=2;
prime = 1;
while (!MLAbort && j*j <=i){
if(i % j == 0){
prime = 0;
break;
}
j++;
}
if(prime) d[ctr++] = i;
yield();
}
if(MLAbort){
abort();
goto R1;
}
MLPutFunction(stdlink,\"List\",ctr);
for(i=0; !MLAbort && i < ctr; i++ ){
MLPutInteger(stdlink,d[i]);
yield();
}
if(MLAbort) abort();
R1: free(d);
}
";
and the template:
template =
"
void primes P((int ));
:Begin:
:Function: primes
:Pattern: primes[n_Integer]
:Arguments: { n }
:ArgumentTypes: { Integer }
:ReturnType: Manual
:End:
";
Here is the code to create the program (taken from the previous answer, slightly modified):
Needs["CCompilerDriver`"];
fullCCode = makeMLinkCodeF[code];
projectDir = "C:\\Temp\\MLProject1";
If[! FileExistsQ[projectDir], CreateDirectory[projectDir]]
pname = "primes";
files = MapThread[
Export[FileNameJoin[{projectDir, pname <> #2}], #1,
"String"] &, {{fullCCode, template}, {".c", ".tm"}}];
Now, here we create it:
In[461]:= exe=CreateExecutable[files,pname];
Install[exe]
Out[462]= LinkObject["C:\Users\Archie\AppData\Roaming\Mathematica\SystemFiles\LibraryResources\
Windows-x86-64\primes.exe",161,10]
and use it:
In[464]:= primes[20]
Out[464]= {2,3,5,7,11,13,17,19}
In[465]:= primes[10000000]
Out[465]= $Aborted
In the latter case, I used Alt+"." to abort the computation. Note that this won't work correctly if you do not include a call to yield.
The general ideology is that you have to check for MLAbort and call MLCallYieldFunction for every expensive computation, such as large loops etc. Perhaps, doing that for inner loops like I did above is an overkill though. One thing you could try doing is to factor the boilerplate code away by using the C preprocessor (macros).
Without ever having tried it, it looks like the Expression Packet functionality might work in this way - if your C code goes back and asks mathematica for some more work to do periodically, then hopefully aborting execution on the mathematica side will tell the C code that there is no more work to do.
If you are using LibraryLink to link external C code to the Mathematica kernel, you can use the Library callback function AbortQ to check if an abort is in progress.

Resources