A more complete recursive descent c interpreter - c

I've seen several implementations of recursive descent c interpreters which all seem
to do a pretty good job - yet they all only implement a small portion of the C language -
for example they don't support structs or typedefs etc -
Does anyone know of any code that supports a large portion of the C language.
I know adding more functionality would be pretty trivial - but I'm a bit strapped
for time.

Picoc supports more that most of the Tiny/Small C interpreters. You might give it a look. And it does support structures.

If you just want to use it, this one looks awfully good for the job. There was a Dr. Dobb's article on it a while back ... there it is

Related

How much lisp to implement in C before writing extension in itself?

I am implementing a lisp interpreter in C, i have implemented along with few primitives like cons , car, cdr , eq, basic arithmetic stuff.
Just before i was starting to implement define and lambda it occurred to me that i need to implement an environment. I am unsure if i could implement it in lisp itself.
My intent is to implement minimal amount of lisp so that i could write extension to the language in itself. I am not sure how much is minimal, Would implementing FFI Qualify as minimal ?
The answer to your question depends on the meaning that you give to the word “minimal”.
Given your question, and assuming that you don't want to make an implementation competing with the nowdays fine implementations of Common Lisp and Schema, my hypothesis is that with “minimal” you intend: Turing complete, that is capable of expressing any computation expressible in a general purpose programming language.
With this assumption, you need to implement three other things:
conditional forms (cond)
lambda expressions (lambda)
a way of defining recursive lambda expression (labels or defun)
Your interpreter then should be able to evaluate forms. This should be sufficient to have a language equivalent to the initial LISP, that allow to express in the language any computable function.
First off, you are talking about first writing a LISP interpreter. You have a lot of choices to take when it comes to scoping, LISP1 vs LISP2 since these questions alter the implementation core. An interpreter is a general purpose program that reads and evaluates code. It can support abstractions but it won't extend itself by making more native stuff.
If you are interested in such stuff you can perhaps make a compiler instead. Eg. there are many Sceme like subsets that compiles to C or Java code, but you can make your own VM. Thus it can indeed compile itself to be run on it's own target machine (self hosting) if all the forms and procedures you use has been implemented using the primitives supported by the compiler.
Making a dumb compiler is not much difference from making an interpreter. That is very clear if yo've watched the SICP videos (10A is about compilation, 7A-B is about interpreters)
The environment can be a chain of pairs just as in a LISP interpreter. It would be difficult to implement the environment of itself in LISP without making it a very difficult Lisp language to use (unless it's compiled that is)
You may use the data structures of lisp and the primitives from the C code though.
Making a FFI is a fast way to give your language lots of features. It solves the chicken and egg problem by using other peoples work from within your language. In fuses the top (primitives and syntax) and the bottom layer (a runtime) of your system. It's the ultimate primitive and you can think of it as system call or message bus to the runtime.
I strongly suggest to read Queinnec's book: Lisp In Small Pieces. It is a book dedicated entirely to answer your question, and it explains in detail the many trade-offs and the internals of Lisp implementations and definitions, by giving many explained examples of Lisp interpreters and compilers.
You might also consider using libffi. You could be interested in the internals of M.Serrano's Bigloo & Hop implementations. You might even look inside my MELT lisp-like language to customize the GCC
compiler.
You also need to learn more about garbage collection (you might read the GC handbook). You could use Boehm's conservative Garbage Collector (or something else, e.g. my Qish or MPS) or write your own GC.
You may want to learn more about Chicken, Scheme 48, Guile and read their papers and look inside their code.
See also J.Pitrat's blog: it is not about Lisp (but about bootstrapping strong AI) and has several fascinating entries related to bootstrapping.

What libraries would be useful for implementing a small language interpreter in C?

For my own learning experience, I want to try writing an interpreter for a simple programming language in C – the main thing I think I need is a hash table library, but a general purpose collection of data structures and helper functions would be pretty helpful. What would you guys recommend?
libbasekit - by the author of Io. You can also use libcoroutine.
One library I recommend looking into is libgc, a garbage collector for C.
You use it by replacing calls to malloc, realloc, strdup, etc. with their libgc counterparts (e.g. GC_MALLOC). It works by scanning the stack, global variables, and GC-allocated blocks, looking for numbers that might be pointers. Believe it or not, it actually performs quite well (almost on par with the very good ptmalloc, which is the default (non-garbage collected) malloc implementation in GNU/Linux), and a lot of programs use it (including Mono and GCJ). A disadvantage, though, is it might not play well with other libraries you may want to use, and you may even have to recompile some of them by hand to replace calls to malloc with GC_MALLOC.
Honestly - and I know some people will hate me for it - but I recommend you use C++. You don't have to bust a gut to learn it just to be able to start your project. Just use it like C, but in an hour you can learn how to use std::map<> (an associative container), std::string for easy textual data handling, and std::vector<> for a resizable heap-allocated array. If you want to spend an extra hour or two, learn to put member functions in classes (don't worry about polymorphism, virtual functions etc. to begin with), and you'll get a more organised program.
You need no more than the standard library for a suitably small language with simple constructs. The most complex part of an interpreted language is probably expression evaluation. For both that, procedure-calling, and construct-nesting you will need to understand and implement stack data structures.
The code at the link above is C++, but the algorithm is described clearly and you could re-implement it easily in C. There again there are few valid arguments for not using C++ IMO.
Before diving into what libraries to use I suggest you learn about grammars and compiler design. Especially input parsing is for compilers and interpreters similar, that is tokenizing and parsing. The process of tokenizing converts a stream characters (your input) into a stream of tokens. A parser takes this stream of tokens and matches it with your grammar.
You don't mention what language you're writing an interpreter for. But very likely that language contains recursion. In that case you need to use a so-called bottom-up parser which you cannot write by hand but needs to be generated. If you try write such a parser by hand you will end up with a error-prone mess.
If you're developing for a posix platform then you can use lex and yacc. These tools are a bit old but very powerful for building parsers. Lex can generate code that implements the tokenizing process and yacc can generate a bottom-up parser.
My answer probably raises more questions than it answers. That's because the field of compilers/interpreters is quite complex and cannot simply be explained in a short answer. Just get a good book on compiler design.

How do I implement parsing?

I am designing a compiler in C. I want to know which technique I should use, top-down or bottom up? I have only implemented operator precedence using bottom up. I have applied the following
rules:
E:=E+E
E:=E-E
E:=E/E
E:=E*E
E:=E^E
I want to know that am I going the right away?
If I want to include if-else, loops, arrays, functions, do I need to implement parsing?
If yes, how do I implement it? Any one can
I have only implemented token collection and operator precedence. What is the next steps?
Lex & Yacc is your answer. Or Flex and Bison which are branched version of original tools.
They are free, they are the real standard for writing lexers and parsers in C and used all around everywhere.
In addition O'Reilly has released a small pearl of 300 pages: Flex & Bison. I bought it and it really explains you how to write a good parser for a programming language and handle all the subtle things (error recovery, conflicts, scopes and so on). It will answer also your questions about how you are parsing expressions: your approach is right with a top-down parser but you'll discover that is not enough to handle operator precedences.
Of course, for hobby, you could write your own lexer and parser but it would be just an academic effort that is nice to understand how FSM and parser work but with no so much fun :)
If you are, instead, interested in programming language design or complex implementations I suggest this book: Programming Language Pragmatics that is not so famous because of the Dragon Book but it really explains why and how various characteristics can and should be implemented in a compiler. The Dragon Book is a bible too, and it will cover at a real low level how to write a parser.. but it would be sort of boring, I warn you..
The best way to implement a good parser in C is using flex & yacc
Your question is quite vague and hard to answer without a more specific, detailed question. The "Dragon book" is an excellent reference though for someone seeking to implement a compiler from scratch, or as others have pointed out Lex and Yacc.
If you intend to implement the parser by hand, you will want to do a recursive descent parser. The code directly reflects the grammar, so it's fairly easy to figure out and understand. It places some restrictions on your grammar (you can't have any left-recursive nonterminals), but you can work around those problems.
However, it depends on the complexity of the grammar; hand-hacking a parser for anything much more complicated than basic arithmetic expressions gets very tedious very quickly. If you're trying to implement anything that looks like a real programming language, use a parser generator like yacc or bison.

HAT-trie in ANSI C implementation?

I am looking for ANSI C HAT-trie implementation released under some free license. I have not found one. Can you point me to some standalone implementation or a program that uses
HAT-tries to get at least slight idea how to implement it the roght way, please?
The original paper on HAT-trie can be found here:
http://crpit.com/confpapers/CRPITV62Askitis.pdf
PS: In case faster cache-conscious data structured well-suited for strings evolved since
the time the above paper was written, please point me to the papers or example source codes rather.
Someone is implementing it in C++ over on github
https://github.com/chris-vaszauskas/hat-trie
If you need a plain C implementation, this would be a good base to start from.
Java is also fairly readable for a C programmer
http://www.stochasticgeometry.ie/2008/05/06/implementing-hat-tries-in-java/
Please see the HAT-trie implementation site at code.google.com/p/hat-trie for implementation notes and source code.

C to IEC 61131-3 IL compiler

I have a requirement for porting some existing C code to a IEC 61131-3 compliant PLC.
I have some options of splitting the code into discrete function blocks and weaving those blocks into a standard solution (Ladder, FB, Structured Text etc). But this would require carving up the C code in order to build each function block.
When looking at the IEC spec I realsied that the IEC Instruction List form could be a target language for a compiler. The wikepedia article lists two development tools:
CoDeSys
Beremiz
But these seem to be targeted compiling IEC languages to C, not C to IEC.
Another possible solution is to push the C code through a C to Pascal translator and use that as a starting point for a Structured Text solution.
If not any of these I will go down the route of splitting the code up into function blocks.
Edit
As prompted by mlieson's reply I should have mentioned that the C code is an existing real-time control system. So the programs algorithms should already suit a PLC environment.
Maybe this answer comes too late but it is possible to call C code from CoDeSys thanks to an external library.
You can find documentation on the CoDeSys forum at http://forum-en.3s-software.com/viewtopic.php?t=620
That would give you to use your C code into the PLC with minor modifcations. You'll just have to define the functions or function blocks interfaces.
My guess is that a C to Pascal translator will not get you near enough for being worth the trouble. Structured text looks a lot like Pascal, but there are differences that you will need to fix everywhere.
Not a bug issue, but don't forget that PLCs runtime enviroment is a bit different. A C applications starts at main() and ends when main() returns. A PLC calls it main() over and over again, 100:s of times per second and it never ends.
Usally lengthy calculations and I/O needs to be coded in diffent fashion than a C appliation would use.
Unless your C source is many many thousands lines of code - Rewrite it.
It is impossible. To be short: the IL language is a 4GL (i.e. limited to
the domain, as well as other IEC 61131-3 languages -- ST, FBD, LD, SFC).
The C language is a 3GL.
To understand the problem, try to answer the question, which way to
express in IL manipulations with a pointer? for example, to express call a
function by a pointer. What about interrupts? Low level access to the
peripherial devices?
(really, there are more problems)
BTW, there is the Reflex language, aka "C with processes". Reflex is a 4GL for the
control domain with C-like syntax. But the known translators produce
C-code and Python-code.
If the amount of code to convert is a few thousand lines, recoding by hand is probably your best bet.
If you have lots of code to convert, then an automated tool might be very effective.
Using the DMS Software Reengineering Toolkit we've built translators to map mechanical motion diagrams into RLL (PLC) code. DMS also has full C parser/analyzers/front ends. The pieces are there to build a C to RLL code.
This isn't an easy task. It likely takes 6-12 man-months to configure DMS to something resembling what you want. If that's less than what it takes to do by hand, then its the right way to do it.
There are a few IEC development environments and target hardware that can use C blocks... I would also take a look at the reasons why it HAS to be an IEC-61131 complaint target. I have written extensively on compliance and why it doesn't mean squat.
SOFTplc corp can help I'm sure with user defined loadable modules... and they can be in C..
Schneider also supports C function blocks...
Labview too!! not sure why IEC is important that's all!! the compiler if existed would create bad code for sure:)
Your best bet is to split your C code into smaller parts which can be recoded as PLC functional blocks and use C to PASCAL convertor for each block which you will rewrite in structured text. Prepare to do a lot of manual work since automated conversion will probably disappoint you.
Also take a look at this page: http://www.control.com/thread/1026228786
Every time I've done this, I just parsed and converted it by hand from C directly to ST. I only ran into a few functions that required complete rewrites, although there was very little that dealt with pointers, which is something that ST generally chokes on, unfortunately.
Using the existing C code as blocks that are called by the PLC program would have the added advantage that the C blocks could run at the same periodicity that they did before, and their function is likely already well documented and tested. This would minimize any effect on changes from the existing control system. This is an architecture for controls with software PLCs that I have seen used before.

Resources