Where can I find a Lisp reader in C? - c

I have a Lisp reader written in Java that I'm thinking of translating into C. (Or perhaps C++.) It's a fairly complete and useful hack, so the main issue is doing the dynamic-storage allocation in a language without garbage collection. If someone has already thought this through I'd rather borrow their code than figure it out myself. (C is not my favorite language.)
Of course, having a Lisp reader makes no sense unless you're planning to do something with the things you read, so perhaps I should have phrased the question, Where do I find a simple Lisp core written in C?, but in my experience the hardest unavoidable part of writing a Lisp (somewhat surprisingly) is the reader. Plus, I don't want to have a garbage collector; I'm anticipating an application where the list structures will be freed more or less by hand.

Gary Knott's Interpreting Lisp is very nice.
You may also try others, like Jim Mayfield's Lisp. There are probably lots of little Lisps out there...
You mentioned that you don't like C. Maybe you'd like Haskell -- in which case you could try "Write yourself a Scheme in 48 hours", an interesting tutorial (you get to write a Scheme interpreter in Haskell).
Update: I know that a Lisper would hardly feel comfortable using Haskell, but hey, it's much more comfortable than C (at least for me)! Besides that, HAskell has a good FFI, so it should be easy to use the Haskell-made Lisp-reader as a C-compatible library.
Update 2: If you want to use XLisp, as suggested by another user, you will probably need src/xlread.c (863 lines) and include/xlisp.h (1379 lines) -- but I could be wrong...
Update 3: If you use Gary Knott's Lisp (one single C file with 942 lines), the function signature is int32 sread(void). This would be my choie if I didn't need anything fancy (like read macros) or highly optimized (there is a PDF paper that describes how the code is implemented, so you won't have to find your way in a labyrinth). The documentation for the function is:
This procedure scans an input string g using a lexical token scanning
routine, e(), where e() returns
1 if the token is '('
2 if the token is '''
3 if the token is '.'
4 if the token is ')' or a typed pointer d to an
atom or number stored in row ptrv(d) in the atom or number tables.
Due to the typecode (8 or 9) of d, d is a negative 32-bit integer. The
token found by e() is stripped from the front of g.
SREAD constructs an S-expression and returns a typed pointer to it as
its result.
See that Gary's Lisp is old and you'll need to change it so it compiles. Instead of including linuxenv.h, include:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <setjmp.h>
Also, it does not work in 64-bit machines (the documentation of sread should tell you why...)
Update 4: There's also the Scheme implementations by Nils Holm (there are books describing the internals)

Lisp500 http://code.google.com/p/lisp5000
ThinLisp http://www.thinlisp.org

lispreader is a simple Lisp file parser done in plain C.
If you want C++, you can dig around in the SuperTux source code, it contains a Lisp file parser written in C++.
When you want an actual implementation of Lisp instead of just a parser you could have a look at Abuse, which contains a small one as the games scripting language.

MIT Professor Rivest published a set of small readers for s-expressions back in 1997 http://people.csail.mit.edu/rivest/sexp.html as part of his DARPA supported research on public key cryptography. The code only does reading and printing and is well described in a document written in the style of an internet RFC.

there are lots of embeddable Scheme implementations, off the top of my head: SIOD, Guile, ChickenScheme, Scheme48....

Related

Scheme to C translator

I tried to generate C code starting from a scheme function and I do not manage to find any translator from scheme to C. I tried to convert this function to C.
(define f
(lambda(n)
(if (= n 0) 1
(* n (f (- n 1))))))
(display (f 10))
(newline)
I tried to use gambit (gsc) and it generates a C file that looks merely like a file to load in some interpreter, not a file containing a main function that can be executed.
Is there some application that generates C code that can be directly executed? The functions from standard scheme library like display should be linked with some object file.
EDIT:
My purpose is to understand the algorithms used by professional translators.
There are many such translators, dating back at least to the 1980s I think CHICKEN is a good current one.
If you want to use that:
get CHICKEN;
build & install it with the appropriate make incantation (this was painless for me on OSX, it should be very painless indeed on Linux therefore, although it may be harder on Windows);
stash your code in a file I'll call f.scm.
if you want to see the C code, compile with chicken f.scm which will produce a few hundred lines of incomprehensible C;
if you want just the executable, use csc to create it.
There is an extensive manual which you will need to read if you want to do anything nontrivial, such as linking in C libraries or talking to Scheme code from C.
Without knowing what you are after, this smells as if it may be an XY problem. In particular:
if you want a Scheme system which will allow you talk to code written in C, then you probably want a system with an FFI, not one that compiles to C;
if you want a Scheme system which will create native executables, then you probably want, well, a Scheme system which will create native executables, not one which compiles to C.
There are many examples of each of these. Some of these systems may also compile to, or via, C, but one does not depend on the other.
Finally, if you want to understand how Scheme compilers which target C work (or how Scheme compilers which target any language, including assembler), then the traditional approach probably still works best: find a well-written one for which source is available, and read & tinker with its source code.
Basically no scheme to C translators will do what you want. They create hideous code not meant to be read and they rely on the underlying C compiler to do much of the optimization. Chicken and Gambit make use of header files while I have Stalin, which does not but it is based on R4RS instead of R5RS and later.
You are probably better off reading Abdulaziz Ghuloum's paper An Incremental Approach to Compiler Construction (PDF) or perhaps Matt Mights articles on parsing, continuations and compilations. Longer down he actually has a Scheme to C and Scheme to Java with different approaches to closure conventions. In the end nothing beats doing it yourself so have a go!

Why doesn't C11 support lambda functions

The new C++11 standard supports lambda functions, which I think is a useful feature. I understand that the C and C++ standards differ from each other but I don't understand why C11 doesn't support lambda functions. I think it could have a lot of use.
Is there a reason why the developers of the C11 standard choose not to include this feature?
2021 update: lambdas based on C++ syntax with minimalist semantics were voted into C23 this year. More details will emerge as the committee pins down precisely what else the feature will bring in from C++ and other implementations.
2016 update: Apple-style lambdas with closures were once again presented to the Working Group at the London 2016 meeting, in a new proposal document that tries to address several of the failings of the previous attempt, tidying up the terminology and explanations and going into much more detail on how closures and lambdas can be made "C-like".
Since the reception was cautiously positive (7-0-9 Yes/No/Abstain), it's looking very possible that something similar to this will make it into the language soon.
The short answer is simply that C doesn't include lambda functions because nobody has yet made an acceptable proposal to the ISO C working group to include lambda functions.
You can take a look at a list of some of the proposals discussed by the working group here: http://www.open-std.org/jtc1/sc22/wg14/www/documents
The only proposal for lambdas of any kind that I can find in that list are Apple's blocks (as demonstrated in Yu Hao's answer), in document N1451. That proposal is discussed further in N1483, which compares it to C++ lambdas, and N1493 and N1542 which are the minutes of the meetings where those documents were presented.
There were several reasons why the proposal in N1451 couldn't be accepted, given in N1542:
initially the committee had difficulty understanding the proposal
it uses incorrect citations and terminology which contradicts the existing C standard
it is apparently vague and incomplete
Apple was in the process of trying to patent the feature (not clear if this is an obstacle to standardisation or not but I would assume so)
A completely new feature with completely new semantics proposed in 2010 had precisely zero chance of being ready in time for 2011, and would have held up the release of C11
Blocks as presented are not compatible with C++11 lambdas
It also looks like they were unconvinced that it was currently demonstrating enough utility. C standardisation apparently tries to be very conservative, and with only one major compiler implementing the feature it's likely that they would want to wait and see how it competes with C++ lambdas, and whether anybody else picks it up. It's not really a "C" feature as opposed to a "Clang" feature until multiple compilers are offering it.
All that said, the committee's votes did apparently lean very slightly in favour of the feature (6-5-4 Yes/No/Abstain), but not enough for the necessary consensus to include it.
As far as I can tell, the other big one, C++11 lambdas, have not been proposed for inclusion into C by anybody; and if you don't ask you don't get.
Any proposal for lambdas in C would add a whole slew of new rules about variable lifetimes and locations and copying and allocation and... etc. For a lot of people this potentially starts to look very un-C-like, with values getting moved around behind the programmer's back or having sudden unexpected changes in their lifespan - avoiding this sort of thing is half the reason people choose to write in C nowadays. So there also has to be a proposal that actually falls in line with C's "philosophy" before it can be taken seriously. I'm sure this can be done, but both of the big proposals so far have been designed for languages with a very different "philosophy" where this sort of thing is less of an obstacle, and don't necessarily reflect C's purpose and character as they currently stand.
C is intended to be a small and simple language. It deliberately omits high-level features when the same things can be done by simpler means. It aims to provide only the basic features that are absolutely necessary for portable programming.
C doesn't have references because they are just pointers. C doesn't have classes, inheritance and virtual functions, because you can just use structs and make vtables yourself using function pointers. It doesn't have a garbage collector because programmers can keep track of memory allocations themselves, it doesn't have templates because they are in fact just macros. If you need exceptions you can use longjmp, and instead of namespaces you simply add prefixes to names.
Adding any of these high-level shortcuts might make programming a little more comfortable, but this comes at the cost of making the language more complicated, which must not be underestimated. It is a slippery slope that directly leads to the mess that C++ has become.
C doesn't have lambda functions because they are not really necessary. Instead you can just use a static function and put the context in a struct.
This is really just my opinion, since I don't know what the committee thinks.
On the one hand, Lisp has been supporting lambda expression since its birth, which is in 1958. The C programming language was born in 1972. So lambda expression actually has a longer history than C. So if you ask why C11 doesn't support lambda expression, the same question can be asked about C89.
On the other hand, lambda expression is always a functional programming thing, and is absorbed to imperative programming languages gradually. Some of the "higher" language (e.g, Java, before the planned Java 8) doesn't support it yet.
Finally, C and C++ are always learning from each other, so maybe it will be in the next C standard. For now, you can take a look at Blocks, a non-standard extension added by Apple. This is an example code from Wikipedia:
#include <stdio.h>
#include <Block.h>
typedef int (^IntBlock)();
IntBlock MakeCounter(int start, int increment) {
__block int i = start;
return Block_copy( ^ {
int ret = i;
i += increment;
return ret;
});
}
int main(void) {
IntBlock mycounter = MakeCounter(5, 2);
printf("First call: %d\n", mycounter());
printf("Second call: %d\n", mycounter());
printf("Third call: %d\n", mycounter());
/* because it was copied, it must also be released */
Block_release(mycounter);
return 0;
}
/* Output:
First call: 5
Second call: 7
Third call: 9
*/

Tool to produce self-referential programs?

Many results in computability theory (such as Kleene's second recursion theorem) ensure that it is possible to construct programs that can operate over their own source code. For example, in Michael Sipser's "Introduction to the Theory of Computation," he proves a special case of the Recursion Theorem, which states that any program representing a function that accepts two strings and produces a string can be converted into an equivalent program where the second argument is equal to the program's own source code. Moreover, this process can be done automatically.
The construction that one uses to produce programs with access to their own source code is well-known (most theory of computation books contain it) and is often used to generate quines. My question is whether someone has written a general-purpose tool that accepts as input a program in some language (perhaps C, for example) that contains some placeholder for the source of the program, then processes the program to produce a new program with access to its own source code. This would make it possible, for example, to generate quines automatically, or to write programs that can introspect on their syntax trees (possibly enabling reflection in languages that don't already support it). If not, I was planning on writing my own version of such a tool, but I don't want to reinvent the wheel if this has already been done.
EDIT: Based on #Henning Makholm's suggestion, I decided to just sit down and implement such a program. The resulting program (which I've dubbed "kleene") accepts as input a C++ program and produces a new C++ program that can access its own source code by calling the function kleene::MySource(). This means that you could transform this very simple program into a Quine using the kleene program:
#include <iostream>
int main() {
std::cout << kleene::MySource() << std::endl;
}
If you're curious to check it out, it's available here on my website.
Lots of examples at the Wikipedia article and links therefrom. After looking at one or two it should be obvious how to build a quine generator a given language that takes an arbitrary piece of payload code as input.
One problem with your reflection idea is that the program cannot, in general, know that what it has constructed is its own source code.
Our DMS Software Reengineering Toolkit is a program transformation system, that will accept programs in arbitrary syntax (described to DMS in an explicit parameter called a "domain description"), parse them to ASTs, carry out analyses and transformations of the ASTs, and can regenerate revised program text from the modified version.
DMS is of course coded in a language (actually as set of domain-specific languages) for which there are already DMS-domain descriptions. So, DMS can read itself, and we use that capability to bootstrap additional DMS capabilities and optimize its performance.
So while we aren't producing quines, we are building programs with self-enhancing code.
And yes, your observation about such a tool providing reflection for arbitrary langauges is smack on. Most reflection facilities provided in languages allow only access to those things the language-compiler folks thought of paramount importance to access at runtime, such as "method names". Things they weren't interested in, of course, aren't accessible; ever seen a reflection mechanism that will tell you what's in an expression? In a comment?
DMS provides complete access to all the details of the source code, by virtue of inspecting the code from outside, using general purpose, complete mechanisms. If your language doesn't have reflection, DMS is the way to access the code and reason arbitrarily about it. Even if your langauge has reflection, DMS can reason about programs in your language in ways that your language cannot, because it can't get access to its own detailed structure.

What libraries would be useful for implementing a small language interpreter in C?

For my own learning experience, I want to try writing an interpreter for a simple programming language in C – the main thing I think I need is a hash table library, but a general purpose collection of data structures and helper functions would be pretty helpful. What would you guys recommend?
libbasekit - by the author of Io. You can also use libcoroutine.
One library I recommend looking into is libgc, a garbage collector for C.
You use it by replacing calls to malloc, realloc, strdup, etc. with their libgc counterparts (e.g. GC_MALLOC). It works by scanning the stack, global variables, and GC-allocated blocks, looking for numbers that might be pointers. Believe it or not, it actually performs quite well (almost on par with the very good ptmalloc, which is the default (non-garbage collected) malloc implementation in GNU/Linux), and a lot of programs use it (including Mono and GCJ). A disadvantage, though, is it might not play well with other libraries you may want to use, and you may even have to recompile some of them by hand to replace calls to malloc with GC_MALLOC.
Honestly - and I know some people will hate me for it - but I recommend you use C++. You don't have to bust a gut to learn it just to be able to start your project. Just use it like C, but in an hour you can learn how to use std::map<> (an associative container), std::string for easy textual data handling, and std::vector<> for a resizable heap-allocated array. If you want to spend an extra hour or two, learn to put member functions in classes (don't worry about polymorphism, virtual functions etc. to begin with), and you'll get a more organised program.
You need no more than the standard library for a suitably small language with simple constructs. The most complex part of an interpreted language is probably expression evaluation. For both that, procedure-calling, and construct-nesting you will need to understand and implement stack data structures.
The code at the link above is C++, but the algorithm is described clearly and you could re-implement it easily in C. There again there are few valid arguments for not using C++ IMO.
Before diving into what libraries to use I suggest you learn about grammars and compiler design. Especially input parsing is for compilers and interpreters similar, that is tokenizing and parsing. The process of tokenizing converts a stream characters (your input) into a stream of tokens. A parser takes this stream of tokens and matches it with your grammar.
You don't mention what language you're writing an interpreter for. But very likely that language contains recursion. In that case you need to use a so-called bottom-up parser which you cannot write by hand but needs to be generated. If you try write such a parser by hand you will end up with a error-prone mess.
If you're developing for a posix platform then you can use lex and yacc. These tools are a bit old but very powerful for building parsers. Lex can generate code that implements the tokenizing process and yacc can generate a bottom-up parser.
My answer probably raises more questions than it answers. That's because the field of compilers/interpreters is quite complex and cannot simply be explained in a short answer. Just get a good book on compiler design.

C to IEC 61131-3 IL compiler

I have a requirement for porting some existing C code to a IEC 61131-3 compliant PLC.
I have some options of splitting the code into discrete function blocks and weaving those blocks into a standard solution (Ladder, FB, Structured Text etc). But this would require carving up the C code in order to build each function block.
When looking at the IEC spec I realsied that the IEC Instruction List form could be a target language for a compiler. The wikepedia article lists two development tools:
CoDeSys
Beremiz
But these seem to be targeted compiling IEC languages to C, not C to IEC.
Another possible solution is to push the C code through a C to Pascal translator and use that as a starting point for a Structured Text solution.
If not any of these I will go down the route of splitting the code up into function blocks.
Edit
As prompted by mlieson's reply I should have mentioned that the C code is an existing real-time control system. So the programs algorithms should already suit a PLC environment.
Maybe this answer comes too late but it is possible to call C code from CoDeSys thanks to an external library.
You can find documentation on the CoDeSys forum at http://forum-en.3s-software.com/viewtopic.php?t=620
That would give you to use your C code into the PLC with minor modifcations. You'll just have to define the functions or function blocks interfaces.
My guess is that a C to Pascal translator will not get you near enough for being worth the trouble. Structured text looks a lot like Pascal, but there are differences that you will need to fix everywhere.
Not a bug issue, but don't forget that PLCs runtime enviroment is a bit different. A C applications starts at main() and ends when main() returns. A PLC calls it main() over and over again, 100:s of times per second and it never ends.
Usally lengthy calculations and I/O needs to be coded in diffent fashion than a C appliation would use.
Unless your C source is many many thousands lines of code - Rewrite it.
It is impossible. To be short: the IL language is a 4GL (i.e. limited to
the domain, as well as other IEC 61131-3 languages -- ST, FBD, LD, SFC).
The C language is a 3GL.
To understand the problem, try to answer the question, which way to
express in IL manipulations with a pointer? for example, to express call a
function by a pointer. What about interrupts? Low level access to the
peripherial devices?
(really, there are more problems)
BTW, there is the Reflex language, aka "C with processes". Reflex is a 4GL for the
control domain with C-like syntax. But the known translators produce
C-code and Python-code.
If the amount of code to convert is a few thousand lines, recoding by hand is probably your best bet.
If you have lots of code to convert, then an automated tool might be very effective.
Using the DMS Software Reengineering Toolkit we've built translators to map mechanical motion diagrams into RLL (PLC) code. DMS also has full C parser/analyzers/front ends. The pieces are there to build a C to RLL code.
This isn't an easy task. It likely takes 6-12 man-months to configure DMS to something resembling what you want. If that's less than what it takes to do by hand, then its the right way to do it.
There are a few IEC development environments and target hardware that can use C blocks... I would also take a look at the reasons why it HAS to be an IEC-61131 complaint target. I have written extensively on compliance and why it doesn't mean squat.
SOFTplc corp can help I'm sure with user defined loadable modules... and they can be in C..
Schneider also supports C function blocks...
Labview too!! not sure why IEC is important that's all!! the compiler if existed would create bad code for sure:)
Your best bet is to split your C code into smaller parts which can be recoded as PLC functional blocks and use C to PASCAL convertor for each block which you will rewrite in structured text. Prepare to do a lot of manual work since automated conversion will probably disappoint you.
Also take a look at this page: http://www.control.com/thread/1026228786
Every time I've done this, I just parsed and converted it by hand from C directly to ST. I only ran into a few functions that required complete rewrites, although there was very little that dealt with pointers, which is something that ST generally chokes on, unfortunately.
Using the existing C code as blocks that are called by the PLC program would have the added advantage that the C blocks could run at the same periodicity that they did before, and their function is likely already well documented and tested. This would minimize any effect on changes from the existing control system. This is an architecture for controls with software PLCs that I have seen used before.

Resources