tiny garbage collector in C for embedded devices - c

Is there some open source tiny GC implementation (preferably as one C source file)?

Google search provides tinygc.sourceforge.net :)

I've got some prototype code that might give you a head start. If all your pointers are "managed" through your interface, you can chop up a heap in any convenient way and use the classic algorithms from 70s dissertations. My adventures with a postscript garbage collector began here.
On reading through it again, the code may not be what you're looking for. It's designed to run on top of an OS. In particular, it uses relative integer locations as much as possible to allow the entire memory space to be moved by the OS if needed for a reallocation. I imagine you don't need to do that (although it guarantees that internal relocations are ok, too). But the code should show that a garbage collector doesn't have to be horribly complicated. It's just a tree traversal. It's futzing with some bits and following some pointers. Keep it simple. You can do it.

Related

Bottom Level Graphics Programming/ Working with Screen Hardware

It is said that a 'low level' programming language is one that works 'close' to the hardware. I work with C as well as Python and Java and it is generally excepted that C is a 'lower level' language or works 'closer' to the hardware than these other two languages. This of course makes sense because python and java are C derived languages.
So what I want to know is what is the 'lowest level' or 'closest to hardware' way of working with the computer screen to display pixel graphics on a monitor?
Basically, how does the screen work on the programming 'back end'?
Just to be perfectly clear I'll give an analogy. When I first learned about python lists, I figured they were multiple numbers stored together. What I later learned was that they are actually pointers to locations in memory reserved for the length of the array in units of the data type.
So what I know about the screen is that its a grid of pixels x wide, y long. When you refer to this grid with functions that accept some form of vector, a pixel will light up a specified colour at that pixel's coordinance (0,0 starts top left of course.) What is really going on in those functions?
sMachine code is lowest you can go. And this even still gets interpreted by the CPU. (Dependent on CPU of course)
Generally most people will go only as low as assembler though.
(I didn't mention GPU programming as you mentioned Python, Java, C and made an assumption you are trying to get to grips with the old way of rendering).
EDIT
Missed your other question
Pointers are used when a block of memory needs referencing. Simple data types do not require a pointer (but can still have one). This is dependent on the language and how it allows data type allocation.
In the case of a list/array of data you can not allocate this on the heap as it is a varying length so a block of memory is request from the operating system which will return a pointer to this section of memory that can be used. The operating system does not care how this is used and it's up to the application to know how much to allocate and how it should use it.
This is why languages use data types so the compiler knows how to manipulate that block of memory. Now nothing stops you manipulating this block of memory directly but you must follow the rules the compile uses since any further code executed could cause errors.
EDIT
Additional based on comment.
When modifying a a buffer that if linked to hardware you need to understand the specs for the buffer. Usually you have already requested the device to use a certain MODE of operation and you know how the hardware will behave. This is why varying standards arose to help programmer use these. I am going to reference to a very good old school programming guide for hardware it should help you a lot PC Game Programmer's Encyclopedia
These concepts still apply to operating systems now in windows you would use the GDI to allocate these screens and the GDI interfaces to a driver to present the required image. The guide I linked is about using hardware directly from old operating systems which have no graphics support (MS-DOS for example)
The word you are looking for is frame buffer. For an introduction see http://en.wikipedia.org/wiki/Framebuffer

What is an efficient heap implementation for an allocation subsystem?

Let's say I want to write a 3D game engine - somewhere along the lines of DOOM and Quake. And I want to do it in pure C (not sure if that's relevant, but just in case...).
The first thing worth tackling I think would be the engines memory allocation. I've looked at some source code for this (Quake 3, DOOM 3), and in terms of allocation management, I've found that a B-tree seems like a good way to go, but I'm not sure if that would
A binary heap would be simpler, but from what I've read I'm not sure it would scale well. Maybe I'm wrong?
Ideally, I'm looking for something between O(1) and O(n log n) runtime efficiency. I'm not sure if this is realistic o
not though :)
Thoughts?
Basically, to start you can simply use the regular malloc provided by your compiler/build environment. Then, once your engine will look great, you could attempt to write your own memory allocator.
Doom 3 offer both options (enabled at compile time): either it uses an internal memory allocator (this is the default), or can use the regular malloc.
You can have a look at Heap.cpp in Doom 3 source code. It is based on BTrees. But honestly, you'll probably find this very difficult to understand !

What specific examples are there of knowing C making you a better high level programmer?

I know about the existance of question such as this one and this one. Let me explain.
Afet reading Joel's article Back to Basics and seeing many similar questions on SO, I've begun to wonder what are specific examples of situations where knowing stuff like C can make you a better high level programmer.
What I want to know is if there are many examples of this. Many times, the answer to this question is something like "Knowing C gives you a better feel of what's happening under the covers" or "You need a solid foundation for your program", and these answers don't have much meaning. I want to understand the different specific ways in which you will benefit from knowing low level concepts,
Joel gave a couple of examples: Binary databases vs XML, and strings. But two examples don't really justify learning C and/or Assembly. So my question is this: What specific examples are there of knowing C making you a better high level programmer?
My experience with teaching students and working with people who only studied high-level languages is that they tend to think at a certain high level of abstraction, and they assume that "everything comes for free". They can become very competent programmers, but eventually they have to deal with some code that has performance issues and then it comes to bite them.
When you work a lot with C, you do think about memory allocation. You often think about memory layout (and cache locality if that's an issue). You understand how and why certain graphics operations just cost a lot. How efficient or inefficient certain socket behaviors are. How buffers work, etc. I feel that using the abstractions in a higher level language when you do know how it is implemented below the covers sometimes gives you "that extra secret sauce" when thinking about performance.
For example, Java has a garbage collector and you can't directly assign things to memory directly. And yet, you can make certain design choices (e.g., with custom data structures) that affect performance because of the same reasons this would be an issue in C.
Also, and more generally, I feel that it is important for a power programmer to not only know big-O notation (which most schools teach), but that in real-life applications the constant is also important (which schools try to ignore). My anecdotal experience is that people with skills in both language levels tend to have a better understanding of the constant, perhaps because of what I described above.
In addition, many higher level systems that I have seen interface with lower level libraries and infrastructures. For instance, some communications, databases or graphics libraries. Some drivers for certain devices, etc. If you are a power programmer, you may eventially have to venture out there and it helps to at least have an idea of what is going on.
Knowing low level stuff can help a lot.
To become a racing driver, you have to learn and understand the basic physics of how tyres grip the road. Anyone can learn to drive pretty fast, but you need a good understanding of the "low level" stuff (forces and friction, racing lines, fine throttle and brake control, etc) to get those last few percent of performance that will allow you to win the race.
For example, if you understand how the CPU architecture works in your computer, you can write code that works better with it (e.g. if you know you have a certain CPU cache size or a certain number of bytes in each CPU cache line, you can arrange your data structures and the way that you access them to make the best use of the cache - for example, processing many elements of an array in order is often faster than processing random elements, due to the CPU cache). If you have a multi-core computer, then understanding how low level techniques like threading work can gave huge benefits (just as not understanding the low level can lead to disaster in threading).
If you understand how Disk I/O and caching works, you can modify file operations to work well with it (e.g. if you read from one file and write to another, working on large batches of data in RAM can help reduce I/O contention between the reading and writing phases of your code, and vastly improve throughput)
If you understand how virtual functions work, you can design high-level code that uses virtual functions well. If used incorrectly they can severely hamper performance.
If you understand how drawing is handled, you can use clever tricks to improve drawing speed. e.g. You can draw a chessboard by alternately drawing 64 white and black squares. But it is often faster to draw 32 white sqares and then 32 black ones (because you only have to change the drawing colour twice instead of 64 times). But you can actually draw the whole board black, then XOR 4 stripes across the board and 4 stripes down the board in white, and this can be much faster still (2 colour changes, and only 9 rectangles to draw instead of 64). This chessboard trick teaches you a very important programming skill: Lateral thinking. By designing your algorithm well, you can often make a big difference to how well your program operates.
Understanding C, or for that matter, any low level programming language, gives you an opportunity to understand things like memory usage (i.e. why is it a bad thing to create several million heavy objects), how pointers/object references work, etc.
The problem is that as we've created ever increasing levels of abstraction, we find ourselves doing a lot of 'lego block' programming, without understanding how the legos actually function. And by having almost infinite resources, we start treating memory and resources like water, and tend to solve problems by throwing more iron at the situation.
While not limited to C, there's a tremendous benefit to working at a low level with much smaller, memory constrained systems like the Arduino or old-school 8-bit processors. It lets you experience close to the metal coding in a much more approachable package, and after spending time squeezing apps into 512K, you will find yourself applying these skills at a larger level within your day to day programming.
So the language itself is not important, but having a deeper appreciation for how all of the bits come together, and how to work effectively at a level closer to the hardware is a set of skills beneficial to any software developer.
For one, knowing C helps you understand how memory works in the OS and in other high level languages. When your C# or Java program balloons on memory usage, understanding that references (which are basically just pointers) take memory too, and understand how many of the data structures are implemented (which you get from making your own in C) helps you understand that your dictionary is reserving huge amounts of memory that aren't actually used.
For another, knowing C can help you understand how to make use of lower level operating system features. You don't need this often, but sometimes you may need memory mapped files, or to use marshalling in C#, and C will greatly help understand what you're doing when that happens.
I think C has also helped my understanding of network protocols, but I can't put my finger on specific examples. I was reading another SO question the other day where someone was complaining about how C's bit-fields are 'basically useless' and I was thinking how elegantly C bit fields represent low-level network protocols. High level languages dealing with structures of bits always end up a mess!
In general, the more you know, the better programmer you will be.
However, sometimes knowing another language, such as C, can make you do the wrong thing, because there might be an assumption that is not true in a higher-level language (such as Python, or PHP). For example, one might assume that finding the length of a list might be O(N) where N is the length of the list. However, this is probably not the case in many high-level language instances. In Python, for most list-like things the cost is O(1).
Knowing more about the specifics of a language will help, but knowing more in general might lead one to make incorrect assumptions.
Just "knowing" C would not make you better.
But, if you understand the whole thing, how native binaries work, how does CPU work with it, what are architecture limitations, you may write a code which is easier for CPU.
For example, how L1/L2 caches affect your work, and how should you write your code to have more hits in L1/L2 caches. When working with C/C++ and doing heavy optimizations, you will have to go down to that kind of things.
It isn't so much knowing C as it is that C is closer to the bare metal than many other languages. You need to be more aware of how to allocate/deallocate memory because you have to do it yourself. Doing it yourself helps you understand the implications of many decisions that you make.
To me any language is acceptable as long as you understand how the compiler/interpreter (basically) maps your code onto the machine. It's a bit easier to do in a language that exposes this directly, but you should be able to, with a bit of reading, figure out how memory is allocated and organized, what sort of indexing patterns are more optimal than others, what constructs are more efficient for particular applications, etc.
More important, I think, is a good understanding of operating systems, memory architectures, and algorithms. If you understand how your algorithm works, why it would be better to choose one algorithm or data structure over another (e.g., HashSet vs. List), and how your code maps onto the machine, it shouldn't matter what language you are using.
This is my experience of how I learnt and taught myself programming, specifically, understanding C, this is going back to early 1990's so may be a bit antique, but the passion and the drive is important:
Learn to understand the low level principles of the computer, such as EGA/VGA programming, here's a link to the Simtel archive on the C programmer's guide to the PC.
Understanding how TSR's work
Download the whole archive of Bob Stout's snippets which is a big collection of C code that does one thing only - study them and understand it, not alone that, the collection of snippets strives to be portable.
Browse at the International Obfuscated C Code Contest (IOCCC) online, and see how the C code can be abused and understand the intracies of the language. The worst code abuse is the winner! Download the archives and study them.
Like myself, I loved the infamous Ponzo's C Tutorial which helped me immensely, unfortunately, the archive is very hard to find. If anyone knows of where to obtain them, please leave a comment and I will amend this answer to include the link. There is another one that I can remember - Coronado's [Generic?] C Tutorial, again, my memory on this one is hazy...
Look at Dr. Dobb's journal and C User Journal here - I do not know if you can still get them in print but they were a classic, can remember the feeling of holding a printed copy in my hand and tearing off home to type in the code to see what happens!
Grab an ancient copy of Turbo C v2 which I believe you can get from borland.com and just play with 16bit C programming to get a feel and mess with the pointers...sure it is ancient and old but playing with pointers on it is fine.
Understand and learn Pointers, link here to the legacy Simtel.net - a crucial link to achieving C Guru'ship for want of a better word, also you will find a host of downloads pertaining to the C programming language - I remember actually ordering the Simtel CD Archive and looking for the C stuff...
A couple of things that you have to deal directly with in C that other languages abstract away from you include explicit memory management (malloc) and dealing directly with pointers.
My girlfriend is one semester from graduating MIT (where they mainly use Java, Scheme, and Python) with a Computer Science degree, and she is currently working at a company whose codebase is in C++. For the first few days she had a difficult time understanding all the pointers/references/etc.
On the other hand, I found moving from C++ to Java very easy, because I was never confused about pass-references-by-value vs pass-by-reference.
Similarly, in C/C++ it is much more apparent that primitives are just the compiler treating the same sets of bits in different ways, as opposed to a language like Python or Ruby where everything is an object with its own distinct properties.
A simple (not entirely realistic) example to illustrate some of the advice above. Consider the seemingly harmless
while(true)
for(Iterator iter = foo.iterator(); iter.hasNext();)
bar.doSomething( iter.next() )
or the even higher level
while(true)
for(Baz b: foo)
bar.doSomething(b)
A possible problem here is that each time round the while loop a new object (the iterator) is created. If all you care about is programmer convenience, then the latter is definitely better. But if the loop has to be efficient or the machine is resource constrained then you are pretty much at the mercy of the designers of your high level language.
For example, a typical complaint for doing high-performance Java is having execution stop while garbage (such as all those allocated Iterator objects) is reclaimed. Not very good if your software is charged with tracking incoming missiles, auto-piloting a passenger jet, or just not leaving the user wondering why the GUI has stopped responding.
One possible solution (still in the higher-level language) would be to weaken the convenience of the iterator to something like
Iterator iter = new Iterator();
while(true)
for(foo.initAlreadyAllocatedIterator(iter); iter.hasNext();)
bar.doSomething(iter.next())
But this would only make sense if you had some idea about memory allocation...otherwise it just looks like a nasty API. Convenience always costs somewhere, and knowing lower-level stuff can help you identify and mitigate those costs.

What is the best way to transition to C from higher level languages?

I'm a web coder: I currently enjoy AS3 and deal with PHP. I own a Nintendo DS and want to give C a go.
From a higher level, what basic things/creature comforts are going to go missing?
I can't find [for... in] loops, so I assume they aren't there. It looks like I'm going to have to declare things religiously, and I assume I have no objects (which I dealt with in PHP a while ago).
Hash tables? Funny data types?
To sum it up, you'll basically get:
Typed variables
Functions
Pointers
Standard libraries
Then, you make the rest -- that may be a little too simplified, but that's a rough idea of what to face.
It can be daunting to begin with and there may be a learning curve to overcome. Here's a few speed bumps you may encounter:
String? What string?
One big thing to get used to would be strings. There is no such thing as a string in C. A string is a "null-terminated character array" (sometimes called C strings), which basically means an array of type char with the final element being a \0 (char value 0).
In memory, a char array of length 4 containing Hi! would appear as:
char[0] == 'H'
char[1] == 'i'
char[2] == '!'
char[3] == '\0'
Also, strings don't know their own length (no such things as "objects" that come for free in C), so the use of standard library call strlen would be required, which more or less is a for loop that goes through the string until it hits a \0 character. (This means it's an O(N) operation -- longer the string, longer it takes to find the length, unlike O(1) operation of most string implementation in modern languages.)
Garbage collection?
No such thing is as a garbage collector in C. In fact, you need to allocate and deallocate memory yourself:
/* Allocate enough memory for array of 10 int values. */
int* array_of_ints = malloc(sizeof(int) * 10);
/* Done with the array? Don't forget to free the memory! */
free(array_of_ints);
Failing to clean up after allocation of memory can lead to things called memory leaks which I'm sure you've heard of before.
Pointers!
And as always, when we talk about C, we can't forget about pointers. The whole concept of references to variables and dereferencing pointers can be a serious headache-inducing concept, but once you get a hang of it, it's actually not too bad.
Except for the times when you expect it to work one way, but you find out that you didn't quite understand pointers well enough and it actually does something else -- as they say, been there, done that.
Oh, and pointers are probably going to be one of the first times you'll actually see a program crash bad enough that the operating system will yell at you. A segmentation fault is not something the computer likes a lot.
Types
All variables in C will have types. C is a statically-typed language, meaning that variable types will be checked at compile time. This might take some getting used to at the beginning, but can also be seen as a good thing, as it can reduce runtime errors such as type errors where you try to assign a number to a string.
However, it is possible to perform typecasts, so it is possible to cast a int type (which are integer values) to a double type (a floating type value). However, it is not possible to try to cast an int directly to a string like char*.
So, for example, in some languages the following is allowed:
// Example of a very weakly-typed pseudolanguage with implicit typecasts:
number n = 42
string s = "answer: "
string result = s + n // Result: "answer: 42"
In C, one would have to call an itoa function to get a char* representation of an int, then use strcat to concatenate two strings.
Conclusion
Those things said, learning C coming from a higher language can be very eye-opening and probably challenging to begin with, but once you get a hang of it, it can be pretty fun to work with.
I'd recommend starting to experiment with a C compiler, and have a good book or reference.
I think many people will recommend the K&R book, which is indeed an excellent book.
At first, I didn't think recommending K&R as the first C book would be a good idea because it may be a little bit on the difficult side, but on second thought, I think it is a very comprehensive and well-written book that can be good for getting into C if you already have some programming experience.
Good luck!
Well ... You might be in for something of a culture shock. These are the 32 standard keywords in C, and that includes the basic types.
C's standard library is pretty functional (more so than people perhaps expect), but very very thin when compared to what higher-level languages give you. There is no hash table in sight, and you are correct to assume that C does not have syntactic or semantic support for objects.
It is possible to write pretty object-oriented code anyway, but you will have to jump through a few hoops, and do much more manually since the language won't help you. See for instance the GTK+ UI toolkit for an example of a well-designed object-oriented C library/API.
I'm a web coder: I currently enjoy AS3 and deal with PHP. I own a Nintendo DS and want to give C a go.
Why do you want to do C programming?
What are your reasons, what do you hope to achieve?
Is it in order to write software for the Nintendo DS?
From a higher level, what basic things/creature comforts are going to go missing?
Given your background, I think you'll personally miss the lack of dynamic typing support, in other words you will have to be very explicit in your C programs, your data must be specified with proper types, so that the compiler knows what type of data you are working with. This also applies to any sort of memory management, i.e. basically anything once you start working with data structures that are non PODs.
For example, where you would do something like this in php:
function multiply(x) {
return (x*x);
}
You would have to do something like this in C:
int multiply(int x) {
return (x*x);
}
While these may seem fairly similar, there are big differences, namely typing restrictions: the php version will also work with floating point values, while in C you would have to explicitly provide versions for different types and ranges of values (C types are constrained to certain ranges).
I can't find [for... in] loops, so assume they aren't there
in C, it looks more like the following:
int c;
for (c=0;c<=10;c++) {
// loop body
}
it looks like I'm going to have to declare things religiously
Yes, very much so - much more so, than you'll appreciate
and I assume I have no objects (which I dealt with in PHP a while ago).
correct, no objects - but OOP can still be emulated using other ways, such as function(struct obj)
Depending on your goals and motivation, I think you may find C a pretty frustrating language to start serious programming with, you may want to look into some of the related alternatives like for example Java instead.
Dynamic arrays and garbage collection. It's not built in to C so you'll need to roll your own or use a pre-existing solution.
The standard procedure is that you manage the memory yourself which might sound like something horrible but it really isn't. For example in AS3 and PHP you can create an array and forget it when you're done with it. In C you'll have to make sure to deallocate it yourself or memory will leak and bad stuff can/will happen.
You'll particularly miss automatic memory management, and semantically meaningful datatypes such as strings, tables &c. However, learning C well is quite instructive, even though you probably don't want to use it for application-level programming, so I suggest you grab a "K&R" (Kernighan and Ritchie's seminal book) and give it a go -- you'll find plenty of free libraries on the web to use and study as you proceed beyond that, though you'll have to discipline yourself to use proper memory management heuristics... happy learning!
I was just doing some research online, and it seems there's a viable possibility to use lua for developing on the "nintendo DS", this may in fact be the easiest way for someone familiar with high level languages to get started doing embedded development, without sacrificing too much HLL power and without experiencing the inevitable culture shock when migrating from a HLL to C: microlua, here are the API docs.
So you might want to give it a go, possibly using an emulator for starters.
Keep us posted!
I'm pretty sure you want to be looking at C++, not C. C++ is basically object oriented C.
What you'll REALLY miss is the ability to rapidly prototype and test changes. You can't just change a line of code and run. Even using build tools like "make" a recompile can often take several minutes. This is even worse when you consider that it's really easy to make mistakes in C/C++. On large projects I reckon I spend more time compiling than actually coding. As a long-term user of script languages this is my biggest issue with using C.
Moving directly from a higher-level language running on a machine with effectively infinite resources to a DS is going to be a challenge, and not just because of the language.
The Nintendo DS has only 4MB of RAM, a 66MHz ARM-7, no operating system, and the development libraries available (such as libnds) provide only a thin abstraction over the hardware itself.
So, in addition to having to deal with manual memory management, a simpler language with fewer creature comforts, static typing, lack of objects, and the need to run a compile step before you can see any changes, you also have to deal with memory fragmentation, a very slow CPU by modern standards, and needing to interact with the hardware directly in order to do anything useful.
Writing code for the DS, the only other option is C++. You can't use a lot of the advanced features that make C++ worthwhile on such a limited system. You'd be writing C code using a C++ compiler.
That said, it's a lot of fun. You can screw around with the hardware all you like, and there's no need to interface with the operating system, because there isn't one.
C is the next level above straight assembler and allows you to operate close to the metal. This gives power to do amazing stuff but also to easily shoot yourself in the foot!
One such example is direct memory access and the perils and wonder of pointer arithmetic. Pointers are very powerful, fast, and handy however require careful management. See this SO question for an example.
Also as mentioned by the other answerers you will have to do your own memory management. Again powerful and painful.
I would recommend studying up a good textbook and find some quality example code. The key thing is to learn the patterns that make all this stuff hang together correctly and elegantly (well, as much as possible). A good debugger will also really help and get familiar with the standard C libraries too.
You may notice your applications crashing at the drop of a hat initially but perservere as C is definitely worth at least dabbling in. You will understand some of the amazing abstractions higher level languages provide and what is really going on under the hood.
We need more homebrew developers. I am a GBA/NDS and many other embedded platform developer and hope to see that you continue with this. I would say skip to arm assembler and then back up to C or any other language you like, once you know how the processor works, languages are just syntax.
I assume your prior experience covers the programming mindset, breaking things down into bite sized chunks and then writing code to perform those chunks. Then another module that links those together and so on. Then C is just another language, a very very simple language, no need to dive into the corners of it, drive down the middle. It is a good habit to declare variables, etc, and here you will have to. The compilers will tell you when you have forgotten something. You are not going to need big concepts, big structures, language magic, this is embedded, you are resource limited, write some bytes here, read a register there, extract a bit from the data to see if a button has been pressed, write a register in response to move a sprite, etc.
I think you will find the NDS much harder than C at first, there are two processors and some infrastructure to get the simplest of working binaries. Granted there are many many examples out there as well. I generally (and still do) recommend starting with the GBA then graduate to the NDS. bite size chunks.
A lot of things from OOP is the same or almost the same in PHP and C#.
You don't play with pointers in C# (compared to C++) so I would definitely recommend going with C# if you want to play with C.
What C are you talking about?
C#
foreach(string item in itemsCollection)
{
...
}
PHP
foreach($itemsCollection as $key=>$value)
{
...
}
etc.
I like C# because it is strongly typed and your types are automatically checked while you write a code... The possibility of trying to save integer into string or vice versa is zero compared to PHP where you can save anything into anything...

How to implement continuations?

I'm working on a Scheme interpreter written in C. Currently it uses the C runtime stack as its own stack, which is presenting a minor problem with implementing continuations. My current solution is manual copying of the C stack to the heap then copying it back when needed. Aside from not being standard C, this solution is hardly ideal.
What is the simplest way to implement continuations for Scheme in C?
A good summary is available in Implementation Strategies for First-Class Continuations, an article by Clinger, Hartheimer, and Ost. I recommend looking at Chez Scheme's implementation in particular.
Stack copying isn't that complex and there are a number of well-understood techniques available to improve performance. Using heap-allocated frames is also fairly simple, but you make a tradeoff of creating overhead for "normal" situation where you aren't using explicit continuations.
If you convert input code to continuation passing style (CPS) then you can get away with eliminating the stack altogether. However, while CPS is elegant it adds another processing step in the front end and requires additional optimization to overcome certain performance implications.
I remember reading an article that may be of help to you: Cheney on the M.T.A. :-)
Some implementations of Scheme I know of, such as SISC, allocate their call frames on the heap.
#ollie: You don't need to do the hoisting if all your call frames are on the heap. There's a tradeoff in performance, of course: the time to hoist, versus the overhead required to allocate all frames on the heap. Maybe it should be a tunable runtime parameter in the interpreter. :-P
If you are starting from scratch, you really should look in to Continuation Passing Style (CPS) transformation.
Good sources include "LISP in small pieces" and Marc Feeley's Scheme in 90 minutes presentation.
It seems Dybvig's thesis is unmentioned so far.
It is a delight to read. The heap based model
is the easiest to implement, but the stack based
is more efficient. Ignore the string based model.
R. Kent Dybvig. "Three Implementation Models for Scheme".
http://www.cs.indiana.edu/~dyb/papers/3imp.pdf
Also check out the implementation papers on ReadScheme.org.
https://web.archive.org/http://library.readscheme.org/page8.html
The abstract is as follows:
This dissertation presents three implementation models for the Scheme
Programming Language. The first is a heap-based model used in some
form in most Scheme implementations to date; the second is a new
stack-based model that is considerably more efficient than the
heap-based model at executing most programs; and the third is a new
string-based model intended for use in a multiple-processor
implementation of Scheme.
The heap-based model allocates several important data structures in a
heap, including actual parameter lists, binding environments, and call
frames.
The stack-based model allocates these same structures on a stack
whenever possible. This results in less heap allocation, fewer memory
references, shorter instruction sequences, less garbage collection,
and more efficient use of memory.
The string-based model allocates versions of these structures right in
the program text, which is represented as a string of symbols. In the
string-based model, Scheme programs are translated into an FFP
language designed specifically to support Scheme. Programs in this
language are directly executed by the FFP machine, a
multiple-processor string-reduction computer.
The stack-based model is of immediate practical benefit; it is the
model used by the author's Chez Scheme system, a high-performance
implementation of Scheme. The string-based model will be useful for
providing Scheme as a high-level alternative to FFP on the FFP machine
once the machine is realized.
Besides the nice answers you've got so far, I recommend Andrew Appel's Compiling with Continuations. It's very well written and while not dealing directly with C, it is a source of really nice ideas for compiler writers.
The Chicken Wiki also has pages that you'll find very interesting, such as internal structure and compilation process (where CPS is explained with an actual example of compilation).
Examples that you can look at are: Chicken (a Scheme implementation, written in C that support continuations); Paul Graham's On Lisp - where he creates a CPS transformer to implement a subset of continuations in Common Lisp; and Weblocks - a continuation based web framework, which also implements a limited form of continuations in Common Lisp.
Continuations aren't the problem: you can implement those with regular higher-order functions using CPS. The issue with naive stack allocation is that tail calls are never optimised, which means you can't be scheme.
The best current approach to mapping scheme's spaghetti stack onto the stack is using trampolines: essentially extra infrastructure to handle non-C-like calls and exits from procedures. See Trampolined Style (ps).
There's some code illustrating both of these ideas.
The traditional way is to use setjmp and longjmp, though there are caveats.
Here's a reasonably good explanation
Continuations basically consist of the saved state of the stack and CPU registers at the point of context switches. At the very least you don't have to copy the entire stack to the heap when switching, you could only redirect the stack pointer.
Continuations are trivially implemented using fibers. http://en.wikipedia.org/wiki/Fiber_%28computer_science%29
. The only things that need careful encapsulation are parameter passing and return values.
In Windows fibers are done using the CreateFiber/SwitchToFiber family of calls.
in Posix-compliant systems it can be done with makecontext/swapcontext.
boost::coroutine has a working implementation of coroutines for C++ that can serve as a reference point for implementation.
As soegaard pointed out, the main reference remains R. Kent Dybvig. "Three Implementation Models for Scheme".
The idea is, a continuation is a closure that keeps its evaluation control stack. The control stack is required in order to continue the evalution from the moment the continuation was created using call/cc.
Oftenly invoking the continuation makes long time of execution and fills the memory with duplicated stacks. I wrote this stupid code to prove that, in mit-scheme it makes the scheme crash,
The code sums the first 1000 numbers 1+2+3+...+1000.
(call-with-current-continuation
(lambda (break)
((lambda (s) (s s 1000 break))
(lambda (s n cc)
(if (= 0 n)
(cc 0)
(+ n
;; non-tail-recursive,
;; the stack grows at each recursive call
(call-with-current-continuation
(lambda (__)
(s s (- n 1) __)))))))))
If you switch from 1000 to 100 000 the code will spend 2 seconds, and if you grow the input number it will crash.
Use an explicit stack instead.
Patrick is correct, the only way you can really do this is to use an explicit stack in your interpreter, and hoist the appropriate segment of stack into the heap when you need to convert to a continuation.
This is basically the same as what is needed to support closures in languages that support them (closures and continuations being somewhat related).

Resources