I am in the process of creating a small language that is compiled to bytecode and run on a custom VM, the architecture of which has largely been influenced by what I've read about Python and Lua. There are two stacks - a data stack that stores function arguments, local variables and temporary values, and a frame stack that contains one entry per active function call. Each entry on the frame stack contains information such as the current function, instruction pointer (indexes into the bytecode array for the current function) and a base pointer (indexes into the data stack - marks where the function's args/locals begin).
Where I've become unstuck is implementing a REPL, or more specifically, the implementation of eval(). The idea so far has been to continuously evaluate user input within the same stack frame - but I can't see a clean way to allow new local variables to be created inside eval(). Because temporary data is always above locals on the stack (the stack grows upwards) the only approach I've been able to think of is to somehow notice that new locals have been created by eval() and then use some hackery to rearrange the stack - but this creates problems in the general case. For example, if there was a recursive function that conditionally used eval() I would need to walk the frame stack and possibly adjust the data stack for each frame.
Is my VM capable of supporting a sensible implementation of eval()? If yes, is the approach outlined above sensible? If no, what architectural changes are required?
Related
I read that the dynamic link points to the previous activation record ( aka a "stack frame"), so it makes sense in dynamic scoped programming language. But in static scoped programming language, why the access link (which points to activation record of function in one lower nesting level) isn't enough?
And specifically in C - why access link is not needed? And why dynamic link is needed?
I will use this nomenclature which is more familiar to me:
Activation record: Stack frame
Dynamic link: [saved] frame pointer
So, I interpret your question as: Why are frame pointers needed?[1]
A frame pointer is not required.
Some compilers (e.g. Green Hills C++, GCC with -O2) don’t usually generate one or can be asked not to generate it (MSVC, GCC).
That said, it does of course have its benefits:
Easy traversing of the call stack: generating a stack trace is as easy as traversing a linked list where the frame pointer forms the head. Makes implementing stack traces and debuggers much easier.
Easier code generation: stack variables can be referenced by indexing the frame pointer instead of the all-the-time-changing stack pointer. The stack pointer changes with each push/pop, the frame pointer stays constant within a function (between the prologue/epilogue)
Should things go awry, stack unwinding can be done using the frame pointer. This is how Borland’s structured exception handling (SEH) works.
Streamlines stack management: Particularly implementations of setjmp(3), alloca(3) and C99-VLA may (and usually do) depend on it.
Drawbacks:
Register usage: a x86 got only 8 general purpose registers. one of these would need to be dedicated fully for holding the frame pointer.
Overhead: the prologue/epilogue is generated for every function.
But as you noticed, a compiler can generate perfectly fine code without having to maintain a frame pointer.
[1] If that's not what's meant, please elaborate.
Your question might be related to the -fomit-frame-pointer optimizing option of GCC, then see this.
BTW, many people are naming call frames (in the call stack) what you name activation record. The notion of continuation, and also of continuation passing style and A-normal forms are closely related.
A dynamic link is really only useful for nested functions (and perhaps closures), and standard C does not have them. Some people speak of display links. Standard C don't have nested functions, so don't need any related trick (display link, trampoline, ...).
The GCC compiler provides nested functions as a C language extension, and implement them with dynamic links on activation records, very closely to what you are thinking about. Read also the wikipages on man or boy test and on trampoline.
I was watching a video about CUDA and the Barnes-Hut algorithm where it was stated that it is necessary to place a depth limit on the tree for the GPU, and then the idea popped into my head about possibly doing recursion in the heap.
Basically, I am wondering just that: Is it possible to allocate memory from the heap and use that as a temporary "stack" in which to place function calls for the recursive function in question to somewhat delay a stack overflow?
If so, how could it be implemented, would we allocate space for a pointer to the function? I assume it would involve storing function address in the heap however I'm not too sure.
[edit] I just wanted to add that this is purely a theoretical question, and i would imagine that doing this would cause the program to slow down once using the heap.
[edit] As per request, the compiler I am using is GCC 4.8.4 on Ubuntu 14.04 (64-bit)
Sure. This is called continuation-passing style. The standard library supports it with setjmp() and longjmp(), and stores the information needed to restore control to an earlier point in a structure called jmp_buf. (There are several restrictions on where you can restore from.) You would store them in a stack, which is just a LIFO queue.
A more general approach is to run the program as a state machine and store the information needed to backtrack the program state, called a continuation, in a data structure called a trampoline. A common reason to want to do this is to get the equivalent of tail-recursion in an implementation that doesn’t optimize it and might chew up lots of stack space. One real-world application where someone I know is currently writing a trampoline is a GLL parser where the grammar is represented as a directed graph, the result of the parse is a shared packed parse forest, and the parser often needs to backtrack to try a different rule.
Continuation-passing and trampolines seem to be regarded as fancy style because they come from the world of functional programming, while longjmp() is regarded as an ugly low-level hack and even the Linux man page says not to use it.
You can simulate this by implementing your own heap-based stack as an array of structures, with each structure representing a stack frame that holds the equivalent of parameters and local variables. Instead of a function calling itself recursively, the function loops and each "call" explicitly pushes a new frame onto the stack.
I did exactly this years ago while attempting to solve a simple board game. The program was originally recursive, and it took forever to run. I changed it to the above structure, and this made it simple to make the app interruptible/restartable. When interrupted the app dumped its "stack" to a state file. When restarted, the app loaded the state file and continued where it left off.
This does required some care if the stack frame structure contains embedded pointers, but it's not insurmountable.
This question was prompted by studying the C language.
I have seen in my data structure course that in many cases where recursion proves to be a quick and easy solution (e.g. quicksort, traversal of binary search tree, etc.) it has been explicitly mentioned that using a self-created stack is a better idea.
The reason given is that recursion requires many function calls, and function calls are 'slower'.
But how does using a self-created stack prove any better as any function call makes use of the stack?
There are two real reasons that self-created stacks can be more efficient than the execution stack:
The execution stack is meant to handle a generalized case of calling new functions. That means it has a lot of overhead: it has to contain pointers to the preceding function, it has to contain pointers to values on the heap, and a number of other bookkeeping items. This may be more than you need for your specific calculation if your calculation is, indeed, specific. All the additional management decreases efficiency. In situations where the function is very heavy and there are relatively few calls, this is fine. In a situation where the function itself is simpler, but there are many function calls, the cost of overhead increases disproportionately.
A generalized stack hides a lot of details from you, preventing you from taking advantage of directly referencing a different part of the stack. For instance, the root of the stack is hidden from you. Lets say you're searching for a particular value in a large tree, using recursion. At some point you're a thousand nodes deep in the tree and you find the value. Success! But then you have to climb out of the tree one function at a time: meaning at least a thousand calls just to return the value. (*) Instead, if you've written your own stack you can return immediately. Or, suppose you have an algorithm that, at certain nodes in the tree, requires you to back up n stack frames before continuing execution. Using the generalized stack frame you are required to back out of those frames until you find the one you're looking for. If you designed the stack specifically for your algorithm, you can provide a mechanism to immediately jump to the point of execution in one instruction rather than n.
Thus, it can behoove you to write your own stack when you can either take advantage of throwing out parts of the generalized stack frame mechanism that you don't need but cost time, or the algorithm being written can take advantage of moving rapidly through the stack if it knows what it is doing (where a generalized stack 'protects' you from doing this by hiding it's abstraction). Remember that function calls are just a particular generalized abstraction for handling code: if for some reason they are adding a constraint that makes your code awkward, you can probably create a stripped down version that more directly addresses your need.
You might also create your own stack if the memory allotted to your stack is small compared to the number of times you must recurse, such as if you have a very large input domain or if you're running on specialized small-footprint hardware or a similar situation. Again, though, it depends on the algorithm you're running and how the generalized stack solution helps or hinders it.
(*) Tail recursion can often help, but because tail recursion is by definition only entering a stack frame one level deeper, I'm assuming you're talking about a situation where that is not strictly possible.
The self-created stack means that you push just a few of the important variables to the self-created stack.
When you use recursion, the function header and state of all variables will be pushed to the stack.
If the depth of the recursion is high, using case 2 means that the memory will be exhausted pretty quickly.
Generally a function call has some overhead before anything inside the function is done. The code generated for a function call basically ensures that you'll find everything like you left it when you, well, return; while it gives you at the same time a clean empty environment inside the called function. In fact this convenience is one of the most crucial services C provides, next to the standard library. (In many other respects C is a mere macro assembler -- did you ever look at a C source and the generated assembler side by side?).
In particular usually a few registers must be saved, and possibly parameters must be copied on the call stack. The effort required depends on the processor, compiler and calling convention. For example, parameters and return values may be in registers, not on the stack (but then the parameters must be saved anyway for each recursive call, don't they?).
The overhead is relatively large if the function is small; that's why inlining can be powerful. Inlining recursive function calls is similar to loop unrolling. I don't know whether current compilers do that on a regular basis (they might). But it's risky to rely on the compiler, so I would avoid recursive implementations of trivial functions, like computing the factorial, if speed is important.
Maybe using a self created stack was not necessarily recommended for performance reasons. One good reason I can think of is that the "regular" stack may be of fixed size (often 1MB), so for example sorting large amounts of data would cause a stack overflow.
While using a for loop if we use a user-defined stack then does that stack get created in system heap memory? And user-defined stack will take much more time to get filled as compared to system stack? Thanks
The heap is all the memory that is not system stack so yes. A user defined stack would reside on the heap.
It all depends. Usually the system stack is quite fast but I imagine you want to use a user defined one to not blow the stack which implies we are dealing with a runtime that adds it's frame which has payload even when nothing is pushed (no arguments). In that case a stack based on an array might be faster since you are pushing less data on it at each iteration.
A stack based on other data structures will vary on efficiency but ultimately never get as efficient as an array.
I would have used the system stack until I know for sure I needed something else (In most languages you can set the stack size) and you end up writing less code which is readable and understandable.
If you need to do a stack then do it for readability first. Thus most likely it becomes slower than just increasing memory for system stack.
I bet you probably know the 3 rules of optimization
In a C program that doesn't use recursion, it should be possible in theory to work out the maximum/worst case stack size needed to call a given function, and anything that it calls. Are there any free, open source tools that can do this, either from the source code or compiled ELF files?
Alternatively, is there a way to extract a function's stack frame size from an ELF file, so I can try to work it out manually?
I'm compiling for the MSP430 using MSPGCC 3.2.3 (I know it's an old version, but I have to use it in this case). The stack space to allocate is set in the source code, and should be as small as possible so that the rest of memory can be used for other things. I have read that you need to take account of the stack space used by interrupts, but the system I'm using already takes account of this - I'm trying to work out how much extra space to add on top of that. Also, I've read that function pointers make this difficult. In the few places where function pointers are used here, I know which functions they can call, so could take account of these cases manually if the stack space needed for the called functions and the calling functions was known.
Static analysis seems like a more robust option than stack painting at runtime, but working it out at runtime is an option if there's no good way to do it statically.
Edit:
I found GCC's -fstack-usage flag, which saves the frame size for each function as it is compiled. Unfortunately, MSPGCC doesn't support it. But it could be useful for anyone who is trying to do something similar on a different platform.
While static analysis is the best method for determining maximum stack usage you may have to resort to an experimental method. This method cannot guarantee you an absolute maximum but can provide you with a very good idea of your stack usage.
You can check your linker script to get the location of __STACK_END and __STACK_SIZE. You can use these to fill the stack space with an easily recognizable pattern like 0xDEAD or 0xAA55. Run your code through a torture test to try and make sure as many interrupts are generated as possible.
After the test you can examine the stack space to see how much of the stack was overwritten.
Interesting question.
I would expect this information to be statically available in the debugging data included in debug builds.
I had a brief look at the DWARF standard, and it does specify two attributes for functions called DW_AT_frame_base and DW_AT_static_link which can be used to "computes the frame
base of the relevant instance of the subroutine
that immediately encloses the subroutine or entry point".
I think that the only to go is by static analysis. You need to account the space for all non-static local variables, which are going to be mostly pointers, but pointers that are going to be stored in the stack anyway, you'll need also to reserve space for the current running address within the caller, as it's going to be stored by the compiler on the stack so control can be return to the caller after your function returns, and also, you need space for all your function parameters.
Based on that, if you have a tool able to count all parameters, auto variables and figure out their size, you should be able to calculate the minimum stack frame size you'll need.
Please note that the compiler could also try to align values on the stack for your particular architecture, what could make the stack space requirements a little bigger that what you'd expect from this calculation.
Some embedded IDE can give info on stack usageduring runtime
I know that IAR eembedded workbench supports it.
Be aware that you need to take in account that interrupts occur asynchronously, so take the biggest stack usage scenario and add interrupt context to it. If nested interrupts are supported like in ARM processors you need to take this in account also.
TinyOS has some work done on stack size analysis. It is described here:
http://tinyos.stanford.edu/tinyos-wiki/index.php/Stack_Analysis
They only support AVR, but say that "MSP430 is not difficult to support but this is not super high priority". In any case, the page provides lots of resources.