I have two questions related to memory. First some background. I am a novice-intermediate c programmer.
I have written several different tree like data-structures with variable number of nodes at each level. One such structure, can have as its data a number of integer variables, which themselves are primary data for integer trees. I have written recursive functions for genrating trees with random numbers of nodes at different levels. I pass pointers to randomly generated integer trees as parameters for generating the main data-structure.
I have also written recursive code for operating on these tree structures, such as printing the tree. Just for my learning, I created queue and stack for my nodes and wrote iterative functions for in-order, pre-order and posr-order printing of the tree. I think, I am beginning to get the hang of it.
Now the question.
(a) I need to write other functions, which are obviously easy and clean if written using pure recursion. I can see how it could be written iteratively. It is not difficult, just tedious. The maximum depth of my trees will be 3-5, however, the number of nodes at each level is large. It is my understanding, that every recursive call will store addresses on a stack. If the depth is large, it can run out of memory. But if the depth is shallow, the penalty (memory/speed) of using a recursive function may not be terrible.
Do people have recommendations on criteria for deciding if an iterative/recursive solution is preferable?? I have read various threads on the site about iterative soution, but could not find any thing that directly speaks to this issue.
(b) Second, question relates to requesting memory from the system. I know that some applications can request certain amount of memory. I am using mingw-gcc4.x with Netbeans IDE. How can I specify the maximum amount of memory that the program can use in debug / release mode? Or, does it depend solely on the available RAM and no explicit specification is necessary?!
Thanks in advance,
paras
~RT
"The maximum depth of my trees will be 3-5"
This depth of recursion will not challenge the default stack size of any version of Windows, or any other system you'll ever see that doesn't have "Watch out! The stack is tiny!" plastered all over it. Most programs go a lot more than 3-5 calls deep, without involving any recursion at all.
So as long as your recursion only goes "down" your tree, not "across" its breadth, you're fine. Unless of course you're doing something really unusual like sticking an enormous array on the stack as a local variable.
Am I right that your (post-order) recursion looks something like this?
void doNode(struct node *n) {
for (int i = 0; i < n->num_nodes; ++i) {
doNode(n->nodes[i]);
}
// do some work on this node
}
If so, then for 3-5 levels the only way it'll use a lot of stack is if it looks like this:
void doNode(struct node *n) {
int myRidiculousArray[100*1000] = { 0 };
for (int i = 0; i < n->num_nodes; ++i) {
doNode(n->nodes[i]);
}
// do some work on this node, using myRidiculousArray
}
If you have millions of nodes, then there may be some performance gain to be had from avoiding the function call overhead per node. But write it the easy way first, and then come back and look at it later if you're ever desperate for a bit more performance. It's fairly rare for the overhead of a function call per node to be reason your code is slow - it does happen, but usually only after you've fixed a bunch of other, even worse, slowdowns.
If you write your function using tail recursion (provided you're compiling with optimization enabled) you won't run into problems with stack or memory space.
In the end you need to program your functions so you can understand them so do whatever is easier for you.
Even an iterative implementation is a recursive algorithm if you're using a stack to store nodes; both use O(f) space, where "f" is a function that's "more" than a constant (c is O(f) but f is not O(1)). You might still wind up using less memory with the iterative version if the elements of your stack are smaller than a call-stack frame. If so, you can look into reducing the size of a call stack by using closures, assuming the language supports them.
Iterative algorithms will have O(1) space requirements. Even a recursive implementation can achieve this using tail calls, as Dashogun mentions.
Spend a little time trying to find an iterative algorithm. If you can't find one, I recommend going with the recursive implementation unless you know for certain that you need to handle a recursive structure that (these days) has a depth of at least 213. For a binary tree, that's 2213 nodes, which I very much doubt you'll see.
(a) Recursion is not bad in itself. However if writing the iterative algo is close in complexity you should use the iterative one. Before commiting to a recursive algorithm some prerequisites apply:
-You should make sure that the recursion depth (and the local variables in the re-entrant functions) will not make you exceed the stack size. For the depth you mentioned used on Windows this would be a problem in very few cases. Additionally you can add a safety check on the height of the tree.
(b) If you are asking about the stack size: I see you use mingw, thus you probably build for Windows. The stack size in Windows is per thread. Have a look here how to setup your reserved and initially commited stack size.
If you are asking about heap memory allocation have a look here. But the short story is that you can use all the memory the system can provide for heap allocations.
Related
I have seen a lot of search algorithms to search in a binary sorted tree, but all of them are using the same way: recursion. I know recursion is expensive in comparison to loops, because every time we call the search function, a new stack frame is created for that method, which will eventually use a lot of memory if the binary search tree is too big.
Why can't we search the binary search tree like this:
while (root!=NULL)
{
if (root->data==data)
return 0;
else if (data > root->data)
root=root->right;
else
root=root->left;
}
I think that this way is faster and more efficient than the recursive way, correct me if I am wrong!
Probably your way -which is the common way to code that in C- might be faster, but you should benchmark, because some C compilers (e.g. recent GCC when invoked with gcc -O2 ...) are able to optimize most tail calls as a jump (and passing values in registers). tail call optimization means that a call stack frame is reused by the callee (so the call stack stays bounded). See this question.
FWIW, in OCaml (or Scheme, or Haskell, or most Common Lisp implementations) you would code a tail-recursive call and you know that the compiler is optimizing it as a jump.
Recursion is not always slower than loops (in particular for tail-calls). It is a matter of optimization by the compiler.
Read about continuations and continuation passing style. If you know only C, learn some functional language (Ocaml, Haskell, or Scheme with SICP ...) where tail-calls are very often used. Read about call/cc in Scheme.
Yes, that's the normal way of doing it.
Theoretically both your solution and the recursive solution have the same Big Oh complexity. In theory they are both O(log n). If you want performance measured in seconds you need to go practical, write the code of both methods (iterative, recursive), run them and measure the run time.
And is there a way to easily monitor your stack depth in a linux environment?
Consider the case of a basic app in C, compiled with gcc, in Ubuntu.
How about if you do NOT allow dynamic memory allocation (no malloc/free-ing)?
Can you know your max stack depth after you compile?
No. Consider a recursive function that might call itself any number of times, depending on input. You can't know how many times the function might be invoked, one inside the last, without knowing what the input to the program is.
I expect that it might be possible to determine the max stack depth for some programs, but you can't determine that for all programs.
And is there a way to easily monitor your stack depth in a linux environment?
I don't know about an easy way to monitor the stack depth continuously, but you can determine the stack depth at any point using gdb's -stack-info-depth command.
How about if you do NOT allow dynamic memory allocation (no malloc/free-ing)?
That doesn't make a difference. Consider a recursive Fibonacci function -- that wouldn't need to allocate any memory dynamically, but the number of stack frames will still vary depending on input.
It is possible to do a call-graph analysis. The longest path in the graph is the maximum stack depth. However, there are limitations.
With recursive functions, all bets are off, the depth of recursion depends on the run time input, it is not possible to deduce that in compile time analysis. [It is possible to detect the presence of recursive function by analyzing the call graph and looking for self edges, i.e. edges with same source and destination.]
Furthermore, the same issue is present if there are loops/cycles in the call graph. [As mentioned by #Caleb: a()->b()->c()->d()->e()->f()->g()->c()] [Using algorithms from graph theory, it is possible to detect the presence of cycles as well.]
References for call graph:
http://en.wikipedia.org/wiki/Call_graph
Tools to get a pictorial function call graph of code
While implementing a lexical analyzer in C, I found myself writing a recursive code:
// Return a List of Tokens
List Lexer_run(const char* input) {
if(input[0] == '\0') {
return new_List();
} else {
// ... Find Token and possibly advance in input for a few characters
return List_add(Lexer_run(nextInput), newToken);
}
Consider another example in an implementation of linked list
List_length(List* this) {
if(!this) {
return 0;
} else {
return 1 + List_length(this->next);
}
}
I am wondering if I can always use such recursive code in C or if I should avoid it unless the case
really requires recursion (for example recursive descent parser or a tree structure)
What I think so far
Advantages of recursion:
readable and elegant
Drawbacks:
will rapidly cause a stack overflow (in my computer it's around 1'000'000 calls)
can be inefficient comparing to an iterative version
Solutions:
use tail-call optimization and let the compiler transform my recursion into loops but I find tail-call code to be less readable.
Increase stack size for my program
Note
My question is not specifically about my examples but rather a general question if one should use
recursion in C.
As a rule, you want to use recursion for tasks that are inherently recursive, such as walking recursive data structures, serializing structures defined recursively, or producing recursive structures from "flat" input (e.g. parsing a language). You do not want to apply recursion for tasks that can be expressed in terms of iteration, such as walking linear data structures.
Recursion is a powerful mechanism, but using it in place of iteration is like swatting a fly with a sledgehammer *. Memory efficiency and a possibility of stack overflow are both very important considerations, but they are secondary to understandability of your code. Even if you eliminate the negative consequences of applying recursion where an iteration is sufficient by letting the compiler optimize tail call for you, readers of your program would be scratching their heads trying to understand first what you did, and then why you did it.
When you apply recursion to a recursive task (trees, recursive descent parsers, divide and conquer algorithms that process the entire input, searches with backtracking) your code becomes more readable, because it matches the task at hand. On the other hand, when you apply recursion to an inherently non-recursive task, your code becomes harder to read.
* This metaphor on recursion vs. iteration is borrowed from an introductory chapters of one of Dijkstra's books.
Tail call optimization is frankly not to be trusted. There's too many gotchas which can scare the optimizer away from applying it in apparently innocuous cases. Be glad that it's there, but don't rely on it.
Because of this (and because of the fixed stack size), you generally want to avoid recursion unless you actually need its implicit stack structure [you don't, for your List_length]. Even then, be aware of the potential for stack overflows.
The primary benefit of writing recursive functions is readability. That makes them "first draft" material for algorithms like recursive descent which have a naturally recursive structure. Then rewrite them as iterative (with a stack as necessary) if/when you run into trouble.
A side benefit of doing things that way: you can keep the recursive version around as a reference implementation, and unit test them to ensure they are equivalent.
I am new here so apologies if I did the post in a wrong way.
I was wondering if someone could please explain why is C so slow with function calling ?
Its easy to give a shallow answer to the standard question about Recursive Fibonacci, but I would appreciate if I knew the "deeper" reason as deep as possible.
Thanks.
Edit1 : Sorry for that mistake. I misunderstood an article in Wiki.
When you make a function call, your program has to put several registers on the stack, maybe push some more stuff, and mess with the stack pointer. That's about all for what can be "slow". Which is, actually, pretty fast. About 10 machine instructions on an x86_64 platform.
It's slow if your code is sparse and your functions are very small. This is the case of the Fibonacci function. However, you have to make a difference between "slow calls" and "slow algorithm": calculating the Fibonacci suite with a recursive implementation is pretty much the slowest straightforward way of doing it. There is almost as much code involved in the function body than in the function prologue and epilogue (where pushing and popping takes place).
There are cases in which calling functions will actually make your code faster overall. When you deal with large functions and your registers are crowded, the compiler may have a rough time deciding in which register to store data. However, isolating code inside a function call will simplify the compiler's task of deciding which register to use.
So, no, C calls are not slow.
Based on the additional information you posted in the comment, it seems that what is confusing you is this sentence:
"In languages (such as C and Java)
that favor iterative looping
constructs, there is usually
significant time and space cost
associated with recursive programs,
due to the overhead required to manage
the stack and the relative slowness of
function calls;"
In the context of a recursive implementation fibonacci calculations.
What this is saying is that making recursive function calls is slower than looping but this does not mean that function calls are slow in general or that function calls in C are slower than function calls in other languages.
Fibbonacci generation is naturally a recursive algorithm, and so the most obvious and natural implementation involves many function calls, but is can also be expressed as an iteration (a loop) instead.
The fibonacci number generation algorithm in particular has a special property called tail recursion. A tail-recursive recursive function can be easily and automatically converted into an iteration, even if it is expressed as a recursive function. Some languages, particularly functional languages where recursion is very common and iteration is rare, guarantee that they will recognize this pattern and automatically transform such a recursion into an iteration "under the hood". Some optimizing C compilers will do this as well, but it is not guaranteed. In C, since iteration is both common and idiomatic, and since the tail recursive optimization is not necessarily going to be made for you by the compiler, it is a better idea to write it explicitly as an iteration to achieve the best performance.
So interpreting this quote as a comment on the speed of C function calls, relative to other languages, is comparing apples to oranges. The other languages in question are those that can take certain patterns of function calls (which happen to occur in fibbonnaci number generation) and automatically transform them into something that is faster, but is faster because it is actually not a function call at all.
C is not slow with function calls.
The overhead of calling a C function is extremely low.
I challenge you to provide evidence to support your assertion.
There are a couple of reasons C can be slower than some other languages for a job like computing Fibonacci numbers recursively. Neither really has anything to do with slow function calls though.
In quite a few functional languages (and languages where a more or less functional style is common), recursion (often very deep recursion) is quite common. To keep speed reasonable, many implementations of such languages do a fair amount of work optimizing recursive calls to (among other things) turn them into iteration when possible.
Quite a few also "memoize" results from previous calls -- i.e., they keep track of the results from a function for a number of values that have been passed recently. When/if the same value is passed again, they can simply return the appropriate value without re-calculating it.
It should be noted, however, that the optimization here isn't really faster function calls -- it's avoiding (often many) function calls.
The Recursive Fibonacci is the reason, not C-language. Recursive Fibonacci is something like
int f(int i)
{
return i < 2 ? 1 : f(i-1) + f(i-2);
}
This is the slowest algorithm to calculate Fibonacci number, and by using stack store called functions list -> make it slower.
I'm not sure what you mean by "a shallow answer to the standard question about Recursive Fibonacci".
The problem with the naive recursive implementation is not that the function calls are slow, but that you make an exponentially large number of calls. By caching the results (memoization) you can reduce the number of calls, allowing the algorithm to run in linear time.
Of all the languages out there, C is probably the fastest (unless you are an assembly language programmer). Most C function calls are 100% pure stack operations. Meaning when you call a function, what this translates too in your binary code is, the CPU pushes any parameters you pass to your function onto the stack. Afterwards, it calls the function. The function then pops your parameters. After that, it executes whatever code makes up your function. Finally, any return parameters are pushed onto the stack, then the function ends and the parameters are popped off. Stack operations on any CPU are usually faster then anything else.
If you are using a profiler or something that is saying a function call you are making is slow, then it HAS to be the code inside your function. Try posting your code here and we will see what is going on.
I'm not sure what you mean. C is basically one abstraction layer on top of CPU assembly instructions, which is pretty fast.
You should clarify your question really.
In some languages, mostly of the functional paradigm, function calls made at the end of a function body can be optimized so that the same stack frame is re-used. This can potentially save both time and space. The benefit is particularly significant when the function is both short and recursive, so that the stack overhead might otherwise dwarf the actual work being done.
The naive Fibonacci algorithm will therefore run much faster with such optimization available. C does not generally perform this optimization, so its performance could suffer.
BUT, as has been stated already, the naive algorithm for the Fibonacci numbers is horrendously inefficient in the first place. A more efficient algorithm will run much faster, whether in C or another language. Other Fibonacci algorithms probably will not see nearly the same benefit from the optimization in question.
So in a nutshell, there are certain optimizations that C does not generally support that could result in significant performance gains in certain situations, but for the most part, in those situations, you could realize equivalent or greater performance gains by using a slightly different algorithm.
I agree with Mark Byers, since you mentioned the recursive Fibonacci. Try adding a printf, so that a message is printed each time you do an addition. You will see that the recursive Fibonacci is doing a lot more additions that it may appear at first glance.
What the article is talking about is the difference between recursion and iteration.
This is under the topic called algorithm analysis in computer science.
Suppose I write the fibonacci function and it looks something like this:
//finds the nth fibonacci
int rec_fib(n) {
if(n == 1)
return 1;
else if (n == 2)
return 1;
else
return fib(n-1) + fib(n - 2)
}
Which, if you write it out on paper (I recommend this), you will see this pyramid-looking shape emerge.
It's taking A Whole Lotta Calls to get the job done.
However, there is another way to write fibonacci (there are several others too)
int fib(int n) //this one taken from scriptol.com, since it takes more thought to write it out.
{
int first = 0, second = 1;
int tmp;
while (n--)
{
tmp = first+second;
first = second;
second = tmp;
}
return first;
}
This one only takes the length of time that is directly proportional to n,instead of the big pyramid shape you saw earlier that grew out in two dimensions.
With algorithm analysis you can determine exactly the speed of growth in terms of run-time vs. size of n of these two functions.
Also, some recursive algorithms are fast(or can be tricked into being faster). It depends on the algorithm - which is why algorithm analysis is important and useful.
Does that make sense?
I have a circular, statically allocated buffer in C, which I'm using as a queue for a depth breadth first search. I'd like have the top N elements in the queue sorted. It would be easy to just use a regular qsort() - except it's a circular buffer, and the top N elements might wrap around. I could, of course, write my own sorting implementation that uses modular arithmetic and knows how to wrap around the array, but I've always thought that writing sorting functions is a good exercise, but something better left to libraries.
I thought of several approaches:
Use a separate linear buffer - first copy the elements from the circular buffer, then apply qsort, then copy them back. Using an additional buffer means an additional O(N) space requirement, which brings me to
Sort the "top" and "bottom" halve using qsort, and then merge them using the additional buffer
Same as 2. but do the final merge in-place (I haven't found much on in-place merging, but the implementations I've seen don't seem worth the reduced space complexity)
On the other hand, spending an hour contemplating how to elegantly avoid writing my own quicksort, instead of adding those 25 (or so) lines might not be the most productive either...
Correction: Made a stupid mistake of switching DFS and BFS (I prefer writing a DFS, but in this particular case I have to use a BFS), sorry for the confusion.
Further description of the original problem:
I'm implementing a breadth first search (for something not unlike the fifteen puzzle, just more complicated, with about O(n^2) possible expansions in each state, instead of 4). The "bruteforce" algorithm is done, but it's "stupid" - at each point, it expands all valid states, in a hard-coded order. The queue is implemented as a circular buffer (unsigned queue[MAXLENGTH]), and it stores integer indices into a table of states. Apart from two simple functions to queue and dequeue an index, it has no encapsulation - it's just a simple, statically allocated array of unsigned's.
Now I want to add some heuristics. The first thing I want to try is to sort the expanded child states after expansion ("expand them in a better order") - just like I would if I were programming a simple best-first DFS. For this, I want to take part of the queue (representing the most recent expanded states), and sort them using some kind of heuristic. I could also expand the states in a different order (so in this case, it's not really important if I break the FIFO properties of the queue).
My goal is not to implement A*, or a depth first search based algorithm (I can't afford to expand all states, but if I don't, I'll start having problems with infinite cycles in the state space, so I'd have to use something like iterative deepening).
I think you need to take a big step back from the problem and try to solve it as a whole - chances are good that the semi-sorted circular buffer is not the best way to store your data. If it is, then you're already committed and you will have to write the buffer to sort the elements - whether that means performing an occasional sort with an outside library, or doing it when elements are inserted I don't know. But at the end of the day it's going to be ugly because a FIFO and sorted buffer are fundamentally different.
Previous answer, which assumes your sort library has a robust and feature filled API (as requested in your question, this does not require you to write your own mod sort or anything - it depends on the library supporting arbitrary located data, usually through a callback function. If your sort doesn't support linked lists, it can't handle this):
The circular buffer has already solved this problem using % (mod) arithmetic. QSort, etc don't care about the locations in memory - they just need a scheme to address the data in a linear manner.
They work as well for linked lists (which are not linear in memory) as they do for 'real' linear non circular arrays.
So if you have a circular array with 100 entries, and you find you need to sort the top 10, and the top ten happen to wrap in half at the top, then you feed the sort the following two bits of information:
The function to locate an array item is (x % 100)
The items to be sorted are at locations 95 to 105
The function will convert the addresses the sort uses into an index used in the real array, and the fact that the array wraps around is hidden, although it may look weird to sort an array past its bounds, a circular array, by definition, has no bounds. The % operator handles that for you, and you might as well be referring to the part of the array as 1295 to 1305 for all it cares.
Bonus points for having an array with 2^n elements.
Additional points of consideration:
It sounds to me that you're using a sorting library which is incapable of sorting anything other than a linear array - so it can't sort linked lists, or arrays with anything other than simple ordering. You really only have three choices:
You can re-write the library to be more flexible (ie, when you call it you give it a set of function pointers for comparison operations, and data access operations)
You can re-write your array so it somehow fits your existing libraries
You can write custom sorts for your particular solution.
Now, for my part I'd re-write the sort code so it was more flexible (or duplicate it and edit the new copy so you have sorts which are fast for linear arrays, and sorts which are flexible for non-linear arrays)
But the reality is that right now your sort library is so simple you can't even tell it how to access data that is non linearly stored.
If it's that simple, there should be no hesitation to adapting the library itself to your particular needs, or adapting your buffer to the library.
Trying an ugly kludge, like somehow turning your buffer into a linear array, sorting it, and then putting it back in is just that - an ugly kludge that you're going to have to understand and maintain later. You're going to 'break' into your FIFO and fiddle with the innards.
-Adam
I'm not seeing exactly the solution you asked for in c. You might consider one of these ideas:
If you have access to the source for your libc's qsort(), you might copy it and simply replace all the array access and indexing code with appropriately generalized equivalents. This gives you some modest assurance that the underling sort is efficient and has few bugs. No help with the risk of introducing your own bugs, of course. Big O like the system qsort, but possibly with a worse multiplier.
If the region to be sorted is small compared to the size of the buffer, you could use the straight ahead linear sort, guarding the call with a test-for-wrap and doing the copy-to-linear-buffer-sort-then-copy-back routine only if needed. Introduces an extra O(n) operation in the cases that trip the guard (for n the size of the region to be sorted), which makes the average O(n^2/N) < O(n).
I see that C++ is not an option for you. ::sigh:: I will leave this here in case someone else can use it.
If C++ is an option you could (subclass the buffer if needed and) overload the [] operator to make the standard sort algorithms work. Again, should work like the standard sort with a multiplier penalty.
Perhaps a priority queue could be adapted to solve your issue.'
You could rotate the circular queue until the subset in question no longer wraps around. Then just pass that subset to qsort like normal. This might be expensive if you need to sort frequently or if the array element size is very large. But if your array elements are just pointers to other objects then rotating the queue may be fast enough. And in fact if they are just pointers then your first approach might also be fast enough: making a separate linear copy of a subset, sorting it, and writing the results back.
Do you know about the rules regarding optimization? You can google them (you'll find a few versions, but they all say pretty much the same thing, DON'T).
It sounds like you are optimizing without testing. That's a huge no-no. On the other hand, you're using straight C, so you are probably on a restricted platform that requires some level of attention to speed, so I expect you need to skip the first two rules because I assume you have no choice:
Rules of optimization:
Don't optimize.
If you know what you are doing, see rule #1
You can go to the more advanced rules:
Rules of optimization (cont):
If you have a spec that requires a certain level of performance, write the code unoptimized and write a test to see if it meets that spec. If it meets it, you're done. NEVER write code taking performance into consideration until you have reached this point.
If you complete step 3 and your code does not meet the specs, recode it leaving your original "most obvious" code in there as comments and retest. If it does not meet the requirements, throw it away and use the unoptimized code.
If your improvements made the tests pass, ensure that the tests remain in the codebase and are re-run, and that your original code remains in there as comments.
Note: that should be 3. 4. 5. Something is screwed up--I'm not even using any markup tags.
Okay, so finally--I'm not saying this because I read it somewhere. I've spent DAYS trying to untangle some god-awful messes that other people coded because it was "Optimized"--and the really funny part is that 9 times out of 10, the compiler could have optimized it better than they did.
I realize that there are times when you will NEED to optimize, all I'm saying is write it unoptimized, test and recode it. It really won't take you much longer--might even make writing the optimized code easier.
The only reason I'm posting this is because almost every line you've written concerns performance, and I'm worried that the next person to see your code is going to be some poor sap like me.
How about somthing like this example here. This example easely sorts a part or whatever you want without having to redefine a lot of extra memory.
It takes inly two pointers a status bit and a counter for the for loop.
#define _PRINT_PROGRESS
#define N 10
BYTE buff[N]={4,5,2,1,3,5,8,6,4,3};
BYTE *a = buff;
BYTE *b = buff;
BYTE changed = 0;
int main(void)
{
BYTE n=0;
do
{
b++;
changed = 0;
for(n=0;n<(N-1);n++)
{
if(*a > *b)
{
*a ^= *b;
*b ^= *a;
*a ^= *b;
changed = 1;
}
a++;
b++;
}
a = buff;
b = buff;
#ifdef _PRINT_PROGRESS
for(n=0;n<N;n++)
printf("%d",buff[n]);
printf("\n");
}
#endif
while(changed);
system( "pause" );
}