what is the most efficient way to search binary search tree? - c

I have seen a lot of search algorithms to search in a binary sorted tree, but all of them are using the same way: recursion. I know recursion is expensive in comparison to loops, because every time we call the search function, a new stack frame is created for that method, which will eventually use a lot of memory if the binary search tree is too big.
Why can't we search the binary search tree like this:
while (root!=NULL)
{
if (root->data==data)
return 0;
else if (data > root->data)
root=root->right;
else
root=root->left;
}
I think that this way is faster and more efficient than the recursive way, correct me if I am wrong!

Probably your way -which is the common way to code that in C- might be faster, but you should benchmark, because some C compilers (e.g. recent GCC when invoked with gcc -O2 ...) are able to optimize most tail calls as a jump (and passing values in registers). tail call optimization means that a call stack frame is reused by the callee (so the call stack stays bounded). See this question.
FWIW, in OCaml (or Scheme, or Haskell, or most Common Lisp implementations) you would code a tail-recursive call and you know that the compiler is optimizing it as a jump.
Recursion is not always slower than loops (in particular for tail-calls). It is a matter of optimization by the compiler.
Read about continuations and continuation passing style. If you know only C, learn some functional language (Ocaml, Haskell, or Scheme with SICP ...) where tail-calls are very often used. Read about call/cc in Scheme.

Yes, that's the normal way of doing it.

Theoretically both your solution and the recursive solution have the same Big Oh complexity. In theory they are both O(log n). If you want performance measured in seconds you need to go practical, write the code of both methods (iterative, recursive), run them and measure the run time.

Related

Is there any use for recursions?

I've recently learned about tail-recursions as a way to make a recursion that doesn't crash when you give it too big of a number to work with. I realised that I could easily rewrite a tail-recursion as a while loop and have it do basically exactly the same thing, which lead me wondering - is there any use for recursions when you can do everything with a normal loop?
Yes, recursion code looks smaller and is easier to understand, but it also has a chance of completely crashing, while a simple loop cannot crash doing the same task.
I'll take for example the Haskell language, it is Purely functional:
Every function in Haskell is a function in the mathematical sense
(i.e., "pure"). Even side-effecting IO operations are but a
description of what to do, produced by pure code. There are no
statements or instructions, only expressions which cannot mutate
variables (local or global) nor access state like time or random
numbers.
So, in haskell a recursive function is tail recursive if the final result of the
recursive call is the final result of the function itself. If the
result of the recursive call must be further processed (say, by adding
1 to it, or consing another element onto the beginning of it), it is
not tail recursive. (see here)
On the other hand, in many programming languages, calling a function uses stack space, so a function that is tail recursive can build up a large stack of calls to itself, which wastes memory. Since in a tail call, the containing function is about to return, its environment can actually be discarded and the recursive call can be entered without creating a new stack frame. This trick is called tail call elimination or tail call optimisation and allows tail-recursive functions to recur indefinitely.
It's been a long while since I posted this question and my opinion on the topic has changed. Here's why:
I learned Haskell, and it's a language that fixes everything bad about recursion - recursive definitions and algorithms are turned into normal looping algorithms and most of the time you don't even use recursion directly and instead use map, fold, filter, or a combination of those. And with everything bad removed, the good sides of functional programming start to shine through - everything is closer to its mathematical definition, not obscured by clunky loops and variables.
To someone else who is struggling to understand why recursion is great, go learn Haskell. It has a lot of other very interesting features like being lazy (values are evaluated only when they're requested), static (variables can never be modified), pure (functions cannot do anything other than take input and return output, so no printing to the console), strongly typed with a very expressive type system, filled with mind-blowing abstractions like Functor, Monad, State, and much more. I can almost say it's life-changing.

On using recursion in C

While implementing a lexical analyzer in C, I found myself writing a recursive code:
// Return a List of Tokens
List Lexer_run(const char* input) {
if(input[0] == '\0') {
return new_List();
} else {
// ... Find Token and possibly advance in input for a few characters
return List_add(Lexer_run(nextInput), newToken);
}
Consider another example in an implementation of linked list
List_length(List* this) {
if(!this) {
return 0;
} else {
return 1 + List_length(this->next);
}
}
I am wondering if I can always use such recursive code in C or if I should avoid it unless the case
really requires recursion (for example recursive descent parser or a tree structure)
What I think so far
Advantages of recursion:
readable and elegant
Drawbacks:
will rapidly cause a stack overflow (in my computer it's around 1'000'000 calls)
can be inefficient comparing to an iterative version
Solutions:
use tail-call optimization and let the compiler transform my recursion into loops but I find tail-call code to be less readable.
Increase stack size for my program
Note
My question is not specifically about my examples but rather a general question if one should use
recursion in C.
As a rule, you want to use recursion for tasks that are inherently recursive, such as walking recursive data structures, serializing structures defined recursively, or producing recursive structures from "flat" input (e.g. parsing a language). You do not want to apply recursion for tasks that can be expressed in terms of iteration, such as walking linear data structures.
Recursion is a powerful mechanism, but using it in place of iteration is like swatting a fly with a sledgehammer *. Memory efficiency and a possibility of stack overflow are both very important considerations, but they are secondary to understandability of your code. Even if you eliminate the negative consequences of applying recursion where an iteration is sufficient by letting the compiler optimize tail call for you, readers of your program would be scratching their heads trying to understand first what you did, and then why you did it.
When you apply recursion to a recursive task (trees, recursive descent parsers, divide and conquer algorithms that process the entire input, searches with backtracking) your code becomes more readable, because it matches the task at hand. On the other hand, when you apply recursion to an inherently non-recursive task, your code becomes harder to read.
* This metaphor on recursion vs. iteration is borrowed from an introductory chapters of one of Dijkstra's books.
Tail call optimization is frankly not to be trusted. There's too many gotchas which can scare the optimizer away from applying it in apparently innocuous cases. Be glad that it's there, but don't rely on it.
Because of this (and because of the fixed stack size), you generally want to avoid recursion unless you actually need its implicit stack structure [you don't, for your List_length]. Even then, be aware of the potential for stack overflows.
The primary benefit of writing recursive functions is readability. That makes them "first draft" material for algorithms like recursive descent which have a naturally recursive structure. Then rewrite them as iterative (with a stack as necessary) if/when you run into trouble.
A side benefit of doing things that way: you can keep the recursive version around as a reference implementation, and unit test them to ensure they are equivalent.

How to solve problems that have many distinct levels (they are recursive) in c

I would like to know how to handle problems where there is a process which is used in many different 'levels' of a problem in c, preferable in an 'idiomatic' way.I know I did not explain this well enough, so let me give an example:
Consider the general problem of making a game solver, which is supposed to print the best next move.I think that it should check all possible moves in a for loop and see if it is a winning move(in this round) if it is, return the move, otherwise check every possible move the opponent can play against your move (for loop) and call the function to find the best move again.
However, I find that this approach has some limitations, such as performance (the program will spend it's time running boilerplate code required to call the functions etc)
and limited flexibility , since the function will have to find a method to communicate with the caller how good a move was found.That is, if it could be done at all.
bestmove()
{
for (;i<maxmove;i++)
{
if(checkifwinning(moves[i])) return;
for (;n<maxopponentmove;n++)
{
bestmove();
}
}
I have been messing with haskell for a while now, so I am afraid that my mind is set on seeking recursive solutions.I hope that you can show me a way to write this function in a 'c native' way.
The C language permits recursive functions. However, it has no notion of tail-recursive calls (but in some occasions, recent GCC compilers are able to optimize some tail-recursive calls into purely iterative machine code).
When coding recursive functions you should avoid too deep recursion and too big local call frames (so use heap memory).
You are talking about searching a game tree to find a best move; you could use a standard algorithm like minimax. Your approach seems to be a depth-first search of the tree which terminates at the first winning move found; note that this won't find the shortest path to a winning move nor does it guard against the opponent's winning.
There are ways to speed up the searching of game trees such as alpha-beta pruning. Such a standard algorithm is the way to go - much better than worrying about the overhead of calling functions in C, etc. C function calls are not expensive. Beware of such "optimizations" when writing C code - the optimizer is likely to be better at such things anyway. At the very least, first write a version in the most straightforward way so you have something to benchmark against. Your job is to find a good algorithm.
I think what you really need is a branch-and-bound like algorithm: keep a list of "open" moves (i.e., not yet considered) and a list of "closed" moves (i.e., already considered). In C this would best be implemented having two queues (open and closed). Then you would have an algorithm as follows:
while (open is not empty)
{
pop gameState from open
if (gameState in closed)
continue;
push gameState to closed
for (eachMove)
{
computeNewState();
addStateToOpen();
}
}
There are several advantages to this compared to a recursive approach:
you are sure that every game state is only considered once
you do not "overwhelm" the stack

Why is C slow with function calls?

I am new here so apologies if I did the post in a wrong way.
I was wondering if someone could please explain why is C so slow with function calling ?
Its easy to give a shallow answer to the standard question about Recursive Fibonacci, but I would appreciate if I knew the "deeper" reason as deep as possible.
Thanks.
Edit1 : Sorry for that mistake. I misunderstood an article in Wiki.
When you make a function call, your program has to put several registers on the stack, maybe push some more stuff, and mess with the stack pointer. That's about all for what can be "slow". Which is, actually, pretty fast. About 10 machine instructions on an x86_64 platform.
It's slow if your code is sparse and your functions are very small. This is the case of the Fibonacci function. However, you have to make a difference between "slow calls" and "slow algorithm": calculating the Fibonacci suite with a recursive implementation is pretty much the slowest straightforward way of doing it. There is almost as much code involved in the function body than in the function prologue and epilogue (where pushing and popping takes place).
There are cases in which calling functions will actually make your code faster overall. When you deal with large functions and your registers are crowded, the compiler may have a rough time deciding in which register to store data. However, isolating code inside a function call will simplify the compiler's task of deciding which register to use.
So, no, C calls are not slow.
Based on the additional information you posted in the comment, it seems that what is confusing you is this sentence:
"In languages (such as C and Java)
that favor iterative looping
constructs, there is usually
significant time and space cost
associated with recursive programs,
due to the overhead required to manage
the stack and the relative slowness of
function calls;"
In the context of a recursive implementation fibonacci calculations.
What this is saying is that making recursive function calls is slower than looping but this does not mean that function calls are slow in general or that function calls in C are slower than function calls in other languages.
Fibbonacci generation is naturally a recursive algorithm, and so the most obvious and natural implementation involves many function calls, but is can also be expressed as an iteration (a loop) instead.
The fibonacci number generation algorithm in particular has a special property called tail recursion. A tail-recursive recursive function can be easily and automatically converted into an iteration, even if it is expressed as a recursive function. Some languages, particularly functional languages where recursion is very common and iteration is rare, guarantee that they will recognize this pattern and automatically transform such a recursion into an iteration "under the hood". Some optimizing C compilers will do this as well, but it is not guaranteed. In C, since iteration is both common and idiomatic, and since the tail recursive optimization is not necessarily going to be made for you by the compiler, it is a better idea to write it explicitly as an iteration to achieve the best performance.
So interpreting this quote as a comment on the speed of C function calls, relative to other languages, is comparing apples to oranges. The other languages in question are those that can take certain patterns of function calls (which happen to occur in fibbonnaci number generation) and automatically transform them into something that is faster, but is faster because it is actually not a function call at all.
C is not slow with function calls.
The overhead of calling a C function is extremely low.
I challenge you to provide evidence to support your assertion.
There are a couple of reasons C can be slower than some other languages for a job like computing Fibonacci numbers recursively. Neither really has anything to do with slow function calls though.
In quite a few functional languages (and languages where a more or less functional style is common), recursion (often very deep recursion) is quite common. To keep speed reasonable, many implementations of such languages do a fair amount of work optimizing recursive calls to (among other things) turn them into iteration when possible.
Quite a few also "memoize" results from previous calls -- i.e., they keep track of the results from a function for a number of values that have been passed recently. When/if the same value is passed again, they can simply return the appropriate value without re-calculating it.
It should be noted, however, that the optimization here isn't really faster function calls -- it's avoiding (often many) function calls.
The Recursive Fibonacci is the reason, not C-language. Recursive Fibonacci is something like
int f(int i)
{
return i < 2 ? 1 : f(i-1) + f(i-2);
}
This is the slowest algorithm to calculate Fibonacci number, and by using stack store called functions list -> make it slower.
I'm not sure what you mean by "a shallow answer to the standard question about Recursive Fibonacci".
The problem with the naive recursive implementation is not that the function calls are slow, but that you make an exponentially large number of calls. By caching the results (memoization) you can reduce the number of calls, allowing the algorithm to run in linear time.
Of all the languages out there, C is probably the fastest (unless you are an assembly language programmer). Most C function calls are 100% pure stack operations. Meaning when you call a function, what this translates too in your binary code is, the CPU pushes any parameters you pass to your function onto the stack. Afterwards, it calls the function. The function then pops your parameters. After that, it executes whatever code makes up your function. Finally, any return parameters are pushed onto the stack, then the function ends and the parameters are popped off. Stack operations on any CPU are usually faster then anything else.
If you are using a profiler or something that is saying a function call you are making is slow, then it HAS to be the code inside your function. Try posting your code here and we will see what is going on.
I'm not sure what you mean. C is basically one abstraction layer on top of CPU assembly instructions, which is pretty fast.
You should clarify your question really.
In some languages, mostly of the functional paradigm, function calls made at the end of a function body can be optimized so that the same stack frame is re-used. This can potentially save both time and space. The benefit is particularly significant when the function is both short and recursive, so that the stack overhead might otherwise dwarf the actual work being done.
The naive Fibonacci algorithm will therefore run much faster with such optimization available. C does not generally perform this optimization, so its performance could suffer.
BUT, as has been stated already, the naive algorithm for the Fibonacci numbers is horrendously inefficient in the first place. A more efficient algorithm will run much faster, whether in C or another language. Other Fibonacci algorithms probably will not see nearly the same benefit from the optimization in question.
So in a nutshell, there are certain optimizations that C does not generally support that could result in significant performance gains in certain situations, but for the most part, in those situations, you could realize equivalent or greater performance gains by using a slightly different algorithm.
I agree with Mark Byers, since you mentioned the recursive Fibonacci. Try adding a printf, so that a message is printed each time you do an addition. You will see that the recursive Fibonacci is doing a lot more additions that it may appear at first glance.
What the article is talking about is the difference between recursion and iteration.
This is under the topic called algorithm analysis in computer science.
Suppose I write the fibonacci function and it looks something like this:
//finds the nth fibonacci
int rec_fib(n) {
if(n == 1)
return 1;
else if (n == 2)
return 1;
else
return fib(n-1) + fib(n - 2)
}
Which, if you write it out on paper (I recommend this), you will see this pyramid-looking shape emerge.
It's taking A Whole Lotta Calls to get the job done.
However, there is another way to write fibonacci (there are several others too)
int fib(int n) //this one taken from scriptol.com, since it takes more thought to write it out.
{
int first = 0, second = 1;
int tmp;
while (n--)
{
tmp = first+second;
first = second;
second = tmp;
}
return first;
}
This one only takes the length of time that is directly proportional to n,instead of the big pyramid shape you saw earlier that grew out in two dimensions.
With algorithm analysis you can determine exactly the speed of growth in terms of run-time vs. size of n of these two functions.
Also, some recursive algorithms are fast(or can be tricked into being faster). It depends on the algorithm - which is why algorithm analysis is important and useful.
Does that make sense?

memory and depth (vs. breadth) of recursion in c

I have two questions related to memory. First some background. I am a novice-intermediate c programmer.
I have written several different tree like data-structures with variable number of nodes at each level. One such structure, can have as its data a number of integer variables, which themselves are primary data for integer trees. I have written recursive functions for genrating trees with random numbers of nodes at different levels. I pass pointers to randomly generated integer trees as parameters for generating the main data-structure.
I have also written recursive code for operating on these tree structures, such as printing the tree. Just for my learning, I created queue and stack for my nodes and wrote iterative functions for in-order, pre-order and posr-order printing of the tree. I think, I am beginning to get the hang of it.
Now the question.
(a) I need to write other functions, which are obviously easy and clean if written using pure recursion. I can see how it could be written iteratively. It is not difficult, just tedious. The maximum depth of my trees will be 3-5, however, the number of nodes at each level is large. It is my understanding, that every recursive call will store addresses on a stack. If the depth is large, it can run out of memory. But if the depth is shallow, the penalty (memory/speed) of using a recursive function may not be terrible.
Do people have recommendations on criteria for deciding if an iterative/recursive solution is preferable?? I have read various threads on the site about iterative soution, but could not find any thing that directly speaks to this issue.
(b) Second, question relates to requesting memory from the system. I know that some applications can request certain amount of memory. I am using mingw-gcc4.x with Netbeans IDE. How can I specify the maximum amount of memory that the program can use in debug / release mode? Or, does it depend solely on the available RAM and no explicit specification is necessary?!
Thanks in advance,
paras
~RT
"The maximum depth of my trees will be 3-5"
This depth of recursion will not challenge the default stack size of any version of Windows, or any other system you'll ever see that doesn't have "Watch out! The stack is tiny!" plastered all over it. Most programs go a lot more than 3-5 calls deep, without involving any recursion at all.
So as long as your recursion only goes "down" your tree, not "across" its breadth, you're fine. Unless of course you're doing something really unusual like sticking an enormous array on the stack as a local variable.
Am I right that your (post-order) recursion looks something like this?
void doNode(struct node *n) {
for (int i = 0; i < n->num_nodes; ++i) {
doNode(n->nodes[i]);
}
// do some work on this node
}
If so, then for 3-5 levels the only way it'll use a lot of stack is if it looks like this:
void doNode(struct node *n) {
int myRidiculousArray[100*1000] = { 0 };
for (int i = 0; i < n->num_nodes; ++i) {
doNode(n->nodes[i]);
}
// do some work on this node, using myRidiculousArray
}
If you have millions of nodes, then there may be some performance gain to be had from avoiding the function call overhead per node. But write it the easy way first, and then come back and look at it later if you're ever desperate for a bit more performance. It's fairly rare for the overhead of a function call per node to be reason your code is slow - it does happen, but usually only after you've fixed a bunch of other, even worse, slowdowns.
If you write your function using tail recursion (provided you're compiling with optimization enabled) you won't run into problems with stack or memory space.
In the end you need to program your functions so you can understand them so do whatever is easier for you.
Even an iterative implementation is a recursive algorithm if you're using a stack to store nodes; both use O(f) space, where "f" is a function that's "more" than a constant (c is O(f) but f is not O(1)). You might still wind up using less memory with the iterative version if the elements of your stack are smaller than a call-stack frame. If so, you can look into reducing the size of a call stack by using closures, assuming the language supports them.
Iterative algorithms will have O(1) space requirements. Even a recursive implementation can achieve this using tail calls, as Dashogun mentions.
Spend a little time trying to find an iterative algorithm. If you can't find one, I recommend going with the recursive implementation unless you know for certain that you need to handle a recursive structure that (these days) has a depth of at least 213. For a binary tree, that's 2213 nodes, which I very much doubt you'll see.
(a) Recursion is not bad in itself. However if writing the iterative algo is close in complexity you should use the iterative one. Before commiting to a recursive algorithm some prerequisites apply:
-You should make sure that the recursion depth (and the local variables in the re-entrant functions) will not make you exceed the stack size. For the depth you mentioned used on Windows this would be a problem in very few cases. Additionally you can add a safety check on the height of the tree.
(b) If you are asking about the stack size: I see you use mingw, thus you probably build for Windows. The stack size in Windows is per thread. Have a look here how to setup your reserved and initially commited stack size.
If you are asking about heap memory allocation have a look here. But the short story is that you can use all the memory the system can provide for heap allocations.

Resources