Why is Haskell so slow compared to C for Fibonacci sequence? - c

I am just beginner in Haskell. And I writing a code to display the N numbers in the Fibonacci sequence. Here is my code in Haskell,
fib_seq 1 = 1:[]
fib_seq 2 = 1:1:[]
fib_seq n = sum(take 2 (fib_seq (n-1))):fib_seq (n-1)
When I run this code for higher numbers like fib_seq 40 in GHCI, it takes a long time to evaluate it and my computer hangs and I have to interrupt. However, when I write the same exact logic in C, (I just print instead of saving it in the list),
#include<stdio.h>
int fib_seq (int n){
if(n==1) return 1;
else if(n==2) return 1;
else return fib_seq(n-1)+fib_seq(n-2); }
void print_fib(int n){
if(n==0) return;
else printf("%i ", fib_seq(n));
print_fib(n-1); }
int main(int argn, char* argc){
print_fib(40);
return 0; }
The code is very fast. Takes about 1 second to run when compiled with GCC. Is Haskell supposed to be this slow than C? I have looked up other answers on the internet and they say something about memoization. I am beginning Haskell and I don't know what that means. What I am saying is that the C code and Haskell code I wrote both do the same exact steps and Haskell is so much slower than C, it hangs my GHCI. A 1-2 seconds difference is something I will never worry about, and if C also had taken the same exact time as Haskell, I would also not worry about. But Haskell crashing and C doing it in 1 seconds is unacceptable.

The following program, compiled with ghc -O2 test.hs, is +/-2% the speed of the C code you posted, compiled with gcc -O2 test.c.
fib_seq :: Int -> Int
fib_seq 1 = 1
fib_seq 2 = 1
fib_seq n = fib_seq (n-1) + fib_seq (n-2)
main = mapM_ (print . fib_seq) [40,39..1]
Some comments:
Unlike you, I implemented the exact same logic. I doubt this is the real difference, though; see the remaining comments for much more likely causes.
I specified the same types as C uses for the arithmetic. You didn't, which is likely to run into two problems: using Integer instead of Int for largenum arithmetic, and having a class-polymorphic type instead of a monomorphic one adding overhead on every function call.
I compiled. ghci is built to be interactive as quickly as possible, not to produce quick code.
I don't have the right version of llvm installed at the moment, but it will often crunch through heavily-numeric code like this much better than ghc's own codegen. I wouldn't be too surprised if it ended up being faster than gcc.
Of course using one of the many well-known better algorithms for fibonacci is going to trump all this nonsense.

Guess what happens if "fib_seq (n-1)" is evaluated twice on each recursion.
And then try this:
fib_seq 1 = 1:[]
fib_seq 2 = 1:1:[]
fib_seq n = sum(take 2 f):f
where f = fib_seq (n-1)

Related

Does recursive functions have some limitations? eg: how many layers does the function require?

Made a recursive function which gives how many terms are there in a collatz sequence given a starting number, this is the code n=13 for exemple :
int collatz(long n,long o)
{
if (n!=1) {
if(n%2==0)
return collatz(n/2,o+1);
else
return collatz((n*3)+1,o+1);
} else
printf("%ld\t",o);
}
void main()
{
collatz(13,0);
}
the function runs as expected; however with some integers such as "n=113383" something overflows (I guess) and returns :
Process returned -1073741571 (0xC00000FD) execution time : 4.631 s
Press any key to continue.
Excuse my non technical explanation, many thanks !
There is no limitation to recursion depth in the C standard itself. You might cause a stack overflow, but the stack size is different in different environments. I think Windows has 1MB and Linux 8MB. It also depends on the size of the stack frame for the function, which in turn depends on how many variables it has and which type.
In your case, you have two long variables which probably is 8 bytes each. You also have the string "%ld\t" which is 5 bytes, which might end up on the stack, but I'm not sure. On top of that you have the overhead of two pointers to the function return address and to the previous stack frame and they are 8 bytes each on a 64 bit system. So the stack frame for your function will roughly be 32 bytes or so. Maybe a little bit more. So on a Linux system I'd guess that your function would crash at a depth of around 200'000.
If this is a problem, consider rewriting the function to a non-recursive variant. Look at Blaze answer for how that can be done for your case. And as andreee commented below:
Additional note: You can increase the stack size under Linux with ulimit -s (also possible: ulimit -s unlimited) and in MSVC you can set the /F compilation flag to increase the stack size of your program. For MinGW, see this post.
What happens here is a stack overflow. This happens because every call of the function creates a new stack frame, and if there are too many, the stack's memory runs out. You can fix it by using iteration instead of recursion.
Also, long might not be able to hold the numbers that the collatz sequence produces for a starting value of 113383 (it didn't for me with MSVC). Use long long instead, that is at least 64 bits big. All in all, it could look like this:
void collatz(long long n)
{
long o;
for (o = 0; n > 1; o++) {
if (n % 2 == 0)
n /= 2;
else
n = n * 3 + 1;
}
printf("%ld\t", o);
return;
}
int main()
{
collatz(113383);
return 0;
}
Note how instead of recursion we now have a for loop.

Improvement in execution time by adding automatic variables

I was playing with the C language on my own and I was trying to write the fastest possible algorithm to find amicable numbers.
This is what I wrote (I've just started, so please do not suggest me methods to improve the algorithm, since I want to write it on my own):
#include <stdio.h>
#include <time.h>
#define MAX (200000)
int dividersSum(int);
void amicable();
int main() {
clock_t start = clock();
amicable();
double executionTime = ((double)clock() - start) / CLOCKS_PER_SEC;
printf("\nEXECUTION TIME: %lf", executionTime);
return 0;
}
int dividersSum(int n) {
int i, sum;
for (sum = 1, i = 2; i <= n / 2; i++) {
if (!(n % i)) {
sum += n / i;
}
}
return sum;
}
void amicable() {
int a, divSum, tot = 0;
for (a = 1; a < MAX; a++) {
divSum = dividersSum(a);
if (divSum > a && dividersSum(divSum) == a) {
printf("\n\t%d\t\t%d", a, dividersSum(a));
tot++;
}
}
printf("\n\nTOT: %d", tot);
}
Now, this works fine. Or, at least, probably not that fine since it took exactly 40 seconds to complete, but it works.
But if I change this line:
int i, sum;
Into this:
int i, sum, a = 4, b = 4, c = 4, d = 4, e = 4, f = 4;
It "drastically" improves. It takes 36 seconds to complete.
I get these execution times from the console timer. I know it isn't accurate at all (indeed as soon as I have the chance to work on this algorithm again I'll try to use time.h library), but I tried the 2 versions of the code over 50 times, and I always get 40 or more seconds for the "normal" version, and 36 or less for the other one.
I've also tried to change the machine where I run the program, but it always takes about 10% less to execute the modified version.
Of course this make no sense to me (I'm pretty new to programming, and I've Googled it but nothing even because I don't really know what to look for...), the only thing I can think about is an optimization of the compiler (I use the hated Dev c++), but which optimization? And if this is the case, why doesn't it use the same optimization also in the "normal" code, since it makes it faster?
Oh, if you're wondering why I tried to declare random variables, the reason is that I wanted to test if there was a measurable worsening in using more variables. I now know it is a very stupid way to test this, but as I said at the beginning of the post, I was "playing"...
Well, I asked my teacher at university. He ran the two versions on his machine, and he was fairly surprised at the beginning (43 seconds the "normal" one and 36 the faster).
Then he told me that he didn't know exactly why this happens, but he speculated that it happens due to the way the compiler organizes the code. Probably those extra variables force the compiler to store the code in different pages of the memory, and this is why this happens.
Of course he wasn't sure about his answer, but it seems fair to me.
It's interesting how things like this can happen sometimes.
Moreover, as Brendan said in the comments section:
If the compiler isn't ignoring (or warning about) unused variables (even though it's a relatively trivial optimisation), then the compiler is poo (either bad in general or crippled by command line options), and the answer to your question should be "the compiler is poo" (e.g. fails to optimise the second version into exactly the same output code as the first version).
Of course if someone thinks to have a better explaination I would be happy to listen to him!

which code execute faster?

these are two codes
int d;
d=0;
d=a+b;
print d+c+e;
code 2:
print a+b+c+e;
I am trying c programming.
I am having some doubts in execution of this code
which code executes faster? and use less memory?
Given what you have posted,
Example 1
int d;
d=0;
d=a+b;
/* print d+c+e;*/
printf("%i\n", d+c+e);
Example 2
/* print a+b+c+e; */
printf("%i\n", a+b+c+e);
Which is faster is tricky, if your compiler optimizes d away in Example 1 they are equivalent. On the other hand, if your compiler can't determine that d=0 is discarded (and it may not) then it can't decide that d is really const int d = a+b; and the examples will not be equivalent with Example 2 being (slightly) faster.

How to understand a recursive function call for large inputs

The result of the following code is 0,1,2,0, I totally understand after writing explicitly every call. But I wonder whether there is an easier method to understand what the recursive function want to realize and find the result faster? I mean we can't write all the call if a=1000.
#include<stdio.h>
void fun(int);
typedef int (*pf) (int, int);
int proc(pf, int, int);
int main()
{
int a=3;
fun(a);
return 0;
}
void fun(int n)
{
if(n > 0)
{
fun(--n);
printf("%d,", n);
fun(--n);
}
}
Your question isn't "what does this do?" but "how do I understand recursive functions for large values?".
Recursion is a great tool for certain kinds of problems. If, for some reason, you ever had to print that sequence of numbers, the above code would be a good way to solve the problem. Recursion is also used in contexts where you have a recursive structure (like a tree or a list) or are dealing with recursive input, like parsers.
You might see the code for a recursive function and think "what does this do?" but it's more likely that the opposite will happen: you will find a problem that you need to solve by writing a program. With experience you will learn to see which problems require a recursive solution, and that's a skill you must develop as a programmer.
The principle of recursion is that you perform a [usually simple] function repeatedly. So to understand a recursive function you usually need to understand only one step, and how to repeat it.
With the above code you don't necessarily need to answer "what output does this code give" but instead "how does it work, and what process does the code follow". You can do both on paper. By stepping through the algorithm you can usually gain insight into what it does. One factor that complicates this example is that it isn't tail-call recursive. This means you must do more work to understand the program.
To 'understand' any program you don't necessarily need to be able to simulate it and calculate the output, and the same applies here.
All you need to do is add some debug statements to understand it a little better. This is the result of the if statements I added to track through it:
start program
before if, n is = 3
before fun call, n is = 3
before if, n is = 2
before fun call, n is = 2
before if, n is = 1
before fun call, n is = 1
before if, n is = 0
printing = 0
before if, n is = -1
after second fun call, n is = -1
printing = 1
before if, n is = 0
after second fun call, n is = 0
printing = 2
before if, n is = 1
before fun call, n is = 1
before if, n is = 0
printing = 0
before if, n is = -1
after second fun call, n is = -1
after second fun call, n is = 1
end program

Efficiency of boolean comparisons? In C

I'm writing a loop in C, and I am just wondering on how to optimize it a bit. It's not crucial here as I'm just practicing, but for further knowledge, I'd like to know:
In a loop, for example the following snippet:
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
If it checks both, wouldn't:
int i = 0;
while (i != 10) {
printf("%d\n", i);
i++;
}
be more efficient?
Thanks!
Both will be translated in a single assembly instruction. Most CPUs have comparison instructions for LESS THAN, for LESS THAN OR EQUAL, for EQUAL and for NOT EQUAL.
One of the interesting things about these optimization questions is that they often show why you should code for clarity/correctness before worrying about the performance impact of these operations (which oh-so often don't have any difference).
Your 2 example loops do not have the same behavior:
int i = 0;
/* this will print 11 lines (0..10) */
while (i <= 10) {
printf("%d\n", i);
i++;
}
And,
int i = 0;
/* This will print 10 lines (0..9) */
while (i != 10) {
printf("%d\n", i);
i++;
}
To answer your question though, it's nearly certain that the performance of the two constructs would be identical (assuming that you fixed the problem so the loop counts were the same). For example, if your processor could only check for equality and whether one value were less than another in two separate steps (which would be a very unusual processor), then the compiler would likely transform the (i <= 10) to an (i < 11) test - or maybe an (i != 11) test.
This a clear example of early optimization.... IMHO, that is something that programmers new to their craft are way to prone to worry about. If you must worry about it, learn to benchmark and profile your code so that your worries are based on evidence rather than supposition.
Speaking to your specific questions. First, a <= is not implemented as two operations testing for < and == separately in any C compiler I've met in my career. And that includes some monumentally stupid compilers. Notice that for integers, a <= 5 is the same condition as a < 6 and if the target architecture required that only < be used, that is what the code generator would do.
Your second concern, that while (i != 10) might be more efficient raises an interesting issue of defensive programming. First, no it isn't any more efficient in any reasonable target architecture. However, it raises a potential for a small bug to cause a larger failure. Consider this: if some line of code within the body of the loop modified i, say by making it greater than 10, what might happen? How long would it take for the loop to end, and would there be any other consequences of the error?
Finally, when wondering about this kind of thing, it often is worthwhile to find out what code the compiler you are using actually generates. Most compilers provide a mechanism to do this. For GCC, learn about the -S option which will cause it to produce the assembly code directly instead of producing an object file.
The operators <= and < are a single instruction in assembly, there should be no performance difference.
Note that tests for 0 can be a bit faster on some processors than to test for any other constant, therefore it can be reasonable to make a loop run backward:
int i = 10;
while (i != 0)
{
printf("%d\n", i);
i--;
}
Note that micro optimizations like these usually can gain you only very little more performance, better use your time to use efficient algorithms.
Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?
Neither, it will most likely check (i < 11). The <= 10 is just there for you to give better meaning to your code since 11 is a magic number which actually means (10+1).
Depends on the architecture and compiler. On most architectures, there is a single instruction for <= or the opposite, which can be negated, so if it is translated into a loop, the comparison will most likely be only one instruction. (On x86 or x86_64 it is one instruction)
The compiler might unroll the loop into a sequence of ten times i++, when only constant expressions are involved it will even optimize the ++ away and leave only constants.
And Ira is right, the comparison does vanish if there is a printf involved, which execution time might be millions of clock cycles.
I'm writing a loop in C, and I am just wondering on how to optimize it a bit.
If you compile with optimizations turned on, the biggest optimization will be from unrolling that loop.
It's going to be hard to profile that code with -O2, because for trivial functions the compiler will unroll the loop and you won't be able to benchmark actual differences in compares. You should be careful when profiling test cases that use constants that might make the code trivial when optimized by the compiler.
disassemble. Depending on the processor, and optimization and a number of things this simple example code actually unrolls or does things that do not reflect your real question. Compiling with gcc -O1 though both example loops you provided resulted in the same assembler (for arm).
Less than in your C code often turns into a branch if greater than or equal to the far side of the loop. If your processor doesnt have a greater than or equal it may have a branch if greater than and a branch if equal, two instructions.
typically though there will be a register holding i. there will be an instruction to increment i. Then an instruction to compare i with 10, then equal to, greater than or equal, and less than are generally done in a single instruction so you should not normally see a difference.
// Case I
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
printf("%d\n", i);
i++;
}
// Case II
int i = 0;
while (i < 10) {
printf("%d\n", i);
i++;
}
Case I code take more space but fast and Case II code is take less space but slow compare to Case I code.
Because in programming space complexity and time complexity always proportional to each other. It means you must compromise either space or time.
So in that way you can optimize your time complexity or space complexity but not both.
And your both code are same.

Resources