Optimize Radix Sort - c

For a programming assignment I had to create a radix sort algorithm that worked with floating point numbers in their binary versions. The end goal of the assignment was to sort 100 million floating point numbers in under 2 minutes.
The algorithm itself took me a while to get working but as of now it has been really efficient. I am able to sort the 100 million numbers in about 30 seconds. The important note is that the floats have to be converted to unsigned ints so that bit-wise operations can be applied to them. The code that I used to do that I saw in a video on Quake 3's Fast Inverse Square Root Algorithm. Here is the code that I wrote for the assignment.
void radixsort(float *array, unsigned int length, unsigned int bits) {
double sum = 0;
// Create buckets
float *bucketsA = (float *) malloc(length * sizeof(float));
float *bucketsB = (float *) malloc(length * sizeof(float));
unsigned int aIndex = 0, bIndex = 0;
for (int i = 0; i < bits; i++) {
// Sort array using digit position d as the key.
for (int j = 0; j < length; j++) {
if (i == 0) sum += array[j];
unsigned int conversion = * (unsigned int *) &array[j];
int positionBit = nthBit(conversion, i);
if (positionBit == 0) {
bucketsA[aIndex] = array[j];
aIndex++;
}
else {
bucketsB[bIndex] = array[j];
bIndex++;
}
}
// Combine and move sorted buckets into original array
if(i == bits - 1) {
reverseArray(bucketsB, bIndex);
memcpy(array, bucketsB, (bIndex + 1) * sizeof(float));
memcpy(array + bIndex, bucketsA, (aIndex + 1) * sizeof(float));
}
else {
memcpy(array, bucketsA, (aIndex + 1) * sizeof(float));
memcpy(array + aIndex, bucketsB, (bIndex + 1) * sizeof(float));
}
// Reset the memory of the buckets
memset(&bucketsA[0], 0, sizeof(float) * length);
memset(&bucketsB[0], 0, sizeof(float) * length);
// Reset bucket index
aIndex = bIndex = 0;
}
printf("Total: %f\n", sum);
// Free reserved memory
free(bucketsA);
free(bucketsB);
}
I have two questions relating to the following code:
The first is that when I commented out freeing the reserved memory at the bottom I was expecting a segmentation fault but nothing happened, why is this? I might be a little confused on how memory on the heap and stack works but I thought that if the memory was not freed even though it is local it should cause an error. Freeing the memory actually slowed down the program as well.
The second question is does anyone see any obvious way to make the program run faster? I am already way below the minimum requirement for time and most of my classmates' programs are sorting the 100 million numbers in 1:30+ so I have nothing to worry about but I saw in a post someone managed to get their radix sort to run through 100 million in about 15 seconds. In this implementation I started with LSB so I wonder if starting with MSB will make the program faster but I don't have much time to really test with finals coming up. Thank you for the help.
EDIT: Here is the code for the nthBit and reverseArray
int nthBit(int number, int n) {
return (number >> n) & 1;
}
void reverseArray(float *array, int size) {
for (int i = 0; i < (size / 2); i++) {
float swap = array[size - 1 - i];
array[size - 1 - i] = array[i];
array[i] = swap;
}
}

Related

Nth Fibonacci using pointers in C; recursive and array

I have this code so far. It works and does what I want it to. I'm wondering if I could make it better. I do not really care for user input or any other "finish touches," just want to make the code more efficient and maybe more useful for future projects.
Excessive comments are for my personal use, I find it easier to read when I go back to old projects for references and what not.
Thanks!
#include<stdio.h>
#include<stdlib.h>
void fabonacci(int * fibArr,int numberOfSeries){
int n;
//allocate memory size
fibArr = malloc (sizeof(int) * numberOfSeries);
//first val, fib = 0
*fibArr = 0;//100
fibArr++;
//second val, fib = 1
*fibArr = 1;//104
fibArr++;
//printing first two fib values 0 and 1
printf("%i\n%i\n", *(fibArr- 2),*(fibArr- 1));
//loop for fib arr
for(n=0;n<numberOfSeries -2;n++,fibArr++){
//108 looking back at 104 looking back at 100
//112 looking back at 108 looking back at 104
*fibArr = *(fibArr-1) + *(fibArr -2);
//printing fib arr
printf("%i\n", *fibArr);
}
}
int main(){
//can implm user input if want
int n = 10;
int *fib;
//calling
fabonacci(fib,n);
}
Your code is halfway between two possible interpretations and I can't tell which one you meant. If you want fibonacci(n) to just give the nth number and not have any external side effects, you should write it as follows:
int fibonacci(int n) {
int lo, hi;
lo = 0;
hi = 1;
while(n-- > 0) {
int tmp = hi;
lo = hi;
hi = lo + tmp;
}
return lo;
}
You need no mallocs or frees because this takes constant, stack-allocated space.
If you want, instead, to store the entire sequence in memory as you compute it, you may as well require that the memory already be allocated, because this allows the caller to control where the numbers go.
// n < 0 => undefined behavior
// not enough space allocated for (n + 1) ints in res => undefined behavior
void fibonacci(int *res, int n) {
res[0] = 0;
if(n == 0) { return; }
res[1] = 1;
if(n == 1) { return; }
for(int i = 2; i <= n; i++) {
res[i] = res[i-1] + res[i-2];
}
}
It is now the caller's job to allocate memory:
int main(){
int fib[10]; // room for F_0 to F_9
fibonacci(fib, 9); // fill up to F_9
int n = ...; // some unknown number
int *fib2 = malloc(sizeof(int) * (n + 2)); // room for (n + 2) values
if(fib2 == NULL) { /* error handling */ }
fibonacci(fib2 + 1, n); // leave 1 space at the start for other purposes.
// e.g. you may want to store the length into the first element
fib2[0] = n + 1;
// this fibonacci is more flexible than before
// remember to free it
free(fib2);
}
And you can wrap this to allocate space itself while still leaving the more flexible version around:
int *fibonacci_alloc(int n) {
int *fib = malloc(sizeof(int) * (n + 1));
if(fib == NULL) { return NULL; }
fibonacci(fib, n);
return fib;
}
One way to improve the code is to let the caller create the array, and pass the array to the fibonacci function. That eliminates the need for fibonacci to allocate memory. Note that the caller can allocate/free if desired, or the caller can just declare an array.
The other improvement is to use array notation inside of the fibonacci function. You may be thinking that the pointer solution has better performance. It doesn't matter. The maximum value for n is 47 before you overflow a 32-bit int, so n is not nearly big enough for performance to be a consideration.
Finally, the fibonacci function should protect itself from bad values of n. For example, if n is 1, then the function should put a 0 in the first array entry, and not touch any other entries.
#include <stdio.h>
void fibonacci(int *array, int length)
{
if (length > 0)
array[0] = 0;
if (length > 1)
array[1] = 1;
for (int i = 2; i < length; i++)
array[i] = array[i-1] + array[i-2];
}
int main(void)
{
int fib[47];
int n = sizeof(fib) / sizeof(fib[0]);
fibonacci(fib, n);
for (int i = 0; i < n; i++)
printf("fib[%d] = %d\n", i, fib[i]);
}

kth smallest number - quicksort faster than quickselect

I have implemented the following quickselect algorithm to achieve O(n) complexity for median selection (more generally kth smallest number):
static size_t partition(struct point **points_ptr, size_t points_size, size_t pivot_idx)
{
const double pivot_value = points_ptr[pivot_idx]->distance;
/* Move pivot to the end. */
SWAP(points_ptr[pivot_idx], points_ptr[points_size - 1], struct point *);
/* Perform the element moving. */
size_t border_idx = 0;
for (size_t i = 0; i < points_size - 1; ++i) {
if (points_ptr[i]->distance < pivot_value) {
SWAP(points_ptr[border_idx], points_ptr[i], struct point *);
border_idx++;
}
}
/* Move pivot to act as a border element. */
SWAP(points_ptr[border_idx], points_ptr[points_size - 1], struct point *);
return border_idx;
}
static struct point * qselect(struct point **points_ptr, size_t points_size, size_t k)
{
const size_t pivot_idx = partition(points_ptr, points_size, rand() % points_size);
if (k == pivot_idx) { //k lies on the same place as a pivot
return points_ptr[pivot_idx];
} else if (k < pivot_idx) { //k lies on the left of the pivot
//points_ptr remains the same
points_size = pivot_idx;
//k remains the same
} else { //k lies on the right of the pivot
points_ptr += pivot_idx + 1;
points_size -= pivot_idx + 1;
k -= pivot_idx + 1;
}
return qselect(points_ptr, points_size, k);
}
Then I tried to compare it with a glibc's qsort() with O(nlog(n)) and was surprised by its superior performance. Here is the measurement code:
double wtime;
wtime = 0.0;
for (size_t i = 0; i < 1000; ++i) {
qsort(points_ptr, points_size, sizeof (*points_ptr), compar_rand);
wtime -= omp_get_wtime();
qsort(points_ptr, points_size, sizeof (*points_ptr), compar_distance);
wtime += omp_get_wtime();
}
printf("qsort took %f\n", wtime);
wtime = 0.0;
for (size_t i = 0; i < 1000; ++i) {
qsort(points_ptr, points_size, sizeof (*points_ptr), compar_rand);
wtime -= omp_get_wtime();
qselect(points_ptr, points_size, points_size / 2);
wtime += omp_get_wtime();
}
printf("qselect took %f\n", wtime);
with results similar to qsort took 0.280432, qselect took 8.516676 for an array of 10000 elements. Why is quicksort faster than quickselect?
The first obvious answer is: Maybe qsort does not implement quicksort.
It has been some time since i read the standard, but i don't think that there is anything requiring that qsort() performs quicksort.
Second: Existing C standard libraries are often heavily optimized (eg using special assembly instructions where available). Combined with how complex performance characteristics of modern CPUs are, this may very well lead to a O(n log n) - which quicksort is not - algorithm being faster then a O(n) algorithm.
My guess would be that you are messing up the cache - something that valgrind / cachegrind sould be able to tell you.
Thanks for your suggestions guys, problem with my implementation of quickselect was that it exhibits its worst-case complexity O(n^2) for inputs that contain many repeated elements, which was my case. Glibc's qsort() (it uses mergesort by default) does not exhibit O(n^2) here.
I have modified my partition() function to perform a basic 3-way partitioning and the median-of-three which works nicely for quickselect:
/** \breif Quicksort's partition procedure.
*
* In linear time, partition a list into three parts: less than, greater than
* and equals to the pivot, for example input 3 2 7 4 5 1 4 1 will be
* partitioned into 3 2 1 1 | 5 7 | 4 4 4 where 4 is the pivot.
* Modified version of the median-of-three strategy is implemented, it ends with
* a median at the end of an array (this saves us one or two swaps).
*/
static void partition(struct point **points_ptr, size_t points_size,
size_t *less_size, size_t *equal_size)
{
/* Modified median-of-three and pivot selection. */
struct point **first_ptr = points_ptr;
struct point **middle_ptr = points_ptr + (points_size / 2);
struct point **last_ptr = points_ptr + (points_size - 1);
if ((*first_ptr)->distance > (*last_ptr)->distance) {
SWAP(*first_ptr, *last_ptr, struct point *);
}
if ((*first_ptr)->distance > (*middle_ptr)->distance) {
SWAP(*first_ptr, *middle_ptr, struct point *);
}
if ((*last_ptr)->distance > (*middle_ptr)->distance) { //reversed
SWAP(*last_ptr, *middle_ptr, struct point *);
}
const double pivot_value = (*last_ptr)->distance;
/* Element swapping. */
size_t greater_idx = 0;
size_t equal_idx = points_size - 1;
size_t i = 0;
while (i < equal_idx) {
const double elem_value = points_ptr[i]->distance;
if (elem_value < pivot_value) {
SWAP(points_ptr[greater_idx], points_ptr[i], struct point *);
greater_idx++;
i++;
} else if (elem_value == pivot_value) {
equal_idx--;
SWAP(points_ptr[i], points_ptr[equal_idx], struct point *);
} else { //elem_value > pivot_value
i++;
}
}
*less_size = greater_idx;
*equal_size = points_size - equal_idx;
}
/** A selection algorithm to find the kth smallest element in an unordered list.
*/
static struct point * qselect(struct point **points_ptr, size_t points_size,
size_t k)
{
size_t less_size;
size_t equal_size;
partition(points_ptr, points_size, &less_size, &equal_size);
if (k < less_size) { //k lies in the less-than-pivot partition
points_size = less_size;
} else if (k < less_size + equal_size) { //k lies in the equals-to-pivot partition
return points_ptr[points_size - 1];
} else { //k lies in the greater-than-pivot partition
points_ptr += less_size;
points_size -= less_size + equal_size;
k -= less_size + equal_size;
}
return qselect(points_ptr, points_size, k);
}
Results are indeed linear and better than qsort() (I have used the Fisher-Yates shuffle as #IVlad have suggested, so the absolute qsort() times are worse):
array size qsort qselect speedup
1000 0.044678 0.008671 5.152328
5000 0.248413 0.045899 5.412160
10000 0.551095 0.096064 5.736730
20000 1.134857 0.191933 5.912773
30000 2.169177 0.278726 7.782467

How to implement summation using parallel reduction in OpenCL?

I'm trying to implement a kernel which does parallel reduction. The code below works on occasion, I have not been able to pin down why it goes wrong on the occasions it does.
__kernel void summation(__global float* input, __global float* partialSum, __local float *localSum){
int local_id = get_local_id(0);
int workgroup_size = get_local_size(0);
localSum[local_id] = input[get_global_id(0)];
for(int step = workgroup_size/2; step>0; step/=2){
barrier(CLK_LOCAL_MEM_FENCE);
if(local_id < step){
localSum[local_id] += localSum[local_id + step];
}
}
if(local_id == 0){
partialSum[get_group_id(0)] = localSum[0];
}}
Essentially I'm summing the values per work group and storing each work group's total into partialSum, the final summation is done on the host. Below is the code which sets up the values for the summation.
size_t global[1];
size_t local[1];
const int DATA_SIZE = 15000;
float *input = NULL;
float *partialSum = NULL;
int count = DATA_SIZE;
local[0] = 2;
global[0] = count;
input = (float *)malloc(count * sizeof(float));
partialSum = (float *)malloc(global[0]/local[0] * sizeof(float));
int i;
for (i = 0; i < count; i++){
input[i] = (float)i+1;
}
I'm thinking it has something to do when the size of the input is not a power of two? I noticed it begins to go off for numbers around 8000 and beyond. Any assistance is welcome. Thanks.
I'm thinking it has something to do when the size of the input is not a power of two?
Yes. Consider what happens when you try to reduce, say, 9 elements. Suppose you launch 1 work-group of 9 work-items:
for (int step = workgroup_size / 2; step > 0; step /= 2){
// At iteration 0: step = 9 / 2 = 4
barrier(CLK_LOCAL_MEM_FENCE);
if (local_id < step) {
// Branch taken by threads 0 to 3
// Only 8 numbers added up together!
localSum[local_id] += localSum[local_id + step];
}
}
You're never summing the 9th element, hence the reduction is incorrect. An easy solution is to pad the input data with enough zeroes to make the work-group size the immediate next power-of-two.

Finding cyclic single transposition vector in C

I have the input as array A = [ 2,3,4,1]
The output is simply all possible permutation from elements in A which can be done by single transposition (single flip of two neighbouring elements) operation. So the output is :
[3,2,4,1],[ 2,4,3,1],[2,3,1,4],[1,3,4,2]
Circular transpositioning is allowed. Hence [2,3,4,1] ==> [1,3,4,2] is allowed and a valid output.
How to do it in C?
EDIT
In python, it would be done as follows:
def Transpose(alist):
leveloutput = []
n = len(alist)
for i in range(n):
x=alist[:]
x[i],x[(i+1)%n] = x[(i+1)%n],x[i]
leveloutput.append(x)
return leveloutput
This solution uses dynamic memory allocation, this way you can do it for an array of size size.
int *swapvalues(const int *const array, size_t size, int left, int right)
{
int *output;
int sotred;
output = malloc(size * sizeof(int));
if (output == NULL) /* check for success */
return NULL;
/* copy the original values into the new array */
memcpy(output, array, size * sizeof(int));
/* swap the requested values */
sotred = output[left];
output[left] = output[right];
output[right] = sotred;
return output;
}
int **transpose(const int *const array, size_t size)
{
int **output;
int i;
int j;
/* generate a swapped copy of the array. */
output = malloc(size * sizeof(int *));
if (output == NULL) /* check success */
return NULL;
j = 0;
for (i = 0 ; i < size - 1 ; ++i)
{
/* allocate space for `size` ints */
output[i] = swapvalues(array, size, j, 1 + j);
if (output[i] == NULL)
goto cleanup;
/* in the next iteration swap the next two values */
j += 1;
}
/* do the same to the first and last element now */
output[i] = swapvalues(array, size, 0, size - 1);
if (output[i] == NULL)
goto cleanup;
return output;
cleanup: /* some malloc call returned NULL, clean up and exit. */
if (output == NULL)
return NULL;
for (j = i ; j >= 0 ; j--)
free(output[j]);
free(output);
return NULL;
}
int main()
{
int array[4] = {2, 3, 4, 1};
int i;
int **permutations = transpose(array, sizeof(array) / sizeof(array[0]));
if (permutations != NULL)
{
for (i = 0 ; i < 4 ; ++i)
{
int j;
fprintf(stderr, "[ ");
for (j = 0 ; j < 4 ; ++j)
{
fprintf(stderr, "%d ", permutations[i][j]);
}
fprintf(stderr, "] ");
free(permutations[i]);
}
fprintf(stderr, "\n");
}
free(permutations);
return 0;
}
Although some people think goto is evil, this is a very nice use for it, don't use it to control the flow of your program (for instance to create a loop), that is confusing. But for the exit point of a function that has to do several things before returning, it think it's actually a nice use, it's my opinion, for me it makes the code easier to understand, I might be wrong.
Have a look at this code I have written with an example :
void transpose() {
int arr[] = {3, 5, 8, 1};
int l = sizeof (arr) / sizeof (arr[0]);
int i, j, k;
for (i = 0; i < l; i++) {
j = (i + 1) % l;
int copy[l];
for (k = 0; k < l; k++)
copy[k] = arr[k];
int t = copy[i];
copy[i] = copy[j];
copy[j] = t;
printf("{%d, %d, %d, %d}\n", copy[0], copy[1], copy[2], copy[3]);
}
}
Sample Output :
{5, 3, 8, 1}
{3, 8, 5, 1}
{3, 5, 1, 8}
{1, 5, 8, 3}
A few notes:
a single memory block is preferred to, say, an array of pointers because of better locality and less heap fragmentation;
the cyclic transposition is only one, it can be done separately, thus avoiding the overhead of the modulo operator in each iteration.
Here's the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int *single_transposition(const int *a, unsigned int n) {
// Output size is known, can use a single allocation
int *out = malloc(n * n * sizeof(int));
// Perform the non-cyclic transpositions
int *dst = out;
for (int i = 0; i < n - 1; ++i) {
memcpy(dst, a, n * sizeof (int));
int t = dst[i];
dst[i] = dst[i + 1];
dst[i + 1] = t;
dst += n;
}
// Perform the cyclic transposition, no need to impose the overhead
// of the modulo operation in each of the above iterations.
memcpy(dst, a, n * sizeof (int));
int t = dst[0];
dst[0] = dst[n-1];
dst[n-1] = t;
return out;
}
int main() {include
int a[] = { 2, 3, 4, 1 };
const unsigned int n = sizeof a / sizeof a[0];
int *b = single_transposition(a, n);
for (int i = 0; i < n * n; ++i)
printf("%d%c", b[i], (i % n) == n - 1 ? '\n' : ' ');
free(b);
}
There are many ways to tackle this problem, and most important questions are: how you're going to consume the output and how variable is the size of the array. You've already said the array is going to be very large, therefore I assume memory, not CPU will be the biggest bottleneck here.
If output is going to be used only few times (especially just once), it'll may be best to use functional approach: generate every transposition on the fly, and never have more than one in memory at a time. For this approach many high level languages would work as well as (maybe sometimes even better than) C.
If size of the array is fixed, or semi-fixed (eg few sizes known at compile-time), you can define structures, using C++ templates at best.
If size is dynamic and you still want to have every transposition in memory then you should allocate one huge memory block and treat it as contiguous array of arrays. This is very simple and straightforward on machine level. Unfortunately it's best tackled using pointer arithmetic, one feature of C/C++ that is renowned for being difficult to understand. (It isn't if you learn C from basics, but people jumping down from high level languages have proven track record of getting it completely wrong first time)
Other approach is to have big array of pointers to smaller arrays, which results in double pointer, the ** which is even more terrifying to newcomers.
Sorry for long post which is not a real answer, but IMHO there are too many questions left open for choosing the best solution and I feel you need bit more C basic knowledge to manage them on your own.
/edit:
As other solutions are already posted, here's a solution with minimum memory footprint. This is the most limiting approach, it uses same one buffer over and over, and you must be sure that your code is finished with first transposition before moving on to the next one. On the bright side, it'll still work just fine when other solutions would require terabyte of memory. It's also so undemanding that it might be as well implemented with a high level language. I insisted on using C++ in case you would like to have more than one matrix at a time (eg comparing them OR running several threads concurrently).
#define NO_TRANSPOSITION -1
class Transposable1dMatrix
{
private:
int * m_pMatrix;
int m_iMatrixSize;
int m_iCurrTransposition;
//transposition N means that elements N and N+1 are swapped
//transpostion -1 means no transposition
//transposition (size-1) means cyclic transpostion
//as usual in C (size-1) is the last valid index
public:
Transposable1dMatrix(int MatrixSize)
{
m_iMatrixSize = MatrixSize;
m_pMatrix = new int[m_iMatrixSize];
m_iCurrTransposition = NO_TRANSPOSITION;
}
int* GetCurrentMatrix()
{
return m_pMatrix;
}
bool IsTransposed()
{
return m_iCurrTransposition != NO_TRANSPOSITION;
}
void ReturnToOriginal()
{
if(!IsTransposed())//already in original state, nothing to do here
return;
//apply same transpostion again to go back to original
TransposeInternal(m_iCurrTransposition);
m_iCurrTransposition = NO_TRANSPOSITION;
}
void TransposeTo(int TranspositionIndex)
{
if(IsTransposed())
ReturnToOriginal();
TransposeInternal(TranspositionIndex);
m_iCurrTransposition = TranspositionIndex;
}
private:
void TransposeInternal(int TranspositionIndex)
{
int Swap1 = TranspositionIndex;
int Swap2 = TranspositionIndex+1;
if(Swap2 == m_iMatrixSize)
Swap2 = 0;//this is the cyclic one
int tmp = m_pMatrix[Swap1];
m_pMatrix[Swap1] = m_pMatrix[Swap2];
m_pMatrix[Swap2] = tmp;
}
};
void main(void)
{
int arr[] = {2, 3, 4, 1};
int size = 4;
//allocate
Transposable1dMatrix* test = new Transposable1dMatrix(size);
//fill data
memcpy(test->GetCurrentMatrix(), arr, size * sizeof (int));
//run test
for(int x = 0; x<size;x++)
{
test->TransposeTo(x);
int* copy = test->GetCurrentMatrix();
printf("{%d, %d, %d, %d}\n", copy[0], copy[1], copy[2], copy[3]);
}
}

How to generate Fibonacci faster [duplicate]

This question already has answers here:
nth fibonacci number in sublinear time
(16 answers)
Closed 6 years ago.
I am a CSE student and preparing myself for programming contest.Now I am working on Fibonacci series. I have a input file of size about some Kilo bytes containing positive integers. Input formate looks like
3 5 6 7 8 0
A zero means the end of file. Output should like
2
5
8
13
21
my code is
#include<stdio.h>
int fibonacci(int n) {
if (n==1 || n==2)
return 1;
else
return fibonacci(n-1) +fibonacci(n-2);
}
int main() {
int z;
FILE * fp;
fp = fopen ("input.txt","r");
while(fscanf(fp,"%d", &z) && z)
printf("%d \n",fibonacci(z));
return 0;
}
The code works fine for sample input and provide accurate result but problem is for my real input set it is taking more time than my time limit. Can anyone help me out.
You could simply use a tail recursion version of a function that returns the two last fibonacci numbers if you have a limit on the memory.
int fib(int n)
{
int a = 0;
int b = 1;
while (n-- > 1) {
int t = a;
a = b;
b += t;
}
return b;
}
This is O(n) and needs a constant space.
You should probably look into memoization.
http://en.wikipedia.org/wiki/Memoization
It has an explanation and a fib example right there
You can do this by matrix multiplictation, raising the matrix to power n and then multiply it by an vector. You can raise it to power in logaritmic time.
I think you can find the problem here. It's in romanian but you can translate it with google translate. It's exactly what you want, and the solution it's listed there.
Your algorithm is recursive, and approximately has O(2^N) complexity.
This issue has been discussed on stackoverflow before:
Computational complexity of Fibonacci Sequence
There is also a faster implementation posted in that particular discussion.
Look in Wikipedia, there is a formula that gives the number in the Fibonacci sequence with no recursion at all
Use memoization. That is, you cache the answers to avoid unnecessary recursive calls.
Here's a code example:
#include <stdio.h>
int memo[10000]; // adjust to however big you need, but the result must fit in an int
// and keep in mind that fibonacci values grow rapidly :)
int fibonacci(int n) {
if (memo[n] != -1)
return memo[n];
if (n==1 || n==2)
return 1;
else
return memo[n] = fibonacci(n-1) +fibonacci(n-2);
}
int main() {
for(int i = 0; i < 10000; ++i)
memo[i] = -1;
fibonacci(50);
}
Nobody mentioned the 2 value stack array version, so I'll just do it for completeness.
// do not call with i == 0
uint64_t Fibonacci(uint64_t i)
{
// we'll only use two values on stack,
// initialized with F(1) and F(2)
uint64_t a[2] = {1, 1};
// We do not enter loop if initial i was 1 or 2
while (i-- > 2)
// A bitwise AND allows switching the storing of the new value
// from index 0 to index 1.
a[i & 1] = a[0] + a[1];
// since the last value of i was 0 (decrementing i),
// the return value is always in a[0 & 1] => a[0].
return a[0];
}
This is a O(n) constant stack space solution that will perform slightly the same than memoization when compiled with optimization.
// Calc of fibonacci f(99), gcc -O2
Benchmark Time(ns) CPU(ns) Iterations
BM_2stack/99 2 2 416666667
BM_memoization/99 2 2 318181818
The BM_memoization used here will initialize the array only once and reuse it for every other call.
The 2 value stack array version performs identically as a version with a temporary variable when optimized.
You can also use the fast doubling method of generating Fibonacci series
Link: fastest-way-to-compute-fibonacci-number
It is actually derived from the results of the matrix exponentiation method.
Use the golden-ratio
Build an array Answer[100] in which you cache the results of fibonacci(n).
Check in your fibonacci code to see if you have precomputed the answer, and
use that result. The results will astonish you.
Are you guaranteed that, as in your example, the input will be given to you in ascending order? If so, you don't even need memoization; just keep track of the last two results, start generating the sequence but only display the Nth number in the sequence if N is the next index in your input. Stop when you hit index 0.
Something like this:
int i = 0;
while ( true ) {
i++; //increment index
fib_at_i = generate_next_fib()
while ( next_input_index() == i ) {
println fib_at_i
}
I leave exit conditions and actually generating the sequence to you.
In C#:
static int fib(int n)
{
if (n < 2) return n;
if (n == 2) return 1;
int k = n / 2;
int a = fib(k + 1);
int b = fib(k);
if (n % 2 == 1)
return a * a + b * b;
else
return b * (2 * a - b);
}
Matrix multiplication, no float arithmetic, O(log N) time complexity assuming integer multiplication/addition is done in constant time.
Here goes python code
def fib(n):
x,y = 1,1
mat = [1,1,1,0]
n -= 1
while n>0:
if n&1==1:
x,y = x*mat[0]+y*mat[1], x*mat[2]+y*mat[3]
n >>= 1
mat[0], mat[1], mat[2], mat[3] = mat[0]*mat[0]+mat[1]*mat[2], mat[0]*mat[1]+mat[1]*mat[3], mat[0]*mat[2]+mat[2]*mat[3], mat[1]*mat[2]+mat[3]*mat[3]
return x
You can reduce the overhead of the if statement: Calculating Fibonacci Numbers Recursively in C
First of all, you can use memoization or an iterative implementation of the same algorithm.
Consider the number of recursive calls your algorithm makes:
fibonacci(n) calls fibonacci(n-1) and fibonacci(n-2)
fibonacci(n-1) calls fibonacci(n-2) and fibonacci(n-3)
fibonacci(n-2) calls fibonacci(n-3) and fibonacci(n-4)
Notice a pattern? You are computing the same function a lot more times than needed.
An iterative implementation would use an array:
int fibonacci(int n) {
int arr[maxSize + 1];
arr[1] = arr[2] = 1; // ideally you would use 0-indexing, but I'm just trying to get a point across
for ( int i = 3; i <= n; ++i )
arr[i] = arr[i - 1] + arr[i - 2];
return arr[n];
}
This is already much faster than your approach. You can do it faster on the same principle by only building the array once up until the maximum value of n, then just print the correct number in a single operation by printing an element of your array. This way you don't call the function for every query.
If you can't afford the initial precomputation time (but this usually only happens if you're asked for the result modulo something, otherwise they probably don't expect you to implement big number arithmetic and precomputation is the best solution), read the fibonacci wiki page for other methods. Focus on the matrix approach, that one is very good to know in a contest.
#include<stdio.h>
int g(int n,int x,int y)
{
return n==0 ? x : g(n-1,y,x+y);}
int f(int n)
{
return g(n,0,1);}
int main (void)
{
int i;
for(i=1; i<=10 ; i++)
printf("%d\n",f(i)
return 0;
}
In the functional programming there is a special algorithm for counting fibonacci. The algorithm uses accumulative recursion. Accumulative recursion are used to minimize the stack size used by algorithms. I think it will help you to minimize the time. You can try it if you want.
int ackFib (int n, int m, int count){
if (count == 0)
return m;
else
return ackFib(n+m, n, count-1);
}
int fib(int n)
{
return ackFib (0, 1, n+1);
}
use any of these: Two Examples of recursion, One with for Loop O(n) time and one with golden ratio O(1) time:
private static long fibonacciWithLoop(int input) {
long prev = 0, curr = 1, next = 0;
for(int i = 1; i < input; i++){
next = curr + prev;
prev = curr;
curr = next;
}
return curr;
}
public static long fibonacciGoldenRatio(int input) {
double termA = Math.pow(((1 + Math.sqrt(5))/2), input);
double termB = Math.pow(((1 - Math.sqrt(5))/2), input);
double factor = 1/Math.sqrt(5);
return Math.round(factor * (termA - termB));
}
public static long fibonacciRecursive(int input) {
if (input <= 1) return input;
return fibonacciRecursive(input - 1) + fibonacciRecursive(input - 2);
}
public static long fibonacciRecursiveImproved(int input) {
if (input == 0) return 0;
if (input == 1) return 1;
if (input == 2) return 1;
if (input >= 93) throw new RuntimeException("Input out of bounds");
// n is odd
if (input % 2 != 0) {
long a = fibonacciRecursiveImproved((input+1)/2);
long b = fibonacciRecursiveImproved((input-1)/2);
return a*a + b*b;
}
// n is even
long a = fibonacciRecursiveImproved(input/2 + 1);
long b = fibonacciRecursiveImproved(input/2 - 1);
return a*a - b*b;
}
using namespace std;
void mult(LL A[ 3 ][ 3 ], LL B[ 3 ][ 3 ]) {
int i,
j,
z;
LL C[ 3 ][ 3 ];
memset(C, 0, sizeof( C ));
for(i = 1; i <= N; i++)
for(j = 1; j <= N; j++) {
for(z = 1; z <= N; z++)
C[ i ][ j ] = (C[ i ][ j ] + A[ i ][ z ] * B[ z ][ j ] % mod ) % mod;
}
memcpy(A, C, sizeof(C));
};
void readAndsolve() {
int i;
LL k;
ifstream I(FIN);
ofstream O(FOUT);
I>>k;
LL A[3][3];
LL B[3][3];
A[1][1] = 1; A[1][2] = 0;
A[2][1] = 0; A[2][2] = 1;
B[1][1] = 0; B[1][2] = 1;
B[2][1] = 1; B[2][2] = 1;
for(i = 0; ((1<<i) <= k); i++) {
if( k & (1<<i) ) mult(A, B);
mult(B, B);
}
O<<A[2][1];
}
//1,1,2,3,5,8,13,21,33,...
int main() {
readAndsolve();
return(0);
}
public static int GetNthFibonacci(int n)
{
var previous = -1;
var current = 1;
int element = 0;
while (1 <= n--)
{
element = previous + current;
previous = current;
current = element;
}
return element;
}
This is similar to answers given before, but with some modifications. Memorization, as stated in other answers, is another way to do this, but I dislike code that doesn't scale as technology changes (size of an unsigned int varies depending on the platform) so the highest value in the sequence that can be reached may also vary, and memorization is ugly in my opinion.
#include <iostream>
using namespace std;
void fibonacci(unsigned int count) {
unsigned int x=0,y=1,z=0;
while(count--!=0) {
cout << x << endl; // you can put x in an array or whatever
z = x;
x = y;
y += z;
}
}
int main() {
fibonacci(48);// 48 values in the sequence is the maximum for a 32-bit unsigend int
return 0;
}
Additionally, if you use <limits> its possible to write a compile-time constant expression that would give you the largest index within the sequence that can be reached for any integral data type.
#include<stdio.h>
main()
{
int a,b=2,c=5,d;
printf("%d %d ");
do
{
d=b+c;
b=c;
c=d;
rintf("%d ");
}

Resources