I'm learning about parallelization and in one exercise I'm given a couple of algorithms that I should improve in performance. One of them is a Fibonacci sequence generator:
array[0] = 0;
array[1] = 1;
for (q = 2; q < MAX; q++) {
array[q] = array[q−1] + array[q−2];
}
My suspicion is, that this cannot be optimized (by parallelization), since every number depends on the two preceding numbers (and therefore indirectly on all preceding numbers). How could this be parallelized?
The Fibonacci sequence is determined just by its first two elements; in fact, you could somehow parallelize it, although ugly:
F(n + 2) = F(n + 1) + F(n)
F(n + 3) = F(n + 1) + F(n + 2) = F(n + 1) * 2 + F(n)
F(n + 4) = F(n + 2) + F(n + 3) = F(n + 1) * 3 + F(n) * 2
F(n + 5) = F(n + 3) + F(n + 4) = F(n + 1) * 5 + F(n) * 3
F(n + 6) = F(n + 4) + F(n + 5) = F(n + 1) * 8 + F(n) * 5
Hopefully by now, you can see that:
F(n + k) = F(n + 1) * F(K) + F(n) * F(k - 1)
So after computing the first k numbers, you could use this relation to compute the next k items in the sequence, at the same time, parallelized.
You could also use the direct formula for Fibonacci numbers to compute them in parallel, but that is kind of too uncool (also might be too simple for learning purposes that it might serve).
The best way to approach it to use 2-dimensional matrix form of Fibonacci
Now you can easily expand it. Simple matrix multiplication concepts will do it.
or you can go with other mathematical way, such as
A number 'n' is a Fibanocci number if either (5n^2 - 4) or (5n^2 + 4) is a perfect square.
http://en.wikipedia.org/wiki/Fibonacci_number
So given a large number, you can find the next two Fib nums using this algorithm and continue with your addition then onwards.
In that way, you could partition the problem as between (0 to N/2) and then (N/2 + 1 to N) and run it in parallel threads.
Related
I am trying to implement n * (n + 1) / 2 knowing that n is an int <= 2^16 - 1 (this guarantees that n * (n + 1) / 2 <= 2^31 - 1 so there is no overflow).
Then we know that n * (n + 1) / 2 is guaranteed to be non-negative integer. When calculating this value in a program, though, if we do multiplication n *(n + 1) first, we might get into integer overflow problem. My idea is to use a clumsy condition:
int m;
if (n % 2 == 0) {
m = (n / 2) * (n + 1);
} else {
m = n * ((n + 1) / 2);
}
Is there any more concise way of doing this?
There is a more concise way to write your test using the ternary operator:
int m = (n % 2 == 0) ? (n / 2) * (n + 1) : n * ((n + 1) / 2);
But it is likely to generate the exact same code.
You could take advantage of the extra precision long long is guaranteed to provide (at least 63 value bits):
int m = (long long)n * (n + 1) / 2;
Whether this is more or less efficient than the test version will depend on the target CPU and the compiler version and options. This version is simpler to read and understand, which is valuable. Adding a comment to explain why the result will be in range would be useful.
Derived from a suggestion by Amadeus, here is a more concise, but much less readable alternative, that does not use 64-bit arithmetics:
int m = (n + (n & 1)) / 2 * (n + 1 - (n & 1));
Demonstration:
if n is odd, we get m = (n + 1) / 2 * n;
if n is even, we get: m = n / 2 * (n + 1);.
The simplest solution is perhaps to use a larger intermediate type:
int m = (int)((long long)n * (n + 1) / 2) ;
It is not necessary to cast all operands since automatic type promotion will apply.
What do you think about:
m = ((n + (n & 1)) >> 1) * ( n + !(n & 1));
Explanation:
This solution try to achieve two objectives:
Do not overflow
Avoid to use if then else condition, and be pipeline friendly
To avoid overflow we first divide and the multiply. Once division is done to half the number (by 2) it has an interesting property: if number is odd the division is exact and can be done by a simple right sifting by 1.
So, to guarantee that the number is odd without if then else condition, we use the following trick:
If number is odd, it means that it lower bit is zero (captured by anding it with 1), otherwise it is even. Therefore if number is odd, we divide it by 2, otherwise, we first add 1, to make sure that it is odd and the divide.
In other words, this solution is equivalent to:
if ( n is odd )
m = (n >> 1) * (n + 1);
else
m = ( (n + 1) >> 1) * n;
and one more:
int m = (n/2 * n) + ((n%2) * (n/2)) + (n/2) + (n%2);
maybe
result = (n) * (n / 2) + (n & 1) * (n) + n / 2 ;
I'm looking for solution to find the sum of numbers. Input will be given has n in integer and problem is to find Sum of the values of sum(1)+ sum(1+2) + sum(1+2+3) + ... + sum(1+2+..+n). I need a very optimised solution using dynamic programming or any math calculation.
int main()
{
int sum = 0;
int i = 0, n = 6;
for( i = 1; i < n; i++ )
sum = sum + findSumN( i );
printf( "%d",sum );
}
You can often find a formula for series like this by calculating the first few terms and using the results to search the On-Line Encyclopedia of Integer Sequences.
1 = 1
1 + (1+2) = 4
4 + (1+2+3) = 10
10 + (1+2+3+4) = 20
20 + (1+2+3+4+5) = 35
35 + (1+2+3+4+5+6) = 56
The sequence you're trying to calculate (1, 4, 10, 20, 35, 56, ...) is A000292, which has the following formula:
a(n) = n × (n + 1) × (n + 2) / 6
If you play with the number you can find some patterns. Starts with
sum(1 + 2 + 3 ... + N) = ((1 + N) * N) /2
Then there is a relationship between the max number and the value above, that is from 1 the difference step 1/3 everytime the max number increase by 1. So get:
(1 + ((1.0 / 3.0) * (max - 1)))
I am not good enough at math to explain why this pattern occurs. Perhaps someone can explain it in a math way.
The following is my solution, no iteration needed.
int main()
{
int min = 1;
int max = 11254;
double sum = ((min + max) * max / 2) * (1 + ((1.0 / 3.0) * (max - 1)));
printf("%.f", sum);
}
Look at the closed form of sum(n)=1+2+…+n and look up the Pascal's triangle identities. This gives immediately a very fast computation method.
As
binom(k,2) + binom(k,3) = binom(k+1,3)
binom(k,2) = binom(k+1,3) - binom(k,3)
the summation of binom(k+1,2) from k=M to N results in the sum value
binom(N+2,3)-binom(M+1,3)=(N+2)*(N+1)*N/6-(M+1)*M*(M-1)/6
= (N+1-M) * ((N+1)²+(N+1)M+M²-1)/6
Having some trouble optimizing a function that returns the number of neighbors of a cell in a Conway's Game of Life implementation. I'm trying to learn C and just get better at coding. I'm not very good at recognizing potential optimizations, and I've spent a lot of time online reading various methods but it's not really clicking for me yet.
Specifically I'm trying to figure out how to unroll this nested for loop in the most efficient way, but each time I try I just make the runtime longer.
I'm including the function, I don't think any other context is needed. Thanks for any advice you can give!
Here is the code for the countNeighbors() function:
static int countNeighbors(board b, int x, int y)
{
int n = 0;
int x_left = max(0, x-1);
int x_right = min(HEIGHT, x+2);
int y_left = max(0, y-1);
int y_right = min(WIDTH, y+2);
int xx, yy;
for (xx = x_left; xx < x_right; ++xx) {
for (yy = y_left; yy < y_right; ++yy) {
n += b[xx][yy];
}
}
return n - b[x][y];
}
Instead of declaring board as b[WIDTH][HEIGHT] declare it as b[WIDTH + 2][HEIGHT + 2]. This gives an extra margin which will have zeros, but it prevents from index out of bounds. So, instead of:
x x
x x
We will have:
0 0 0 0
0 x x 0
0 x x 0
0 0 0 0
x denotes used cells, 0 will be unused.
Typical trade off: a bit of memory for speed.
Thanks to that we don't have to call min and max functions (which have bad for performance if statements).
Finally, I would write your function like that:
int countNeighborsFast(board b, int x, int y)
{
int n = 0;
n += b[x-1][y-1];
n += b[x][y-1];
n += b[x+1][y-1];
n += b[x-1][y];
n += b[x+1][y];
n += b[x-1][y+1];
n += b[x][y+1];
n += b[x+1][y+1];
return n;
}
Benchmark (updated)
Full, working source code.
Thanks to Jongware comment I added linearization (reducing array's dimensions from 2 to 1) and changing int to char.
I also made the main loop linear and calculate the returned sum directly, without an intermediate n variable.
2D array was 10002 x 10002, 1D had 100040004 elements.
The CPU I have is Pentium Dual-Core T4500 at 2.30 GHz, further details here (output of cat /prof/cpuinfo).
Results on default optimization level O0:
Original: 15.50s
Mine: 10.13s
Linear: 2.51s
LinearAndChars: 2.48s
LinearAndCharsAndLinearLoop: 2.32s
LinearAndCharsAndLinearLoopAndSum: 1.53s
That's about 10x faster compared to the original version.
Results on O2:
Original: 6.42s
Mine: 4.17s
Linear: 0.55s
LinearAndChars: 0.53s
LinearAndCharsAndLinearLoop: 0.42s
LinearAndCharsAndLinearLoopAndSum: 0.44s
About 15x faster.
On O3:
Original: 10.44s
Mine: 1.47s
Linear: 0.26s
LinearAndChars: 0.26s
LinearAndCharsAndLinearLoop: 0.25s
LinearAndCharsAndLinearLoopAndSum: 0.24s
About 44x faster.
The last version, LinearAndCharsAndLinearLoopAndSum is:
typedef char board3[(HEIGHT + 2) * (WIDTH + 2)];
int i;
for (i = WIDTH + 3; i <= (WIDTH + 2) * (HEIGHT + 1) - 2; i++)
countNeighborsLinearAndCharsAndLinearLoopAndSum(b3, i);
int countNeighborsLinearAndCharsAndLinearLoopAndSum(board3 b, int pos)
{
return
b[pos - 1 - (WIDTH + 2)] +
b[pos - (WIDTH + 2)] +
b[pos + 1 - (WIDTH + 2)] +
b[pos - 1] +
b[pos + 1] +
b[pos - 1 + (WIDTH + 2)] +
b[pos + (WIDTH + 2)] +
b[pos + 1 + (WIDTH + 2)];
}
Changing 1 + (WIDTH + 2) to WIDTH + 3 won't help, because compiler takes care of it anyway (even on O0 optimization level).
So my question is how to do this in C more specifically. I'm aware that O(logn) usually means that we'll be using recursion by somehow splitting one of the parameters.
What I'm trying to achieve is the sum of k = 0 to n of xn.
for example exponent_sum(x, n) would be the parameters in this case.
Then,
exponent_sum(4, 4) would be 40 + 41 + 42 + 43 + 44 = 341.
I'm not sure where to start. Some hints would be really appreciated.
One way to look at the sum is as a number in base x consisting of all 1s.
For e.g, 44 + 43 + 42 + 41 + 40 is 11111 in base 4.
In any base, a string of 1s is going to be equal to 1 followed by a string of the same number of 0s, minus 1, divided by the base minus 1.
For e.g:
in base 10: (1000 - 1) / 9 = 999 / 9 = 111
in base 16: (0x10000 - 1) / 0xF = 0xFFFF / 0xF = 0x1111
in base 8: (0100 - 1) / 7 = 077 / 7 = 011
etc
So put these together and we can generalize that
exponent_sum(x, n) = (x (n + 1) - 1) / (x - 1)
For example, exponent_sum(4, 4) = (45 - 1) / 3 = 1023 / 3 = 341
So the big O complexity for it will be the same as for computing xn
Let me add another proof for the sake of completeness:
s = 1 + x1 + x2 + ... + xn
Then
xs = x(1 + x1 + x2 + ... + xn) = x1 + x2 + ... + xn + xn+1 = s - 1 + xn+1
Solving for s
(x - 1)s = xn+1 - 1,
s = (xn+1 - 1)/(x - 1)
Another way to see the solution is like this: suppose the sum is S written as
S = 1 + x + x^2 + ... + x^k
Then if we multiply both sides of it by x we get
S*x = x * (1 + x + x^2 + ... + x^k)
= x + x^2 + ... + x^k + x^(k+1)
then add 1 to both sides
S*x + 1 = 1 + x + x^2 + ... + x^k + x^(k+1)
= (1 + x + x^2 + ... + x^k) + x^(k+1)
= S + x^(k+1)
which means
S*x - S = x^(k+1) - 1
S*(x - 1) = x^(k+1) - 1
so
S = (x^(k+1) - 1) / (x - 1)
Use the theory Of geometric progression. where
sum = (first-term(pow(common-ratio,number-of-terms)-1))/(common-ratio-1);
here first-term is obviously 1;
Common-ratio= number itself;
number-of-terms=number+1;
But common-ratio should be greater than 1;
For
Common-ratio=1;
Sum=number*number-of-terms.
You can evaluate the sum directly, without using the geometric progression formula. This has the advantage that no division is required (which is necessary if, for example, you want to adapt the code to return the result modulo some large number).
Letting S(k) to be the sum x^0 + ... + x^{k-1}, it satisfies these recurrence relations:
S(1) = 1
S(2n) = S(n) * (1 + x^n)
S(2n+1) = S(n) * (1 + x^n) + x^{2n}
Using these, the only difficulty is arranging to keep a running value of xp to use as x^n. Otherwise the algorithm is very similar to a bottom-up implementation of exponentiation by squaring.
#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
int64_t exponent_sum(int64_t x, int64_t k) {
int64_t r = 0, xp = 1;
for (int i = 63; i >= 0; i--) {
r *= 1 + xp;
xp *= xp;
if (((k + 1) >> i) & 1) {
r += xp;
xp *= x;
}
}
return r;
}
int main(int argc, char *argv[]) {
for (int k = 0; k < 10; k++) {
printf("4^0 + 4^1 + ... + 4^%d = %" PRId64 "\n", k, exponent_sum(4, k));
}
return 0;
}
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What’s the complexity of for i: for o = i+1
I have done the following sorting algorithm for arrays of length 5:
int myarray[5] = {2,4,3,5,1};
int i;
for (i = 0; i < 5; i++)
{
printf("%d", myarray[i]);
int j;
for (j=i+1; j < 5; j++)
{
int tmp = myarray[i];
if (myarray[i] > myarray[j]) {
tmp = myarray[i];
myarray[i] = myarray[j];
myarray[j] = tmp;
}
}
}
I believe that the complexity of this sorting algorithm is O(n*n) because for each element you compare it with the rest. However, I also notice that for each time we increate in the outer loop, we don't compare with all the rest, but with the rest - i. What would be the complexity?
It's still O(n²) (or O(n * n), as you wrote). Only the highest order term matters when analysing computational complexity.
You are right:
It's O(1 + 2 + 3... + N)
But mathematically it's just:
= O(n*((n-1)/2))
but that is just:
= O(n^2)
You are right, that it is O(n2).
Here's how to calculate it. On the first iteration, you will look at n elements; on the next, n - 1, and so on. If you write two copies of that sum, and divide by two, you can pair up the terms, such that you add the first term in the first copy n to the last term of the second copy 1, and so on. You wind up with n copies of n + 1, so the sum winds up being n * (n + 1) / 2. Big-O only distinguishes asymptotic behavior; the asymptotic behavior is described by the highest order term, without regard to constant factor, which is n2.
n + (n - 1) + (n - 2) ... + 1
= 2 * (n + (n - 1) + (n - 2) ... + 1) / 2
= ((n + 1) + (n - 1 + 2) + (n - 2 + 3) + ... + (1 + n)) / 2
= ((n + 1) + (n + 1) + ... + (n + 1)) / 2
= n * (n + 1) / 2
= 1/2 * n2 + 1/2 * n
= O(n2)
This is bubble sort, and it is indeed of complexity O(n^2)
The entire run time of the algorithm can be surmised in the following summation:
n + (n-1) + (n-2) + ... + 1 = n(n+1)/2
Since only the highest order term is of interest in asymptotic analysis, the complexity is O(n^2)
The big O notation is asymptotic. It means that we overlook constant factors such as - i. The complexity of your algorithm is O(N²) (see also here).
The complexity is O(1). The O notation only makes sense for large inputs, with, where an increase or decrease is not only visible, but relevant.
If you were to extend it, it would be O(n^2), yes.
for multiple loops
n*m*..no.of loops
for above code in worst case its n*n=n^2
BigOh means the max bound.
so the maximum complexity can't be greater then this.
For
i=0 it runs for n time
i=1 it runs for n-1 time
i=2 it runs for n-2 time
....
So total Sum = (n) + (n-1) + (n-2) + .... + 1
sum = (n*n) - (1 + 2 + ...)
= n^2 -
So big O complexity = O(n^2) { upper bound; + or - gets ignored }