i = 1;
while (i <= n) {
j = n - i;
while (j >= 2) {
for (k = 1; k <= j; k++) {
s = s + Arr[k];
}
j = j - 2;
}
i = i + 1;
}
The part that confuses me is where it says
j = n - i;
while(j >= 2){
I'm not really sure how to show my work on that part. I'm pretty sure the algorthim is O(n^3) though.
You can simplify it a bit in order to see things more clearly:
for(i = 1; i <= n; i++)
{
for(j = n - i; j >= 2; j -= 2)
{
for(k = 1; k <= j; k++)
{
s = s + Arr[k];
}
}
}
Now things should be simpler
for(i = 1; i <= n; i++) : O(n) [executes exactly n times, actually]
for(j = n - i; j >= 2; j -= 2) : (n-1)/2 in 1st iteration, (n-3)/2 in the 2nd and so on... O(n)
for(k = 1; k <= j; k++) n-2 in 1st iteration, n-3 in the 2nd and so on... O(n)
s = s + Arr[k]; [simple operation] : O(1)
Multiply every step and you get O(n^3)
If you are still having trouble with it, I would suggest you run a few simulations of this code with varying n values and a counter inside the loops. Hopefully you'll be able to see how the O(n) is the complexity for each loop
Related
I'm trying to make it so the user will input an odd number of stars on the bottom row of the pyramid. The program will build that pyramid using only odd integers less than the one input. On the outside of the pyramid are underscores.
String result = "";
int sideWidth = -1, midWidth = -1;
for (int i=1; i<=numSymbols ; i++)
{
for (int j=numSymbols; j>i; j--)
{
System.out.print("_");
}
for (int k=1; k<=(i * 2) -1; k++)
{
System.out.print("*");
}
System.out.println();
}
return result;
Expected output if user puts 7:
___*___
__***__
_*****_
*******
What I actually got if user puts 7:
______*
_____***
____*****
___*******
__*********
_***********
*************
You need to use numSymbols / 2 + 1 instead of numSymbols as you need 4 rows only for 7 numSymbols. Also, The _ loop should be like this j = numSymbols / 2; j >= i and repeated twice; one before the stars and one after them. Finally, you don't need to use 'k'.
for (int i = 1; i <= numSymbols / 2 + 1; i++)
{
for (int j = numSymbols / 2; j >= i; j--)
{
System.out.print("_");
}
for (int j = 1; j <= i * 2 - 1; j++)
{
System.out.print("*");
}
for (int j = numSymbols / 2; j >= i; j--)
{
System.out.print("_");
}
System.out.println();
}
For a simpler form:
for (int i = 1; i <= numSymbols; i += 2)
{
for (int j = numSymbols / 2; j >= i / 2; j--)
System.out.print("_");
for (int j = 1; j <= i; j++)
System.out.print("*");
for (int j = numSymbols / 2; j >= i / 2; j--)
System.out.print("_");
System.out.println();
}
int mmult_omp(double *c,
double *a, int aRows, int aCols,
double *b, int bRows, int bCols, int numThreads)
{
for (i = 0; i < aRows; i++) {
for (j = 0; j < bCols; j++) {
c[i*bCols + j] = 0;
}
for (k = 0; k < aCols; k++) {
for (j = 0; j < bCols; j++) {
c[i*bCols + j] += a[i*aCols + k] * b[k*bCols + j];
}
}
}
for (i = 0; i < aRows; i++) {
for (j = 0; j < bCols; j++) {
c[i*bCols + j] = 0;
for (k = 0; k < aCols; k++) {
c[i*bCols + j] += a[i*aCols + k] * b[k*bCols + j];
}
}
}
Why is the first algorithm faster than the second?
I’ve used C’s time library and the first algorithm is objectively faster than the second. Why is that?
This code is very hard to understand. I had to copy it and reformat it to see what loops were what. I'm not really sure why one is faster but here's a great resource to see why.
Here are links to inspect the assembly output:
link for #1
link for #2
There are many implementations of the Sieve of Eratosthenes online. Through searching Google, I found this implementation in C.
#include <stdio.h>
#include <stdlib.h>
#define limit 100 /*size of integers array*/
int main(){
unsigned long long int i,j;
int *primes;
int z = 1;
primes = malloc(sizeof(int) * limit);
for (i = 2;i < limit; i++)
primes[i] = 1;
for (i = 2;i < limit; i++)
if (primes[i])
for (j = i;i * j < limit; j++)
primes[i * j] = 0;
printf("\nPrime numbers in range 1 to 100 are: \n");
for (i = 2;i < limit; i++)
if (primes[i])
printf("%d\n", i);
return 0;
}
I then attempted to update the existing code so that the C program would follow what is described by Scott Ridgway in Parallel Scientific Computing. In the first chapter, the author describes what is known as the Prime number sieve. Instead of finding the primes up to a number k, the modified sieve searches for primes between k <= n <= k^2. Ridgway provides the psuedocode to write this algorithm.
To match the psuedocode provided by the author, I modified the original program above and wrote
#include <stdio.h>
#include <stdlib.h>
#define limit 10 /*size of integers array*/
int main(){
unsigned long long int i,j,k;
int *primes;
int *arr[100];
int z = 1;
primes = malloc(sizeof(int) * limit);
for (i = 2;i < limit; i++)
primes[i] = 1;
for (i = 2;i < limit; i++)
if (primes[i])
for (j = i;i * j < limit; j++)
primes[i * j] = 0;
/* Code which prints out primes for Sieve of Eratosthenes */
/*printf("\nPrime numbers in range 1 to 100 are: \n");
for (i = 2;i < limit; i++)
if (primes[i])
//printf("Element[%d] = %d\n", i, primes[i]);*/
for (k=limit; k < limit*limit; k++)
for (j = primes[0]; j = arr[sizeof(arr)/sizeof(arr[0]) - 1]; j++)
if ((k % j) == 0)
arr[k]=0;
arr[k] = 1;
printf("\nPrime numbers in range k to k^2 are: \n");
for (k=limit; k < limit*limit; k++)
if (arr[k])
printf("Element[%d] = %d\n", k, k);
return 0;
}
which returns
Prime numbers in range k to k^2 are:
Element[10] = 10
Element[14] = 14
Element[15] = 15
Element[16] = 16
Element[17] = 17
Element[18] = 18
Element[19] = 19
.
.
.
This is clearly wrong. I think that my mistake is in my interpretation of the psuedocode
as
for (k=limit; k < limit*limit; k++)
for (j = primes[0]; j = arr[sizeof(arr)/sizeof(arr[0]) - 1]; j++)
if ((k % j) == 0)
arr[k]=0;
arr[k] = 1;
As I am new to C, I likely made an elementary mistake. I'm not sure what is wrong with the five lines of code above and have therefore asked a question on Stack Overflow.
You have some problem with your loop statement, j variable should use for index of primes that is pointer to array of int with 0 or 1 values. You can use primes array in this case is S(k) in algorithm.
for (k=limit; k < limit*limit; k++)
for (j = primes[0]; j = arr[sizeof(arr)/sizeof(arr[0]) - 1]; j++)
if ((k % j) == 0)
arr[k]=0;
arr[k] = 1;
So the for loop with j should be
for (j = 2; j < limit; j++)
And condition IN if statement should be
if (primes[j] && (k % j) == 0)
{
arr[k] = 0;
break;
}
And if this condition is true, we should exit inner for loop with j variable. Outside for loop with j, should check value of j variable to check if the inner loop is completed or not (j == limit).
if (j == limit) arr[k] = 1;
So here is the entire for loop (outer and inner loop) the I modified.
for (k = limit; k < limit*limit; k++)
{
for (j = 2; j < limit; j++)
{
if (primes[j] && (k % j) == 0)
{
arr[k] = 0;
break;
}
}
if (j == limit) arr[k] = 1;
}
And here is entire solution:
#include <stdio.h>
#include <stdlib.h>
#define limit 10 /*size of integers array*/
int main() {
unsigned long long int i, j, k;
int *primes;
int arr[limit*limit];
int z = 1;
primes = (int*)malloc(sizeof(int) * limit);
for (i = 2; i < limit; i++)
primes[i] = 1;
for (i = 2; i < limit; i++)
if (primes[i])
for (j = i; i * j < limit; j++)
primes[i * j] = 0;
/* Code which prints out primes for Sieve of Eratosthenes */
/*printf("\nPrime numbers in range 1 to 100 are: \n");
for (i = 2;i < limit; i++)
if (primes[i])
//printf("Element[%d] = %d\n", i, primes[i]);*/
for (k = limit; k < limit*limit; k++)
{
for (j = 2; j < limit; j++)
{
if (primes[j] && (k % j) == 0)
{
arr[k] = 0;
break;
}
}
if (j == limit) arr[k] = 1;
}
printf("\nPrime numbers in range k to k^2 are: \n");
for (k = limit; k < limit*limit; k++)
if (arr[k] == 1)
printf("Element %d\n", k);
return 0;
}
I've searched for hours and spent many more trying to figure how to fix this problem. I need to find the inverse of a predefined matrix using
A^-1 = I + (B + B^2 + ... + B^20) where B = I-A.
void invA(double a[][3], double id[][3], double z[][3])
{
int i, j, n, k;
double pb[3][3] = {1.,0.,0.,0.,1.,0.,0.,0.,1.};
double temp[3][3] = {1.,0.,0.,0.,1.,0.,0.,0.,1.};
double b[3][3];
temp[i][j] = 0;
b[i][j] = 0;
for(i = 0; i < 3; i++)
for (j = 0; j < 3; j++)
b[i][j] = id[i][j] - a[i][j];
for (n = 0; n < 20; n++) //run loop n times
{
for (i = 0; i < 3; i++) //find b to the power 20
for (j = 0; j < 3; j++)
for (k = 0; k < 3; k++)
temp[i][j] += pb[i][k] * b[k][j];
for (i = 0; i < 3; i++) //allocate pb from temp
for (j = 0; j < 3; j++)
pb[i][j] = temp[i][j];
for (i = 0; i < 3; i++) //summing b n time
for (j = 0; j < 3; j++) //to find inverse
z[i][j] = z[i][j] + pb[i][j];
}
}
Matrix a is the defined matrix, id is the identity and z is the inverse (result). I can't seem to figure out where I've gone wrong.
You have few problems.
First, temp[i][j] = 0; and b[i][j] = 0; at the beginning of the function use uninitialized variables i and j. The behaviour is undefined, and who knows how temp is actually initialized.
Then, temp must be reinitialized to a zero matrix at each iteration. I don't know what exactly does your code compute, but it is not a power for sure.
Finally, (unless z is initialized to I), you are missing the initial term.
All that said, I highly recommend to factor out most of the loops into functions: matAdd() and matMult(). Once they are unit tested, the rest is much simpler.
I have a number crunching C program which involves a main loop with two conditionals:
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
for (k = 0; k < N; k++) {
if (k == i || k == j) continue;
...(calculate a, b, c, d (depending on k)
if (a*a + b*b + c*c < d*d) {break;}
} //k
} //j
} //i
The hardware here is the SPE of the Cell processor, where there is a big penalty when using branching. So in order to optimize my program for speedup I need to remove these 2 conditionals, do you know about good strategies for this?
For the first one, you could break it into multiple loops, eg change:
for(int i = 0; i < 1000; i++)
for(int j = 0; j < 1000; j++) {
for(int k = 0; k < 1000; k++) {
if(k==i || k == j) continue;
// other code
}
}
to:
for(int i = 0; i < 1000; i++)
for(int j = 0; j < 1000; j++) {
for(int k = 0; k < min(i, j); k++) {
// other code
}
for(int k = min(i, j) + 1; k < max(i, j); k++) {
// other code
}
for(int k = max(i, j) + 1; k < 1000; k++) {
// other code
}
}
To remove the second, you could store the previous total and use it in the for loop conditions, i.e.:
int left_side = 1, right_side = 0;
for(int i = 0; i < N; i++)
for(int j = 0; j < N; j++) {
for(int k = 0; k < min(i, j) && left_side >= right_side; k++) {
// other code (calculate a, b, c, d)
left_side = a * a + b * b + c * c;
right_side = d * d;
}
for(int k = min(i, j) + 1; k < max(i, j) && left_side >= right_side; k++) {
// same as in previous loop
}
for(int k = max(i, j) + 1; k < N && left_side >= right_side; k++) {
// same as in previous loop
}
}
Implementing min and max without branching could also be tricky. Maybe this version is better:
int i, j, k,
left_side = 1, right_side = 0;
for(i = 0; i < N; i++) {
// this loop covers the case where j < i
for(j = 0; j < i; j++) {
k = 0;
for(; k < j && left_side >= right_side; k++) {
// other code (calculate a, b, c, d)
left_side = a * a + b * b + c * c;
right_side = d * d;
}
k++; // skip k == j
for(; k < i && left_side >= right_side; k++) {
// same as in previous loop
}
k++; // skip k == i
for(; k < N && left_side >= right_side; k++) {
// same as in previous loop
}
}
j++; // skip j == i
// and now, j > i
for(; j < N; j++) {
k = 0;
for(; k < i && left_side >= right_side; k++) {
// other code (calculate a, b, c, d)
left_side = a * a + b * b + c * c;
right_side = d * d;
}
k++; // skip k == i
for(; k < j && left_side >= right_side; k++) {
// same as in previous loop
}
k++; // skip k == j
for(; k < N && left_side >= right_side; k++) {
// same as in previous loop
}
}
}
I agree with 'sje397'.
Besides this, you provide too little information about your problem. You say branching is pricey. But how often does it actually happen? Maybe your problem is that compiler-generated code does branching in the common scenario?
Perhaps you could re-arrange your if-s. The implementation of the if is actually compiler-dependent, bust many compilers treat it in a straight-forward way. That is: if - common - else - rare (jump).
Then try the following:
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
for (k = 0; k < N; k++) {
if (k != i && k != j)
{
...(calculate a, b, c, d)
if (a*a + b*b + c*c >= d*d)
{
...
} else
break;
}
} //k
} //j
} //i
EDIT:
Of course you may go into assembler level to ensure correct code generated.
I would look first at your calculate code, because that could swamp all these branching issues. Some sampling would find out for sure.
However, it looks like you're doing, for each i,j, a linear search for the first point inside a sphere. Could you have 3 arrays, one for each of the X, Y, and Z axes, and in each array store indexes of all the original points in ascending order by that axis? That could facilitate a nearest-neighbor search. Also, you might be able to use an in-cube test, rather than an in-sphere test, since you're not hunting for the closest point, but only a nearby point.
Are you sure you actually need the first if-statement? Even if it jumps one calculation when k equals i or j, the penalty for checking it every iteration is very costly. Also, keep in mind that if N is not a constant, the compiler probably wont be able to unroll the for loops.
Although, if it's a cell processor, the compiler might even try to vectorize the loops.
If the for loops compiles to normal iterative loops it could be an idea to make them compare with zero instead, as the decrement operation will often do the comparison for you when it hits zero.
for (i = 0; i < N; i++) {
...can become...
for (i = N; i != 0; i--) {
Although, if "i" is used as an index or a variable in a calculation, you might get performance degradation as you will get cache misses.