Nested loop logic to skip the inner loop when its index equals the outer loop index.
Used an if statement within the inner loop to the effect of:
for (i=0;i<N;i++)
for (j=0;j<N;j++)
if (j!=i)
... some code
I believe this gives me the expected results but is there a less CPU consuming method that I may not be aware of?
If you can assume that N <= i, you can split the inner loop into 2 separate for loops to reduce the number of tests:
for (i = 0; i < N; i++) {
for (j = 0; j < i; j++) {
... some code
}
/* here we have j == i, skip this one */
j++;
for (; j < N; j++) {
... same code
}
}
This results in more code but half as many tests on j. Note however that if N is a constant, the compiler might unroll the original inner loop more efficiently. Careful benchmarking is the only way to determine if this solution is worth the effort for your problem, compiler and architecture.
For completeness, this code can be simplified as:
for (i = 0; i < N; i++) {
for (j = 0; j < i; j++) {
... some code
}
/* here we have j == i, skip this one */
while (++j < N) {
... same code
}
}
Related
double A[N] = { ... }, B[N] = { ... };
for (int i = 1; i < N; i++) {
A[i] = B[i] – A[i –1];
}
Why can't this loop be parallelized using the #pragma omp parallel for construct?
The reason is pretty plain in the code. To calculate A[i] you need to calculate A[i-1] first. There are N steps where every step depends on the previous step.
In general, a loop for(int i=1; i<N; i++) is suitable for parallelism if the loop would produce the exact same result if you changed the header to for(int i=N-1; i>0; i--)
Maybe that was a bit confusing. It has nothing to do with reverse order. The point is that you should be able to do each step individually.
So I have this segment of code that was given to me.
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++)
{
if (arr[j] < arr[i])
{
temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
}
I am trying to calculate the number of comparison operations that would occur if the code were to run.
There's the initial comparison all the way up to i=100. so there's 101 comparisons for the outer loop. The inner loop also has 101 loops, but that comparison within will only happen 100 times due to the j=100 will not have that comparison occurring.
I've made a tries but none of been the right answer so far.
I've had 101 x (101+100) = 20301 which is not the right answer.
I've searched for this on google and came up with a question identical to this but was answering how many assignment operations that occur which I was able to answer on my own. Which btw is 25201.
I got 20201.
#include <stdio.h>
int main(void) {
int i, j;
unsigned long count;
count = 0;
for (i = 0; ++count, i < 100; ++i) {
for (j = 0; ++count, j < 100; ++j) {
++count;
}
}
(void) printf("%lu\n", count);
return 0;
}
100 comparisons on the outer loop drive 101 + 100 comparisons on the inner loop. There is one more comparison on the outer loop to detect loop termination, so:
100 * (101 + 100) + 101 = 20201.
Instrumenting the program:
outer_cmps=0;
total_inner_cmps=0;
for (int i = 0; i < 100; i++) {
++outer_cmps;
inner_cmps=0;
for (int j = 0; j < 100; j++)
{
++inner_cmps;
if (arr[j] < arr[i])
{
temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
++inner_cmps;
}
++inner_cmps;
tota_inner_cmps += inner_cmps;
}
++outer_cmps;
total_cmps = outer_cmps + total_inner_cmps;
So that would be 100*200+100+1=20101
(100 times i, which runs the j loop 100 times, which performs 1 comparisson if (arr[j] < arr[i]) per loop, and one i loop that fails when i==100and 100 times j loop that fail when j==100)
void fun(int n, int arr[])
{
int i = 0, j = 0;
for(; i < n; ++i)
while(j < n && arr[i] < arr[j])
j++;
}
The answer given is: variable j is not initialized for each value of variable i, so time complexity is O(n)
I don't quite understand it. Can anyone explain?
See the difference between your function and this (this is in O(n2) time complexity) -
void fun(int n, int arr[])
{
int i = 0, j = 0;
for(; i < n; ++i)
{
j = 0;
while(j < n && arr[i] < arr[j])
j++;
}
}
In your function the variable j is not initialized for each value of variable i. So, the inner loop runs at most n times.
Edit 1- From j=0 to j=n is maximum n iterations of inner while loop. Since you never initialize j again once j=n the inner while loop will never iterate. So at maximum (it could be less depending on the second condition arr[i] < arr[j]) you have n iterations of inner while loop once. The outer for loop will obviously iterate n times . So you have n+n=2n and not n2 even in the worst case.
Edit 2- #Kerrek SB answer to this is spot-on -
"The code finishes when j has been incremented n times. j never gets decremented, and at most n attempts are made at incrementing it (via i)"
The outer loop is executed n times.
The inner loop only continues if j < n, and since j is incremented in each step, it cannot be executed more than n times in total.
Which of these optimizations is better and in what situation? Why?
Intuitively, I am getting the feeling that loop tiling will in general
be a better optimization.
What about for the below example?
Assume a cache which can only store about 20 elements in it at any time.
Original Loop:
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 1000; j++)
{
a[i] += a[i]*b[j];
}
}
Loop Interchange:
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 10; j++)
{
a[j] += a[j]*b[i];
}
}
Loop Tiling:
for(int k = 0; k < 1000; k += 20)
{
for(int i = 0; i < 10; i++)
{
for(int j = k; j < min(1000, k+20); j++)
{
a[i] += a[i]*b[j];
}
}
}
The first two cases you are exposing in your question are about the same. Things would really change in the following two cases:
CASE 1:
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 1000; j++)
{
b[i] += a[i]*a[j];
}
}
Here you are accessing the matrix "a" as follows: a[0]*a[0], a[0]*a1, a[0]*a[2],.... In most architectures, matrix structures are stored in memory like: a[0]*a[0], a1*a[0], a[2]*a[0] (first column of first row followed by second column of first raw,....). Imagine your cache only could store 5 elements and your matrix is 6x6. The first "pack" of elements that would be stored in cache would be a[0]*a[0] to a[4]*a[0]. Your first acces would cause no cache miss so a[0][0] is stored in cache but the second yes!! a0 is not stored in cache! Then the OS would bring to cache the pack of elements a0 to a4. Then you do the third acces: a[0]*a[2] wich is out of cache again. Another cache miss!
As you can colcude, case 1 is not a good solution for the problem. It causes lots of cache misses that we can avoid changing the code for the following:
CASE 2:
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 1000; j++)
{
b[i] += a[i]*a[j];
}
}
Here, as you can see, we are accessing the matrix as it's stored in memory. Consequently it's much better (faster) than case 1.
About the third code you posted about loop tiling, loop tiling and also loop unrolling are optimizations that in most cases the compiler does automaticaly. Here's a very interesting post in stackoverflow explaining these two techniques;
Hope it helps! (sorry about my english, I'm not a native speaker)
Helo, I'm a bit confused about the definition of an inner loop in the case of imperfectly nested loops. Consider this code
for (i = 0; i < n; ++i)
{
for (j = 0; j <= i - 1; ++j)
/*some statement*/
p[i] = 1.0 / sqrt (x);
for (j = i + 1; j < n; ++j)
{
x = a[i][j];
for (k = 0; k <= i - 1; ++k)
/*some statement*/
a[j][i] = x * p[i];
}
}
Here, we have two loops in the same nesting level. But, in the second loop which iterates over "j" starting from j+1, there is a again another nesting level. Considering the entire loop structure, which is the inner most loop in the code ?
Both j loops are nested inside i equally, k is the inner most loop
Lol I don't know how to explain this so i'll give it my best shot I recommend using a debugger! it may help you so much you won't even know
for (i = 0; i < n; ++i)
{
//Goes in here first.. i = 0..
for (j = 0; j <= i - 1; ++j) {
//Goes here second..
//Goes inside here and gets stuck until j is greater then (i- 1) (right now i = 0)
//So (i-1) = -1 so it does this only once.
/*some statement*/
p[i] = 1.0 / sqrt (x);
}
for (j = i + 1; j < n; ++j)
{
//Goes sixth here.. etc.. ..
//when this is done.. goes to loop for (i = 0; i < n; ++i)
//Goes here third and gets stuck
//j = i which is 0 + 1.. so, j == 1
//keeps looping inside this loop until j is greater then n.. idk what is n..
//Can stay here until it hits n.. which could be a while.
x = a[i][j];
for (k = 0; k <= i - 1; ++k) {
//Goes in here fourth until k > (i-1).. i is still 0..
//So (i-1) = -1 so it does this only once
/*some statement*/
a[j][i] = x * p[i];
}
//Goes here fifth.. which goes.... to this same loop!
}
}
I'd say that k is the inner-most loop, because if you count the number of loops required to reach it from the outside, it's three loops, and that's the most out of all four of the loops in your code.