I'm trying to determine the Big O notation of the following function
public static int f10(int n)
{
return f1(n*n*n)
}
Where f1() is given by:
public static int f1(int n)
{
int sum = 0;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++)
sum++;
for (int j = 0; j < n; j++)
sum++;
for (int j = 0; j < n; j++)
sum++;
}
return sum;
}
I can see that O(f1) is O(n^2), but when we call f10, does this become O(n^6) because the size of n is being cubed before calling f1?
I understand the complexity of f1 is not changing from it's own perspective, but is it from f10's perspective of 'n'?
Let's analyse f1():
for (int i = 0; i < n; i++) -> O(n)
for (int j = 0; j < n; j++)
for (int j = 0; j < n; j++)
for (int j = 0; j < n; j++) -> O(n)
for (n-times) { O(n) } -> O(n^2)
So f1() is O(n^2). It's just two nested loops. But because f1() is called with n^3, make f10() indeed O(n^6).
However, the above complexity order is theoretical. In practice it may depend on how you call f10() and/or what optimizations the compiler makes. A smart C compiler could replace f1() with a simple and O(1) arithmetic expression.
Then, having f1() reduced to an expression, the compiler could replace calls like f10(42) with the result, doing all calculations at compile-time.
Do you see what I mean; how would you simplify f1() to a simple O(1) expression?
Complexity of f1 is always O(n^2), that's clear. However, the complexity of f10 is indeed O(n^6), because it relies on f1 invoked with an argument n^3. For the sake of simplicity, imagine that f1 is an inlined function. Body of f10 would then look like this:
public static int f10(int n)
{
int sum = 0;
for (int i = 0; i < n^3; i++) {
for (int j = 0; j < n^3; j++)
sum++;
for (int j = 0; j < n^3; j++)
sum++;
for (int j = 0; j < n^3; j++)
sum++;
}
return sum;
}
Now it is easy to deduce - two levels of nested loops, each with n^3 iterations -> O(n^6). If you are still not convinced, try to see how the running time of f10 increases with increased input:
n = 1 -> 3 iterations
n = 2 -> 8 * 3 * 8 = 3 * 2^6 iterations
n = 3 -> 27 * 3 * 27 = 3 * 3^6 iterations
....
n = k -> k^3 * 3 * k^3 = 3 * k^6 iterations
Related
I'm generally an R user but I am trying to use to C for some lower level cumulative sums and multiplications.
I am trying to generate a cumulative sum of eta and storing the result in tmp0. However, when I output tmp0 it either gives me Inf, NaN, or some arbitrarily large number. I double checked the same cumulative sum in R and it works fine; I am not sure why C is not handling it. Below is the code that I am using:
int i,j;
const int p = ncov, n = nin;
double accNum0[n]; //accumulate first part of likelihood sum eta_i
double accNum1[n]; //accumulate the backwards numerator
double accNum2[n]; //acumulate the forward numerator (weighted)
double tmp0 = 0;
double eta[n]; //calculate linear predictor in this step (X %*% beta)
for(i = 0; i < n; i++) {
for (j = 0; j < p; j++)
eta[i] += b[j] * x[n * j + i];
}
for (i = 0; i < n; ++i) {
tmp0 += eta[i];
}
return (tmp0);
Again, I am fairly new to C so I may be making some rookie mistakes and would greatly appreciate any (and all) suggestions!
There might be errors with how you are initializing b or x. However, one definite error is that eta is being used uninitialized. This means eta[i] may begin with some arbitrary value instead of 0 as you are likely expecting.
Add an initialization before accumulating into it.
for(i = 0; i < n; i++) {
eta[i] = 0;
for (j = 0; j < p; j++)
eta[i] += b[j] * x[n * j + i];
}
So I have this segment of code that was given to me.
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++)
{
if (arr[j] < arr[i])
{
temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
}
I am trying to calculate the number of comparison operations that would occur if the code were to run.
There's the initial comparison all the way up to i=100. so there's 101 comparisons for the outer loop. The inner loop also has 101 loops, but that comparison within will only happen 100 times due to the j=100 will not have that comparison occurring.
I've made a tries but none of been the right answer so far.
I've had 101 x (101+100) = 20301 which is not the right answer.
I've searched for this on google and came up with a question identical to this but was answering how many assignment operations that occur which I was able to answer on my own. Which btw is 25201.
I got 20201.
#include <stdio.h>
int main(void) {
int i, j;
unsigned long count;
count = 0;
for (i = 0; ++count, i < 100; ++i) {
for (j = 0; ++count, j < 100; ++j) {
++count;
}
}
(void) printf("%lu\n", count);
return 0;
}
100 comparisons on the outer loop drive 101 + 100 comparisons on the inner loop. There is one more comparison on the outer loop to detect loop termination, so:
100 * (101 + 100) + 101 = 20201.
Instrumenting the program:
outer_cmps=0;
total_inner_cmps=0;
for (int i = 0; i < 100; i++) {
++outer_cmps;
inner_cmps=0;
for (int j = 0; j < 100; j++)
{
++inner_cmps;
if (arr[j] < arr[i])
{
temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
++inner_cmps;
}
++inner_cmps;
tota_inner_cmps += inner_cmps;
}
++outer_cmps;
total_cmps = outer_cmps + total_inner_cmps;
So that would be 100*200+100+1=20101
(100 times i, which runs the j loop 100 times, which performs 1 comparisson if (arr[j] < arr[i]) per loop, and one i loop that fails when i==100and 100 times j loop that fail when j==100)
I have an array[768] but now I have only 256 (from 0 to 255) samples in this array. I want to copy each value from 0 to 255 and fill this array better, I mean:
[1][2][3] - > [1][1][1][2][2][2][3][3][3]
How I can do that? Is there a library function that can do this?
I don't recall any known library function capable of doing this.
If you want to do it in-place, I'd do it from right to left (i.e. tail to head), I think this is the only way to do it in-place:
int i, j;
for (i = 255, j = 767; i > 0; i--) {
for (int k = 0; k < 3; k++) {
array[j--] = array[i];
}
}
If you don't need to do it in-place, this would suffice:
for (int i = 0, j = 0; i < 256; i++) {
for (int k = 0; k < 3; k++)
new_array[j++] = array[i];
}
You can do this:
for (int i = 0; i < 768; ++i)
new_array[i] = array[i/3];
The index of the right hand side of the assignment will vary only every three steps.
Which of these optimizations is better and in what situation? Why?
Intuitively, I am getting the feeling that loop tiling will in general
be a better optimization.
What about for the below example?
Assume a cache which can only store about 20 elements in it at any time.
Original Loop:
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 1000; j++)
{
a[i] += a[i]*b[j];
}
}
Loop Interchange:
for(int i = 0; i < 1000; i++)
{
for(int j = 0; j < 10; j++)
{
a[j] += a[j]*b[i];
}
}
Loop Tiling:
for(int k = 0; k < 1000; k += 20)
{
for(int i = 0; i < 10; i++)
{
for(int j = k; j < min(1000, k+20); j++)
{
a[i] += a[i]*b[j];
}
}
}
The first two cases you are exposing in your question are about the same. Things would really change in the following two cases:
CASE 1:
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 1000; j++)
{
b[i] += a[i]*a[j];
}
}
Here you are accessing the matrix "a" as follows: a[0]*a[0], a[0]*a1, a[0]*a[2],.... In most architectures, matrix structures are stored in memory like: a[0]*a[0], a1*a[0], a[2]*a[0] (first column of first row followed by second column of first raw,....). Imagine your cache only could store 5 elements and your matrix is 6x6. The first "pack" of elements that would be stored in cache would be a[0]*a[0] to a[4]*a[0]. Your first acces would cause no cache miss so a[0][0] is stored in cache but the second yes!! a0 is not stored in cache! Then the OS would bring to cache the pack of elements a0 to a4. Then you do the third acces: a[0]*a[2] wich is out of cache again. Another cache miss!
As you can colcude, case 1 is not a good solution for the problem. It causes lots of cache misses that we can avoid changing the code for the following:
CASE 2:
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 1000; j++)
{
b[i] += a[i]*a[j];
}
}
Here, as you can see, we are accessing the matrix as it's stored in memory. Consequently it's much better (faster) than case 1.
About the third code you posted about loop tiling, loop tiling and also loop unrolling are optimizations that in most cases the compiler does automaticaly. Here's a very interesting post in stackoverflow explaining these two techniques;
Hope it helps! (sorry about my english, I'm not a native speaker)
Helo, I'm a bit confused about the definition of an inner loop in the case of imperfectly nested loops. Consider this code
for (i = 0; i < n; ++i)
{
for (j = 0; j <= i - 1; ++j)
/*some statement*/
p[i] = 1.0 / sqrt (x);
for (j = i + 1; j < n; ++j)
{
x = a[i][j];
for (k = 0; k <= i - 1; ++k)
/*some statement*/
a[j][i] = x * p[i];
}
}
Here, we have two loops in the same nesting level. But, in the second loop which iterates over "j" starting from j+1, there is a again another nesting level. Considering the entire loop structure, which is the inner most loop in the code ?
Both j loops are nested inside i equally, k is the inner most loop
Lol I don't know how to explain this so i'll give it my best shot I recommend using a debugger! it may help you so much you won't even know
for (i = 0; i < n; ++i)
{
//Goes in here first.. i = 0..
for (j = 0; j <= i - 1; ++j) {
//Goes here second..
//Goes inside here and gets stuck until j is greater then (i- 1) (right now i = 0)
//So (i-1) = -1 so it does this only once.
/*some statement*/
p[i] = 1.0 / sqrt (x);
}
for (j = i + 1; j < n; ++j)
{
//Goes sixth here.. etc.. ..
//when this is done.. goes to loop for (i = 0; i < n; ++i)
//Goes here third and gets stuck
//j = i which is 0 + 1.. so, j == 1
//keeps looping inside this loop until j is greater then n.. idk what is n..
//Can stay here until it hits n.. which could be a while.
x = a[i][j];
for (k = 0; k <= i - 1; ++k) {
//Goes in here fourth until k > (i-1).. i is still 0..
//So (i-1) = -1 so it does this only once
/*some statement*/
a[j][i] = x * p[i];
}
//Goes here fifth.. which goes.... to this same loop!
}
}
I'd say that k is the inner-most loop, because if you count the number of loops required to reach it from the outside, it's three loops, and that's the most out of all four of the loops in your code.