Why does this nested loop have O(n) time complexity? - c

I have a test in computer sience about complexity and I have this question:
int counter = 0;
for (int i = 2; i < n; ++i) {
for (int j = 1; j < n; j = j * i) {
counter++;
}
}
My solution is O(nlogn) because the first for is n-2 and the second for is doing log in base i of n and it's n-2 * logn, that is O(nlogn)-
But my teacher told us it's n and when I tried in cLion to run it it gives me 2*n and it's O(n). Can someone explain why it is O(n)?

Empirically, you can see that this is correct (that's around the right value for the sum of the series), for n=100 and n=1,000
If you want more intuition, you can think about the fact that for nearly all the series, i > sqrt(2).
for example, if n = 100 then 90% of values have i > 10, and for n = 1,000 97% have i > 32.
From that point onwards, all iterations of the outer loop will have at most 2 iterations in the inner loop (since log(n) with base sqrt(n) is 2, by definition).
If n grows really large, you can also apply the same logic to show that from the cube root to the square root, log is between 2 and 3, etc...

This would be O(nlogn) if j was incremented by i each iteration, not multiplied by it. As it is now, the j loop increases much more slowly than n grows, which is why your teacher and CLion state the time complexity as O(n).

Note that it's j=j*i, not j=j*2. That means most of the time, the inner loop will only have one pass. For example, with n of 33, the inner loop will only have one pass when i is in [7,33).
n = 33
j = 32
j = 16
j = 8 27
j = 4 9 16 25
j = 2 3 4 5 6
j = 1 1 1 1 1 1 1 1 1 1 1 1
--------------------------------------------
i = 2 3 4 5 6 7 8 9 10 11 ... 28 29
If you think of the above as a graph, it looks like the complexity of algorithm is O( area under 1/log(n) ). I have no idea how to prove that, and calculating that integral involves the unfamiliar-to-me logarithmic integral function. But the Wikipedia page does say this function is O( n / log n ).
Let's do it experimentally.
#include <stdio.h>
int main( void ) {
for ( int n = 20; n <= 20000; ++n ) {
int counter = 0;
for ( int i = 2; i < n; ++i ) {
for ( int j = 1; j < n ; j *= i ) {
++counter;
}
}
if ( n % 1000 == 0 )
printf( "%d: %.3f\n", n, counter / (n-1) );
}
}
1000: 2.047
2000: 2.033
3000: 2.027
4000: 2.023
5000: 2.021
6000: 2.019
7000: 2.017
8000: 2.016
9000: 2.015
10000: 2.014
11000: 2.013
12000: 2.013
13000: 2.012
14000: 2.012
15000: 2.011
16000: 2.011
17000: 2.011
18000: 2.010
19000: 2.010
20000: 2.010
So it doubles plus a little. But the extra little shrinks as n grows. So it's definitely not O( n log n ). It's something of the form O( n / f(n) ), where f() produces some number ≥1. It looks like it could be O( n / log n ), but that's pure speculation.
Whatever f(n) is, O( n / f(n) ) approaches O( n ) as n approaches infinity. So we can also call this O( n ).

For some value of i, j will go like
1 i^1 i^2 i^3 ....
So the number of times the inner loop needs to execute is found like
log_i(n)
which would lead to the following:
log_2(n) + log_3(n) + log_4(n) + ....
But... there is the stop condition j < n which need to be considered.
Now consider n as a number that can be written as m^2. As soon a i reach the value m all remaining inner loop iterations will only be done for j equal 1 and j equal i (because i^2 will be greater than n). In other words - there will only be 2 executions of the inner loop.
So the total number of iterations will be:
2 * (m^2 - m) + number_of_iteration(i=2:m)
Now divide that by n which is m^2:
(2 * (m^2 - m) + number_of_iteration(i=2:m)) / m^2
gives
2 * (1 -1/m) + number_of_iteration(i=2:m) / m^2
The first part 2 * (1 -1/m) clear goes towards 2 as m goes to inifinity.
The second part is (at worst):
(log_2(n) + log_3(n) + log_4(n) + ... + log_m(n)) / m^2
or
(log_2(n) + log_3(n) + log_4(n) + ... + log_m(n)) / n
As log(x)/x goes towards zero as x goes towards infinity, the above expression will also go towards zero.
So the full expression:
(2 * (m^2 - m) + number_of_iteration(i=2:m)) / m^2
will go towards 2 as m goes towards infinity.
In other words: The total number of iterations divided by n will go towards 2. Consequently we have O(n).

Related

The time complexity answer to the question confuses me - n^(2/3)

I'm trying to figure out why the time complexity of this code is n2/3. The space complexity is log n, but I don't know how to continue the time complexity calculation (or if it's right).
int g2 (int n, int m)
{
if (m >= n)
{
for (int i = 0; i < n; ++i)
printf("#");
return 1;
}
return 1 + g2 (n / 2, 4 * m);
}
int main (int n)
{
return g2 (n, 1);
}
As long as m < n, you perform an O(1) operation: making a recursive call. You halve n and quadruple m, so after k steps, you get
n(k) = n(0) * 0.5^k
m(k) = m(0) * 4^k
You can set them equal to each other to find that
n(0) / m(0) = 8^k
Taking the log
log(n(0)) - log(m(0)) = k log(8)
or
k = log_8(n(0)) - log_8(m(0))
On the kth recursion you perform n(k) loop iterations.
You can plug k back into n(k) = n(0) * 0.5^k to estimate the number of iterations. Let's ignore m(0) for now:
n(k) = n(0) * 0.5^log_8(n(0))
Taking again the log of both sides,
log_8(n(k)) = log_8(n(0)) + log_8(0.5) * log_8(n(0))
Since log_8(0.5) = -1/3, you get
log_8(n(k)) = log_8(n(0)) * (2/3)`
Taking the exponent again:
n(k) = n(0)^(2/3)
Since any positive exponent will overwhelm the O(log(n)) recursion, your final complexity is indeed O(n^(2/3)).
Let's look for a moment what happens if m(0) > 1.
n(k) = n(0) * 0.5^(log_8(n(0)) - log_8(m(0)))
Again taking the log:
log_8(n(k)) = log_8(n(0)) - 1/3 * (log_8(n(0)) - log_8(m(0)))
log_8(n(k)) = log_8(n(0)^(2/3)) + log_8(m(0)^(1/3))
So you get
n(k) = n(0)^(2/3) * m(0)^(1/3)
Or
n(k) = (m n^2)^(1/3)
Quick note on corner cases in the starting conditions:
For m > 0:
If n <= 0:, n <= m is immediately true and the recursion terminates and there is no loop.
For m < 0:
If n <= m, the recursion terminates immediately and there is no loop. If n > m, n will converge to zero while m diverges, and the algorithm will run forever.
The only interesting case is where m == 0. Regardless of whether n is positive or negative, it will reach zero because of integer truncation, so the complexity depends on when it reaches 1:
n(0) * 0.5^k = 1
log_2(n(0)) - k = 0
So in this case, the runtime of the recursion is still O(log(n)). The loop does not run.
m starts at 1, and at each step n -> n/2 and m -> m*4 until m>n. After k steps, n_final = n/2^k and m_final = 4^k. So the final value of k is where n/2^k = 4^k, or k = log8(n).
When this is reached, the inner loop performs n_final (approximately equal to m_final) steps, leading to a complexity of O(4^k) = O(4^log8(n)) = O(4^(log4(n)/log4(8))) = O(n^(1/log4(8))) = O(n^(2/3)).

When is the sum of series 1+1/2+1/3+1/4+......+1/n=log n and when is the same sum equal to n, i.e. 1+1/2+1/3+1/4+......+1/n=n

I'm new to understanding asymptotic analysis, while trying to find the big O notation, in a few problems it is given as log n for the same simplification of series and n for another problem.
Here are the questions:
int fun(int n)
{
int count = 0;
for (int i= n; i> 0; i/=2)
for (int j = 0; j < i; j++)
count ++;
return count;
}
T(n)=O(n)
int fun2(int n)
{
int count = 0;
for(i = 1; i < n; i++)
for(j = 1; j <= n; j += i)
count ++;
return count;
}
T(n)=O(n log n)
I'm really confused. Why are the complexities of these seemingly similar algorithms different?
The series formed in both the cases are different
Time Complexity Analysis
In this case first i will be n and the loop for j will go till n, then i will be n/2 and loop will go till n/2 and so on , So the time complexity will be
= n + n/2 + n/4 + n/8.......
The result of this sum is 2n-1 and hence the time complexity O(n)
In this case when i is n, we will loop for j n times, next time i will be 2 and we will skip one entry at a time, which means we are iterating n/2 times, and so on. So the time complexity will be
= n + n/2 + n/3 + n/4........
= n (1 + 1/2 + 1/3 + 1/4 +....)
= O(nlogn)
The sum of 1 + 1/2 + 1/3... is O(logn). For solution see.
For the former, the inner loop runs approximately (exactly if n is a power of 2)
n + n/2 + n/4 + n/8 + ... + n/2^log2(n)
times. It can be factored into
n * (1 + 1/2 + 1/4 + 1/8 + ... + (1/2)^(log2 n))
The 2nd factor is called (a partial sum of) the geometric series which converges, meaning that as we approach the infinity it will approach a constant. Therefore it is θ(1); when you multiply this by n you get θ(n)
I've made an analysis of the latter algorithm just a couple days ago. The number of iterations for n in that algorithm are
ceil(n) + ceil(n / 2) + ceil(n/3) + ... + ceil(n/n)
It is quite close to a partial sum the harmonic series multiplied by n:
n * (1 + 1/2 + 1/3 + 1/4 + ... 1/n)
Unlike the geometric series the harmonic series does not converge, but it diverges as we add more terms. The partial sums of first n terms can be bounded above and below by ln n + C, hence the time complexity of the entire algorithm is θ(n log n).

Time complexity finding n primes with trial division by all preceding primes

Problem : Finding n prime numbers.
#include<stdio.h>
#include<stdlib.h>
void firstnprimes(int *a, int n){
if (n < 1){
printf("INVALID");
return;
}
int i = 0, j, k; // i is the primes counter
for (j = 2; i != n; j++){ // j is a candidate number
for (k = 0; k < i; k++)
{
if (j % a[k] == 0) // a[k] is k-th prime
break;
}
if (k == i) // end-of-loop was reached
a[i++] = j; // record the i-th prime, j
}
return;
}
int main(){
int n;
scanf_s("%d",&n);
int *a = (int *)malloc(n*sizeof(int));
firstnprimes(a,n);
for (int i = 0; i < n; i++)
printf("%d\n",a[i]);
system("pause");
return 0;
}
My function's inner loop runs for i times (at the most), where i is the number of prime numbers below a given candidate number, and the outer loop runs for (nth prime number - 2) times.
How can I derive the complexity of this algorithm in Big O notation?
Thanks in advance.
In pseudocode your code is
firstnprimes(n) = a[:n] # array a's first n entries
where
i = 0
a = [j for j in [2..]
if is_empty( [j for p in a[:i] if (j%p == 0)] )
&& (++i) ]
(assuming the short-circuiting is_empty which returns false as soon as the list is discovered to be non-empty).
What it does is testing each candidate number from 2 and up by all its preceding primes.
Melissa O'Neill analyzes this algorithm in her widely known JFP article and derives its complexity as O( n^2 ).
Basically, each of the n primes that are produced is paired up with (is tested by) all the primes preceding it (i.e. k-1 primes, for the k th prime) and the sum of the arithmetic progression 0...(n-1) is (n-1)n/2 which is O( n^2 ); and she shows that composites do not contribute any term which is more significant than that to the overall sum, as there are O(n log n) composites on the way to n th prime but the is_empty calculation fails early for them.
Here's how it goes: with m = n log n, there will be m/2 evens, for each of which the is_empty calculation takes just 1 step; m/3 multiples of 3 with 2 steps; m/5 with 3 steps; etc.
So the total contribution of the composites, overestimated by not dealing with the multiplicities (basically, counting 15 twice, as a multiple of both 3 and 5, etc.), is:
SUM{i = 1, ..., n} (i m / p_i) // p_i is the i-th prime
= m SUM{i = 1, ..., n} (i / p_i)
= n log(n) SUM{i = 1, ..., n} (i / p_i)
< n log(n) (n / log(n)) // for n > 14,000
= n^2
The inequality can be tested at Wolfram Alpha cloud sandbox as Sum[ i/Prime[i], {i, 14000}] Log[14000.0] / 14000.0 (which is 0.99921, and diminishing for bigger n, tested up to n = 2,000,000 where it's 0.963554).
The prime number theorem states that asymptotically, the number of primes less than n is equal to n/log n. Therefore, your inner loop will run Theta of i * max =n / log n * n times (assuming max=n).
Also, your outer loop runs on the order of n log n times, making the total complexity Theta of n / log n * n * n log n = n^3. In other words, this is not the most efficient algorithm.
Note that there are better approximations around (e.g. the n-th prime number is closer to:
n log n + n log log n - n + n log log n / log n + ...
But, since you are concerned with just big O, this approximation is good enough.
Also, there are much better algorithms for doing what you're looking to do. Look up the topic of pseudoprimes, for more information.

sum's sum of divizors of numbers less than or equal to N

I really need some help at this problem:
Given a positive integer N, we define xsum(N) as sum's sum of all positive integer divisors' numbers less or equal to N.
For example: xsum(6) = 1 + (1 + 2) + (1 + 3) + (1 + 2 + 4) + (1 + 5) + (1 + 2 + 3 + 6) = 33.
(xsum - sum of divizors of 1 + sum of divizors of 2 + ... + sum of div of 6)
Given a positive integer K, you are asked to find the lowest N that satisfies the condition: xsum(N) >= K
K is a nonzero natural number that has at most 14 digits
time limit : 0.2 sec
Obviously, the brute force will fall for most cases with Time Limit Exceeded. I haven't find something better than it yet, so that's the code:
fscanf(fi,"%lld",&k);
i=2;
sum=1;
while(sum<k) {
sum=sum+i+1;
d=2;
while(d*d<=i) {
if(i%d==0 && d*d!=i)
sum=sum+d+i/d;
else
if(d*d==i)
sum+=d;
d++;
}
i++;
}
Any better ideas?
For each number n in range [1 , N] the following applies: n is divisor of exactly roundDown(N / n) numbers in range [1 , N]. Thus for each n we add a total of n * roundDown(N / n) to the result.
int xsum(int N){
int result = 0;
for(int i = 1 ; i <= N ; i++)
result += (N / i) * i;//due to the int-division the two i don't cancel out
return result;
}
The idea behind this algorithm can aswell be used to solve the main-problem (smallest N such that xsum(N) >= K) in faster time than brute-force search.
The complete search can be further optimized using some rules we can derive from the above code: K = minN * minN (minN would be the correct result if K = 2 * 3 * ...). Using this information we have a lower-bound for starting the search.
Next step would be to search for the upper bound. Since the growth of xsum(N) is (approximately) quadratic we can use this to approximate N. This optimized guessing allows to find the searched value pretty fast.
int N(int K){
//start with the minimum-bound of N
int upperN = (int) sqrt(K);
int lowerN = upperN;
int tmpSum;
//search until xsum(upperN) reaches K
while((tmpSum = xsum(upperN)) < K){
int r = K - tmpSum;
lowerN = upperN;
upperN += (int) sqrt(r / 3) + 1;
}
//Now the we have an upper and a lower bound for searching N
//the rest of the search can be done using binary-search (i won't
//implement it here)
int N;//search for the value
return N;
}

Need explaination for this code (algorithm)

The problem:
Larry is very bad at math - he usually uses a calculator, which worked well throughout college. Unforunately, he is now struck in a deserted island with his good buddy Ryan after a snowboarding accident. They're now trying to spend some time figuring out some good problems, and Ryan will eat Larry if he cannot answer, so his fate is up to you!
It's a very simple problem - given a number N, how many ways can K numbers less than N add up to N?
For example, for N = 20 and K = 2, there are 21 ways:
0+20
1+19
2+18
3+17
4+16
5+15
...
18+2
19+1
20+0
Input
Each line will contain a pair of numbers N and K. N and K will both be an integer from 1 to 100, inclusive. The input will terminate on 2 0's.
Output
Since Larry is only interested in the last few digits of the answer, for each pair of numbers N and K, print a single number mod 1,000,000 on a single line.
Sample Input
20 2
20 2
0 0
Sample Output
21
21
The solution code:
#include<iostream>
#include<stdlib.h>
#include<stdio.h>
using namespace std;
#define maxn 100
typedef long ss;
ss T[maxn+2][maxn+2];
void Gen() {
ss i, j;
for(i = 0; i<= maxn; i++)
T[1][i] = 1;
for(i = 2; i<= 100; i++) {
T[i][0] = 1;
for(j = 1; j <= 100; j++)
T[i][j] = (T[i][j-1] + T[i-1][j]) % 1000000;
}
}
int main() {
//freopen("in.txt", "r", stdin);
ss n, m;
Gen();
while(cin>>n>>m) {
if(!n && !m) break;
cout<<T[m][n]<<endl;
}
return 0;
}
How has this calculation been derived?
How has it come T[i][j] = (T[i][j-1] + T[i-1][j]) ?
Note: I only use n and k (lower case) to refer to some anonymous variable. I will always use N and K (upper case) to refer to N and K as defined in the question (sum and the number of portions).
Let C(n, k) be the result of n choose k, then the solution to the problem is C(N + K - 1, K - 1), with the assumption that those K numbers are non-negative (or there will be infinitely many solution even for N = 0 and K = 2).
Since the K numbers are non-negative, and the sum N is fixed, we can think of the problem as: how many ways to divide candy among K people. We can divide the candies, by lying them into a line, and put (K - 1) separator between the candies. The (K - 1) separators will divide the candies up to K portions of candies. Looking at another perspective, it is also like choosing (K - 1) positions among (N + K - 1) positions to put in the separators, then the rest of the positions are candies. So, this explains why the number of ways is N + (K - 1) choose (K - 1).
Then the problem reduce to how to find the least significant digits of C(n, k). (Since maximum of N and K is 100 as defined in maxn, we don't have to worry if the algorithm goes up to O(n3)).
The calculation uses this combinatorial identity C(n, k) = C(n - 1, k) + C(n, k - 1) (Pascal's rule). The clever thing about the implementation is that it doesn't store C(n, k) (table of result of combination, which is a jagged array), but it stores C(N, K) instead. The identity is actually present in the T[i][j] = (T[i][j-1] + T[i-1][j]):
The first dimension is actually K, the number of portions. And the second dimension is the sum N. T[K][N] will directly store the result, and according to the mathematical result derived above, is (least significant digits of) C(N + K - 1, K - 1).
Re-writing the T[i][j] = (T[i][j-1] + T[i-1][j]) back to equivalent mathematical result:
C(i + j - 1, i - 1) = C(i + j - 2, i - 1) + C(i + j - 2, i - 2), which is correct according to the identity.
The program will fill the array row by row:
The row K = 0 is already initialized to 0, using the fact that static array is initialized to 0.
It fills the row K = 1 with 1 (there is only 1 way to divide N into 1 portion).
For the rest of the rows, it sets the case N = 0 to 1 (there is only 1 way to divide 0 into K parts - all parts are 0).
Then the rest are filled with the expression T[i][j] = (T[i][j-1] + T[i-1][j]), which will refer to the previous row, and the previous element of the same row, both of which has been filled up in earlier iterations.
Let C(x, y) to be the result of x choose y, then the value of T[i][j] equals: C(i - 1 + j, j).
You can proove this by induction.
Base cases:
T[1][j] = C(1 - 1 + j, j) = C(j, j) = 1
T[i][0] = C(i - 1, 0) = 1
For the induction step, use the formula (for 0<=y<=x):
C(x,y) = C(x - 1, y - 1) + C(x - 1, y)
Therefore:
C(i - 1 + j, j) = C(i-1+j - 1, j - 1) + C(i-1+j - 1, j) = C(i-1+(j-1), (j-1)) + C((i-1)-1+j, j)
Or in other words:
T[i][j] = T[i,j-1] + T[i-1,j]
Now, as nhahtdh mentioned before, the value you are looking for is C(N + K - 1, K - 1)
which equals:
T[N+1][K-1] = C(N+1-1+K-1, K-1)
(modulo 1000000)
This is a famous problem - you can check solution here
How many ways to drop N identical balls to K boxes.
The following algorithm is a dynamic-programming solution to your problem:
Define D[i,j] to be the number of ways i numbers less than j, can sum up to j.
0 <= i < = N
1 <= j <= K
Where D[j,1] = 1 for every j.
And where j > 1 you get:
D[i,j] = D[i,j-1] + D[i-1,j-1] +...+ D[0,j-1]
The problem is known as "the integer partition problem". Basically there exists a recursive computation of the k-partition of n, but your solution is just the dynamic programming version of it (non-recursive and computing bottom-up for short).

Resources