Time complexity finding n primes with trial division by all preceding primes - c

Problem : Finding n prime numbers.
#include<stdio.h>
#include<stdlib.h>
void firstnprimes(int *a, int n){
if (n < 1){
printf("INVALID");
return;
}
int i = 0, j, k; // i is the primes counter
for (j = 2; i != n; j++){ // j is a candidate number
for (k = 0; k < i; k++)
{
if (j % a[k] == 0) // a[k] is k-th prime
break;
}
if (k == i) // end-of-loop was reached
a[i++] = j; // record the i-th prime, j
}
return;
}
int main(){
int n;
scanf_s("%d",&n);
int *a = (int *)malloc(n*sizeof(int));
firstnprimes(a,n);
for (int i = 0; i < n; i++)
printf("%d\n",a[i]);
system("pause");
return 0;
}
My function's inner loop runs for i times (at the most), where i is the number of prime numbers below a given candidate number, and the outer loop runs for (nth prime number - 2) times.
How can I derive the complexity of this algorithm in Big O notation?
Thanks in advance.

In pseudocode your code is
firstnprimes(n) = a[:n] # array a's first n entries
where
i = 0
a = [j for j in [2..]
if is_empty( [j for p in a[:i] if (j%p == 0)] )
&& (++i) ]
(assuming the short-circuiting is_empty which returns false as soon as the list is discovered to be non-empty).
What it does is testing each candidate number from 2 and up by all its preceding primes.
Melissa O'Neill analyzes this algorithm in her widely known JFP article and derives its complexity as O( n^2 ).
Basically, each of the n primes that are produced is paired up with (is tested by) all the primes preceding it (i.e. k-1 primes, for the k th prime) and the sum of the arithmetic progression 0...(n-1) is (n-1)n/2 which is O( n^2 ); and she shows that composites do not contribute any term which is more significant than that to the overall sum, as there are O(n log n) composites on the way to n th prime but the is_empty calculation fails early for them.
Here's how it goes: with m = n log n, there will be m/2 evens, for each of which the is_empty calculation takes just 1 step; m/3 multiples of 3 with 2 steps; m/5 with 3 steps; etc.
So the total contribution of the composites, overestimated by not dealing with the multiplicities (basically, counting 15 twice, as a multiple of both 3 and 5, etc.), is:
SUM{i = 1, ..., n} (i m / p_i) // p_i is the i-th prime
= m SUM{i = 1, ..., n} (i / p_i)
= n log(n) SUM{i = 1, ..., n} (i / p_i)
< n log(n) (n / log(n)) // for n > 14,000
= n^2
The inequality can be tested at Wolfram Alpha cloud sandbox as Sum[ i/Prime[i], {i, 14000}] Log[14000.0] / 14000.0 (which is 0.99921, and diminishing for bigger n, tested up to n = 2,000,000 where it's 0.963554).

The prime number theorem states that asymptotically, the number of primes less than n is equal to n/log n. Therefore, your inner loop will run Theta of i * max =n / log n * n times (assuming max=n).
Also, your outer loop runs on the order of n log n times, making the total complexity Theta of n / log n * n * n log n = n^3. In other words, this is not the most efficient algorithm.
Note that there are better approximations around (e.g. the n-th prime number is closer to:
n log n + n log log n - n + n log log n / log n + ...
But, since you are concerned with just big O, this approximation is good enough.
Also, there are much better algorithms for doing what you're looking to do. Look up the topic of pseudoprimes, for more information.

Related

Why does this nested loop have O(n) time complexity?

I have a test in computer sience about complexity and I have this question:
int counter = 0;
for (int i = 2; i < n; ++i) {
for (int j = 1; j < n; j = j * i) {
counter++;
}
}
My solution is O(nlogn) because the first for is n-2 and the second for is doing log in base i of n and it's n-2 * logn, that is O(nlogn)-
But my teacher told us it's n and when I tried in cLion to run it it gives me 2*n and it's O(n). Can someone explain why it is O(n)?
Empirically, you can see that this is correct (that's around the right value for the sum of the series), for n=100 and n=1,000
If you want more intuition, you can think about the fact that for nearly all the series, i > sqrt(2).
for example, if n = 100 then 90% of values have i > 10, and for n = 1,000 97% have i > 32.
From that point onwards, all iterations of the outer loop will have at most 2 iterations in the inner loop (since log(n) with base sqrt(n) is 2, by definition).
If n grows really large, you can also apply the same logic to show that from the cube root to the square root, log is between 2 and 3, etc...
This would be O(nlogn) if j was incremented by i each iteration, not multiplied by it. As it is now, the j loop increases much more slowly than n grows, which is why your teacher and CLion state the time complexity as O(n).
Note that it's j=j*i, not j=j*2. That means most of the time, the inner loop will only have one pass. For example, with n of 33, the inner loop will only have one pass when i is in [7,33).
n = 33
j = 32
j = 16
j = 8 27
j = 4 9 16 25
j = 2 3 4 5 6
j = 1 1 1 1 1 1 1 1 1 1 1 1
--------------------------------------------
i = 2 3 4 5 6 7 8 9 10 11 ... 28 29
If you think of the above as a graph, it looks like the complexity of algorithm is O( area under 1/log(n) ). I have no idea how to prove that, and calculating that integral involves the unfamiliar-to-me logarithmic integral function. But the Wikipedia page does say this function is O( n / log n ).
Let's do it experimentally.
#include <stdio.h>
int main( void ) {
for ( int n = 20; n <= 20000; ++n ) {
int counter = 0;
for ( int i = 2; i < n; ++i ) {
for ( int j = 1; j < n ; j *= i ) {
++counter;
}
}
if ( n % 1000 == 0 )
printf( "%d: %.3f\n", n, counter / (n-1) );
}
}
1000: 2.047
2000: 2.033
3000: 2.027
4000: 2.023
5000: 2.021
6000: 2.019
7000: 2.017
8000: 2.016
9000: 2.015
10000: 2.014
11000: 2.013
12000: 2.013
13000: 2.012
14000: 2.012
15000: 2.011
16000: 2.011
17000: 2.011
18000: 2.010
19000: 2.010
20000: 2.010
So it doubles plus a little. But the extra little shrinks as n grows. So it's definitely not O( n log n ). It's something of the form O( n / f(n) ), where f() produces some number ≥1. It looks like it could be O( n / log n ), but that's pure speculation.
Whatever f(n) is, O( n / f(n) ) approaches O( n ) as n approaches infinity. So we can also call this O( n ).
For some value of i, j will go like
1 i^1 i^2 i^3 ....
So the number of times the inner loop needs to execute is found like
log_i(n)
which would lead to the following:
log_2(n) + log_3(n) + log_4(n) + ....
But... there is the stop condition j < n which need to be considered.
Now consider n as a number that can be written as m^2. As soon a i reach the value m all remaining inner loop iterations will only be done for j equal 1 and j equal i (because i^2 will be greater than n). In other words - there will only be 2 executions of the inner loop.
So the total number of iterations will be:
2 * (m^2 - m) + number_of_iteration(i=2:m)
Now divide that by n which is m^2:
(2 * (m^2 - m) + number_of_iteration(i=2:m)) / m^2
gives
2 * (1 -1/m) + number_of_iteration(i=2:m) / m^2
The first part 2 * (1 -1/m) clear goes towards 2 as m goes to inifinity.
The second part is (at worst):
(log_2(n) + log_3(n) + log_4(n) + ... + log_m(n)) / m^2
or
(log_2(n) + log_3(n) + log_4(n) + ... + log_m(n)) / n
As log(x)/x goes towards zero as x goes towards infinity, the above expression will also go towards zero.
So the full expression:
(2 * (m^2 - m) + number_of_iteration(i=2:m)) / m^2
will go towards 2 as m goes towards infinity.
In other words: The total number of iterations divided by n will go towards 2. Consequently we have O(n).

The time complexity answer to the question confuses me - n^(2/3)

I'm trying to figure out why the time complexity of this code is n2/3. The space complexity is log n, but I don't know how to continue the time complexity calculation (or if it's right).
int g2 (int n, int m)
{
if (m >= n)
{
for (int i = 0; i < n; ++i)
printf("#");
return 1;
}
return 1 + g2 (n / 2, 4 * m);
}
int main (int n)
{
return g2 (n, 1);
}
As long as m < n, you perform an O(1) operation: making a recursive call. You halve n and quadruple m, so after k steps, you get
n(k) = n(0) * 0.5^k
m(k) = m(0) * 4^k
You can set them equal to each other to find that
n(0) / m(0) = 8^k
Taking the log
log(n(0)) - log(m(0)) = k log(8)
or
k = log_8(n(0)) - log_8(m(0))
On the kth recursion you perform n(k) loop iterations.
You can plug k back into n(k) = n(0) * 0.5^k to estimate the number of iterations. Let's ignore m(0) for now:
n(k) = n(0) * 0.5^log_8(n(0))
Taking again the log of both sides,
log_8(n(k)) = log_8(n(0)) + log_8(0.5) * log_8(n(0))
Since log_8(0.5) = -1/3, you get
log_8(n(k)) = log_8(n(0)) * (2/3)`
Taking the exponent again:
n(k) = n(0)^(2/3)
Since any positive exponent will overwhelm the O(log(n)) recursion, your final complexity is indeed O(n^(2/3)).
Let's look for a moment what happens if m(0) > 1.
n(k) = n(0) * 0.5^(log_8(n(0)) - log_8(m(0)))
Again taking the log:
log_8(n(k)) = log_8(n(0)) - 1/3 * (log_8(n(0)) - log_8(m(0)))
log_8(n(k)) = log_8(n(0)^(2/3)) + log_8(m(0)^(1/3))
So you get
n(k) = n(0)^(2/3) * m(0)^(1/3)
Or
n(k) = (m n^2)^(1/3)
Quick note on corner cases in the starting conditions:
For m > 0:
If n <= 0:, n <= m is immediately true and the recursion terminates and there is no loop.
For m < 0:
If n <= m, the recursion terminates immediately and there is no loop. If n > m, n will converge to zero while m diverges, and the algorithm will run forever.
The only interesting case is where m == 0. Regardless of whether n is positive or negative, it will reach zero because of integer truncation, so the complexity depends on when it reaches 1:
n(0) * 0.5^k = 1
log_2(n(0)) - k = 0
So in this case, the runtime of the recursion is still O(log(n)). The loop does not run.
m starts at 1, and at each step n -> n/2 and m -> m*4 until m>n. After k steps, n_final = n/2^k and m_final = 4^k. So the final value of k is where n/2^k = 4^k, or k = log8(n).
When this is reached, the inner loop performs n_final (approximately equal to m_final) steps, leading to a complexity of O(4^k) = O(4^log8(n)) = O(4^(log4(n)/log4(8))) = O(n^(1/log4(8))) = O(n^(2/3)).

When is the sum of series 1+1/2+1/3+1/4+......+1/n=log n and when is the same sum equal to n, i.e. 1+1/2+1/3+1/4+......+1/n=n

I'm new to understanding asymptotic analysis, while trying to find the big O notation, in a few problems it is given as log n for the same simplification of series and n for another problem.
Here are the questions:
int fun(int n)
{
int count = 0;
for (int i= n; i> 0; i/=2)
for (int j = 0; j < i; j++)
count ++;
return count;
}
T(n)=O(n)
int fun2(int n)
{
int count = 0;
for(i = 1; i < n; i++)
for(j = 1; j <= n; j += i)
count ++;
return count;
}
T(n)=O(n log n)
I'm really confused. Why are the complexities of these seemingly similar algorithms different?
The series formed in both the cases are different
Time Complexity Analysis
In this case first i will be n and the loop for j will go till n, then i will be n/2 and loop will go till n/2 and so on , So the time complexity will be
= n + n/2 + n/4 + n/8.......
The result of this sum is 2n-1 and hence the time complexity O(n)
In this case when i is n, we will loop for j n times, next time i will be 2 and we will skip one entry at a time, which means we are iterating n/2 times, and so on. So the time complexity will be
= n + n/2 + n/3 + n/4........
= n (1 + 1/2 + 1/3 + 1/4 +....)
= O(nlogn)
The sum of 1 + 1/2 + 1/3... is O(logn). For solution see.
For the former, the inner loop runs approximately (exactly if n is a power of 2)
n + n/2 + n/4 + n/8 + ... + n/2^log2(n)
times. It can be factored into
n * (1 + 1/2 + 1/4 + 1/8 + ... + (1/2)^(log2 n))
The 2nd factor is called (a partial sum of) the geometric series which converges, meaning that as we approach the infinity it will approach a constant. Therefore it is θ(1); when you multiply this by n you get θ(n)
I've made an analysis of the latter algorithm just a couple days ago. The number of iterations for n in that algorithm are
ceil(n) + ceil(n / 2) + ceil(n/3) + ... + ceil(n/n)
It is quite close to a partial sum the harmonic series multiplied by n:
n * (1 + 1/2 + 1/3 + 1/4 + ... 1/n)
Unlike the geometric series the harmonic series does not converge, but it diverges as we add more terms. The partial sums of first n terms can be bounded above and below by ln n + C, hence the time complexity of the entire algorithm is θ(n log n).

finding number of numbers which have prime number of factors

The problem is to find the number of divisors of a number
ex-
for 10
ans=4
since 1,2,5,10 are the numbers which are divisors
i.e. they are the factors
constraints are num<=10^6
I have implemented a code for the same but got TLE!!
here is my code
int isprime[MAX];
void seive()
{
int i,
j;
isprime[0] = isprime[1] = 1;
for (i = 4; i < MAX; i += 2)
isprime[i] = 1;
for (i = 3; i * i < MAX; i += 2) {
if (!isprime[i]) {
for (j = i * i; j < MAX; j += 2 * i)
isprime[j] = 1;
}
}
}
int main()
{
seive();
int t;
long long num;
scanf("%d", & t);
while (t--) {
scanf("%lld", & num);
cnt = 0;
for (j = 1; j * j <= num; j++) {
if (num % j == 0) {
cnt++;
if (num / j != j)
cnt++;
}
printf("%lld\n", cnt);
}
return 0;
}
Can somebody help me to optimize it?
I have also searched about it but didnot getting any sucess.
So Please help guys.
You could try computing this mathematically (I'm not sure this will be faster/easier). Basically, given the prime factorization of a number, you should be able to calculate the number of divisors without too much trouble.
If you have an input x decompose into something like
x = p1^a1 * p2^a2 * ... pn^an
Then the number of divisors should be
prod(ai + 1) for i in 1 to n
I would then look at finding the smallest prime < sqrt(x), dividing that out until you're left with just a prime. A sieve might still be useful and I don't know what kind of input you would be getting.
Now consider what the above statement says: the number of divisors in the product of the powers of the prime factorization (plus 1). Thus, if you only every care if the result is prime, then you should only ever consider numbers which are prime, or powers of primes. And within that, you then only need to consider powers such that a1 + 1 is prime.
That should significantly cut down your search space.
If the prime factorization of a number is:
x = p1^e1 * p2^e2 * ... * pk^ek
Then the number of divisors is:
(e1 + 1)*(e2 + 1)* ... *(ek + 1)
For this to be prime, you need all ei to be 0, except one, which needs to be a prime - 1.
This is only true for primes and powers of primes. So you need to find how many powers of primes are in [l, r]. For example, 2^6 has (6 + 1) = 7 prime factors.
Now you just need to sieve enough primes fast enough. You only need to sieve those in [l, r], so an interval of size max 10^6.
To sieve directly in this interval, remove multiples of 2 directly from [l, r], and same for the rest. You can sieve primes up to 10^6 and use those to do the interval sieving later.
You can do the necessary counting while you're sieving as well.

Fast Prime Factorization Algorithm

I'm writing a code in C that returns the number of times a positive integer can be expressed as sums of perfect squares of two positive integers.
R(n) is the number of couples (x,y) such that x² + y² = n where x, y, n are all
non negative integers.
To compute R(n), I need to first find the prime factorization of n.
The problem is that I've tried a lot of algorithm for prime factorization that I can use on C but I need my code to be as fast as possible, so I would appreciate it if anyone can give me what he/she considers as the fastest algorithm to compute the prime factorization of a number as large as 2147483742.
What an odd limit; 2147483742 = 2^31 + 94.
As others have pointed out, for a number this small trial division by primes is most likely fast enough. Only if it isn't, you could try Pollard's rho method:
/* WARNING! UNTESTED CODE! */
long rho(n, c) {
long t = 2;
long h = 2;
long d = 1;
while (d == 1) {
t = (t*t + c) % n;
h = (h*h + c) % n;
h = (h*h + c) % n;
d = gcd(t-h, n); }
if (d == n)
return rho(n, c+1);
return d;
}
Called as rho(n,1), this function returns a (possibly-composite) factor of n; put it in a loop and call it repeatedly if you want to find all the factors of n. You'll also need a primality checker; for your limit, a Rabin-Miller test with bases 2, 7 and 61 is proven accurate and reasonably fast. You can read more about programming with prime numbers at my blog.
But in any case, given such a small limit I think you are better off using trial division by primes. Anything else might be asymptotically faster but practically slower.
EDIT: This answer has received several recent upvotes, so I'm adding a simple program that does wheel factorization with a 2,3,5-wheel. Called as wheel(n), this program prints the factors of n in increasing order.
long wheel(long n) {
long ws[] = {1,2,2,4,2,4,2,4,6,2,6};
long f = 2; int w = 0;
while (f * f <= n) {
if (n % f == 0) {
printf("%ld\n", f);
n /= f;
} else {
f += ws[w];
w = (w == 10) ? 3 : (w+1);
}
}
printf("%ld\n", n);
return 0;
}
I discuss wheel factorization at my blog; the explanation is lengthy, so I won't repeat it here. For integers that fit in a long, it is unlikely that you will be able to significantly better the wheel function given above.
There's a fast way to cut down the number of candidates. This routine tries 2, then 3, then all the odd numbers not divisible by 3.
long mediumFactor(n)
{
if ((n % 2) == 0) return 2;
if ((n % 3) == 0) return 3;
try = 5;
inc = 2;
lim = sqrt(n);
while (try <= lim)
{
if ((n % try) == 0) return try;
try += inc;
inc = 6 - inc; // flip from 2 -> 4 -> 2
}
return 1; // n is prime
}
The alternation of inc between 2 and 4 is carefully aligned so that it skips all even numbers and numbers divisible by 3. For this case: 5 (+2) 7 (+4) 11 (+2) 13 (+4) 17
Trials stop at sqrt(n) because at least one factor must be at or below the square root. (If both factors were > sqrt(n) then the product of the factors would be greater than n.)
Number of tries is sqrt(m)/3, where m is the highest possible number in your series. For a limit of 2147483647, that yields a maximum of 15,448 divisions worst case (for a prime near 2147483647) including the 2 and 3 tests.
If the number is composite, total number of divisions is usually much less and will very rarely be more; even taking into account calling the routine repeatedly to get all the factors.

Resources