Time complexity of a recursive algorithm - c

How can I calculate the time complexity of a recursive algorithm?
int pow1(int x,int n) {
if(n==0){
return 1;
}
else{
return x * pow1(x, n-1);
}
}
int pow2(int x,int n) {
if(n==0){
return 1;
}
else if(n&1){
int p = pow2(x, (n-1)/2)
return x * p * p;
}
else {
int p = pow2(x, n/2)
return p * p;
}
}

Analyzing recursive functions (or even evaluating them) is a nontrivial task. A (in my opinion) good introduction can be found in Don Knuths Concrete Mathematics.
However, let's analyse these examples now:
We define a function that gives us the time needed by a function. Let's say that t(n) denotes the time needed by pow(x,n), i.e. a function of n.
Then we can conclude, that t(0)=c, because if we call pow(x,0), we have to check whether (n==0), and then return 1, which can be done in constant time (hence the constant c).
Now we consider the other case: n>0. Here we obtain t(n) = d + t(n-1). That's because we have again to check n==1, compute pow(x, n-1, hence (t(n-1)), and multiply the result by x. Checking and multiplying can be done in constant time (constant d), the recursive calculation of pow needs t(n-1).
Now we can "expand" the term t(n):
t(n) =
d + t(n-1) =
d + (d + t(n-2)) =
d + d + t(n-2) =
d + d + d + t(n-3) =
... =
d + d + d + ... + t(1) =
d + d + d + ... + c
So, how long does it take until we reach t(1)? Since we start at t(n) and we subtract 1 in each step, it takes n-1 steps to reach t(n-(n-1)) = t(1). That, on the other hands, means, that we get n-1 times the constant d, and t(1) is evaluated to c.
So we obtain:
t(n) =
...
d + d + d + ... + c =
(n-1) * d + c
So we get that t(n)=(n-1) * d + c which is element of O(n).
pow2 can be done using Masters theorem. Since we can assume that time functions for algorithms are monotonically increasing. So now we have the time t(n) needed for the computation of pow2(x,n):
t(0) = c (since constant time needed for computation of pow(x,0))
for n>0 we get
/ t((n-1)/2) + d if n is odd (d is constant cost)
t(n) = <
\ t(n/2) + d if n is even (d is constant cost)
The above can be "simplified" to:
t(n) = floor(t(n/2)) + d <= t(n/2) + d (since t is monotonically increasing)
So we obtain t(n) <= t(n/2) + d, which can be solved using the masters theorem to t(n) = O(log n) (see section Application to Popular Algorithms in the wikipedia link, example "Binary Search").

Let's just start with pow1, because that's the simplest one.
You have a function where a single run is done in O(1). (Condition checking, returning, and multiplication are constant time.)
What you have left is then your recursion. What you need to do is analyze how often the function would end up calling itself. In pow1, it'll happen N times. N*O(1)=O(N).
For pow2, it's the same principle - a single run of the function runs in O(1). However, this time you're halving N every time. That means it will run log2(N) times - effectively once per bit. log2(N)*O(1)=O(log(N)).
Something which might help you is to exploit the fact that recursion can always be expressed as iteration (not always very simply, but it's possible. We can express pow1 as
result = 1;
while(n != 0)
{
result = result*n;
n = n - 1;
}
Now you have an iterative algorithm instead, and you might find it easier to analyze it that way.

It can be a bit complex, but I think the usual way is to use Master's theorem.

Complexity of both functions ignoring recursion is O(1)
For the first algorithm pow1(x, n) complexity is O(n) because the depth of recursion correlates with n linearly.
For the second complexity is O(log n). Here we recurse approximately log2(n) times. Throwing out 2 we get log n.

So I'm guessing you're raising x to the power n. pow1 takes O(n).
You never change the value of x but you take 1 from n each time until it gets to 1 (and you then just return) This means that you will make a recursive call n times.

Related

Calculating time complexity using T(n) method?

How may I calculate the time complexity of f() using T(n) method?
int f (int n)
{
if (n==1)
return 1;
return f(f(n-1));
}
What I did till now?
T(n)=T(T(n-1))=T(T(T(n-2)))=T(T(T(T(n-3))))...
Plus, I know that for every n>=1 the function always returns 1
And why changing the last line from:
return f(f(n-1));
to:
return 1+f(f(n-1));
would change time complexity (Note: It's will change the complexity from n to 2^n for sure)?
The time complexity changes, because the function does not return 1; always because of the +1.
For return T(T(n-1)); the second T call will always be called with 1, which will be only 1 more call. The number of calls are 2*n-1, therefore complexity O(n).
For return 1 + T(T(n-1)); not all calls to T will result in 1, T(3) will result in 3 with 7 calls. So the second call depend on the value of n. Incrementing n will lead to doubled calls. The number of calls are 2^n - 1, therefore complexity O(2^n). Here you can see the number of calls: https://ideone.com/1KEAkU
first version (T always return 1):
T(4)
T(1) T(3)
T(1) T(2)
T(1) T(1)
You can see that the left call (the outer one) will always be called with 1 and the tree does not continue there.
second version (T does not always return 1):
T(4)
T(3) T(3)
T(2) T(2) T(2) T(2)
T(1) T(1) T(1) T(1) T(1) T(1) T(1) T(1)
Here you can see that because of the changed return value T(n) == n the second T calls doubles the number of calls.
The space complexity does not increase because the recursions depth does not change, both T(4) graphs have 4 lines and for the second tree the left part can only execute after the right part finished completely. e.g. for both T(4) the maximum number of T functions running are 4.
Adding any constant C to your program, would not change the complexity. In general, even after adding 5 new statements that do not contain the variable n wouldn't change the complexity either. For this program complexity = n, since you know that this will run at most n times.
The (correct) recurrence relation for the time complexity is:
T(n) = T(n-1) + T(f(n-1))
That's because to compute f(n), first you compute f(n-1) (cost T(n-1)) and then call f with the argument f(n-1) (cost T(f(n-1))).
When f(n) always returns 1, this results in T(n) = T(n-1) + 1, which solves to T(n) = Theta(n).
When the return statement of f is changed to return 1 + f(f(n-1)), then f(n) returns n for n>=1 (by a simple proof by induction).
Then the time complexity is T(n) = T(n-1) + T(f(n-1)) = T(n-1) + T(n-1) = 2T(n-1), which solves to T(n) = Theta(2^n).
(As an amusing side note, if you change if(n==1)return 1; to if(n==1)return 0; in the second case (with the +1)), then the time complexity is Theta(Fib(n)) where Fib(n) is the n'th fibonacci number).

Run time of a function

Let function f() be:
void f(int n)
{
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i)
printf(“*”);
}
According to my calculations, the run time in Big O method should be O(n2log n).
The answer is O(n2). Why is that?
I owe you an apology. I misread your code the first time around, so the initial answer I gave was incorrect. Here's a corrected answer, along with a comparison with the original answer that explains where my analysis went wrong. I hope you find this interesting - I think there's some really cool math that arises from this!
The code you've posted is shown here:
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i)
printf(“*”);
To determine the runtime of this code, let's look at how much work the inner loop does across all iterations. When i = 1, the loop counts up to n2 by ones, so it does n2 work. When i = 2, the loop counts up to n2 / 2 by twos, so it does n2 / 4 work. When i = 3, the loop counts up to n2 / 3 by threes, so it does n2 / 9 work. More generally, the kth iteration does n2 / k2 work, since it counts up to n2 / k with steps of size k.
If we sum up the work done here for i ranging from 1 to n, inclusive, we see that the runtime is
n2 + n2 / 4 + n2 / 9 + n2 / 16 + ... + n2 / n2
= n2 (1 + 1/4 + 1/9 + 1/16 + 1/25 + ... + 1/n2).
The summation here (1 + 1/4 + 1/9 + 1/16 + ...) has the (surprising!) property that, in the limit, it's exactly equal to π2 / 6. In other words, the runtime of your code asymptotically approaches n2 π / 6, so the runtime is O(n2). You can see this by writing a program that compares the number of actual steps against n2 π / 6 and looking at the results.
I got this wrong the first time around because I misread your code as though it were written as
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=1)
printf(“*”);
In other words, I thought that the inner loop took steps of size one on each iteration rather than steps of size i. In that case, the work done by the kth iteration of the loop is n2 / k, rather than n2 / k2, which gives a runtime of
n2 + n2/2 + n2/3 + n2/4 + ...n2/n
= n2(1 + 1/2 + 1/3 + 1/4 + ... + 1/n)
Here, we can use the fact that 1 + 1/2 + 1/3 + ... + 1/n is a well-known summation. The nth harmonic number is defined as Hn = 1 + 1/2 + 1/3 + ... + 1/n and it's known that the harmonic numbers obey Hn = Θ(log n), so this version of the code runs in time O(n2 log n). It's interesting how this change so dramatically changes the runtime of the code!
As an interesting generalization, let's suppose that you change the inner loop so that the step size is iε for some ε > 0 (and assuming you round up). In that case, the number of iterations on the kth time through the inner loop will be n2 / k1 + ε, since the upper bound on the loop is n2 / k and you're taking steps of size kε. Via a similar analysis to what we've seen before, the runtime will be
n2 + n2 / 21+ε + n2 / 31+ε + n2 / 31+ε + ... + n2 / n1+ε
= n2(1 + 1/21+ε + 1/31+ε + 1/41+ε + ... + 1/n1+ε)
If you've taken a calculus course, you might recognize that the series
1 + 1/21+ε + 1/31+ε + 1/41+ε + ... + 1/n1+ε
converges to some fixed limit for any ε > 0, meaning that if the step size is any positive power of i, the overall runtime will be O(n2). This means that all of the following pieces of code have runtime O(n2):
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i)
printf(“*”);
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i*i)
printf(“*”);
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i*(sqrt(i) + 1))
printf(“*”);
Run time for first loop is n and Run time for second loop is (n/i)^2 (not n^2/i) because we have j+=i(not j++). So total time is as follow:
∑{i=1to n}(n/i)^2 = n^2∑{i=1to n}(1/i)^2 < 2*n^2
So time complexity is O(n^2)
From what I've learned from theory, is that the i does not affect the complexity very much. Since you have an exponential function, the log n would be neglected. Therefore, it would be considered only the big O(n2) instead of the expected O(n2log n).
Recall that when we use big-O notation, we drop constants and low-order terms. This is because when the problem size gets sufficiently large, those terms don't matter. However, this means that two algorithms can have the same big-O time complexity, even though one is always faster than the other. For example, suppose algorithm 1 requires N2 time, and algorithm 2 requires 10 * N2 + N time. For both algorithms, the time is O(N2), but algorithm 1 will always be faster than algorithm 2. In this case, the constants and low-order terms do matter in terms of which algorithm is actually faster.
However, it is important to note that constants do not matter in terms of the question of how an algorithm "scales" (i.e., how does the algorithm's time change when the problem size doubles). Although an algorithm that requires N2 time will always be faster than an algorithm that requires 10*N2 time, for both algorithms, if the problem size doubles, the actual time will quadruple.
When two algorithms have different big-O time complexity, the constants and low-order terms only matter when the problem size is small. For example, even if there are large constants involved, a linear-time algorithm will always eventually be faster than a quadratic-time algorithm. This is illustrated in the following table, which shows the value of 100*N (a time that is linear in N) and the value of N2/100 (a time that is quadratic in N) for some values of N. For values of N less than 104, the quadratic time is smaller than the linear time. However, for all values of N greater than 104, the linear time is smaller.
Take a look at this article for more details.

Efficiently calculating nCk mod p

I have came across this problem many time but I am unable to solve it. There would occur some cases or the other which will wrong answer or otherwise the program I write will be too slow. Formally I am talking about calculating
nCk mod p where p is a prime n is a large number, and 1<=k<=n.
What have I tried:
I know the recursive formulation of factorial and then modelling it as a dynamic programming problem, but I feel that it is slow. The recursive formulation is (nCk) + (nCk-1) = (n+1Ck). I took care of the modulus while storing values in array to avoid overflows but I am not sure that just doing a mod p on the result will avoid all overflows as it may happen that one needs to remove.
To compute nCr, there's a simple algorithm based on the rule nCr = (n - 1)C(r - 1) * n / r:
def nCr(n,r):
if r == 0:
return 1
return n * nCr(n - 1, r - 1) // r
Now in modulo arithmetic we don't quite have division, but we have modulo inverses which (when modding by a prime) are just as good
def nCrModP(n, r, p):
if r == 0:
return 1
return n * nCrModP(n - 1, r - 1) * modinv(r, p) % p
Here's one implementation of modinv on rosettacode
Not sure what you mean by "storing values in array", but I assume they array serves as a lookup table while running to avoid redundant calculations to speed things up. This should take care of the speed problem. Regarding the overflows - you can perform the modulo operation at any stage of computation and repeat it as much as you want - the result will be correct.
First, let's work with the case where p is relatively small.
Take the base-p expansions of n and k: write n = n_0 + n_1 p + n_2 p^2 + ... + n_m p^m and k = k_0 + k_1 p + ... + k_m p^m where each n_i and each k_i is at least 0 but less than p. A theorem (which I think is due to Edouard Lucas) states that C(n,k) = C(n_0, k_0) * C(n_1, k_1) * ... * C(n_m, k_m). This reduces to taking a mod-p product of numbers in the "n is relatively small" case below.
Second, if n is relatively small, you can just compute binomial coefficients using dynamic programming on the formula C(n,k) = C(n-1,k-1) + C(n-1,k), reducing mod p at each step. Or do something more clever.
Third, if k is relatively small (and less than p), you should be able to compute n!/(k!(n-k)!) mod p by computing n!/(n-k)! as n * (n-1) * ... * (n-k+1), reducing modulo p after each product, then multiplying by the modular inverses of each number between 1 and k.

time complexity of the recursive algorithm

Can someone please explain to me how to calculate the complexity of the following recursive code:
long bigmod(long b, long p, long m) {
if (p == 0)
return 1;
else
if (p % 2 == 0)
return square(bigmod(b, p / 2, m)) % m;
else
return ((b % m) * bigmod(b, p - 1, m)) % m;
}
This is O(log(p)) because you are dividing by 2 every time or subtracting one then dividing by two, so the worst case would really take O(2 * log(p)) - one for the division and one for the subtraction of one.
Note that in this example the worst case and average case should be the same complexity.
If you want to be more formal about it then you can write a recurrence relation and use the Master theorem to solve it. http://en.wikipedia.org/wiki/Master_theorem
It runs in O(log n)
There are no expensive operations (by that i more expensive than squaring or modding. no looping, etc) inside the function, so we can pretty much just count the function calls.
Best case is a power of two, we will need exactly log(n) calls.
Worst case we get an odd number on every other call. This can do no more than double our calls. Multiplication by a constant factor, no worse asymptotically. 2*f(x) is still O(f(x))
O(logn)
It is o(log(N)) base 2, because the division by 2

Moving from Linear Probing to Quadratic Probing (hash collisons)

My current implementation of an Hash Table is using Linear Probing and now I want to move to Quadratic Probing (and later to chaining and maybe double hashing too). I've read a few articles, tutorials, wikipedia, etc... But I still don't know exactly what I should do.
Linear Probing, basically, has a step of 1 and that's easy to do. When searching, inserting or removing an element from the Hash Table, I need to calculate an hash and for that I do this:
index = hash_function(key) % table_size;
Then, while searching, inserting or removing I loop through the table until I find a free bucket, like this:
do {
if(/* CHECK IF IT'S THE ELEMENT WE WANT */) {
// FOUND ELEMENT
return;
} else {
index = (index + 1) % table_size;
}
while(/* LOOP UNTIL IT'S NECESSARY */);
As for Quadratic Probing, I think what I need to do is change how the "index" step size is calculated but that's what I don't understand how I should do it. I've seen various pieces of code, and all of them are somewhat different.
Also, I've seen some implementations of Quadratic Probing where the hash function is changed to accommodated that (but not all of them). Is that change really needed or can I avoid modifying the hash function and still use Quadratic Probing?
EDIT:
After reading everything pointed out by Eli Bendersky below I think I got the general idea. Here's part of the code at http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx:
15 for ( step = 1; table->table[h] != EMPTY; step++ ) {
16 if ( compare ( key, table->table[h] ) == 0 )
17 return 1;
18
19 /* Move forward by quadratically, wrap if necessary */
20 h = ( h + ( step * step - step ) / 2 ) % table->size;
21 }
There's 2 things I don't get... They say that quadratic probing is usually done using c(i)=i^2. However, in the code above, it's doing something more like c(i)=(i^2-i)/2
I was ready to implement this on my code but I would simply do:
index = (index + (index^index)) % table_size;
...and not:
index = (index + (index^index - index)/2) % table_size;
If anything, I would do:
index = (index + (index^index)/2) % table_size;
...cause I've seen other code examples diving by two. Although I don't understand why...
1) Why is it subtracting the step?
2) Why is it diving it by 2?
There is a particularly simple and elegant way to implement quadratic probing if your table size is a power of 2:
step = 1;
do {
if(/* CHECK IF IT'S THE ELEMENT WE WANT */) {
// FOUND ELEMENT
return;
} else {
index = (index + step) % table_size;
step++;
}
} while(/* LOOP UNTIL IT'S NECESSARY */);
Instead of looking at offsets 0, 1, 2, 3, 4... from the original index, this will look at offsets 0, 1, 3, 6, 10... (the ith probe is at offset (i*(i+1))/2, i.e. it's quadratic).
This is guaranteed to hit every position in the hash table (so you are guaranteed to find an empty bucket if there is one) provided the table size is a power of 2.
Here is a sketch of a proof:
Given a table size of n, we want to show that we will get n distinct values of (i*(i+1))/2 (mod n) with i = 0 ... n-1.
We can prove this by contradiction. Assume that there are fewer than n distinct values: if so, there must be at least two distinct integer values for i in the range [0, n-1] such that (i*(i+1))/2 (mod n) is the same. Call these p and q, where p < q.
i.e. (p * (p+1)) / 2 = (q * (q+1)) / 2 (mod n)
=> (p2 + p) / 2 = (q2 + q) / 2 (mod n)
=> p2 + p = q2 + q (mod 2n)
=> q2 - p2 + q - p = 0 (mod 2n)
Factorise => (q - p) (p + q + 1) = 0 (mod 2n)
(q - p) = 0 is the trivial case p = q.
(p + q + 1) = 0 (mod 2n) is impossible: our values of p and q are in the range [0, n-1], and q > p, so (p + q + 1) must be in the range [2, 2n-2].
As we are working modulo 2n, we must also deal with the tricky case where both factors are non-zero, but multiply to give 0 (mod 2n):
Observe that the difference between the two factors (q - p) and (p + q + 1) is (2p + 1), which is an odd number - so one of the factors must be even, and the other must be odd.
(q - p) (p + q + 1) = 0 (mod 2n) => (q - p) (p + q + 1) is divisible by 2n. If n (and hence 2n) is a power of 2, this requires the even factor to be a multiple of 2n (because all of the prime factors of 2n are 2, whereas none of the prime factors of our odd factor are).
But (q - p) has a maximum value of n-1, and (p + q + 1) has a maximum value of 2n-2 (as seen in step 9), so neither can be a multiple of 2n.
So this case is impossible as well.
Therefore the assumption that there are fewer than n distinct values (in step 2) must be false.
(If the table size is not a power of 2, this falls apart at step 10.)
You don't have to modify the hash function for quadratic probing. The simplest form of quadratic probing is really just adding consequent squares to the calculated position instead of linear 1, 2, 3.
There's a good resource here. The following is taken from there. This is the simplest form of quadratic probing when the simple polynomial c(i) = i^2 is used:
In the more general case the formula is:
And you can pick your constants.
Keep, in mind, however, that quadratic probing is useful only in certain cases. As the Wikipedia entry states:
Quadratic probing provides good memory
caching because it preserves some
locality of reference; however, linear
probing has greater locality and,
thus, better cache performance.
Quadratic probing better avoids the
clustering problem that can occur with
linear probing, although it is not
immune.
EDIT: Like many things in computer science, the exact constants and polynomials of quadratic probing are heuristic. Yes, the simplest form is i^2, but you may choose any other polynomial. Wikipedia gives the example with h(k,i) = (h(k) + i + i^2)(mod m).
Therefore, it is difficult to answer your "why" question. The only "why" here is why do you need quadratic probing at all? Having problems with other forms of probing and getting a clustered table? Or is it just a homework assignment, or self-learning?
Keep in mind that by far the most common collision resolution technique for hash tables is either chaining or linear probing. Quadratic probing is a heuristic option available for special cases, and unless you know what you're doing very well, I wouldn't recommend using it.

Resources