Given a snipplet of code, how will you determine the complexities in general. I find myself getting very confused with Big O questions. For example, a very simple question:
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
System.out.println("*");
}
}
The TA explained this with something like combinations. Like this is n choose 2 = (n(n-1))/2 = n^2 + 0.5, then remove the constant so it becomes n^2. I can put int test values and try but how does this combination thing come in?
What if theres an if statement? How is the complexity determined?
for (int i = 0; i < n; i++) {
if (i % 2 ==0) {
for (int j = i; j < n; j++) { ... }
} else {
for (int j = 0; j < i; j++) { ... }
}
}
Then what about recursion ...
int fib(int a, int b, int n) {
if (n == 3) {
return a + b;
} else {
return fib(b, a+b, n-1);
}
}
In general, there is no way to determine the complexity of a given function
Warning! Wall of text incoming!
1. There are very simple algorithms that no one knows whether they even halt or not.
There is no algorithm that can decide whether a given program halts or not, if given a certain input. Calculating the computational complexity is an even harder problem since not only do we need to prove that the algorithm halts but we need to prove how fast it does so.
//The Collatz conjecture states that the sequence generated by the following
// algorithm always reaches 1, for any initial positive integer. It has been
// an open problem for 70+ years now.
function col(n){
if (n == 1){
return 0;
}else if (n % 2 == 0){ //even
return 1 + col(n/2);
}else{ //odd
return 1 + col(3*n + 1);
}
}
2. Some algorithms have weird and off-beat complexities
A general "complexity determining scheme" would easily get too complicated because of these guys
//The Ackermann function. One of the first examples of a non-primitive-recursive algorithm.
function ack(m, n){
if(m == 0){
return n + 1;
}else if( n == 0 ){
return ack(m-1, 1);
}else{
return ack(m-1, ack(m, n-1));
}
}
function f(n){ return ack(n, n); }
//f(1) = 3
//f(2) = 7
//f(3) = 61
//f(4) takes longer then your wildest dreams to terminate.
3. Some functions are very simple but will confuse lots of kinds of static analysis attempts
//Mc'Carthy's 91 function. Try guessing what it does without
// running it or reading the Wikipedia page ;)
function f91(n){
if(n > 100){
return n - 10;
}else{
return f91(f91(n + 11));
}
}
That said, we still need a way to find the complexity of stuff, right? For loops are a simple and common pattern. Take your initial example:
for(i=0; i<N; i++){
for(j=0; j<i; j++){
print something
}
}
Since each print something is O(1), the time complexity of the algorithm will be determined by how many times we run that line. Well, as your TA mentioned, we do this by looking at the combinations in this case. The inner loop will run (N + (N-1) + ... + 1) times, for a total of (N+1)*N/2.
Since we disregard constants we get O(N2).
Now for the more tricky cases we can get more mathematical. Try to create a function whose value represents how long the algorithm takes to run, given the size N of the input. Often we can construct a recursive version of this function directly from the algorithm itself and so calculating the complexity becomes the problem of putting bounds on that function. We call this function a recurrence
For example:
function fib_like(n){
if(n <= 1){
return 17;
}else{
return 42 + fib_like(n-1) + fib_like(n-2);
}
}
it is easy to see that the running time, in terms of N, will be given by
T(N) = 1 if (N <= 1)
T(N) = T(N-1) + T(N-2) otherwise
Well, T(N) is just the good-old Fibonacci function. We can use induction to put some bounds on that.
For, example, Lets prove, by induction, that T(N) <= 2^n for all N (ie, T(N) is O(2^n))
base case: n = 0 or n = 1
T(0) = 1 <= 1 = 2^0
T(1) = 1 <= 2 = 2^1
inductive case (n > 1):
T(N) = T(n-1) + T(n-2)
aplying the inductive hypothesis in T(n-1) and T(n-2)...
T(N) <= 2^(n-1) + 2^(n-2)
so..
T(N) <= 2^(n-1) + 2^(n-1)
<= 2^n
(we can try doing something similar to prove the lower bound too)
In most cases, having a good guess on the final runtime of the function will allow you to easily solve recurrence problems with an induction proof. Of course, this requires you to be able to guess first - only lots of practice can help you here.
And as f final note, I would like to point out about the Master theorem, the only rule for more difficult recurrence problems I can think of now that is commonly used. Use it when you have to deal with a tricky divide and conquer algorithm.
Also, in your "if case" example, I would solve that by cheating and splitting it into two separate loops that don; t have an if inside.
for (int i = 0; i < n; i++) {
if (i % 2 ==0) {
for (int j = i; j < n; j++) { ... }
} else {
for (int j = 0; j < i; j++) { ... }
}
}
Has the same runtime as
for (int i = 0; i < n; i += 2) {
for (int j = i; j < n; j++) { ... }
}
for (int i = 1; i < n; i+=2) {
for (int j = 0; j < i; j++) { ... }
}
And each of the two parts can be easily seen to be O(N^2) for a total that is also O(N^2).
Note that I used a good trick trick to get rid of the "if" here. There is no general rule for doing so, as shown by the Collatz algorithm example
In general, deciding algorithm complexity is theoretically impossible.
However, one cool and code-centric method for doing it is to actually just think in terms of programs directly. Take your example:
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
System.out.println("*");
}
}
Now we want to analyze its complexity, so let's add a simple counter that counts the number of executions of the inner line:
int counter = 0;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
System.out.println("*");
counter++;
}
}
Because the System.out.println line doesn't really matter, let's remove it:
int counter = 0;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
counter++;
}
}
Now that we have only the counter left, we can obviously simplify the inner loop out:
int counter = 0;
for (int i = 0; i < n; i++) {
counter += n;
}
... because we know that the increment is run exactly n times. And now we see that counter is incremented by n exactly n times, so we simplify this to:
int counter = 0;
counter += n * n;
And we emerged with the (correct) O(n2) complexity :) It's there in the code :)
Let's look how this works for a recursive Fibonacci calculator:
int fib(int n) {
if (n < 2) return 1;
return fib(n - 1) + fib(n - 2);
}
Change the routine so that it returns the number of iterations spent inside it instead of the actual Fibonacci numbers:
int fib_count(int n) {
if (n < 2) return 1;
return fib_count(n - 1) + fib_count(n - 2);
}
It's still Fibonacci! :) So we know now that the recursive Fibonacci calculator is of complexity O(F(n)) where F is the Fibonacci number itself.
Ok, let's look at something more interesting, say simple (and inefficient) mergesort:
void mergesort(Array a, int from, int to) {
if (from >= to - 1) return;
int m = (from + to) / 2;
/* Recursively sort halves */
mergesort(a, from, m);
mergesort(m, m, to);
/* Then merge */
Array b = new Array(to - from);
int i = from;
int j = m;
int ptr = 0;
while (i < m || j < to) {
if (i == m || a[j] < a[i]) {
b[ptr] = a[j++];
} else {
b[ptr] = a[i++];
}
ptr++;
}
for (i = from; i < to; i++)
a[i] = b[i - from];
}
Because we are not interested in the actual result but the complexity, we change the routine so that it actually returns the number of units of work carried out:
int mergesort(Array a, int from, int to) {
if (from >= to - 1) return 1;
int m = (from + to) / 2;
/* Recursively sort halves */
int count = 0;
count += mergesort(a, from, m);
count += mergesort(m, m, to);
/* Then merge */
Array b = new Array(to - from);
int i = from;
int j = m;
int ptr = 0;
while (i < m || j < to) {
if (i == m || a[j] < a[i]) {
b[ptr] = a[j++];
} else {
b[ptr] = a[i++];
}
ptr++;
count++;
}
for (i = from; i < to; i++) {
count++;
a[i] = b[i - from];
}
return count;
}
Then we remove those lines that do not actually impact the counts and simplify:
int mergesort(Array a, int from, int to) {
if (from >= to - 1) return 1;
int m = (from + to) / 2;
/* Recursively sort halves */
int count = 0;
count += mergesort(a, from, m);
count += mergesort(m, m, to);
/* Then merge */
count += to - from;
/* Copy the array */
count += to - from;
return count;
}
Still simplifying a bit:
int mergesort(Array a, int from, int to) {
if (from >= to - 1) return 1;
int m = (from + to) / 2;
int count = 0;
count += mergesort(a, from, m);
count += mergesort(m, m, to);
count += (to - from) * 2;
return count;
}
We can now actually dispense with the array:
int mergesort(int from, int to) {
if (from >= to - 1) return 1;
int m = (from + to) / 2;
int count = 0;
count += mergesort(from, m);
count += mergesort(m, to);
count += (to - from) * 2;
return count;
}
We can now see that actually the absolute values of from and to do not matter any more, but only their distance, so we modify this to:
int mergesort(int d) {
if (d <= 1) return 1;
int count = 0;
count += mergesort(d / 2);
count += mergesort(d / 2);
count += d * 2;
return count;
}
And then we get to:
int mergesort(int d) {
if (d <= 1) return 1;
return 2 * mergesort(d / 2) + d * 2;
}
Here obviously d on the first call is the size of the array to be sorted, so you have the recurrence for the complexity M(x) (this is in plain sight on the second line :)
M(x) = 2(M(x/2) + x)
and this you need to solve in order to get to a closed form solution. This you do easiest by guessing the solution M(x) = x log x, and verify for the right side:
2 (x/2 log x/2 + x)
= x log x/2 + 2x
= x (log x - log 2 + 2)
= x (log x - C)
and verify it is asymptotically equivalent to the left side:
x log x - Cx
------------ = 1 - [Cx / (x log x)] = 1 - [C / log x] --> 1 - 0 = 1.
x log x
Even though this is an over generalization, I like to think of Big-O in terms of lists, where the length of the list is N items.
Thus, if you have a for-loop that iterates over everything in the list, it is O(N). In your code, you have one line that (in isolation all by itself) is 0(N).
for (int i = 0; i < n; i++) {
If you have a for loop nested inside another for loop, and you perform an operation on each item in the list that requires you to look at every item in the list, then you are doing an operation N times for each of N items, thus O(N^2). In your example above you do in fact, have another for loop nested inside your for loop. So you can think about it as if each for loop is 0(N), and then because they are nested, multiply them together for a total value of 0(N^2).
Conversely, if you are just doing a quick operation on a single item then that would be O(1). There is no 'list of length n' to go over, just a single one time operation.To put this in context, in your example above, the operation:
if (i % 2 ==0)
is 0(1). What is important isn't the 'if', but the fact that checking to see if a single item is equal to another item is a quick operation on a single item. Like before, the if statement is nested inside your external for loop. However, because it is 0(1), then you are multiplying everything by '1', and so there is no 'noticeable' affect in your final calculation for the run time of the entire function.
For logs, and dealing with more complex situations (like this business of counting up to j or i, and not just n again), I would point you towards a more elegant explanation here.
I like to use two things for Big-O notation: standard Big-O, which is worst case scenario, and average Big-O, which is what normally ends up happening. It also helps me to remember that Big-O notation is trying to approximate run-time as a function of N, the number of inputs.
The TA explained this with something like combinations. Like this is n choose 2 = (n(n-1))/2 = n^2 + 0.5, then remove the constant so it becomes n^2. I can put int test values and try but how does this combination thing come in?
As I said, normal big-O is worst case scenario. You can try to count the number of times that each line gets executed, but it is simpler to just look at the first example and say that there are two loops over the length of n, one embedded in the other, so it is n * n. If they were one after another, it'd be n + n, equaling 2n. Since its an approximation, you just say n or linear.
What if theres an if statement? How is the complexity determined?
This is where for me having average case and best case helps a lot for organizing my thoughts. In worst case, you ignore the if and say n^2. In average case, for your example, you have a loop over n, with another loop over part of n that happens half of the time. This gives you n * n/x/2 (the x is whatever fraction of n gets looped over in your embedded loops. This gives you n^2/(2x), so you'd get n^2 just the same. This is because its an approximation.
I know this isn't a complete answer to your question, but hopefully it sheds some kind of light on approximating complexities in code.
As has been said in the answers above mine, it is clearly not possible to determine this for all snippets of code; I just wanted to add the idea of using average case Big-O to the discussion.
For the first snippet, it's just n^2 because you perform n operations n times. If j was initialized to i, or went up to i, the explanation you posted would be more appropriate but as it stands it is not.
For the second snippet, you can easily see that half of the time the first one will be executed, and the second will be executed the other half of the time. Depending on what's in there (hopefully it's dependent on n), you can rewrite the equation as a recursive one.
The recursive equations (including the third snippet) can be written as such: the third one would appear as
T(n) = T(n-1) + 1
Which we can easily see is O(n).
Big-O is just an approximation, it doesn't say how long an algorithm takes to execute, it just says something about how much longer it takes when the size of its input grows.
So if the input is size N and the algorithm evaluates an expression of constant complexity: O(1) N times, the complexity of the algorithm is linear: O(N). If the expression has linear complexity, the algorithm has quadratic complexity: O(N*N).
Some expressions have exponential complexity: O(N^N) or logarithmic complexity: O(log N). For an algorithm with loops and recursion, multiply the complexities of each level of loop and/or recursion. In terms of complexity, looping and recursion are equivalent. An algorithm that has different complexities at different stages in the algorithm, choose the highest complexity and ignore the rest. And finally, all constant complexities are considered equivalent: O(5) is the same as O(1), O(5*N) is the same as O(N).
Related
The answer is O(n^6) but I am not quite sure how to get there, trying with small numbers shows that g ups the number n to the power of 3 so k=n^3 and thus k^2=n^6 (I think), but how do I show it mathematically, specifically, we were taught a method to use a new function T(n) but I'm not sure how to apply it here, appreciate any help thanks.
int g(int n)
{
if (n <= 1) return 1;
return 8 * g(n / 2);
}
void f3(int n)
{
int k = g(n);
for (int i = 2; i < k * k; ++i)
{ printf("*"); }
}
Let's first analyze the function g(n):
g(n) = 8 * g(n/2)
if you eliminate the recursion, this breaks down to
g(n) = 8^log_2(n)
and eliminating the logarithm yields:
g(n) = n^3
Now k*k is n^3*n^3 = n^6, so the loop prints n^6 asterisks. This results in the time complexity of O(n^6).
int dup_chk(int a[], int length)
{
int i = length;
while (i > 0)
{
i--;
int j = i -1;
while (j >= 0)
{
if (a[i] == a[j])
{
return 1;
}
j--;
}
}
return 0;
}
So what I think I know is the following:
line 1 is just 1.
First while loop is N+1.
i--; is N times since its inside the first while loop.
j = i -1; is also N.
Second while loop is (N+1)N = N^2+N since its a while loop within a while loop
if statement: ???
j--; is N(N) = N^2
return 0; is 1
I'm really new to calculating the time complexity of algorithms so I'm not even sure if what I think I know is completely right.
But what is messing with me is the if statement, I do not know how to calculate that (and what if there is an else after it as well?)
EDIT: The grand total is equal to 3/2N^2 + 5/2N+3
I understand that this function is O(N^2) but don't quite get how the grand total was calculated.
Usually such accurate analysis of time complexity is not required. It suffices to know it in terms of Big-O. However, I did some calculations for my own curiosity.
If your concern is just a worst case analysis to obtain the time complexity, consider an array with only unique elements. In such a scenario:
The return 1 statement never executes. The inner while loop executes N(N-1)/2 times (summation i-1 from 1 to N), and three things happen - the while condition is checked (and evaluates to true), the if condition is checked (and evaluates to false) and the variable j is decremented. Therefore, the number of operations is 3N(N-1)/2.
The outer while loop executes N times, and there are three statements apart from the condition check - i is decremented, j is assigned, and the inner while condition fails N times. That is 4N more operations.
Outside all loops, there are three more statements. Initialisation of i, the while condition fails once, and then the return statement. Add 3 more to our tally.
3/2N2 - 3/2N + 4N + 3.
That's 3/2N2 + 5/2N + 3. There is your 'grand total'.
To repeat myself, this calculation is completely unnecessary for all practical purposes.
Maybe this can help you understand what goes wrong in your code. I have added some printout that make easier to understand what happens in your code. I think this should be sufficient to find your error
int dup_chk(int a[], int length)
{
int j = 0;
int i = length;
char stringa[30];
printf("Before first while loop j = %d and i = %d \n", j, i);
while (i > 0)
{
i--;
j = i - 1;
printf("\tIn first while loop j = %d and i = %d\n", j, i);
while (j >= 0)
{
printf("\t\tIn second while loop j = %d and i = %d\n", j, i);
if (a[i] == a[j])
{
printf("\t\tIn if statment j = %d and i = %d\n", j, i);
return 1;
}
j--;
printf("\t\tEnd of second while loop j = %d and i = %d\n", j, i);
}
}
printf("After first while loop j = %d and i = %d \n", j, i);
printf("Press any key to finish the program and close the window\n");
return 0;
}
I should also recomend to debug your code understand what goes on better.
The if check is executed as many times as the inner while loop iterates.
The return 1 is by definition only executed once max. It appears you assume there are no duplicates in the input (ie. worst case), in which case the return 1 statement never executes.
You'll eventually get a feel for what parts of the code you can ignore, so you won't need to calculate this "grand total", and just realize there are two nested loops that each traverse the array - ie. O(N^2).
int dup_chk(int a[], int length)
{
int i = length;
while (i > 0) // Outer loop
{
i--;
int j = i -1;
while (j >= 0) // Inner loop
{
if (a[i] == a[j])
{
return 1;
}
j--;
}
}
return 0;
}
The above program is exactly your code with two comments I took the liberty to add.
Let's consider the worst case scenario (because that's what everyone cares / is worried about). If you notice carefully, you will observe that for every value of i, the Inner loop executes i - 1 times. Thus if your Outer loop executes n times, the Inner loop will execute n * (n - 1) times in total (i.e. n - 1 times for each value of n).
n * (n - 1) yields n^2 - n in general algebra. Now, n^2 increases in leaps and bounds (as compared to n) as you go on increasing the value of n. Asymptotic notation let's us consider the factor which will have the greatest impact on the number of steps to be executed. Thus, we can ignore n and say that this program has a worst case running time of O(n^2).
That's the beauty and simplicity of the Big-O notation. - Quoting Jonathan Leffler from the comments above.
Thorough evaluation:
This program has a special feature: it terminates if a pair (a[I], a[J]) of equal values is found. Assume that we know I and J (we will see later what if there is no such pair).
The outer loop is executed for all I <= i < L, hence L-I times. Each time, the inner loop is executed for all 0 <= j < i, hence i times, except for the last pass (i = I): we have J <= j < I hence I-J iterations.
We assume that the "cost" of a loop is of the form a N + b, where a is the cost of a single iteration and b some constant overhead.
Now for the inner loop, which is run L-I times with decreasing numbers of iterations, using the "triangular numbers" formula, the cost is
a (L-1 + L-2 + ... I+1 + I-J) + b (L - I) = a ((L-1)L/2 - I(I+1)/2 + I-J) + b (L-I)
to which we add the cost of the outer loop to get
a ((L-1)L/2 - I(I+1)/2 + I-J) + b (L-I) + c
(where b is a different constant than above).
In general, this function is quadratic in L, but if a pair is found quickly (say I = L-3), it becomes linear; in the best case (I = L-1,J = L-2), it is even the constant a + b + c.
The worst case occurs when the pair is found last (I = 1, J = 0), which is virtually equivalent to no pair found. Then we have
a (L-1)L/2 + b (L - 1) + c
obviously O(L²).
I have this piece of code and I would like to find its time complexity. I am preparing for interviews and I think this one is a bit tough.
int foo (int n)
{
int sum = 0;
int k, i, j;
int t = 2;
for (i=n/2; i>0; i/=2)
{
for(j=0; j<i; j++)
{
for(k=0; k<log2(t-1); k++)
{
sum += bar(sum);
// bar time-complexity for all inputs is O(1)
}
}
t = pow(2, i);
}
}
I don't know why but I am unable to bound this expression and find a complexity.
Any help on how to resolve this ?
Since you've shown no progress, I'll give you top-level hints:
let log_t = log2(t). What is log_t as a function of i?
Note that the outer loop is executed log2(n) times.
How many total times is the j loop executed?
Pick a sample value of n, such as 32. For each value of i, how many times is the sum += statement executed? Can you generalize an equation for this based on n?
Lets write it down as:
(n/2)*log(1) + (n/4)*log(3) + ... + 1*(log(n-1)). Which is equal to:
< n * [log(2^i)/(2^i) for i in range 1...n] .
= n * [log(2)/2 + log(4)/4 + log(8)/8 + ... + log(n)/n)]
This yeilds to O(n)
I have to print numbers between two limits n and m, t times.
I created t variable, and two pointers n, m that points to reserved blocks of memory for t integer values.
I use pointers instead of array to do faster operations.
Outer for loop iterates for every test cases and increasing m and n pointers.
Inner for loop prints primes from m[i] to n[i].
Code
#include <stdio.h>
#include <stdlib.h>
int is_prime(int);
int main(void) {
int t;
int *n = malloc(sizeof(int) * t);
int *m = malloc(sizeof(int) * t);
scanf("%d", &t);
for (int i = 0; i < t; i++, m++, n++) {
scanf("%d %d", &m[i], &n[i]);
for (int j = m[i]; j <= n[i]; j++) {
if (is_prime(j)) {
printf("%d\n", j);
}
}
if (i < t - 1) printf("\n");
}
return 0;
}
int is_prime(int num)
{
if (num <= 1) return 0;
if (num % 2 == 0 && num > 2) return 0;
for(int i = 3; i < num / 2; i+= 2){
if (num % i == 0)
return 0;
}
return 1;
}
Problem: http://www.spoj.com/problems/PRIME1/
Code is correctly compiling on http://ideone.com but I'm giving "time limit exceeded" error when I'm trying submit this code on SPOJ. How can I reduce execution time of this prime number generator?
As #Carcigenicate suggests, you're exceeding the time limit because your prime generator is too slow; and it's too slow since you're using an inefficient algorithm.
Indeed, you should not simply test each consecutive number for primality (which, by the way, you're also doing ineffectively), but rather rule out multiple values at once using known primes (and perhaps additional primes which you compute). For example, you don't need to check multiples of 5 and 10 (other than the actual value 5) for primality, since you know that 5 divides them. So just "mark" the multiples of various primes as irrelevant.
... and of course, that's just for getting you started, there are all sort of tricks you could use for optimization - algorithmic and implementation-related.
I know that you are looking for algorithm improvements, but the following technical optimizations might help:
If you are using Visual Studio, you can use alloca instead of malloc, so that n and m go in the stack instead of the heap.
You can also try to rewrite your algorithm using arrays instead of pointers to put n and m in the stack.
If you want to keep using pointers, use the __restrict keyword after the asterisks, which alerts the compiler that you don't make references of the two pointers.
You can even do it without using pointers or arrays
#include <stdio.h>
#include<math.h>
int is_prime(long n){
if (n == 1 || n % 2 == 0)
return 0;
if (n == 2)
return 1;
for (long i = 3; i <= sqrt(n); i += 2) {
if(n % i == 0)
return 0;
}
return 1;
}
int main() {
int t;
scanf("%d",&t);
while(t--) {
long n, m;
scanf("%ld %ld",&n,&m);
for (long i = n; i <= m; i++) {
if (is_prime(i) == 1)
printf("%ld\n",i);
}
}
return 0;
}
There are several ways to improve the primality check for an integer n. Here are a few that you might find useful.
Reduce the number of checks: A well known theorem is giving the fact that if you want to look for factors of n, let say n = a * b, then you can look for a divisor between 1 and sqrt(n). (Proof is quite easy, the main argument being that we have three cases, either a = b = sqrt(n), or we have a < sqrt(n) < b or b < sqrt(n) < a. And, whatever case we fall in, there will be a factor of n between 1 and sqrt(n)).
Use a Sieve of Eratosthenes: This way allows to discard unnecessary candidates which are previously disqualified (see Sieve of Eratosthenes (Wikipedia))
Use probabilistic algorithms: The most efficient way to check for primality nowadays is to use a probabilistic test. It is a bit more complex to implements but it is way more efficient. You can find a few of these techniques here (Wikipedia).
Approach 1:
C(n,r) = n!/(n-r)!r!
Approach 2:
In the book Combinatorial Algorithms by wilf, i have found this:
C(n,r) can be written as C(n-1,r) + C(n-1,r-1).
e.g.
C(7,4) = C(6,4) + C(6,3)
= C(5,4) + C(5,3) + C(5,3) + C(5,2)
. .
. .
. .
. .
After solving
= C(4,4) + C(4,1) + 3*C(3,3) + 3*C(3,1) + 6*C(2,1) + 6*C(2,2)
As you can see, the final solution doesn't need any multiplication. In every form C(n,r), either n==r or r==1.
Here is the sample code i have implemented:
int foo(int n,int r)
{
if(n==r) return 1;
if(r==1) return n;
return foo(n-1,r) + foo(n-1,r-1);
}
See output here.
In the approach 2, there are overlapping sub-problems where we are calling recursion to solve the same sub-problems again. We can avoid it by using Dynamic Programming.
I want to know which is the better way to calculate C(n,r)?.
Both approaches will save time, but the first one is very prone to integer overflow.
Approach 1:
This approach will generate result in shortest time (in at most n/2 iterations), and the possibility of overflow can be reduced by doing the multiplications carefully:
long long C(int n, int r) {
if(r > n - r) r = n - r; // because C(n, r) == C(n, n - r)
long long ans = 1;
int i;
for(i = 1; i <= r; i++) {
ans *= n - r + i;
ans /= i;
}
return ans;
}
This code will start multiplication of the numerator from the smaller end, and as the product of any k consecutive integers is divisible by k!, there will be no divisibility problem. But the possibility of overflow is still there, another useful trick may be dividing n - r + i and i by their GCD before doing the multiplication and division (and still overflow may occur).
Approach 2:
In this approach, you'll be actually building up the Pascal's Triangle. The dynamic approach is much faster than the recursive one (the first one is O(n^2) while the other is exponential). However, you'll need to use O(n^2) memory too.
# define MAX 100 // assuming we need first 100 rows
long long triangle[MAX + 1][MAX + 1];
void makeTriangle() {
int i, j;
// initialize the first row
triangle[0][0] = 1; // C(0, 0) = 1
for(i = 1; i < MAX; i++) {
triangle[i][0] = 1; // C(i, 0) = 1
for(j = 1; j <= i; j++) {
triangle[i][j] = triangle[i - 1][j - 1] + triangle[i - 1][j];
}
}
}
long long C(int n, int r) {
return triangle[n][r];
}
Then you can look up any C(n, r) in O(1) time.
If you need a particular C(n, r) (i.e. the full triangle is not needed), then the memory consumption can be made O(n) by overwriting the same row of the triangle, top to bottom.
# define MAX 100
long long row[MAX + 1];
int C(int n, int r) {
int i, j;
// initialize by the first row
row[0] = 1; // this is the value of C(0, 0)
for(i = 1; i <= n; i++) {
for(j = i; j > 0; j--) {
// from the recurrence C(n, r) = C(n - 1, r - 1) + C(n - 1, r)
row[j] += row[j - 1];
}
}
return row[r];
}
The inner loop is started from the end to simplify the calculations. If you start it from index 0, you'll need another variable to store the value being overwritten.
I think your recursive approach should work efficiently with DP. But it will start giving problems once the constraints increase. See http://www.spoj.pl/problems/MARBLES/
Here is the function which i use in online judges and coding contests. So it works quite fast.
long combi(int n,int k)
{
long ans=1;
k=k>n-k?n-k:k;
int j=1;
for(;j<=k;j++,n--)
{
if(n%j==0)
{
ans*=n/j;
}else
if(ans%j==0)
{
ans=ans/j*n;
}else
{
ans=(ans*n)/j;
}
}
return ans;
}
It is an efficient implementation for your Approach #1
Your Recursive Approach is fine but using DP with your approach will reduce the overhead of solving subproblems again.Now since we already have two Conditions-
nCr(n,r) = nCr(n-1,r-1) + nCr(n-1,r);
nCr(n,0)=nCr(n,n)=1;
Now we can easily build a DP solution by storing our subresults in a 2-D array-
int dp[max][max];
//Initialise array elements with zero
int nCr(int n, int r)
{
if(n==r) return dp[n][r] = 1; //Base Case
if(r==0) return dp[n][r] = 1; //Base Case
if(r==1) return dp[n][r] = n;
if(dp[n][r]) return dp[n][r]; // Using Subproblem Result
return dp[n][r] = nCr(n-1,r) + nCr(n-1,r-1);
}
Now if you want to further otimise, Getting the prime factorization of the binomial coefficient is probably the most efficient way to calculate it, especially if multiplication is expensive.
The fastest method I know is Vladimir's method. One avoids division all together by decomposing nCr into prime factors. As Vladimir says you can do this pretty efficiently using Eratosthenes sieve.Also,Use Fermat's little theorem to calculate nCr mod MOD(Where MOD is a prime number).
Using dynamic programming you can easily find the nCr here is the solution
package com.practice.competitive.maths;
import java.util.Scanner;
public class NCR1 {
public static void main(String[] args) {
try (Scanner scanner = new Scanner(System.in)) {
int testCase = scanner.nextInt();
while (testCase-- > 0) {
int n = scanner.nextInt();
int r = scanner.nextInt();
int[][] combination = combination();
System.out.println(combination[n][r]%1000000007);
}
} catch (Exception e) {
e.printStackTrace();
}
}
public static int[][] combination() {
int combination[][] = new int[1001][1001];
for (int i = 0; i < 1001; i++)
for (int j = 0; j <= i; j++) {
if (j == 0 || j == i)
combination[i][j] = 1;
else
combination[i][j] = combination[i - 1][j - 1] % 1000000007 + combination[i - 1][j] % 1000000007;
}
return combination;
}
}
unsigned long long ans = 1,a=1,b=1;
int k = r,i=0;
if (r > (n-r))
k = n-r;
for (i = n ; k >=1 ; k--,i--)
{
a *= i;
b *= k;
if (a%b == 0)
{
a = (a/b);
b=1;
}
}
ans = a/b;