I am studying for an exam and i came across some problems i need to address - dealing with Base Cases:
I am converting from Code to a Recurrence Relation not the other way around
Example 1:
if(n==1) return 0;
Now the recurrence relation to that piece of code is: T(1) = 0
How i got that?
By looking at n==1, we see this is a comparison with a value > 0, which is doing some form of work so we set "T(1)" and the return 0; isn't doing any work so we say "=0"
=> T(1) = 0;
Example 2:
if(n==0) return n+1*2;
Analyzing: n==0 means we aren't doing any work so T(0), but return n+1*2; is doing work so "=1"
=> T(0) = 1;
What i want to know is if this the correct way of analyzing a piece of code like that to come up with a recurrence relation base case?
I am unsure about these which i came up on my own to exhaust possibilities of base-cases:
Example 3: if(n==m-2) return n-1; //answer: T(1) = 1; ?
Example 4: if(n!=2) return n; //answer: T(1) = 1; ?
Example 5: if(n/2==0) return 1; //answer: T(1) = 1; ?
Example 6: if(n<2) return; //answer: T(1) = 0; ?
It's hard to analyze base cases outside of the context of the code, so it might be helpful if you posted the entire function. However, I think your confusion is arising from the assumption that T(n) always represents "work". I'm guessing that you are taking a class on complexity and you have learned about recurrence relations as a method for expressing the complexity of a recursive function.
T(n) is just a function: you plug in a number n (usually a positive integer) and you get out a number T(n). Just like any other function, T(n) means nothing on its own. However, we often use a function with the notation T(n) to express the amount of time required by an algorithm to run on an input of size n. These are two separate concepts; (1) a function T(n) and the various ways to represent it, such as a recurrence relationship, and (2) the number of operations required to run an algorithm.
Let me give an example.
int factorial(n)
{
if (n > 0)
return n*factorial(n-1);
else
return 1;
}
Let's see if we can write some function F(n) that represents the output of the code. well, F(n) = n*F(n-1), with F(0) = 1. Why? Clearly from the code, the result of F(0) is 1. For any other value of n, the result is F(n) = n*F(n-1). That recurrence relation is a very convenient way to express the output of a recursive function. Of course, I could just as easily say that F(n) = n! (the factorial operator), which is also correct. That's a non-recurrence expression of the same function. Notice that I haven't said anything about the run time of the algorithm or how much "work" it is doing. I'm just writing a mathematical expression for what the code outputs.
Dealing with the run time of a function is a little trickier, since you have to decide what you mean by "work" or "an operation." Let's suppose that we don't count "return" as an operation, but we do count multiplication as an operation and we count a conditional (the if statement) as an operation. Under these assumptions, we can try to write a recurrence relation for a function T(n) that describes how much work is done when the input n is given to the function. (Later, I'll make a comment about why this is a poor question.) For n = 0, we have a conditional (the if statement) and a return, nothing else. So T(0) = 1. For any other n > 0, we have a conditional, a multiply, and however many operations are required to compute T(n-1). So the total for n is:
T(n) = 1 (conditional) + 1 (multiply) + T(n-1) = 2 + T(n-1),
T(0) = 1.
We could write T(n) as a recurrence relation: T(n) = 2 + T(n-1), T(0) = 1. Of course, this is also just the function T(n) = 1 + 2n. Again, I want to stress that these are two very different functions. F(n) is describing the output of the function when n is the input. T(n) is describing how much work is done when n is the input.
Now, the T(n) that I just described is bad in terms of complexity theory. The reason is that complexity theory isn't about describing how much work is required to compute functions that take only a single integer as an argument. In other words, we aren't looking at the work required for a function of the form F(n). We want something more general: how much work is required to perform an algorithm on an input of size n. For example, MergeSort is an algorithm for sorting a list of objects. It requires roughly nlog(n) operations to run MergeSort on a list of n items. Notice that MergeSort isn't doing anything with the number n, rather, it operates on a list of size n. In contrast, our factorial function F(n) isn't operating on an input of size n: presumably n is an integer type, so it is probably 32-bits or 64-bits or something, no matter its value. Or you can get picky and say that its size is the minimum number of bits to describe it. In any case, n is the input, not the size of the input.
When you are answering these questions, it is important to be very clear about whether they want a recurrence relation that describes the output of the function, or a recurrence relation that describes the run time of the function.
Related
A friend from another college came to me with this challenge. I was unable to help him, and became rather troubled as I can't understand the purpose of the function he was supposed to decipher. Has anyone ever seen anything like this?
I've tried visualising the data, but couldn't really take any meaningful conclusions out of the graph:
(blue is the return value of mistery, orange is the difference between the last and the current return. The scale is logarithmic for easier reading.)
int mistery(int n){
int i, j, res=0;
for(i = n / 2; i <= n; i++){
for(j = 2; j <= n; j *= 2){
res += n / 2;
}
}
return res;
}
It seems like random code, but I've actually seen the worksheet where this appears. Any input is welcome.
Thanks
The increment added to the result variable on each iteration of the inner loop depends only on function parameter n, which the function does not modify. The result is therefore the product of the increment (n/2) with the total number of inner-loop iterations (supposing that does not overflow).
So how many loop iterations are there? Consider first the outer loop. If the lower bound of i were 0, then the inclusive upper bound of n would yield n+1 iterations. But we are skipping the first n/2 iterations (0 ... (n/2)-1), so that's n+1-(n/2). All the divisions here are integer divisions, and with integer division it is true for all m that m = (m/2) + ((m+1)/2). We can use that to rewrite the number of outer-loop iterations as ((n+1)/2) + 1, or (n+3)/2 (still using integer division).
Now consider the inner loop. The index variable j starts at 2 and is doubled at every iteration until it exceeds n. This yields a number of iterations equal to the floor of the base-2 logarithm of n. The overall number of iterations is therefore
(n+3)/2 * floor(log2(n))
(supposing that we can assume an exact result when n is a power of 2). The combined result, then, is
((n+3)/2) * (n/2) * floor(log2(n))
where the divisions are still integer divisions, and are performed before the multiplications. This explains why the function's derivative spikes at power-of-2 arguments: the number of outer-loop iterations increases by one at each of those points.
We haven't any context from which to guess the purpose of the function, and I don't recognize it as a well-known function, but we can talk a bit about its properties. For example,
it grows asymptotically faster than n2, and asymptotically slower than n3. In fact,
it has a form reminiscent of those that tend to appear in computational asmyptotic bounds, so perhaps it is an estimate or bound for the memory or time that some computation will require. Also,
it is strictly increasing for n > 0. That might not be immediately clear because of the involvement of integer division, but note that n and n + 3 have opposite parity, so whenever n increases by one, exactly one of n/2 and (n+3)/2 increases by one, while the other does not change (and the logarithmic term is non-decreasing).
as already discussed, it has sudden jumps at powers of 2. This can be viewed in terms of the number of iterations of the outer loop, or, alternatively, in terms of the involvement of the floor() function in the single-equation form.
The polynomial part of the function is related to the equation for the sum of the integers from 1 to n.
The logarithmic part of the function is related to the number of significant bits in the binary representation of n.
this is a classic problem, but I am curious if it is possible to do better with these conditions.
Problem: Suppose we have a sorted array of length 4*N, that is, each element is repeated 4 times. Note that N can be any natural number. Also, each element in the array is subject to the constraint 0 < A[i] < 190*N. Are there 4 elements in the array such that A[i] + A[j] + A[k] + A[m] = V, where V can be any positive integer; note we must use exactly 4 elements and they can be repeated. It is not necessarily a requirement to find the 4 elements that satisfy the condition, rather, just showing it can be done for a given array and V is enough.
Ex : A = [1,1,1,1,4,4,4,4,5,5,5,5,11,11,11,11]
V = 22
This is true because, 11 + 5 + 5 + 1 = 22.
My attempt:
Instead of "4sum" I first tried k-sum, but this proved pretty difficult so I instead went for this variation. The first solution I came to was rather naive O(n^2). However, given these constraints, I imagine that we can do better. I tried some dynamic programming methods and divide and conquer, but that didn't quite get me anywhere. To be specific, I am not sure how to cleverly approach this in a way where I can "eliminate" portions of the array without having to explicitly check values against all or almost all permutations.
Make an vector S0 of length 256N where S0[x]=1 if x appears in A.
Perform a convolution of S0 with itself to produce a new vector S1 of length 512N. S1[x] is nonzero iff x is the sum of 2 numbers in A.
Perform a convolution of S1 with itself to make a new vector S2. S2[x] is nonzero iff x is the sum of 4 numbers in A.
Check S2[V] to get your answer.
Convolution can be performed in O(N log N) time using FFT convolution (http://www.dspguide.com/ch18/2.htm) or similar techniques.
Since at most 4 such convolutions are performed, the total complexity is O(N log N)
Given a range a to b , and number k, find all the k-prime numbers between a to b [inclusive both].
Definition of k-prime : A number is a k-prime if it has exactly k distinct prime factors.
i.e. a=4, b=10 k=2 the answer is 2. Since the prime factors of 6 are [2,3] and the prime factors of 10 are [2,5].
Now here's my attempt
#include<stdio.h>
#include<stdlib.h>
int main(){
int numOfInp;
scanf("%d",&numOfInp);
int a,b,k;
scanf("%d %d %d",&a,&b,&k);
int *arr;
arr = (int*)calloc(b+1,sizeof(int));
int i=2,j=2,count=0;
//Count is the count of distic k prim factors for a particular number
while(i<=b){
if(arr[i]==0){
for(j=i;j<=b;j=j+i){
arr[j]++;
}
}
if(i>=a && arr[i]==k)
count++;
i++;
}
printf("%d\n",count);
free(arr);
return 0;
}
This problem is taken from Codechef
Here's what I've done, I take an array of size b, and for each number starting from 2, I do the following.
For 2 check if arr[2] is 0, then arr[2]++,arr[4]++,arr[6]++ .... so on.
For 3 check if arr[2] is 0, then arr[3]++,arr[6]++,arr[9]++ .... so on.
Since arr[4] is not zero, leave it.
In the end, the value arr[i] will give me the answer, i.e arr[2] is 1, hence 2 is 1-prime number, arr[6] is 2, hence 6 is 2-prime number.
Questions:
What is the complexity of this code, and can it be done in O(n)?
Am I using Dynamic Programming here?
The algorithm you are using is know as Sieve of Eratosthenes. It is a well known algorithm for finding prime numbers. Now to answer your questions :
1a) What is the complexity of this code
The complexity of your code is O(n log(log n)).
For and input of a and b the complexity of your code is O(b log log b). The runtime is due to the fact that you first mark b/2 number then b/3 then b/5 and so on. So your runtime is b * (1/2 + 1/3 + 1/5 + 1/7 + 1/11 + ... + 1/prime_closest_to_b). What we have there is a prime harmonic series which grows asymptotically as ln(ln(b+1)) (see here).
Asymptotically the upper bound is:
O(b * (1/2 + 1/3 + 1/5 + 1/7 +..)) = O(b) * O(log(log(b+1))) = O(b*log(log(b))
1b) Can it be done in O(n)
This is tricky. I would say that for all practical purposes an O(n log log n) algorithm is gonna be about as good as any O(n) algorithm, since log(log(n)) grows really really really slow.
Now, if my life depended on it I would try to see if I can find a method to generate all numbers up to n in a way where every operation generates a unique number and tells me how many unique prime divisors it has.
2) Am I using Dynamic Programming here?
Definition of dynamic programming from wikipedia says:
Dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems
The definition is quite broad, so it is unfortunately open to interpretation. I would say that this isn't dynamic programming, because you aren't breaking down your problem into smaller smaller sub-problems and using the results from those sub-problems to find the final answer.
I came across this post, which reports the following interview question:
Given two arrays of numbers, find if each of the two arrays have the
same set of integers ? Suggest an algo which can run faster than NlogN
without extra space?
The best that I can think of is the following:
(a) sort each array, and then (b) have two pointers moving along the two arrays and check if you find different values ... but step (a) has already NlogN complexity :(
(a) scan shortest array and put values into a map, and then (b) scan second array and check if you find a value that is not in the map ... here we have linear complexity, but we I use extra space
... so, I can't think of a solution for this question.
Ideas?
Thank you for all the answers. I feel many of them are right, but I decided to choose ruslik's one, because it gives an interesting option that I did not think about.
You can try a probabilistic approach by choosing a commutative function for accumulation (eg, addition or XOR) and a parametrized hash function.
unsigned addition(unsigned a, unsigned b);
unsigned hash(int n, int h_type);
unsigned hash_set(int* a, int num, int h_type){
unsigned rez = 0;
for (int i = 0; i < num; i++)
rez = addition(rez, hash(a[i], h_type));
return rez;
};
In this way the number of tries before you decide that the probability of false positive will be below a certain treshold will not depend on the number of elements, so it will be linear.
EDIT: In general case the probability of sets being the same is very small, so this O(n) check with several hash functions can be used for prefiltering: to decide as fast as possible if they are surely different or if there is a probability of them being equivalent, and if a slow deterministic method should be used. The final average complexity will be O(n), but worst case scenario will have the complexity of the determenistic method.
You said "without extra space" in the question but I assume that you actually mean "with O(1) extra space".
Suppose that all the integers in the arrays are less than k. Then you can use in-place radix sort to sort each array in time O(n log k) with O(log k) extra space (for the stack, as pointed out by yi_H in comments), and compare the sorted arrays in time O(n log k). If k does not vary with n, then you're done.
I'll assume that the integers in question are of fixed size (eg. 32 bit).
Then, radix-quicksorting both arrays in place (aka "binary quicksort") is constant space and O(n).
In case of unbounded integers, I believe (but cannot proof, even if it is probably doable) that you cannot break the O(n k) barrier, where k is the number of digits of the greatest integer in either array.
Whether this is better than O(n log n) depends on how k is assumed to scale with n, and therefore depends on what the interviewer expects of you.
A special, not harder case is when one array holds 1,2,..,n. This was discussed many times:
How to tell if an array is a permutation in O(n)?
Algorithm to determine if array contains n...n+m?
mathoverflow
and despite many tries no deterministic solutions using O(1) space and O(n) time were shown. Either you can cheat the requirements in some way (reuse input space, assume integers are bounded) or use probabilistic test.
Probably this is an open problem.
Here is a co-rp algorithm:
In linear time, iterate over the first array (A), building the polynomial
Pa = A[0] - x)(A[1] -x)...(A[n-1] - x). Do the same for array B, naming this polynomial Pb.
We now want to answer the question "is Pa = Pb?" We can check this probabilistically as follows. Select a number r uniformly at random from the range [0...4n] and compute d = Pa(r) - Pb(r) in linear time. If d = 0, return true; otherwise return false.
Why is this valid? First of all, observe that if the two arrays contain the same elements, then Pa = Pb, so Pa(r) = Pb(r) for all r. With this in mind, we can easily see that this algorithm will never erroneously reject two identical arrays.
Now we must consider the case where the arrays are not identical. By the Schwart-Zippel Lemma, P(Pa(r) - Pb(r) = 0 | Pa != Pb) < (n/4n). So the probability that we accept the two arrays as equivalent when they are not is < (1/4).
The usual assumption for these kinds of problems is Theta(log n)-bit words, because that's the minimum needed to index the input.
sshannin's polynomial-evaluation answer works fine over finite fields, which sidesteps the difficulties with limited-precision registers. All we need are a prime of the appropriate (easy to find under the same assumptions that support a lot of public-key crypto) or an irreducible polynomial in (Z/2)[x] of the appropriate degree (difficulty here is multiplying polynomials quickly, but I think the algorithm would be o(n log n)).
If we can modify the input with the restriction that it must maintain the same set, then it's not too hard to find space for radix sort. Select the (n/log n)th element from each array and partition both arrays. Sort the size-(n/log n) pieces and compare them. Now use radix sort on the size-(n - n/log n) pieces. From the previously processed elements, we can obtain n/log n bits, where bit i is on if a[2*i] > a[2*i + 1] and off if a[2*i] < a[2*i + 1]. This is sufficient to support a radix sort with n/(log n)^2 buckets.
In the algebraic decision tree model, there are known Omega(NlogN) lower bounds for computing set intersection (irrespective of the space limits).
For instance, see here: http://compgeom.cs.uiuc.edu/~jeffe/teaching/497/06-algebraic-tree.pdf
So unless you do clever bit manipulations/hashing type approaches, you cannot do better than NlogN.
For instance, if you used only comparisons, you cannot do better than NlogN.
You can break the O(n*log(n)) barrier if you have some restrictions on the range of numbers. But it's not possible to do this if you cannot use any extra memory (you need really silly restrictions to be able to do that).
I would also like to note that even O(nlog(n)) with sorting is not trivial if you have O(1) space limit as merge sort uses O(n) space and quicksort (which is not even strict o(nlog(n)) needs O(log(n)) space for the stack. You have to use heapsort or smoothsort.
Some companies like to ask questions which cannot be solved and I think it is a good practice, as a programmer you have to know both what's possible and how to code it and also know what are the limits so you don't waste your time on something that's not doable.
Check this question for a couple of good techniques to use:
Algorithm to tell if two arrays have identical members
For each integer i check that the number of occurrences of i in the two arrays are either both zero or both nonzero, by iterating over the arrays.
Since the number of integers is constant the total runtime is O(n).
No, I wouldn't do this in practice.
Was just thinking if there was a way you could hash the cumulative of both arrays and compare them, assuming the hashing function doesn't produce collisions from two differing patterns.
why not i find the sum , product , xor of all the elements one array and compare them with the corresponding value of the elements of the other array ??
the xor of elements of both arrays may give zero if the it is like
2,2,3,3
1,1,2,2
but what if you compare the xor of the elements of two array to be equal ???
consider this
10,3
12,5
here xor of both arrays will be same !!! (10^3)=(12^5)=9
but their sum and product are different . I think two different set of elements cannot have same sum ,product and xor !
This can be analysed by simple bitvalue examination.
Is there anything wrong in this approach ??
I'm not sure that correctly understood the problem, but if you are interested in integers that are in both array:
If N >>>>> 2^SizeOf(int) (count of bit for integer (16, 32, 64)) there is one solution:
a = Array(N); //length(a) = N;
b = Array(M); //length(b) = M;
//x86-64. Integer consist of 64 bits.
for i := 0 to 2^64 / 64 - 1 do //very big, but CONST
for k := 0 to M - 1 do
if a[i] = b[l] then doSomething; //detected
for i := 2^64 / 64 to N - 1 do
if not isSetBit(a[i div 64], i mod 64) then
setBit(a[i div 64], i mod 64);
for i := 0 to M - 1 do
if isSetBit(a[b[i] div 64], b[i] mod 64) then doSomething; //detected
O(N), with out aditional structures
All I know is that comparison based sorting cannot possibly be faster than O(NlogN), so we can eliminate most of the "common" comparison based sorts. I was thinking of doing a bucket sort. Perhaps if this qn was asked in an interview, the best response would first be to clarify what sort of data those integers represent. For e.g., if they represent a persons age, then we know that the range of values of int is limited, and can use bucket sort at O(n). However, this will not be in place....
If the arrays have the same size, and there are guaranteed to be no duplicates, sum each of the arrays. If the sum of the values is different, then they contain different integers.
Edit: You can then sum the log of the entries in the arrays. If that is also the same, then you have the same entries in the array.
A continued fraction is a series of divisions of this kind:
depth 1 1+1/s
depth 2 1+1/(1+1/s)
depth 3 1+1/(1+1/(1+1/s))
. . .
. . .
. . .
The depth is an integer, but s is a floating point number.
What would be an optimal algorithm (performance-wise) to calculate the result for such a fraction with large depth?
Hint: unroll each of these formulas using basic algebra. You will see a pattern emerge.
I'll show you the first steps so it becomes obvious:
f(2,s) = 1+1/s = (s+1)/s
f(3,s) = 1+1/f(2,s) = 1+(s/(s+1)) = (1*(s+1) + s)/(s+1) = (2*s + 1) / (s + 1)
/* You multiply the first "1" by denominator */
f(4,s) = 1+1/f(3,s) = 1+(s+1)/(2s+1) = (1*(2*s+1) + (s+1))/(2*s+1) = (3*s + 2) / (2*s + 1)
f(5,s) = 1+1/f(4,s) = 1+(2s+1)/(3s+2) = (1*(3*s+2) + (2s+1))/(3*s+2) = (5*s + 3) / (3*s + 2)
...
Hint2: if you don't see the obvious pattern emerging form the above, the most optimal algorithm would involve calculating Fibonacci numbers (so you'd need to google for optimal Fibonacci # generator).
I'd like to elaborate a bit on DVK's excellent answer. I'll stick with his notation f(d,s) to denote the sought value for depth d.
If you calculate the value f(d,s) for large d, you'll notice that the values converge as d increases.
Let φ=f(∞,s). That is, φ is the limit as d approaches infinity, and is the continued fraction fully expanded. Note that φ contains a copy of itself, so that we can write φ=1+1/φ. Multiplying both sides by φ and rearranging, we get the quadratic equation
φ2 - φ - 1 = 0
which can be solved to get
φ = (1 + √5)/2.
This is the famous golden ratio.
You'll find that f(d,s) is very close to φ as d gets large.
But wait. There's more!
As DVK pointed out, the formula for f(d,s) involves terms from the Fibonacci sequence. In particular, it involves ratios of successive terms of the Fibonacci sequence. There is a closed form expression for the nth term of the sequence, namely
(φn-(1-φ)n)/√5.
Since 1-φ is less than one, (1-φ)n gets small as n gets large, so a good approximation for the nth Fibonacci term is φn/√5. And getting back to DVK's formula, the ratio of successive terms in the Fibonacci sequence will tend to φn+1/φn = φ.
So that's a second way of getting to the fact that the continued fraction in this question evaluates to φ.
Smells like tail recursion(recursion(recursion(...))).
(In other words - loop it!)
I would start with calculating 1/s, which we will call a.
Then use a for-loop, as, if you use recursion, in C, you may experience a stack overflow.
Since this is homework I won't give much code, but, if you start with a simple loop, of 1, then keep increasing it, until you get to 4, then you can just go to n times.
Since you are always going to be dividing 1/s and division is expensive, just doing it one time will help with performance.
I expect that if you work it out that you can actually find a pattern that will help you to further optimize.
You may find an article such as this: http://www.b-list.org/weblog/2006/nov/05/programming-tips-learn-optimization-strategies/, to be helpful.
I am assuming by performance-wise you mean that you want it to be fast, regardless of memory used, btw.
You may find that if you cache the values that you calculated, at each step, that you can reuse them, rather than redoing an expensive calculation.
I personally would do 4-5 steps by hand, writing out the equations and results of each step, and see if any pattern emerges.
Update:
GCC has added tail recursion, and I never noticed it, since I try to limit recursion heavily in C, from habit. But this answer has a nice quick explanation of different optimizations gcc did based on the optimization level.
http://answers.yahoo.com/question/index?qid=20100511111152AAVHx6s