Big-O notation for a pairing operation - arrays

I'm having a hard time wrapping my head around the big-O notation for a pairing operation. The question is pretty simple- Generate all possible pairs for a given list of numbers in an array.
My first guess is to have a nested for/foreach loop and generate the pairs. This is easy enough and I get that for every n, I analyze n other numbers and this gives me a complexity of n^2.
Now, if I try to optimize this further and say that (1,4) is the same as (4,1), then for a sorted array like 1,2,3,4,5. I only run the pairing operation in a nested for loop this way
for (i =0; i < count; i++) {
for (j = i + 1; j < count - 1; j++)
}
}
In this case, I only run through the array < n^2 times. For a sample size of 7 numbers, i ran through the loop 21 times to generate the pairs. I know that this cannot be a log-n operation and I'm tempted to say that this operation converges to n^2 but I don't remember enough from my math or theory classes to answer this definitively. How do I go about this problem?
Context: I had an interview with a similar question and this was born out of an argument I had with my friend if a pairing operation from a list can ever be better than n^2.

You are correct that you're doing fewer than n2 operations. The question is how many fewer operations you are doing.
Let's think about how many pairs there are in the array. If each of the n numbers can be paired with (n - 1) other numbers, the total number of pairs possible is n(n - 1). Each iteration of the original for loop generates one of these pairs, so the total number of pairs you generate is n2 - n, which is O(n2).
Now, what about if you eliminate duplicate pairs by saying that (1, 4) and (4, 1) are the same? In this case, note that half of the pairs you generate are going to be extraneous - you'll generate each pair twice. This means that the number of pairs is (n2 - n) / 2. This expression is less than n2, but notice that it is still O(n2) because big-O notation ignores constants.
In other words - you are right that you are generating fewer than n2 pairs, but the total number of pairs you're creating is still O(n2).
More generally - if you ever decrease the total amount of work that an algorithm does by some constant factor (say, you cut the work in half, or cut the work by a factor of 100), you have not changed the big-O runtime of the algorithm. Big-O completely ignores constants. In order to decrease the big-O runtime, you need to decrease the total amount of work by an amount that is more than a constant; say, by a factor of n, log n, etc.
Hope this helps!

Remember that big-O notation involves an implied multiplicative constant. So your complexity is still O(n^2) if your run-time is <= k.n^2 as n -> infinity.

It's still O(n^2) since now you have exactly half the pairs that you had before introducing the order requirement. Dividing by two does not change the Big O.

Related

what is the correct time complexity for this following code?

I just learned time complexity and I'm trying to calculate the theta for this code:
for(i=2; i<n; i=i+1) {
for(j=1;j<n;j=j*i) {
count++;
}
}
I though that its n*log(n), because the first loop complexity is n, and the second loop is log(n). but I've been told the answer is n.
can someone tell what is the correct answer and explain why?
In the inner loop, j starts at 1 and on each cycle it is multiplied by i, so it takes the values 1 = i0, i1, i2, i3, etc. The iteration stops when j == ik for that integer k such that ik-1 <= n < ik. That takes k+1 iterations.
Logarithms with base > 1 are strictly increasing functions over the positive real numbers, so the relations above are preserved if we take the base-i logarithm of each term: k-1 <= logi(n) < k. With a little algebra, we can then get k+1 <= logi(n) + 2. Since k+1 is the number of iterations, and every inner-loop iteration has the same, constant cost, that gives us that the cost of the inner loop for a given value of i is O(logi(n)).
The overall cost of the loop nest, then, is bounded by Σi=2,n O(logi(n)). That can be written in terms of the natural logarithm as Σi=2,n O(loge(n) / loge(i)). Dropping the 'e' subscript and factoring, we can reach O((log n) Σi=2,n (1/(log i))). And that's one answer.
But for comparison with the complexity of other algorithms, we would like a simpler formulation, even if it's a little looser. By observing that 1/log(i) decreases, albeit slowly, as i increases, we can observe that one slightly looser bound would be O((log n) Σi=2,n (1/(log 2))) = O((log n) * (n-1) / (log 2)) = O(n log n). Thus, we can conclude that O(n log n) is an asymptotic bound.
Is there a tighter bound with similarly simple form? Another answer claims O(n), but it seems to be based on a false premise, or else its reasoning is unclear to me. There may be a tighter bound expression than O(n log n), but I don't think O(n) is a bound.
Update:
Thanks to #SupportUkraine, we can say that the performance is indeed bounded by O(n). Here's an argument inspired by their comment:
We can observe that for all i greater than sqrt(n), the inner loop body will execute exactly twice, contributing O(n) inner-loop iterations in total.
For each of the remaining sqrt(n) outer-loop iterations (having i < sqrt(n)), the number of inner-loop iterations is bounded by O(logi(n)) = O(loge(n)). These contribute O(sqrt(n) * log(n)) iterations in total.
Thus, the whole loop nest costs O(sqrt(n) * log(n)) + O(n). But sqrt(n) * log(n) grows more slowly than n, so this is O(n).
the second loop isn't O(log n) because the multiplier i keeps increasing. It's O(login). This causes the number of repetitions of the inner loop to be inversely proportionally to i, so the inner loops average out to the same number of iterations as the outer loop, making the whole thing O(n).

Can I sort an array in O(n) if all values are positive and with digits <= k?

Supposing I have an array A[1 ... n] , and no range for its values apart from the fact that they are positive. If I know they have up to k digits, is it possible to sort array in O(n)?
All examples I have come across for O(n) sorting give an upper bound for the values in the array. If there is a duplicate please let me know.
This depends on whether k is a constant or not.
If your numbers have k digits each, then you do have a bound on the numbers, since they can't be any bigger than 10k - 1. You could therefore use radix sort to sort the integers. The runtime of radix sort is O((n + b)logb U), where n is the number of numbers to sort, b is the base of your radix sort, and U is the maximum value that you're sorting. In your case, that works out to
O((n + b) logb 10k) = O(k(n + b)).
This is where the "it depends" comes in. If k is some fixed number that never changes - say, it's always 137 or something like that - then the above expression reduces to O(n + b), and picking b to be any constant (say, base-2 for your radix sort) gives a runtime of O(n). On the other hand, if k can vary (say, the numbers are allowed to be as big as you'd like them to be, and then after seeing the numbers you work out what k is), then the above expression can't be simplified beyond O(kn) because k is a parameter to the algorithm.
Hope this helps!
If k <= n, you can, otherwise it's not guaranteed.
If I give you n = 4 and k = 5, where would you place an element 3?

Is it possible to do 3-sum/4-sum...k-sum better than O(n^2) with these conditions? - Tech Interview

this is a classic problem, but I am curious if it is possible to do better with these conditions.
Problem: Suppose we have a sorted array of length 4*N, that is, each element is repeated 4 times. Note that N can be any natural number. Also, each element in the array is subject to the constraint 0 < A[i] < 190*N. Are there 4 elements in the array such that A[i] + A[j] + A[k] + A[m] = V, where V can be any positive integer; note we must use exactly 4 elements and they can be repeated. It is not necessarily a requirement to find the 4 elements that satisfy the condition, rather, just showing it can be done for a given array and V is enough.
Ex : A = [1,1,1,1,4,4,4,4,5,5,5,5,11,11,11,11]
V = 22
This is true because, 11 + 5 + 5 + 1 = 22.
My attempt:
Instead of "4sum" I first tried k-sum, but this proved pretty difficult so I instead went for this variation. The first solution I came to was rather naive O(n^2). However, given these constraints, I imagine that we can do better. I tried some dynamic programming methods and divide and conquer, but that didn't quite get me anywhere. To be specific, I am not sure how to cleverly approach this in a way where I can "eliminate" portions of the array without having to explicitly check values against all or almost all permutations.
Make an vector S0 of length 256N where S0[x]=1 if x appears in A.
Perform a convolution of S0 with itself to produce a new vector S1 of length 512N. S1[x] is nonzero iff x is the sum of 2 numbers in A.
Perform a convolution of S1 with itself to make a new vector S2. S2[x] is nonzero iff x is the sum of 4 numbers in A.
Check S2[V] to get your answer.
Convolution can be performed in O(N log N) time using FFT convolution (http://www.dspguide.com/ch18/2.htm) or similar techniques.
Since at most 4 such convolutions are performed, the total complexity is O(N log N)

Searching through a partially sorted array in O(lgn)

I'm having a hard time solving this problem.
A[1..n] is an array of real numbers which is partially sorted:
There are some p,q (1 <= p <= q <=n) so:
A[1] <= ... <= A[p]
A[p] >= ... >= A[q]
A[q] <= ... <= A[n]
How can we find a value in this array in O(lgn)?
(You can assume that the value exists in the array)
Make 3 binary searches: from 1 to p, p to q and q to n. The complexity is still O(logn).
Since we don't know p and q:
You cannot solve this problem in logn time. Assume a case where you have a sorted list of positive numbers with one zero mixed in (p+1=q and A[q]=0). This situation satisfies all the criteria you mentioned. Now, the problem of finding where that zero is located cannot be solved in sub O(n) time. Therefore your problem cannot be solved in O(logn) time.
Despite the "buried zero" worst case already pointed out, I would still recommend implementing an algorithm that can often speed things up, depending on p,q. For example, suppose that you have n numbers, and each increasing and decreasing region has size at least k. Then if you check 2^m elements in your array, including the first and last element and the rest of the elements as equally spaced as possible, starting with m=2 and then iteratively increasing m by 1, eventually you will reach m when you find 3 pairs of consecutive elements (A,B),(C,D),(E,F) from left-to-right out of the 2^m elements that you have checked, which satisfy A < B, C > D, E < F (some pairs may share elements). If my back-of-the-envelope calculation is correct, then the worst-case m you will need to achieve this will have you checking no more than 4n/k elements, so e.g. if k=100 you are much faster than checking all n elements. Then you know everything before A and everything after F are increasing sequences, and you can binary search through them. Now, if m got big enough that you checked at least sqrt(n) elements, then you can finish up by doing a brute-force search between A and F and the overall running time will be O(n/k + sqrt(n)). On the other hand, if the final m had you check fewer than sqrt(n) elements, then you can further increase m until you have checked sqrt(n) elements. Then there will be 2 pairs of consecutive checked elements (A,B),(C,D) that satisfy A < B, C > D, and there will also be 2 pairs of consecutive checked elements (W,X),(Y,Z) later in the array that satisfy W > X, Y < Z. Then everything before A is increasing, everything between D and W is decreasing, and everything after Z is increasing. So you can binary search these 3 regions in the array. The remaining part of the array that you haven't entirely searched through has size O(sqrt(n)), so you can use brute-force search the unchecked regions and the overall running time is O(sqrt(n)). Thus the bound O(n/k + sqrt(n)) holds in general. I have a feeling this is worst-case optimal, but I don't have a proof.
It's solvable in O(log2n).
if at midpoint the slope is decreasing we're in the p..q range.
if at midpoint the slope is increasing, we're either in 1..p or in q..n range.
perform a binary search in 1.. mid point and mid point..n ranges to seek for a value where the slope is decreasing. It will be found only in one of the ranges. Now we know in which of the 1..p and q..n subranges the mid point is located.
repeat the process from (1) for the subrange with the peaks until hitting the p..q range.
find the peaks in the subranges by applying algorithm in Divide and conquer algorithm applied in finding a peak in an array.
perform 3 binary searches in the ranges 1..p, p..q, q..n.
==> Overall complexity is O(log2n).

find if two arrays contain the same set of integers without extra space and faster than NlogN

I came across this post, which reports the following interview question:
Given two arrays of numbers, find if each of the two arrays have the
same set of integers ? Suggest an algo which can run faster than NlogN
without extra space?
The best that I can think of is the following:
(a) sort each array, and then (b) have two pointers moving along the two arrays and check if you find different values ... but step (a) has already NlogN complexity :(
(a) scan shortest array and put values into a map, and then (b) scan second array and check if you find a value that is not in the map ... here we have linear complexity, but we I use extra space
... so, I can't think of a solution for this question.
Ideas?
Thank you for all the answers. I feel many of them are right, but I decided to choose ruslik's one, because it gives an interesting option that I did not think about.
You can try a probabilistic approach by choosing a commutative function for accumulation (eg, addition or XOR) and a parametrized hash function.
unsigned addition(unsigned a, unsigned b);
unsigned hash(int n, int h_type);
unsigned hash_set(int* a, int num, int h_type){
unsigned rez = 0;
for (int i = 0; i < num; i++)
rez = addition(rez, hash(a[i], h_type));
return rez;
};
In this way the number of tries before you decide that the probability of false positive will be below a certain treshold will not depend on the number of elements, so it will be linear.
EDIT: In general case the probability of sets being the same is very small, so this O(n) check with several hash functions can be used for prefiltering: to decide as fast as possible if they are surely different or if there is a probability of them being equivalent, and if a slow deterministic method should be used. The final average complexity will be O(n), but worst case scenario will have the complexity of the determenistic method.
You said "without extra space" in the question but I assume that you actually mean "with O(1) extra space".
Suppose that all the integers in the arrays are less than k. Then you can use in-place radix sort to sort each array in time O(n log k) with O(log k) extra space (for the stack, as pointed out by yi_H in comments), and compare the sorted arrays in time O(n log k). If k does not vary with n, then you're done.
I'll assume that the integers in question are of fixed size (eg. 32 bit).
Then, radix-quicksorting both arrays in place (aka "binary quicksort") is constant space and O(n).
In case of unbounded integers, I believe (but cannot proof, even if it is probably doable) that you cannot break the O(n k) barrier, where k is the number of digits of the greatest integer in either array.
Whether this is better than O(n log n) depends on how k is assumed to scale with n, and therefore depends on what the interviewer expects of you.
A special, not harder case is when one array holds 1,2,..,n. This was discussed many times:
How to tell if an array is a permutation in O(n)?
Algorithm to determine if array contains n...n+m?
mathoverflow
and despite many tries no deterministic solutions using O(1) space and O(n) time were shown. Either you can cheat the requirements in some way (reuse input space, assume integers are bounded) or use probabilistic test.
Probably this is an open problem.
Here is a co-rp algorithm:
In linear time, iterate over the first array (A), building the polynomial
Pa = A[0] - x)(A[1] -x)...(A[n-1] - x). Do the same for array B, naming this polynomial Pb.
We now want to answer the question "is Pa = Pb?" We can check this probabilistically as follows. Select a number r uniformly at random from the range [0...4n] and compute d = Pa(r) - Pb(r) in linear time. If d = 0, return true; otherwise return false.
Why is this valid? First of all, observe that if the two arrays contain the same elements, then Pa = Pb, so Pa(r) = Pb(r) for all r. With this in mind, we can easily see that this algorithm will never erroneously reject two identical arrays.
Now we must consider the case where the arrays are not identical. By the Schwart-Zippel Lemma, P(Pa(r) - Pb(r) = 0 | Pa != Pb) < (n/4n). So the probability that we accept the two arrays as equivalent when they are not is < (1/4).
The usual assumption for these kinds of problems is Theta(log n)-bit words, because that's the minimum needed to index the input.
sshannin's polynomial-evaluation answer works fine over finite fields, which sidesteps the difficulties with limited-precision registers. All we need are a prime of the appropriate (easy to find under the same assumptions that support a lot of public-key crypto) or an irreducible polynomial in (Z/2)[x] of the appropriate degree (difficulty here is multiplying polynomials quickly, but I think the algorithm would be o(n log n)).
If we can modify the input with the restriction that it must maintain the same set, then it's not too hard to find space for radix sort. Select the (n/log n)th element from each array and partition both arrays. Sort the size-(n/log n) pieces and compare them. Now use radix sort on the size-(n - n/log n) pieces. From the previously processed elements, we can obtain n/log n bits, where bit i is on if a[2*i] > a[2*i + 1] and off if a[2*i] < a[2*i + 1]. This is sufficient to support a radix sort with n/(log n)^2 buckets.
In the algebraic decision tree model, there are known Omega(NlogN) lower bounds for computing set intersection (irrespective of the space limits).
For instance, see here: http://compgeom.cs.uiuc.edu/~jeffe/teaching/497/06-algebraic-tree.pdf
So unless you do clever bit manipulations/hashing type approaches, you cannot do better than NlogN.
For instance, if you used only comparisons, you cannot do better than NlogN.
You can break the O(n*log(n)) barrier if you have some restrictions on the range of numbers. But it's not possible to do this if you cannot use any extra memory (you need really silly restrictions to be able to do that).
I would also like to note that even O(nlog(n)) with sorting is not trivial if you have O(1) space limit as merge sort uses O(n) space and quicksort (which is not even strict o(nlog(n)) needs O(log(n)) space for the stack. You have to use heapsort or smoothsort.
Some companies like to ask questions which cannot be solved and I think it is a good practice, as a programmer you have to know both what's possible and how to code it and also know what are the limits so you don't waste your time on something that's not doable.
Check this question for a couple of good techniques to use:
Algorithm to tell if two arrays have identical members
For each integer i check that the number of occurrences of i in the two arrays are either both zero or both nonzero, by iterating over the arrays.
Since the number of integers is constant the total runtime is O(n).
No, I wouldn't do this in practice.
Was just thinking if there was a way you could hash the cumulative of both arrays and compare them, assuming the hashing function doesn't produce collisions from two differing patterns.
why not i find the sum , product , xor of all the elements one array and compare them with the corresponding value of the elements of the other array ??
the xor of elements of both arrays may give zero if the it is like
2,2,3,3
1,1,2,2
but what if you compare the xor of the elements of two array to be equal ???
consider this
10,3
12,5
here xor of both arrays will be same !!! (10^3)=(12^5)=9
but their sum and product are different . I think two different set of elements cannot have same sum ,product and xor !
This can be analysed by simple bitvalue examination.
Is there anything wrong in this approach ??
I'm not sure that correctly understood the problem, but if you are interested in integers that are in both array:
If N >>>>> 2^SizeOf(int) (count of bit for integer (16, 32, 64)) there is one solution:
a = Array(N); //length(a) = N;
b = Array(M); //length(b) = M;
//x86-64. Integer consist of 64 bits.
for i := 0 to 2^64 / 64 - 1 do //very big, but CONST
for k := 0 to M - 1 do
if a[i] = b[l] then doSomething; //detected
for i := 2^64 / 64 to N - 1 do
if not isSetBit(a[i div 64], i mod 64) then
setBit(a[i div 64], i mod 64);
for i := 0 to M - 1 do
if isSetBit(a[b[i] div 64], b[i] mod 64) then doSomething; //detected
O(N), with out aditional structures
All I know is that comparison based sorting cannot possibly be faster than O(NlogN), so we can eliminate most of the "common" comparison based sorts. I was thinking of doing a bucket sort. Perhaps if this qn was asked in an interview, the best response would first be to clarify what sort of data those integers represent. For e.g., if they represent a persons age, then we know that the range of values of int is limited, and can use bucket sort at O(n). However, this will not be in place....
If the arrays have the same size, and there are guaranteed to be no duplicates, sum each of the arrays. If the sum of the values is different, then they contain different integers.
Edit: You can then sum the log of the entries in the arrays. If that is also the same, then you have the same entries in the array.

Resources