Searching through a partially sorted array in O(lgn)

Searching through a partially sorted array in O(lgn) - arrays

I'm having a hard time solving this problem.
A[1..n] is an array of real numbers which is partially sorted:
There are some p,q (1 <= p <= q <=n) so:
A[1] <= ... <= A[p]
A[p] >= ... >= A[q]
A[q] <= ... <= A[n]
How can we find a value in this array in O(lgn)?
(You can assume that the value exists in the array)

Make 3 binary searches: from 1 to p, p to q and q to n. The complexity is still O(logn).
Since we don't know p and q:
You cannot solve this problem in logn time. Assume a case where you have a sorted list of positive numbers with one zero mixed in (p+1=q and A[q]=0). This situation satisfies all the criteria you mentioned. Now, the problem of finding where that zero is located cannot be solved in sub O(n) time. Therefore your problem cannot be solved in O(logn) time.

Despite the "buried zero" worst case already pointed out, I would still recommend implementing an algorithm that can often speed things up, depending on p,q. For example, suppose that you have n numbers, and each increasing and decreasing region has size at least k. Then if you check 2^m elements in your array, including the first and last element and the rest of the elements as equally spaced as possible, starting with m=2 and then iteratively increasing m by 1, eventually you will reach m when you find 3 pairs of consecutive elements (A,B),(C,D),(E,F) from left-to-right out of the 2^m elements that you have checked, which satisfy A < B, C > D, E < F (some pairs may share elements). If my back-of-the-envelope calculation is correct, then the worst-case m you will need to achieve this will have you checking no more than 4n/k elements, so e.g. if k=100 you are much faster than checking all n elements. Then you know everything before A and everything after F are increasing sequences, and you can binary search through them. Now, if m got big enough that you checked at least sqrt(n) elements, then you can finish up by doing a brute-force search between A and F and the overall running time will be O(n/k + sqrt(n)). On the other hand, if the final m had you check fewer than sqrt(n) elements, then you can further increase m until you have checked sqrt(n) elements. Then there will be 2 pairs of consecutive checked elements (A,B),(C,D) that satisfy A < B, C > D, and there will also be 2 pairs of consecutive checked elements (W,X),(Y,Z) later in the array that satisfy W > X, Y < Z. Then everything before A is increasing, everything between D and W is decreasing, and everything after Z is increasing. So you can binary search these 3 regions in the array. The remaining part of the array that you haven't entirely searched through has size O(sqrt(n)), so you can use brute-force search the unchecked regions and the overall running time is O(sqrt(n)). Thus the bound O(n/k + sqrt(n)) holds in general. I have a feeling this is worst-case optimal, but I don't have a proof.

It's solvable in O(log2n).
if at midpoint the slope is decreasing we're in the p..q range.
if at midpoint the slope is increasing, we're either in 1..p or in q..n range.
perform a binary search in 1.. mid point and mid point..n ranges to seek for a value where the slope is decreasing. It will be found only in one of the ranges. Now we know in which of the 1..p and q..n subranges the mid point is located.
repeat the process from (1) for the subrange with the peaks until hitting the p..q range.
find the peaks in the subranges by applying algorithm in Divide and conquer algorithm applied in finding a peak in an array.
perform 3 binary searches in the ranges 1..p, p..q, q..n.
==> Overall complexity is O(log2n).

Related

Can I sort an array in O(n) if all values are positive and with digits <= k?

Supposing I have an array A[1 ... n] , and no range for its values apart from the fact that they are positive. If I know they have up to k digits, is it possible to sort array in O(n)?
All examples I have come across for O(n) sorting give an upper bound for the values in the array. If there is a duplicate please let me know.

This depends on whether k is a constant or not.
If your numbers have k digits each, then you do have a bound on the numbers, since they can't be any bigger than 10k - 1. You could therefore use radix sort to sort the integers. The runtime of radix sort is O((n + b)logb U), where n is the number of numbers to sort, b is the base of your radix sort, and U is the maximum value that you're sorting. In your case, that works out to
O((n + b) logb 10k) = O(k(n + b)).
This is where the "it depends" comes in. If k is some fixed number that never changes - say, it's always 137 or something like that - then the above expression reduces to O(n + b), and picking b to be any constant (say, base-2 for your radix sort) gives a runtime of O(n). On the other hand, if k can vary (say, the numbers are allowed to be as big as you'd like them to be, and then after seeing the numbers you work out what k is), then the above expression can't be simplified beyond O(kn) because k is a parameter to the algorithm.
Hope this helps!

If k <= n, you can, otherwise it's not guaranteed.
If I give you n = 4 and k = 5, where would you place an element 3?

How can I find an element in an m-sorted array, where the rest of the array is zeros and m is not given

Given an array with n elements. The first m elements in the array are sorted (without duplicates) and different from zero. The rest of the array is zeros.
It should be noted that m is not known - it could be anything, but it is known that n >> m.
Now, given x; if it exists in the array, I should return its index, otherwise, it does not exist.
Now the thing is that I have to find an algorithm that does so.
A trivial answer would be to scan the array as long as the next element is not zero, with O(m) time complexity, or simply a modified version of "Binary Search" which would take O(logn) time complexity.
Apart from these two solutions - I have no clue. It has been hinted that we can find x in O(log m) time complexity, by finding m in O(logm) time Then I could do binary search on the first m elements... but otherwise I have no clue!

You can find m in O(log m) time as follows (considering that the first m elements do not contain 0).
i = 1
while A[i] != 0 do
i = 2*i
return i
This gives you an upper bound on m that is at most 2m (meaning m <= i <= 2m). All you have to do then is a binary search on the i first elements of your array to find x.
Each operation can be done in O(log m) time, so the total complexity is O(log m).

Is it possible to do 3-sum/4-sum...k-sum better than O(n^2) with these conditions? - Tech Interview

this is a classic problem, but I am curious if it is possible to do better with these conditions.
Problem: Suppose we have a sorted array of length 4*N, that is, each element is repeated 4 times. Note that N can be any natural number. Also, each element in the array is subject to the constraint 0 < A[i] < 190*N. Are there 4 elements in the array such that A[i] + A[j] + A[k] + A[m] = V, where V can be any positive integer; note we must use exactly 4 elements and they can be repeated. It is not necessarily a requirement to find the 4 elements that satisfy the condition, rather, just showing it can be done for a given array and V is enough.
Ex : A = [1,1,1,1,4,4,4,4,5,5,5,5,11,11,11,11]
V = 22
This is true because, 11 + 5 + 5 + 1 = 22.
My attempt:
Instead of "4sum" I first tried k-sum, but this proved pretty difficult so I instead went for this variation. The first solution I came to was rather naive O(n^2). However, given these constraints, I imagine that we can do better. I tried some dynamic programming methods and divide and conquer, but that didn't quite get me anywhere. To be specific, I am not sure how to cleverly approach this in a way where I can "eliminate" portions of the array without having to explicitly check values against all or almost all permutations.

Make an vector S0 of length 256N where S0[x]=1 if x appears in A.
Perform a convolution of S0 with itself to produce a new vector S1 of length 512N. S1[x] is nonzero iff x is the sum of 2 numbers in A.
Perform a convolution of S1 with itself to make a new vector S2. S2[x] is nonzero iff x is the sum of 4 numbers in A.
Check S2[V] to get your answer.
Convolution can be performed in O(N log N) time using FFT convolution (http://www.dspguide.com/ch18/2.htm) or similar techniques.
Since at most 4 such convolutions are performed, the total complexity is O(N log N)

How to find the most frequent number and its frequency in an array in range L,R most efficiently?

Lets say we are given an array A[] of length N and we have to answer Q queries which consists of two integers L,R. We have to find the number from A[L] to A[R] which has its frequency at least (R-L+1)/2. If such number doesn't exist then we have to print "No such number"
I could think of only O(Q*(R-L)) approach of running a frequency counter and first obtaining the most frequent number in the array from L to R. Then count its frequency.
But more optimization is needed.
Constraints: 1<= N <= 3*10^5, ,1<=Q<=10^5 ,1<=L<=R<=N

I know an O((N + Q) * sqrt(N)) solution:
Let's call a number heavy if at occurs at least B times in the array. There are at most N / B heavy numbers in the array.
If the query segment is "short" (R - L + 1 < 2 * B), we can answer it in O(B) time (by simply iterating over all elements of the range).
If the query segment is "long" (R - L + 1 >= 2 * B), a frequent element must be heavy. We can iterate over all heavy numbers and check if at least one then fits (to do that, we can precompute prefix sums of number of occurrences for each heavy element and find the number of its occurrences in a [L, R] segment in constant time).
If we set B = C * sqrt(N) for some constant C, this solution runs in O((N + Q) * sqrt(N)) time and uses O(N * sqrt(N)) memory. With properly chosen C, and may fit into time and memory limit.
There is also a randomized solution which runs in O(N + Q * log N * k) time.
Let's store a vector of position of occurrences for each unique element in the array. Now we can find the number of occurrences of a fixed element in a fixed range in O(log N) time (two binary searches over the vector of occurrences).
For each query, we'll do the following:
pick a random element from the segment
Check the number of its occurrences in O(log N) time as described above
If it's frequent enough, we are done. Otherwise, we pick another random element and do the same
If a frequent element exists, the probability not to pick it is no more than 1 / 2 for each trial. If we do it k times, the probability not to find it is (1 / 2) ^ k
With a proper choice of k (so that O(k * log N) per query is fast enough and (1 / 2) ^ k is reasonably small), this solution should pass.
Both solutions are easy to code (the first just needs prefix sums, the second only uses a vector of occurrences and binary search). If I had to code one them, I'd pick the latter (the former can be more painful to squeeze in time and memory limit).

fast algorithm of finding sums in array

I am looking for a fast algorithm:
I have a int array of size n, the goal is to find all patterns in the array that
x1, x2, x3 are different elements in the array, such that x1+x2 = x3
For example I know there's a int array of size 3 is [1, 2, 3] then there's only one possibility: 1+2 = 3 (consider 1+2 = 2+1)
I am thinking about implementing Pairs and Hashmaps to make the algorithm fast. (the fastest one I got now is still O(n^2))
Please share your idea for this problem, thank you

Edit: The answer below applies to a version of this problem in which you only want one triplet that adds up like that. When you want all of them, since there are potentially at least O(n^2) possible outputs (as pointed out by ex0du5), and even O(n^3) in pathological cases of repeated elements, you're not going to beat the simple O(n^2) algorithm based on hashing (mapping from a value to the list of indices with that value).
This is basically the 3SUM problem. Without potentially unboundedly large elements, the best known algorithms are approximately O(n^2), but we've only proved that it can't be faster than O(n lg n) for most models of computation.
If the integer elements lie in the range [u, v], you can do a slightly different version of this in O(n + (v-u) lg (v-u)) with an FFT. I'm going to describe a process to transform this problem into that one, solve it there, and then figure out the answer to your problem based on this transformation.
The problem that I know how to solve with FFT is to find a length-3 arithmetic sequence in an array: that is, a sequence a, b, c with c - b = b - a, or equivalently, a + c = 2b.
Unfortunately, the last step of the transformation back isn't as fast as I'd like, but I'll talk about that when we get there.
Let's call your original array X, which contains integers x_1, ..., x_n. We want to find indices i, j, k such that x_i + x_j = x_k.
Find the minimum u and maximum v of X in O(n) time. Let u' be min(u, u*2) and v' be max(v, v*2).
Construct a binary array (bitstring) Z of length v' - u' + 1; Z[i] will be true if either X or its double [x_1*2, ..., x_n*2] contains u' + i. This is O(n) to initialize; just walk over each element of X and set the two corresponding elements of Z.
As we're building this array, we can save the indices of any duplicates we find into an auxiliary list Y. Once Z is complete, we just check for 2 * x_i for each x_i in Y. If any are present, we're done; otherwise the duplicates are irrelevant, and we can forget about Y. (The only situation slightly more complicated is if 0 is repeated; then we need three distinct copies of it to get a solution.)
Now, a solution to your problem, i.e. x_i + x_j = x_k, will appear in Z as three evenly-spaced ones, since some simple algebraic manipulations give us 2*x_j - x_k = x_k - 2*x_i. Note that the elements on the ends are our special doubled entries (from 2X) and the one in the middle is a regular entry (from X).
Consider Z as a representation of a polynomial p, where the coefficient for the term of degree i is Z[i]. If X is [1, 2, 3, 5], then Z is 1111110001 (because we have 1, 2, 3, 4, 5, 6, and 10); p is then 1 + x + x2 + x3 + x4 + x5 + x9.
Now, remember from high school algebra that the coefficient of xc in the product of two polynomials is the sum over all a, b with a + b = c of the first polynomial's coefficient for xa times the second's coefficient for xb. So, if we consider q = p2, the coefficient of x2j (for a j with Z[j] = 1) will be the sum over all i of Z[i] * Z[2*j - i]. But since Z is binary, that's exactly the number of triplets i,j,k which are evenly-spaced ones in Z. Note that (j, j, j) is always such a triplet, so we only care about ones with values > 1.
We can then use a Fast Fourier Transform to find p2 in O(|Z| log |Z|) time, where |Z| is v' - u' + 1. We get out another array of coefficients; call it W.
Loop over each x_k in X. (Recall that our desired evenly-spaced ones are all centered on an element of X, not 2*X.) If the corresponding W for twice this element, i.e. W[2*(x_k - u')], is 1, we know it's not the center of any nontrivial progressions and we can skip it. (As argued before, it should only be a positive integer.)
Otherwise, it might be the center of a progression that we want (so we need to find i and j). But, unfortunately, it might also be the center of a progression that doesn't have our desired form. So we need to check. Loop over the other elements x_i of X, and check if there's a triple with 2*x_i, x_k, 2*x_j for some j (by checking Z[2*(x_k - x_j) - u']). If so, we have an answer; if we make it through all of X without a hit, then the FFT found only spurious answers, and we have to check another element of W.
This last step is therefore O(n * 1 + (number of x_k with W[2*(x_k - u')] > 1 that aren't actually solutions)), which is maybe possibly O(n^2), which is obviously not okay. There should be a way to avoid generating these spurious answers in the output W; if we knew that any appropriate W coefficient definitely had an answer, this last step would be O(n) and all would be well.
I think it's possible to use a somewhat different polynomial to do this, but I haven't gotten it to actually work. I'll think about it some more....
Partially based on this answer.

It has to be at least O(n^2) as there are n(n-1)/2 different sums possible to check for other members. You have to compute all those, because any pair summed may be any other member (start with one example and permute all the elements to convince yourself that all must be checked). Or look at fibonacci for something concrete.
So calculating that and looking up members in a hash table gives amortised O(n^2). Or use an ordered tree if you need best worst-case.

You essentially need to find all the different sums of value pairs so I don't think you're going to do any better than O(n2). But you can optimize by sorting the list and reducing duplicate values, then only pairing a value with anything equal or greater, and stopping when the sum exceeds the maximum value in the list.