2D peak finding binary search - arrays

Taken from mit 6.006 : To find a peak in a 2D array, where a number is a peak if it is >= than all its neighbours:
Pick middle column j = m/2
Find global maximum on column j at (i, j)
Compare (i, j − 1),(i, j),(i, j + 1)
Pick left columns of (i, j − 1) > (i, j), Similarly for right
(i, j) is a 2D-peak if neither condition holds
Solve the new problem with half the number of columns.
When you have a single column, find global maximum and you‘re done.
I understand why it might find a peak, but I think it only finds a peak on half of the array if it exists
Using binary search here confuses me since (1) the 2d array is not sorted, and each time you halve you are essentially saying there can be no peak on the left(which is not confirmed?)
It finds the maximum element in the middle column - This ignores the possibility of a peak formed from non-maximal numbers, or that you can have more than one 1D peak in that column
They compare numbers to the left and right of the max of the middle column - this discounts that there may be elements in the left and right column that are larger than max but not adjacent
Can someone explain to me why this algorithm is correct, hopefully by explaining (1)(2)(3)

each time you halve you are essentially saying there can be no peak on the left
Ah, no, we're saying that there is a peak on the right. There can be peaks on the left too, but we don't need to find every peak.
To prove that there is a peak on the (without loss of generality) right, consider the following "gradient ascent" algorithm:
Start at an arbitrary number.
While the current number has at least one greater neighbor, go to an arbitrary greater neighbor.
This algorithm never cycles because the current number only increases. This algorithm hence terminates because there are finitely many numbers. When the algorithm terminates, it has found a peak.
Consider what happens if (i, j) has the maximum value in its column and we start gradient ascent at (i, j). Either (i, j) is a peak (great!), or we move to a greater number in one of the adjacent columns. In the latter case, this number is greater than the maximum in column j, hence greater than every number in column j. Therefore, gradient ascent will never reenter the column, and thus it will never enter the columns on the other side, implying the existence of a peak on the desired side.

The idea of using binary search to find peak elements in a matrix is very good. But if used alone, it returns only one peak element (this value is generally not the maximum).
You can use the following code to find all the peak elements by binary search method.
def Div(lst, low, high, c):
mid = (low + high)//2
if low > high:
return ""
else:
if mid+1 < len(lst) and lst[mid][c] > lst[mid+1][c] and lst[mid][c] > lst[mid -1][c] and lst[mid][c] > lst[mid][c + 1] and lst[mid][c] > lst[mid][c - 1]:
return str(lst[mid][c]) + " " + Div(lst, low, mid-1, c) + Div(lst, mid+1, high, c)
else:
return Div(lst, low, mid-1, c) + Div(lst, mid + 1, high, c)
def peak(lst, c):
return Div(lst, 0, len(lst), c)
lst=[[0,0,0,0,0,0],
[0,0,1,0,4,0],
[0,2,0,3,0,0],
[0,0,5,0,6,0],
[0,0,0,0,0,0]]
for i in range (0,5):
print(peak(lst,i))
output:
2
1 5
3
4 6
O(mn)

Related

program to find the number of exchanges that will happen until k minutes are passed

Given n runners running on a circular track. each runner when cross another runner, they exchange gems. Given an array of the time taken (in minutes) by each runner to complete the circular track and an integer k. We need to find the number of exchanges that will happen until k minutes are passed.
[EDIT 1]: I thought along the lines of using hcf of all the numbers for meeting point, but could not get through. Any help would be nice.
What we're trying to do here is to count the number of times a runner passes another runner. First of all, we can sort the runners by their speed, this will come in handy later. For the ith runner, we can easily compute the number of laps they will run, which I will refer to as L(i). For two runners i and j where i is faster than j, the number of times i will pass j is floor(L(i) - L(j)). Our solution is the sum of this value for all of the pairs of i and j where i > j.
If the limit for n is small enough, you can just loop over all these pairs and sum up the values in O(n^2) time. But if n is large, this will be too slow. If we simply wanted to compute the sum of L(i) - L(j) for all of i > j without the floor function, we can do this in linear time using prefix sums.
If our runners are numbered from 0 to n - 1 in the order of their speed, for each value of i, the sum of L(i) - L(j) for all values of j less than i is equal to L(i) * i - P(i - 1)), where P(j) is the precomputed value of the sum of L(0) + L(1) + L(2) + ... + L(j). Now we need to deal with the floor function. For two real numbers x and y where x > y, floor(x - y) is equal to floor(x) - floor(y) if the fractional part of x is greater than or equal to the fractional part of y, and floor(x) - floor(y) - 1 otherwise.
So if we need to compute the number of times a runner i pass another runner, we can first compute the value using the prefix sum technique described above with the floor of each L value and then subtract the number of j values where the fractional part of L(j) is greater than the fractional part of L(i). Finding the number of j values with the fractional part of L(j) greater than the fractional part of L(i) is basically inversion counting on real numbers, which can be done with a binary indexed tree.
The final complexity is O(n log n).

How to find the most frequent number and its frequency in an array in range L,R most efficiently?

Lets say we are given an array A[] of length N and we have to answer Q queries which consists of two integers L,R. We have to find the number from A[L] to A[R] which has its frequency at least (R-L+1)/2. If such number doesn't exist then we have to print "No such number"
I could think of only O(Q*(R-L)) approach of running a frequency counter and first obtaining the most frequent number in the array from L to R. Then count its frequency.
But more optimization is needed.
Constraints: 1<= N <= 3*10^5, ,1<=Q<=10^5 ,1<=L<=R<=N
I know an O((N + Q) * sqrt(N)) solution:
Let's call a number heavy if at occurs at least B times in the array. There are at most N / B heavy numbers in the array.
If the query segment is "short" (R - L + 1 < 2 * B), we can answer it in O(B) time (by simply iterating over all elements of the range).
If the query segment is "long" (R - L + 1 >= 2 * B), a frequent element must be heavy. We can iterate over all heavy numbers and check if at least one then fits (to do that, we can precompute prefix sums of number of occurrences for each heavy element and find the number of its occurrences in a [L, R] segment in constant time).
If we set B = C * sqrt(N) for some constant C, this solution runs in O((N + Q) * sqrt(N)) time and uses O(N * sqrt(N)) memory. With properly chosen C, and may fit into time and memory limit.
There is also a randomized solution which runs in O(N + Q * log N * k) time.
Let's store a vector of position of occurrences for each unique element in the array. Now we can find the number of occurrences of a fixed element in a fixed range in O(log N) time (two binary searches over the vector of occurrences).
For each query, we'll do the following:
pick a random element from the segment
Check the number of its occurrences in O(log N) time as described above
If it's frequent enough, we are done. Otherwise, we pick another random element and do the same
If a frequent element exists, the probability not to pick it is no more than 1 / 2 for each trial. If we do it k times, the probability not to find it is (1 / 2) ^ k
With a proper choice of k (so that O(k * log N) per query is fast enough and (1 / 2) ^ k is reasonably small), this solution should pass.
Both solutions are easy to code (the first just needs prefix sums, the second only uses a vector of occurrences and binary search). If I had to code one them, I'd pick the latter (the former can be more painful to squeeze in time and memory limit).

Finding the probability that two items are compared. (hints please)

I'm attempting to solve the following problem (from Prof. Jeff Erikson's notes): Given the algorithm below which takes in an unsorted array A and returns the k-th smallest element in the array (given that Partition does what its name implies via the standard quicksort method given the pivot returned by Random (which is assumed to return a uniformly random integer between 1 and n in linear time) and returns the new index of the pivot), we are to find the exact probability that this algorithm compares the i-th smallest and j-th smallest elements in the input array.
QuickSelect(A[1..n],k):
r <-- Partition(A[1..n],Random(n))
if k < r:
return QuickSelect(A[1..r-1],k)
else if k > r:
return QuickSelect(A[r+1..n],k-r)
else:
return A[k]
Now, I can see that the probability of the first if statement being true is (n-k)/n, the probability of the second block being true is (k-1)/n, and the probability of executing the else statement is 1/n. I also know that (assuming i < j) the probability of i < r < j is (j-i-1)/n which guarantees that the two elements are never compared. On the other hand, if i==r or j==r, then i and j are guaranteed to be compared. The part that really trips me up is what happens if r < i or j < r, because whether or not i and j are compared depends on the value of k (whether or not we are able to recursively call QuickSelect).
Any hints and/or suggestions would be greatly appreciated. This is for homework, so I would rather not have full solutions given to me so that I may actually learn a bit. Thanks in advance!
As it has already been mentioned Monte Carlo method is simple solution for fast (in sense of implementation) approximation.
There is a way to compute exact probability using dynamic programming
Here we will assume that all elements in array are distinct and A[i] < A[j].
Let us denote P(i, j, k, n) for probability of comparison ith and jth elements while selecting k-th in an n-elements array.
Then there is equal probability for r to be any of 1..n and this probability is 1/n. Also note that all this events are non-intersecting and their union forms all the space of events.
Let us look carefully at each possible value of r.
If r = 1..i-1 then i and j fall into the same part and the probability of their comparison is P(i-r, j-r, k-r, n-r) if k > r and 0 otherwise.
If r = i the probability is 1.
If r = i+1..j-1 the probability is 0.
If r = j the probability is 1 and if r = j+1..n the probability is P(i, j, k, r-1) if k < r and 0 otherwise.
So the full recurrent formula is P(i, j, k, n) = 1/n * (2 + Sum for r = 1..min(r, i)-1 P(i-r, j-r, k-r, n-r) + sum for r = max(j, k)+1..n P(i, j, k, r-1))
Finally for n = 2 (for i and j to be different) the only possible Ps are P(1, 2, 1, 2) and P(1, 2, 2, 2) and both equal 1 (no matter what r is equal to there will be a comparison)
Time complexity is O(n^5), space complexity is O(n^4). Also it is possible to optimize calculations and make time complexity O(n^4). Also as we only consider A[i] < A[j] and i,j,k <= n multiplicative constant is 1/8. So it would possible to compute any value for n up to 100 in a couple of minutes, using straight-forward algorithm described or up to 300 for optimized one.
Note that two positions are only compared if one of them is the pivot. So the best way to look at this is to look at the sequence of chosen pivots.
Suppose the k-th smallest element is between i and j. Then i and j are not compared if and only if an element between them is selected as a pivot before i or j are. What is the probability that this happens?
Now suppose the k-th smallest element is after j. i and j are not compared if and only if an element between i+1 and k (excluding j) is selected as a pivot before i or j are. What is the probability that this happens?

Finding continuous subsequence that minimizes the average of the rest of the array?

Suppose there's an integer array arr[0..n-1]. Find a subsequence sub[i..j] (i > 0 and j < n - 1) such that the rest of the array has the smallest average.
Example:
arr[5] = {5,1,7,8,2};
Remove {7,8}, the array becomes {5, 1, 2} which has average 2.67 (smallest possible).
I thought this is a modification of the Longest Increasing Subsequence but couldn't figure it out.
Thanks,
Let's find the average value using binary search.
Suppose, that sum of all elements is S.
For given x let's check if exist i and j such that avg of all elements except from i to j less or equal to x.
To do that, let's subtract x from all elements in arr. We need to check if exists i and j such that sum of all elements except from i to j less or equal to zero. To do that, lets find sum of all elements in current array: S' = S - x * n. So we want to find i and j such that sum from i to j will be greater or equal than S'. To do that, let's find subarray with the larges sum. And this can be done using elegant Jay Kadane's algorithm: https://en.wikipedia.org/wiki/Maximum_subarray_problem
When to terminate binary search? When the maximum subarray sum will be zero (or close enough).
Time complexity: O(n log w), w - presicion of the binary search.

Searching through a partially sorted array in O(lgn)

I'm having a hard time solving this problem.
A[1..n] is an array of real numbers which is partially sorted:
There are some p,q (1 <= p <= q <=n) so:
A[1] <= ... <= A[p]
A[p] >= ... >= A[q]
A[q] <= ... <= A[n]
How can we find a value in this array in O(lgn)?
(You can assume that the value exists in the array)
Make 3 binary searches: from 1 to p, p to q and q to n. The complexity is still O(logn).
Since we don't know p and q:
You cannot solve this problem in logn time. Assume a case where you have a sorted list of positive numbers with one zero mixed in (p+1=q and A[q]=0). This situation satisfies all the criteria you mentioned. Now, the problem of finding where that zero is located cannot be solved in sub O(n) time. Therefore your problem cannot be solved in O(logn) time.
Despite the "buried zero" worst case already pointed out, I would still recommend implementing an algorithm that can often speed things up, depending on p,q. For example, suppose that you have n numbers, and each increasing and decreasing region has size at least k. Then if you check 2^m elements in your array, including the first and last element and the rest of the elements as equally spaced as possible, starting with m=2 and then iteratively increasing m by 1, eventually you will reach m when you find 3 pairs of consecutive elements (A,B),(C,D),(E,F) from left-to-right out of the 2^m elements that you have checked, which satisfy A < B, C > D, E < F (some pairs may share elements). If my back-of-the-envelope calculation is correct, then the worst-case m you will need to achieve this will have you checking no more than 4n/k elements, so e.g. if k=100 you are much faster than checking all n elements. Then you know everything before A and everything after F are increasing sequences, and you can binary search through them. Now, if m got big enough that you checked at least sqrt(n) elements, then you can finish up by doing a brute-force search between A and F and the overall running time will be O(n/k + sqrt(n)). On the other hand, if the final m had you check fewer than sqrt(n) elements, then you can further increase m until you have checked sqrt(n) elements. Then there will be 2 pairs of consecutive checked elements (A,B),(C,D) that satisfy A < B, C > D, and there will also be 2 pairs of consecutive checked elements (W,X),(Y,Z) later in the array that satisfy W > X, Y < Z. Then everything before A is increasing, everything between D and W is decreasing, and everything after Z is increasing. So you can binary search these 3 regions in the array. The remaining part of the array that you haven't entirely searched through has size O(sqrt(n)), so you can use brute-force search the unchecked regions and the overall running time is O(sqrt(n)). Thus the bound O(n/k + sqrt(n)) holds in general. I have a feeling this is worst-case optimal, but I don't have a proof.
It's solvable in O(log2n).
if at midpoint the slope is decreasing we're in the p..q range.
if at midpoint the slope is increasing, we're either in 1..p or in q..n range.
perform a binary search in 1.. mid point and mid point..n ranges to seek for a value where the slope is decreasing. It will be found only in one of the ranges. Now we know in which of the 1..p and q..n subranges the mid point is located.
repeat the process from (1) for the subrange with the peaks until hitting the p..q range.
find the peaks in the subranges by applying algorithm in Divide and conquer algorithm applied in finding a peak in an array.
perform 3 binary searches in the ranges 1..p, p..q, q..n.
==> Overall complexity is O(log2n).

Resources