Find maximum subarray using binary search - arrays

You have two arrays arr1 and arr2, you want to find the maximum size of a subarray which is both a subarray of arr1 and arr2
For example:
arr1 = [3,2,1, 4,5]
arr2 = [1,2,3,4,3,2,1]
return is 3 ([3,2,1])
This problem has a binary search solution which gives you
O(nlogn)
complexity.
Does anyone know how to approach this problem?

So I am going to provide a general idea. someone edit the answer if sentence structure is not clear
ok , so how do we know that binary search is applicable here , suppose let's take mid = (l + (r - l)/2) as the length of longest common subarray .
Now if we have a common subarray with length L, then there must be a common subarray with length less than L and if the two array do not have a common subarray of length L they can't have a common subarray of any length greater than L. So now implementing binary search search should be simple , that we check for mid which is our possible length , if there exist
a common subarray of size mid , if yes ,then there exist a possiblity that larger length common subarray may exists so we store this current satisfied length as answer and make l = mid + 1 , to check for more possible larger length and if at some iteration of binary search we see that no common subarray of length mid exist , no meaning of increasing our length so we go for lower length that is r = mid - 1.
Writing code in c++
int l = 1 , r = min(array1.size() , array2.size()); // taking min length of array 1 and array2
int answer = -1;
while(l <= r)
{
int mid = ( l + ( r - l) / 2);
if(check(array1 , array2 , mid))
{
answer = mid;
l = mid + 1;
}
else
r = mid - 1;
}
cout << answer << "\n";
Now problem comes , how do we check that given a length L , and two arrays , if there exists a common subarray in both arrays of this given length L , for that you have to know about hashing which is actually trying to give a unique numerical value to an array , so that it becomes easy to compare two different arrays efficienty , i.e two same arrays are going to have same hash value , and different arrays would have different hash . so different hashing method exists , but as you would have guessed their can be a case where two different array have same hash , which is known as collision , so how do we reduce it we can reduce it by using a strong hash method which reduces the probability of collision . One of these method is a rolling hash , for more general idea , check out about rolling hash on internet.
now in each check of mid in binary search , calculate the rolling hash of all subarray of length mid and store it in datastructure like hashtable or set. then again calculate rolling hash for all subarray of length mid in second array , but this time while calculating , only check if this hash value has already been calculated and stored in your datastructre for subarrays of first array , in hashtable(average look up time is O(1)) or set (average look up time is logarithmic), if yes , then this mid length common subarray exists and you return true to binary search ,but after every checking for window of length mid in second array , you don't find any already existing hash , you return false.
so assuming you take hashtable as a data structure , the total time complexity would be
( array1.size() + array2.size() ) * log( min( array1.size() , array2.size() ) )
because you iterate log(min(array1.size() , array2.size()) times in binary search , and in each iteration , you check the both array by traversing , calculating rolling hash and check in hashtable i.e (array1.size() + array2.size()).

Related

Maximum adjacent product sum (Interview question)

We have an array of integers where integer in each position is seen as its value. Each time a position is selected, you will earn the amount associated with it multiplied by its adjacent position's value (left and right side). After a position has been selected it would be removed from the array and its left and right positions would become adjacent to each other.
If there are no adjacent positions assume a value of 1 for the same. For example, if there is only single position left and you select it then it's value will be multiplied by 1 as both left and right adjacent positions.
Find out what can be maximum amount earned at the end after selecting all positions.
I have implemented a dynamic programming approach to it using the following recurrence relation : First we observe that if we somehow in the process as mentioned in question encounter a step where we multiply arr[position_p] and arr[position_q], then all positions in between position_p and position_q should have already been chosen, if any.
For simplicity let us assume array indices start from 1 and position 0 and position n+1 contain value 1 in accordance with the question, where n is the number of elements in array.
So we need to select positions p+1 to q-1 in such an order that maximizes the amount.
Using this, we obtain recurrence relation :
If f(p,q) is maximum amount obtained by choosing only from positions p+1 to q-1, then we have :
f(p, q) = max ( f(p,k) + f(k,q) + arr[p] * arr[k] * arr[q] ) for k between p and q (Excluding p and q)
where k is last position chosen from positions p+1 to q-1 before choosing either p or q
And here is the python implementation :
import numpy as np
n = int(input("Enter the no. of inputs : "))
arr = [1]
arr = arr + list( map( int, input("Enter the list : ").split() ) )
arr.append(1)
# matrix created to memoize values instead of recomputing
mat = np.zeros( (n+2, n+2), dtype = "i8" )
# Bottom-up dynamic programming approach
for row in range ( n + 1, -1, -1 ) :
for column in range ( row + 2, n + 2 ) :
# This initialization to zero may not work when there are negative integers in the list.
max_sum = 0
# Recurrence relation
# mat[row][column] should have the maximmum product sum from indices row+1 until column-1
# And arr[row] and arr[column] are boundary values for sub_array
# By above notation, if column <= row + 1, then there would be no elements between them and thus mat[row][column] should remain zero
for k in range ( row + 1 , column ) :
max_sum = max( max_sum, mat[row][k] + mat[k][column] + ( arr[row] * arr[k] * arr[column] ) )
mat[row][column] = max_sum
print(mat[0][n+1])
The problem is that I have seen the following question in a programming round of interview before some time back. Though my solution seems to be working, it has O(n^3) time complexity and O(n^2) space complexity.
Can I do better, what about the case when all values of array positions are positive (original question assumes this). And any help on reducing space complexity is also appreciated.
Thank you.
Edit :
Though this is no proof, as suggested by #risingStark I have seen the same question on LeetCode also where all correct algorithms seem to have used O(n^2) space running in O(n^3) time for general case solution.

Maximize number of inversion count in array

We are given an unsorted array A of integers (duplicates allowed) with size N possibly large. We can count the number of pairs with indices i < j, for which A[i] < A[j], let's call this X.
We can change maximum one element from the array with a cost equal to the difference in absolute values (for instance, if we replace element on index k with the new number K, the cost Y is | A[k] - K |).
We can only replace this element with other elements found in the array.
We want to find the minimum possible value of X + Y.
Some examples:
[1,2,2] should return 1 (change the 1 to 2 such that the array becomes [2,2,2])
[2,2,3] should return 1 (change the 3 to 2)
[2,1,1] should return 0 (because no changes are necessary)
[1,2,3,4] should return 6 (this is already the minimum possible value)
[4,4,5,5] should return 3 (this can accomplished by changing the first 4 into a 5 or the last 5 in a 4)
The number of pairs can be found with a naive O(n²) solution, here in Python:
def calc_x(arr):
n = len(arr)
cnt = 0
for i in range(n):
for j in range(i+1, n):
if arr[j] > arr[i]:
cnt += 1
return cnt
A brute-force solution is easily written as for example:
def f(arr):
best_val = calc_x(arr)
used = set(arr)
for i, v in enumerate(arr):
for replacement in used:
if replacement == v:
continue
arr2 = arr[0:i] + replacement + arr[i:]
y = abs(replacement - v)
x = calc_x(arr2)
best_val = min(best_val, x + y)
return best_val
We can count for each element the number of items right of it larger than itself in O(n*log(n)) using for instance an AVL-tree or some variation on merge sort.
However, we still have to search which element to change and what improvement it can achieve.
This was given as an interview question and I would like some hints or insights as how to solve this problem efficiently (data structures or algorithm).
Definitely go for a O(n log n) complexity when counting inversions.
We can see that when you change a value at index k, you can either:
1) increase it, and then possibly reduce the number of inversions with elements bigger than k, but increase the number of inversions with elements smaller than k
2) decrease it (the opposite thing happens)
Let's try not to count x every time you change a value. What do you need to know?
In case 1):
You have to know how many elements on the left are smaller than your new value v and how many elements on the right are bigger than your value. You can pretty easily check that in O (n). So what is your x now? You can count it with the following formula:
prev_val - your previous value
prev_x - x that you've counted at the beginning of your program
prev_l - number of elements on the left smaller than prev_val
prev_r - number of elements on the right bigger than prev_val
v - new value
l - number of elements on the right smaller than v
r - number of elements on the right bigger than v
new_x = prev_x + r + l - prev_l - prev_r
In the second case you pretty much do the opposite thing.
Right now you get something like O( n^3 ) instead of O (n^3 log n), which is probably still bad. Unfortunately that's all what I came up for now. I'll definitely tell you if I come up with sth better.
EDIT: What about memory limit? Is there any? If not, you can just for each element in the array make two sets with elements before and after the current one. Then you can find the amount of smaller/bigger in O (log n), making your time complexity O (n^2 log n).
EDIT 2: We can also try to check, what element would be the best to change to a value v, for every possible value v. You can make then two sets and add/erase elements from them while checking for every element, making the time complexity O(n^2 log n) without using too much space. So the algorithm would be:
1) determine every value v that you can change any element, calculate x
2) for each possible value v:
make two sets, push all elements into the second one
for each element e in array:
add previous element (if there's any) to the first set and erase element e from the second set, then count number of bigger/smaller elements in set 1 and 2 and calculate new x
EDIT 3: Instead of making two sets, you could go with prefix sum for a value. That's O (n^2) already, but I think we can go even better than this.

Possibly simpler O(n) solution to find the Sub-array of length K (or more) with the maximum average

I saw this question on a coding competition site.
Suppose you are given an array of n integers and an integer k (n<= 10^5, 1<=k<=n). How to find the sub-array(contiguous) with maximum average whose length is more than k.
There's an O(n) solution presented in research papers(arxiv.org/abs/cs/0207026.), linked in a duplicate SO question. I'm posting this as a separate question since I think I have a similar method with a simpler explanation. Do you think there's anything wrong with my logic in the solution below?
Here's the logic:
Start with the range of window as [i,j] = [0,K-1]. Then iterate over remaining elements.
For every next element, j, update the prefix sum**. Now we have a choice - we can use the full range [i,j] or discard the range [i:j-k] and keep [j-k+1:j] (i.e keep the latest K elements). Choose the range with the higher average (use prefix sum to do this in O(1)).
Keep track of the max average at every step
Return the max avg at the end
** I calculate the prefix sum as I iterate over the array. The prefix sum at i is the cumulative sum of the first i elements in the array.
Code:
def findMaxAverage(nums, k):
prefix = [0]
for i in range(k):
prefix.append(float(prefix[-1] + nums[i]))
mavg = prefix[-1]/k
lbound = -1
for i in range(k,len(nums)):
prefix.append(prefix[-1] + nums[i])
cavg = (prefix[i+1] - prefix[lbound+1])/(i-lbound)
altavg = (prefix[i+1] - prefix[i-k+1])/k
if altavg > cavg:
lbound = i-k
cavg = altavg
mavg = max(mavg, cavg)
return mavg
Consider k = 3 and sequence
3,0,0,2,0,1,3
Output of your program is 1.3333333333333333. It has found subsequence 0,1,3, but the best possible subsequence is 2,0,1,3 with average 1.5.

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Finding difference less than the average in an unsorted array?

I need to find 2 elements in an unsorted array such that the difference between them is less than or equal to (Maximum - Minimum)/(number of elements in the array).
In O(n).
I know the max and min values.
Can anyone think of something?
Thank you!
Step 1: Use Bucket Sort. Don't sort the individual buckets.
Should be pretty obvious what to do from here, and how to size the buckets.
Number of buckets = 2n.
values in each bucket = (min + k((max-min)/2n)) <= value < (min + (k+1)((max-min)/2n)).
0 <= k < 2n
Range of each bucket = ((max-min)/2n)
Assign each element into buckets. Dont sort inside buckets.
If any bucket has more than 1 element, the maximum possible difference between them is ((max-min)/2n) . Hence you have your answer.
If any two consecutive buckets have more than zero elements each, maximum difference between them is ((max-min)/2n)*2 = ((max-min)/n) . Hence you have your answer.
The correct question should be:
in an array A=[a0,a2,..an] find two elements a, b such that the difference between them is less than or equal to: (M-m)/n > | a - b| where M=max(A) and m = min(A).
The solution I’ll suggest is using quickSelect, time complexity of O(n) in expectation. it’s actual worst case is O(n^2). This is a tradeoff because most times it's O(n), but it demand O(1) space complexity (if quickSelect is implemented iteratively and my pseudo code is implemented with a while loop instead of recursion).
main idea:
At each iteration we find the median using quickSelect, if |max - medianValue | > |min - medianValue | we know that we should search to the left side of the array. That is because we have the same amount of elements at both side, but the median value is closer to the minimum thus there should be elements with smaller difference between them. Else we should search at the right side.
each time we do that we know the new maximum or minimum of the subArray should be the median.
we continue the search, each time dividing the array’s size by 2.
proof of runtime in expectation:
assume each iteration over n elements take c*n + d in expectation.
thus we have:
(cn + d) + 0.5(cn + d) + 0.25 (c*n + d) + … +(1/log_{2}(n)) (cn + d) <=
<=(1+0.5+0.25+….)d + (c*n + 0.5*cn +….) = (1+0.5+0.25+….)d + cn(1+0.5+0.25+….) =
=2*d +2*c*n
meaning we have O(n) in expectation.
pseudo-code using recursion:
run(arr):
M = max(arr)
m = min(arr)
return findPairBelowAverageDiff(arr,0,arr.length,m,M)
findPairBelowAverageDiff(arr, start, end, min, max) :
if start + 1 < end:
medianPos = start + (end - start) / 2
// median that is between start and end in the arr.
quickSelect(arr, start, medianPos, end)
if max - arr[medianPos] > arr[medianPos] - min:
return findPairBelowAverageDiff(arr, start, medianPos,
min, arr[medianPos])
else :
return findPairBelowAverageDiff(arr, medianPos,
end, arr[medianPos], max);
else :
return (arr[start], arr[start + 1])

Resources