We have an array of N positive elements. We can perform M operations on this array. In each operation we have to select a subarray(contiguous) of length W and increase each element by 1. Each element of the array can be increased at most K times.
We have to perform these operations such that the minimum element in the array is maximized.
1 <= N, W <= 10^5
1 <= M, K <= 10^5
Time limit: 1 sec
I can think of an O(n^2) solution but it is exceeding time limit. Can somebody provide an O(nlogn) or better solution for this?
P.S.- This is an interview question
It was asked in a Google interview and I solved it by using sliding window, heap and increment in a range logic. I will solve the problem in 3 parts:
Finding out the minimum of every subarray of size W. This can be done in O(n) by using sliding window with priority queue. The maximum of every window must be inserted into a min-heap with 3 variable: [array_value, left_index, right_index]
Now, make auxiliary array initialised to 0 with of size N. Perform pop operation on heap M number of times and in each pop operation perform 3 task:
value, left_index, right_index = heap.pop() # theoretical function to pop minimum
Increment the value by 1,
increment by 1 in auxiliary array at left_index and decrement by 1 in
auxiliary array at right_index+1
Again insert this pair into heap. [with incremented value and same indexes]
After performing M operations traverse the given array with auxiliary array and add the cumulative sum till index 'i' to element at index 'i' in array.
Return minimum of array.
Time Complexity
O(N) <- for minimum element in every window + building heap.
O(M*logN) <- Extracting and inserting into heap.
O(N) <- For traversing to add cumulative sum.
So, overall is O(N + M*logN + N) which is O(M*logN)
Space Complexity
O(N) <- Extra array + heap.
Few things can be easily optimised above like inserting values in heap, only left_index can be inserted and as right_index = left_index + k.
My Code
from heapq import heappop, heappush
from collections import deque
def find_maximised_minimum(arr, n, m, k):
"""
arr -> Array, n-> Size of array
m -> increment operation that can be performed
k -> window size
"""
heap = []
q = deque()
# sliding window + heap building
for i in range(k):
while q and arr[q[-1]] > arr[i]:
q.pop()
q.append(i)
for i in range(k, n):
heappush(heap, [arr[q[0]], i - k, i - 1])
while q and q[0] <= i - k:
q.popleft()
while q and arr[q[-1]] > arr[i]:
q.pop()
q.append(i)
heappush(heap, [arr[q[0]], n - k, n - 1])
# auxiliary array
temp = [0 for i in range(n)]
# performing M increment operations
while m:
top = heappop(heap)
temp[top[1]] += 1
try:
temp[top[2] + 1] -= 1
except:
# when the index is last, so just ignore
pass
top[0] += 1
heappush(heap, top)
m -= 1
# finding cumulative sum
sumi = 0
for i in range(n):
sumi += temp[i]
arr[i] += sumi
print(min(arr))
if __name__ == '__main__':
# find([1, 2, 3, 4, 5, 6], 6, 5, 2)
# find([73, 77, 60, 100, 94, 24, 31], 7, 9, 1)
# find([24, 41, 100, 70, 97, 89, 38, 68, 41, 93], 10, 6, 5)
# find([88, 36, 72, 72, 37, 76, 83, 18, 76, 54], 10, 4, 3)
find_maximised_minimum([98, 97, 23, 13, 27, 100, 75, 42], 8, 5, 1)
What if we kept a copy of the array sorted ascending, pointing each element to its original index? Think about the order of priority when incrementing the elements. Also, does the final order of operations matter?
Once the lowest element reaches the next lowest element, what must then be incremented? And if we apply k operations to any one element does it matter in which w those increments were applied?
Related
You are given an integer array a of size n and an integer m. you have to distribute the elements of array A into M groups such that the maximum sum of elements in each group is minimum and the elements of array A allocated to any group is contiguous. Write a program to determine the maximum sum of elements among all the groups.
The brute force could be having all the M possible sub-arrays from the given array and checking with them. But this would take a time complexity of O(n^3).
Here I used dynamic programming which would take both the time and space complexity of O(n*M).
Try all the possible partitions that are possible using recursion. Let dp[ind][k] be the minimum of maximum sums till index 'ind' with 'k' partitions. Hence the possible states will be partition at every index from the index ind till n. The minimum of maximum sums of all those states will be our answer.
In order not to recur for visited element, we can use memoize.
Below is the implementation of the above approach:
from sys import maxsize
from sys import setrecursionlimit
setrecursionlimit(10**6)
def max_sum_groups_min(arr, n, ind, M, dp):
# If M segments have been divided
if M == -1:
# If we are at the end
if ind == n:
return -maxsize
# If we do not reach end, return a maximum number
# that cannot be a min value than existing answer
return maxsize
# If at the end, but M segmments are not formed
if ind == n:
return maxsize
# If it's already visited: Memoization
if dp[ind][M] != False:
return dp[ind][M]
ans = maxsize
cur = 0
# Iterate and try to break at every segment and create a segment
for i in range(ind, n):
# sum of elements in current segment
cur += arr[i]
# Find the maximum of all segments after adding and
# find minimum in all possible combinations
ans = min(ans, max(max_sum_groups_min(arr, n, i + 1, M - 1, dp), cur))
# Return the answer by memoizing it
dp[ind][M] = ans
return ans
n = int(input("Enter array size: "))
arr = list(map(int, input("Enter list of elements: ").split()))
group = int(input("Enter the group size: "))
# keeping 'False' inorder to keep track it's not visited
dp = [[False for _ in range(group)] for _ in range(n)]
print(max_sum_groups_min(arr, n, 0, group - 1, dp))
For eg: n = 8, arr = [1, 2, 3, 4, 5, 6, 7, 8], M = 4
Here the given array can be divided into 4 groups as:
{1, 2, 3, 4}, {5, 6}, {7}, {8}
in which 2nd group has largest sum(11) among all the groups which is minimum of all possibilities.
Thanks,
Jayanth.
I have a working declustering algorithm that I would like to speed up using numpy. Given an array a, the consecutive differences diffa are obtained. Each of these consecutive differences are then checked to see whether each is greater or lesser than some threshold value t_c, which produces an array of 0's and 1's False and True. Taking into account that diffa is one index smaller than a, the counting schema is slightly modified. First, the size of each cluster of 0's and 1's is calculated as array cl_size. If the array contains 0, then the size of the cluster is its original size plus one; if the array contains 1, then the size of the cluster is its original size minus one. Below is an example that I would like to adapt for a much larger dataset.
import numpy as np
thresh = 21
a = np.array([1, 2, 5, 10, 20, 40, 70, 71, 72, 74, 100, 130, 160, 171, 200, 201])
diffa = np.diff(a)
print(diffa)
>> [ 1 3 5 10 20 30 1 1 2 26 30 30 11 29 1]
def get_cluster_statistics(array, t_c, func_kw='all'):
""" This function separates clusters of datapoints such that the number
of clusters and the number of events in each cluster can be known. """
# GET CONSECUTIVE DIFFERENCES
ts_dif = np.diff(array)
# GET BOOLEAN ARRAY MASK OF 0's AND 1's FOR TIMES ABOVE THRESHOLD T_C
bool_mask = np.array(ts_dif > t_c) * 1
# COPY BOOLEAN ARRAY MASK (DO NOT MODIFY ORIGINAL ARRAY)
bm_arr = bool_mask[:]
# SPLIT CLUSTERS INTO SUB-ARRAYS
res = np.split(bm_arr, np.where(abs(np.diff(bm_arr)) != 0)[0] + 1)
print(res)
>>[array([0, 0, 0, 0, 0]), array([1]), array([0, 0, 0]), array([1, 1, 1]), array([0]), array([1]), array([0])]
# GET SIZE OF EACH SUB-ARRAY CLUSTER
cl_size = np.array([res[idx].size for idx in range(len(res))])
print(cl_size)
>>[5 1 3 3 1 1 1]
# CHOOSE BETWEEN CHECKING ANY OR ALL VALUES OF SUB-ARRAYS (check timeit)
func = dict(zip(['all', 'any'], [np.all, np.any]))[func_kw]
# INITIALIZE EMPTY OUTPUT LIST
ans = []
# CHECK EACH SPLIT SUB-ARRAY IN RES
for idx in range(len(res)):
# print("res[%d] = %s" %(idx, res[idx]))
if func(res[idx] == 1):
term = [1 for jdx in range(cl_size[idx]-1)]
# cl_size[idx] = cl_size[idx]-1
ans.append(term)
elif func(res[idx] == 0):
# cl_size[idx] = cl_size[idx]+1
term = [cl_size[idx]+1]
ans.append(term)
print(ans)
>> [[6], [], [4], [1, 1], [2], [], [2]]
out = np.sum(ans)
print(out)
>> [6, 4, 1, 1, 2, 2]
get_cluster_statistics(a, thresh, 'any')
After this, I apply Counter via importable module collections to count the frequency of clusters of various sizes.
I am not sure how but I think there is a numpy solution that is more efficient, specifically in the section of code under # CHECK EACH SPLIT SUB-ARRAY IN RES. Any help would be appreciated.
I want change order in arr if the next element is bigger than current.
Hot to modify the code, so it will be work?
arr = [5, 22, 29, 39, 19, 51, 78, 96, 84]
i = 0
while (i < arr.size-1)
if arr[i].to_i < arr[i+1].to_i
arr[i]
elsif arr[i].to_i > arr[i + 1].to_i
arr[i+1], arr[i] = arr[i], arr[i+1]
end
puts arr[i]
i += 1
end
Returns: [5, 22, 29, 39, 19, 51, 78, 96, 84]
Instead: [5, 19, 22, 29, 39, 51, 78, 84, 96]
You can use any of sorting algorithms depending on the size of array (n),
For Bubble Sort, Time Complexity is O(n^2)
For Merge Sort, Time Complexity is O(nlogn)
For Counting Sort, Time Complexity is O(n) but number in array must be 0.upto 10^6
Bubble Sort: It runs pairwise in one iteration and put the maximum element in last, In second iteration, put the second maximum element in second last position and so on till array is sorted.
Iterate (n-1) times [to find (n-1) max numbers]
Iterate (n-idx-1) times to swap pair of numbers if (first number is
greater than next number)
If swapping stopped in inner loop means that array becomes sorted,
so break the outer loop
Ruby Code:
def bubble_sort(arr)
n = arr.size
(n-1).times do |idx|
swapped = false
(n-idx-1).times do |i|
if arr[i] > arr[i+1]
arr[i], arr[i+1] = arr[i+1], arr[i]
swapped = true
end
end
break unless swapped
end
arr
end
p bubble_sort([5, 22, 29, 39, 19, 51, 78, 96, 84])
Merge Sort: It runs on divide and conquer strategy, i.e if you know two halves of array is sorted then you can sort whole array by using two pointer strategy in O(n).
For Instance,
#first half : [4,5,7,9]
#second half : [1,2,10,15]
1. Take two pointer l and r assigned to starting index of both halves i.e 0
2. Iterate over l and r upto their lengths to consume both arrays
if element at first_half[l] < second_half[r]
Put first_half[l] in result_array
Increment l pointer
else
Put second_half[r] in result_array
Increment r pointer
This merge operation will take O(n) to sort whole array.
Now, if we divide whole array into two halves recursively, we will get binary tree of height log(n) and each level will take O(n) to sort the subproblems (halves), resulting in O(nlogn) Time Complexity.
Base case would be : single element array is always sorted
Ruby Code:
def merge(left_sorted, right_sorted)
res = []
left_size, right_size = left_sorted.size, right_sorted.size
l = r = 0
loop do
break if r == right_size and l == left_size # break if both halves processed
if r == right_size or (l < left_size and left_sorted[l] < right_sorted[r])
res << left_sorted[l]; l += 1
else
res << right_sorted[r]; r += 1
end
end
res
end
def merge_sort(arr)
size = arr.size
return arr if size <= 1 # base case
mid = arr.size/2 - 1
left_half, right_half = arr[0..mid], arr[mid+1..-1]
left_sorted = merge_sort(left_half)
right_sorted = merge_sort(right_half)
return merge(left_sorted, right_sorted)
end
p merge_sort([5, 22, 29, 39, 19, 51, 78, 96, 84])
Counting Sort: It works in O(n) by counting numbers appearance in array if numbers in array lies in range(0..10^6)
Keep count of each number of array in count_array.
Iterate from min_element to max_element of array, and put element in
sorted_array if appeared i.e its count > 0
Ruby Code:
def counting_sort(arr)
min, max = arr.min, arr.max
count_arr = [0] * (max - min + 1) # initialize count_array with all 0s
arr.each do |num|
count_arr[num - min] += 1
end
res = []
size = count_arr.size
size.times do |i|
count_arr[i].times do
res << i + min
end
end
res
end
p counting_sort([5, 22, 29, 39, 19, 51, 78, 96, 84])
Notice that as you sort you are rearranging the array. Don't modify it, use it as a reference and place the sorted items in a new array.
If you want to study algotirthms use C or C++.
def bubble_sort(array)
sorted = array.dup
i = 0
l = sorted.length
while i < (l - 1)
j = 0
while j < l - i - 1
if sorted[j] > sorted[j + 1]
tmp = sorted[j]
sorted[j] = sorted[j + 1]
sorted[j + 1] = tmp
end
j += 1
end
i += 1
end
sorted
end
puts bubble_sort([5, 22, 29, 39, 19, 51, 78, 96, 84])
I have a sorted array, for example
[0, 0, 3, 6, 7, 8, 8, 8, 10, 11, 13]
Here, let's say k = 1 so the longest sub-array is [7, 8, 8, 8] with length = 4.
As another example, consider [0, 0, 0, 3, 6, 9, 12, 12, 12, 12] with k = 3. Here the longest sub-array is [9, 12, 12, 12, 12] with length = 5.
So far, I have used a binary search algorithm O(n log n) which iterates from index 0 .. n - 1 and tries to find the rightmost index that satisfies our condition.
Is there a linear time algorithm to do this?
Yes, there is a linear time algorithm. You can use two pointers technique. Here is a pseudo code:
R = 0
res = 0
for L = 0 .. N - 1:
while R < N and a[R] - a[L] <= k:
R += 1
res = max(res, R - L)
It has O(n) time complexity because L and R are strictly increasing and each of them can be incremented only n times.
Why is this algorithm correct? For a fixed L, R is the index of the first element of the array such that a[R] - a[L] > k. That's why R - 1 is the index of the last element that fits. The length of [L, R - 1] subarray is exactly R - L. The resulting subarray is obtained by iterating over all possible values of L, that is, all possibilities are checked. That's why it always finds correct answer.
I'm building a decision tree algorithm. The sorting is very expensive in this algorithm because for every split I need to sort each column. So at the beginning - even before tree construction I'm presorting variables - I'm creating a matrix so for each column in the matrix I save its ranking. Then when I want to sort the variable in some split I don't actually sort it but use the presorted ranking array. The problem is that I don't know how to do it in a space efficient manner.
A naive solution of this is below. This is only for 1 variabe (v) and 1 split (split_ind).
import numpy as np
v = np.array([60,70,50,10,20,0,90,80,30,40])
sortperm = v.argsort() #1 sortperm = array([5, 3, 4, 8, 9, 2, 0, 1, 7, 6])
rankperm = sortperm.argsort() #2 rankperm = array([6, 7, 5, 1, 2, 0, 9, 8, 3, 4])
split_ind = np.array([3,6,4,8,9]) # this is my split (random)
# split v and sortperm
v_split = v[split_ind] # v_split = array([10, 90, 20, 30, 40])
rankperm_split = rankperm[split_ind] # rankperm_split = array([1, 9, 2, 3, 4])
vsorted_dummy = np.ones(10)*-1 #3 allocate "empty" array[N]
vsorted_dummy[rankperm_split] = v_split
vsorted = vsorted_dummy[vsorted_dummy!=-1] # vsorted = array([ 10., 20., 30., 40., 90.])
Basically I have 2 questions:
Is double sorting necessary to create ranking array? (#1 and #2)
In the line #3 I'm allocating array[N]. This is very inefficent in terms of space because even if split size n << N I have to allocate whole array. The problem here is how to calculate rankperm_split. In the example original rankperm_split = [1,9,2,3,4] while it should be really [1,5,2,3,4]. This problem can be reformulated so that I want to create a "dense" integer array that has maximum gap of 1 and it keeps the ranking of the array intact.
UPDATE
I think that second point is the key here. This problem can be redefined as
A[N] - array of size N
B[N] - array of size N
I want to transform array A to array B so that:
Ranking of the elements stays the same (for each pair i,j if A[i] < A[j] then B[i] < B[j]
Array B has only elements from 1 to N where each element is unique.
A few examples of this transformation:
[3,4,5] => [1,2,3]
[30,40,50] => [1,2,3]
[30,50,40] => [1,3,2]
[3,4,50] => [1,2,3]
A naive implementation (with sorting) can be defined like this (in Python)
def remap(a):
a_ = sorted(a)
b = [a_.index(e)+1 for e in a]
return b