Minimum set of numbers that sum to least K - arrays

Given a list of n objects, write a function that outputs the minimum set of numbers that sum to at least K. FOLLOW UP: can you beat O(n ln n)?
The minimum set will be a set with 1 element. Don't we just have to traverse the array and find an element i.e. >= K.
Otherwise for O(nlgn), I understand we have to first sort the array and then we can find pair or triplets which sum >=k.
What if we don't find such a combination and have to go for bigger sets won't this problem be same as N sum problem?

Here's a linear algorithm that uses linear-time median finding as a subroutine:
Findsum(A, K) {
Let n be the length of A.
Let M be the median element of A, found in linear time.
Let L be the elements of A less than M.
Let U be the elements of A greater than M.
Let E be the elements of A equal to M.
If the sum of the elements in U is at least K,
Return Findsum(U, K).
Else, if the sum of the elements in U and E is at least K,
Return U together with enough elements of E that the sum is at least K.
Else,
Return Findsum(L, K - sum(U) - sum(E)).
}
Each recursive call is done on a list at most half the size of A and all other steps take at most linear time, so this algorithm takes linear time overall.

This is very different from the N Sum problem because it requires the set add up to at least K instead of exactly K.
It can be done in O(n ln n) by sorting the list and progressing from the maximum element until the sum is greater than K. It can be optimized by scanning the list first to eliminate the case where a single number > K and the case where the sum of all members < K. You could also get the average value of the list, and, sometimes, only sort the "upper" half of the list. These optimizations don't improve the O(n ln n) time, though.
The sorting can be done using an index array (or list of integers), so the original values or objects don't need to be moved.

Related

The most efficient way to find the position (indices) of m smallest elements in a list of n unique numbers without changing the list

Here's a look at my attempt to solve this (algorithm):
position := 1
i := 2
k=[]
FOR b = 1, b <= m, b++
WHILE i <= n DO
IF i in k
THEN i++
IF position in k
THEN position++
IF A[i] < A[position]
THEN position := i
i++
RESET i := 2
ADD position to k[]
RESET position := 1
Now this works, but the complexity for this would be at least n^4 and I would like something way better. Any help would be greatly appreciated. Thank you!
It's not clear to me why you think your algorithm's time complexity is O(n4); the main loop iterates m times, the inner loop iterates O(n) times, and the i in k and position in k tests in that inner loop are O(m) if implemented as a linear search, or O(log m) if implemented as binary search. That gives a time complexity of O(m2n) or O(mn log m) respectively. Those are both better than O(n4) because m ≤ n.
The problem can indeed be solved more efficiently. The main idea is to use a priority queue to keep track of the indices of the current m smallest array elements. Initialise the priority queue with the first m indices. Then for each other index, insert it into the priority queue and then remove whichever index has the largest array value from the priority queue. Finally, when you reach the end of the array, poll the indices from the priority queue to get them in order.
The length of the priority is always at most m + 1, so if you use a heap as a priority queue, then the insertion and remove-largest operations each take O(log m) time. Those are done for (n - m) array elements. The initial stage of inserting m indices into the heap, and the final stage of polling m elements from the heap, take O(m log m) time. This makes the overall time complexity O(m log m + (n - m) log m), which simplifies to O(n log m).
You make a max-heap of size m out of the first m indexes of the array. When comparing indexes for maintaining the heap, compare the corresponding elements of the array.
Then, for each other index:
If it is not smaller (by element) than the largest in the max-heap, discard it;
Otherwise, remove the largest index from the max-heap and put the new one in in.
Throughout the whole process, the max-heap will always contain the indices of the m smallest elements seen. Total complexity is O(n * log m).
Theoretically, it's better for large m -- O(n) -- to make an array of all indices and just use quickselect, again comparing them by the corresponding elements, but this is worse in the worst case and m is usually small compared to n when this algorithm is needed.

The kth smallest number in two arrays, one sorted the other unsorted

There is already an answer for two sorted arrays. However, in my question one of the arrays is unsorted.
Suppose X[1..n] and Y[1..m] where n < m. X is sorted and Y is unsorted. What is the efficient algorithm to find the kth smallest number of X U Y.
MinHeap can be used to find the kth smallest number in an unsorted array. However, here one of these arrays is sorted. I can think of:
1. Building a `MinHeap` for `Y`
2. i = 1, j = 1
3. x1 = extract Min from Y
4. x2 = X[i];
5. if j == k: return min(x1, x2)
5. if x1 < x2: j++; goto 3
6. else: j++; i++; goto 4
Is it efficient and correct?
There is no help for it but you have to scan Y. This takes O(m) so you can't do better than O(m).
However quickselect has average performance O(m). Basically that algorithm is just to do a quicksort, except that you ignore all partitions that don't have your final answer in them.
Given that n < m we can simply join one array to the other and do quickselect.
Note that the average performance is good, but the worst case performance is quadratic. To fix that, if you do not make progress sufficiently quickly you can switch over to the same median of medians algorithm that gives quicksort guaranteed performance (albeit with bad constants). If you're not familiar with it, that's the one where you divide the array into groups of 5, find the median of each group, and then repeat until you are down to 1 element. Then use that element as the pivot for the whole array.
Make a max-heap of the smallest k items from the sorted array (X). That takes O(k) time.
For each item in the unsorted array (Y), if it's smaller than the largest item in the heap (the root), then remove the root from the heap and add the new item. Worst case for this is O(m log k).
When you're done, the kth smallest number will be at the top of the heap.
Whereas the worst case for the second part is O(m log k), average case is much better because typically a small percentage of items have to be inserted in the heap.

Number of ways to choose elements from array if the sum of the group is at least K

Problem: if we have give integer N, K and array of size N, such that 1 <= N <= 36,
and every integer in the array is <=10^13. Now we have to count on how many different ways we can take elements from array such that the sum of those elements will be at least K
Here is one example: N = 4, K=6, array = {1,2,5,4}
The answer is 9, because we can take elements from the array on nine different ways and their sum will be at least K, the answers are the elements (first and third); (second and third); (first, second and third); (second and fourth); (first, second and fourth); (first, third and fourth); (second, third and fourth); (first, second, third and fourth); (third and fourth).
My idea is with bit-masks we can search over all combinations and choose should we take or shouldn't we, but that has complexity of O(2^N) and in our case N <=36 Which is too slow.
For this problem, you can use a 'meet in the middle trick' similar to the one that is usually used to solve the subset sum problem in time O(N*2^(N/2)) (and you will have the same complexity).
First, compute the 2^N/2 possible sums of the first N/2 elements, and store them. Do the same with the last N/2 elements.
Now sort both sets in increasing order. Sorting n elements takes time O(n log n), so here that is a O(N 2^(N/2)) cost. Let's call those 2 sorted sets F and L (for First and Last).
Then, do the following :
Set res = 0
for i from 0 to 2^(N/2)-1{
find the minimum j_i in {0,..,2^(N/2)-1} such that F[i] + L[j_i] >= K (use dichotomic search for this)
if such a j_i exists, increase res by (2^(N/2) - j_i)
}
return res
The idea is that for each subset of the first N/2 elements, you look at how many ways there are to choose a subset of the last N/2 elements such that the total sum is above K. For this, you just have to find the lowest value that achieves this in the sums of the subsets of the last elements, and then you know that the subsets that sum to a value at least as big are exactly the subsets that combine with the initial subset in a sum greater than or equal to K, and counting those is easy since you sorted the array of possible values.
P.S : a marginal optimisation is possible by using the fact that the sequence of minimal j_i such that F[i] + L[j_i] >= K is a non-increasing sequence.

Find 3 numbers in 3 sorted arrays with sum equal to some value [duplicate]

Assume we have three arrays of length N which contain arbitrary numbers of type long. Then we are given a number M (of the same type) and our mission is to pick three numbers A, B and C one from each array (in other words A should be picked from first array, B from second one and C from third) so the sum A + B + C = M.
Question: could we pick all three numbers and end up with time complexity of O(N2)?
Illustration:
Arrays are:
1) 6 5 8 3 9 2
2) 1 9 0 4 6 4
3) 7 8 1 5 4 3
And M we've been given is 19.
Then our choice would be 8 from first, 4 from second and 7 from third.
This can be done in O(1) space and O(N2) time.
First lets solve a simpler problem: Given two arrays A and B pick one element from each so that their sum is equal to given number K.
Sort both the arrays which takes O(NlogN).
Take pointers i and j so that i points to the start of the array A and j points to the end of B.
Find the sum A[i] + B[j] and compare it with K
if A[i] + B[j] == K we have found
the pair A[i] and B[j]
if A[i] + B[j] < K, we need to
increase the sum, so increment i.
if A[i] + B[j] > K, we need to
decrease the sum, so decrement j.
This process of finding the pair after sorting takes O(N).
Now lets take the original problem. We've got a third array now call it C.
So the algorithm now is :
foreach element x in C
find a pair A[i], B[j] from A and B such that A[i] + B[j] = K - x
end for
The outer loop runs N times and for each run we do a O(N) operation making the entire algorithm O(N2).
You can reduce it to the similar problem with two arrays, which is kinda famous and has simple O(n) solution (involving iterating from both ends).
Sort all arrays.
Try each number A from the first array once.
Find if the last two arrays can give us numbers B and C, such that B + C = M - A.
Steps 2 and 3 multiplied give us O(n^2) complexity.
The other solutions are already better, but here's my O(n^2) time and O(n) memory solution anyway.
Insert all elements of array C into a hashtable. (time complexity O(n), space O(n))
Take all pairs (a,b), a from A and b from B (time complexity O(n^2)).
For each pair, check if M-(a+b) exists in the hastable (complexity O(1) expected per query).
So, the overall time complexity is O(n^2), and a space complexity of O(n) for the hashtable.
Hash the last list. The time taken to do this is O(N) on that particular list but this will be added to the next phase.
The next phase is to create a "matrix" of the first two rows of their sums. Then look up in the hash if their matching number is there. Creating the matrix is O(N*N) whilst looking up in the hash is constant time.
1.Store A[i]*B[j] for all pair (i,j) in another array D, organized in the hash data-structure. The complexity of this step is O(N*N).
construct a hash named D
for i = 1 to n
for j = 1 to n
insert A[i]*B[j] into D
2.For each C[i] in the array C, find if M-C[i] exists in D. The complexity of this step is O(N).
for i = 1 to n
check if M - C[i] is in D
I have a solution. Insert all elements from one of the list into a hash table. This will not take O(n) time.
Once that is complete you find all the pairs from the remaining 2 arrays and see if their sum is present in the hash table.
Because hash hookup is constant we get a quadratic time in total.
Using this approach you save the time on sorting.
Another idea is if you know the max size of each element, you can use a variation of bucket sort and do it in nlogn time.
At the cost of O(N^2) space, but still using O(N^2) time, one could handle four arrays, by computing all possible sums from the first two arrays, and all possible residues from the last two, sort the lists (possible in linear time since they are all of type 'long', whose number of bits is independent of N), and then seeing if any sum equals any residue.
Sorting all 3 arrays and using binary search seems a better approach. Once the arrays are sorted, one should definitely go for binary search rather than linear search, which take n rather than log(n).
Hash table is also a viable option.
The combination of hash and sort can bring the time down but at cost of O(N square) space.
I have another O(N^2) time complexity, O(N) additional space complexity solution.
First, sort the three arrays, this step is O(N*log(N)). Then, for each element in A, create two arrays V = Ai + B and W = Ai + C (Ai is the current element). Ai + B means that each element of this new array V is the elemnent in that position in B plus Ai(the current element in A). W = Ai + C is similar.
Now, merge V and W, as in merge sort. Since both are sorted, this is O(N). In this new array with 2*N elements, search for M + Ai(because Ai is used twice). This can be done in O(log n) with binary search.
Therefore, total complexity is O(N^2).
Sort the three arrays.Then initialize three indices
i pointing to first element of A,
j pointing to last element of B and
k pointing to first element of C.
While i,j,k are in the limits of their respective arrays A,B,C
If A[i]+B[j]+C[k] == M return
If A[i]+B[j]+C[k] < M .Increment i if A[i]<=C[k] otherwise increment k.
If A[i]+B[j]+C[k] > M. Decrement j.
Which should run in O(n).
How about:
for a in A
for b in B
hash a*b
for c in C
if K-c is in hash
print a b c
The idea is to hash all possible pairs in A and B. Next for every element in C see if the residual zum is present in hash.

Interview question: three arrays and O(N*N)

Assume we have three arrays of length N which contain arbitrary numbers of type long. Then we are given a number M (of the same type) and our mission is to pick three numbers A, B and C one from each array (in other words A should be picked from first array, B from second one and C from third) so the sum A + B + C = M.
Question: could we pick all three numbers and end up with time complexity of O(N2)?
Illustration:
Arrays are:
1) 6 5 8 3 9 2
2) 1 9 0 4 6 4
3) 7 8 1 5 4 3
And M we've been given is 19.
Then our choice would be 8 from first, 4 from second and 7 from third.
This can be done in O(1) space and O(N2) time.
First lets solve a simpler problem: Given two arrays A and B pick one element from each so that their sum is equal to given number K.
Sort both the arrays which takes O(NlogN).
Take pointers i and j so that i points to the start of the array A and j points to the end of B.
Find the sum A[i] + B[j] and compare it with K
if A[i] + B[j] == K we have found
the pair A[i] and B[j]
if A[i] + B[j] < K, we need to
increase the sum, so increment i.
if A[i] + B[j] > K, we need to
decrease the sum, so decrement j.
This process of finding the pair after sorting takes O(N).
Now lets take the original problem. We've got a third array now call it C.
So the algorithm now is :
foreach element x in C
find a pair A[i], B[j] from A and B such that A[i] + B[j] = K - x
end for
The outer loop runs N times and for each run we do a O(N) operation making the entire algorithm O(N2).
You can reduce it to the similar problem with two arrays, which is kinda famous and has simple O(n) solution (involving iterating from both ends).
Sort all arrays.
Try each number A from the first array once.
Find if the last two arrays can give us numbers B and C, such that B + C = M - A.
Steps 2 and 3 multiplied give us O(n^2) complexity.
The other solutions are already better, but here's my O(n^2) time and O(n) memory solution anyway.
Insert all elements of array C into a hashtable. (time complexity O(n), space O(n))
Take all pairs (a,b), a from A and b from B (time complexity O(n^2)).
For each pair, check if M-(a+b) exists in the hastable (complexity O(1) expected per query).
So, the overall time complexity is O(n^2), and a space complexity of O(n) for the hashtable.
Hash the last list. The time taken to do this is O(N) on that particular list but this will be added to the next phase.
The next phase is to create a "matrix" of the first two rows of their sums. Then look up in the hash if their matching number is there. Creating the matrix is O(N*N) whilst looking up in the hash is constant time.
1.Store A[i]*B[j] for all pair (i,j) in another array D, organized in the hash data-structure. The complexity of this step is O(N*N).
construct a hash named D
for i = 1 to n
for j = 1 to n
insert A[i]*B[j] into D
2.For each C[i] in the array C, find if M-C[i] exists in D. The complexity of this step is O(N).
for i = 1 to n
check if M - C[i] is in D
I have a solution. Insert all elements from one of the list into a hash table. This will not take O(n) time.
Once that is complete you find all the pairs from the remaining 2 arrays and see if their sum is present in the hash table.
Because hash hookup is constant we get a quadratic time in total.
Using this approach you save the time on sorting.
Another idea is if you know the max size of each element, you can use a variation of bucket sort and do it in nlogn time.
At the cost of O(N^2) space, but still using O(N^2) time, one could handle four arrays, by computing all possible sums from the first two arrays, and all possible residues from the last two, sort the lists (possible in linear time since they are all of type 'long', whose number of bits is independent of N), and then seeing if any sum equals any residue.
Sorting all 3 arrays and using binary search seems a better approach. Once the arrays are sorted, one should definitely go for binary search rather than linear search, which take n rather than log(n).
Hash table is also a viable option.
The combination of hash and sort can bring the time down but at cost of O(N square) space.
I have another O(N^2) time complexity, O(N) additional space complexity solution.
First, sort the three arrays, this step is O(N*log(N)). Then, for each element in A, create two arrays V = Ai + B and W = Ai + C (Ai is the current element). Ai + B means that each element of this new array V is the elemnent in that position in B plus Ai(the current element in A). W = Ai + C is similar.
Now, merge V and W, as in merge sort. Since both are sorted, this is O(N). In this new array with 2*N elements, search for M + Ai(because Ai is used twice). This can be done in O(log n) with binary search.
Therefore, total complexity is O(N^2).
Sort the three arrays.Then initialize three indices
i pointing to first element of A,
j pointing to last element of B and
k pointing to first element of C.
While i,j,k are in the limits of their respective arrays A,B,C
If A[i]+B[j]+C[k] == M return
If A[i]+B[j]+C[k] < M .Increment i if A[i]<=C[k] otherwise increment k.
If A[i]+B[j]+C[k] > M. Decrement j.
Which should run in O(n).
How about:
for a in A
for b in B
hash a*b
for c in C
if K-c is in hash
print a b c
The idea is to hash all possible pairs in A and B. Next for every element in C see if the residual zum is present in hash.

Resources