Getting smallest subarray that covers all valid entries in a circular array - arrays

Let's say I have a circular array with some valid and invalid entries i.e.
array = [0,0,1,0,1,0,0,0,0,0,1,1]
I want to find the smallest subarray here that covers all 1s. If this were not a circular array the smallest subarray would be size 10 because it would start with the first 1 and end with the last 1 (inclusive), i.e.
[0,0,1,0,1,0,0,0,0,0,1,1]
<----------------->
However, as it is a circular array, then I can reduce the subarray size to size 7 i.e.
[0,0,1,0,1,0,0,0,0,0,1,1]
--------> <---
My idea is to keep track of 4 pointers i.e. when traversing through the array, the smallest start position would be in array[2] because that is the first "1" entry, and the last position would be array[11], therefore the window would be 10. My other two pointers would start in array[9] and end in array[4], but how would I know when to stop at array[4] and start at array[9]?

Since you're allowing only a single range, consider all adjacent pairs of 1's.
Call their indices q and p. Each such pair represents the interval from array[p] to the end of the array, wrapping around to the start and back to array[q].
It's not hard to see that you want to find such a pair where p - q is a maximum. This corresponds to the smallest covering wrap-around interval. Its size is len(array) - (p - q).
The single additional case is the "non-wrapping" one from the leftmost 1 at q to rightmost at p. This interval has size p - q.
All the rest is to arrange the code nicely. Here's one idea:
a = [0,0,1,0,1,0,0,0,0,0,1,1]
def find_min_window(arr):
# Find and remember the leftmost 1
q0 = p = q = -1
for i in range(len(arr)):
if arr[i]:
q = q0 = i
break
# Handle the case of no 1's at all.
if q == -1:
return None
# Find max gap between adjacent pairs of 1's, also the rightmost 1
max_gap = 0
a = b = p0 = q0 # Remember rightmost 1 so far.
while True:
# Advance p to next 1.
p = q + 1
while p < len(arr) and arr[p] == 0:
p += 1
# If we scanned off the end of the array, we're done.
if p == len(arr):
break
# Found a 1 at arr[p]. Update rightmost.
p0 = p
# Check the gap
gap = p - q
if gap > max_gap:
(a, b, max_gap) = (p, q, gap)
# Move on to the next pair.
q = p
# Return the non-wrapping case or the wrapping case with largest gap
return (q0, p0) if p0 - q0 < len(arr) - max_gap else (a, b)
print find_min_window(a)
This arrangement has the advantage that it scans the array just once.

Do an initial count of how many 1s should be in the window.
N = sum(X)
Calculate a new array by concatenating X
Y=X+X
Use a non circular subset search to find the smallest window with N 1s in Y. Say IX1 is the start of the window and IX2 is the end then the window size is
WinSize = IX2 - IX1

Yes, you should do like set a pointer for the first 1 and the last 1 no need for 4 pointers
Start finding first 1 and last one by traversing start (from starting index of array) and end (from last index) pointers.
The above length obtained would cover the minimum window as for counting all ones you need the last 1 index
start=0; end =0;
For(int i=0;i<length;i++)
{
If(a[i] =1)
{ start=i;break;}}
For(int j=length-1;j>=0;j++)
{
If(a[j] =1)
{ last=j;break;}}
windowlength=j-i+1;
For your above array this would give
i=2 and j=11 and windowlength=10

Related

Recover original array from all subsets

You are given all subset sums of an array. You are then supposed to recover the original array from the subset sums provided.
Every element in the original array is guaranteed to be non-negative and less than 10^5. There are no more than 20 elements in the original array. The original array is also sorted. The input is guaranteed to be valid.
Example 1
If the subset sums provided are this:
0 1 5 6 6 7 11 12
We can quickly deduce that the size of the original array is 3 since there are 8 (2^3) subsets. The output (i.e original array) for the above input is this:
1 5 6
Example 2
Input:
0 1 1 2 8 9 9 10
Output:
1 1 8
What I Tried
Since all elements are guaranteed to be non-negative, the largest integer in the input must be the total of the array. However, I am not sure as to how do I proceed from there. By logic, I thought that the next (2^2 - 1) largest subset sums must include all except one element from the array.
However, the above logic does not work when the original array is this:
1 1 8
That's why I am stuck and am not sure on how to proceed on.
Say S is the subset sum array and A is the original array. I'm assuming S is sorted.
|A| = log2(|S|)
S[0] = 0
S[1] = A[0]
S[2] = A[1]
S[3] = EITHER A[2] OR A[0] + A[1].
In general, S[i] for i >= 3 is either an element of A or a combination of the elements of A that you've already encountered. When processing S, skip once per combination of known elements of A that generate a given number, add any remaining numbers to A. Stop when A gets to the right size.
E.g., if A=[1,2,7,8,9] then S will include [1,2,1+2=3,...,1+8=9, 2+7=9,9,...]. When processing S we skip over two 9s because of 1+8 and 2+7, then see a third 9 which we know must belong to A.
E.g., if S=[0,1,1,2,8,9,9,10] then we know A has 3 elements, that the first 2 elements of A are [1,1], when we get to 2 we skip it because 1+1=2, we append 8 and we're done because we have 3 elements.
Here's an easy algorithm that doesn't require finding which subset sums to a given number.
S ← input sequence
X ← empty sequence
While S has a non-zero element:
d ← second smallest element of S (the smallest one is always zero)
Insert d in X
N ← empty sequence
While S is not empty:
z ← smallest element of S
Remove both z and z+d from S (if S does not contain z+d, it's an error; remove only one instance of both z and z+d if there are several).
Insert z in N.
S ← N
Output X.
I revisited this question a few years later and finally managed to solve it! The approach that I've used to tackle this problem is the same as what Dave had devised earlier. Dave gave a pretty concrete explanation so I'll just add on some details and append my commented C++ code so that it's a bit more clear;
Excluding the empty set, the two smallest elements in S has to be the two smallest elements in A. This is because every element is guaranteed to be non-negative. Having known the values of A[0] and A[1], we have something tangible to work and build bottom-up with.
Following which, any new element in S can either be a summation of the previous elements we have confirmed to be in A or it can an entirely new element in A. (i.e S[3] = A[0] + A[1] or S[3] = A[2]) To keep track of this, we can use a frequency table such as an unordered_map<int, int> in C++. We then repeat this process for S[4], S[5]... to continue filling up A.
To prune our search space, we can stop the moment the size of A corresponds with the size of S. (i.e |A| = log(|S|)/log2). This help us drastically cut unnecessary computation and runtime.
#include <bits/stdc++.h>
using namespace std;
typedef vector<int> vi;
int main () {
int n; cin>>n;
vi S, A, sums;
unordered_map<int, int> freq;
for (int i=0;i<(int) pow(2.0, n);i++) {
int a; cin>>a;
S.push_back(a);
}
sort(S.begin(), S.end());
// edge cases
A.push_back(S[1]);
if (n == 1) {for (auto v : A) cout << v << "\n"; return 0;}
A.push_back(S[2]);
if (n == 2) {for (auto v : A) cout << v << "\n"; return 0;}
sums.push_back(0); sums.push_back(S[1]); sums.push_back(S[2]);
sums.push_back(S[1] + S[2]);
freq[S[1] + S[2]]++; // IMPT: we only need frequency of composite elements
for (int i=3; i < S.size(); i++) {
if (A.size() == n) break; // IMPT: prune the search space
// has to be a new element in A
if (freq[S[i]] == 0) {
// compute the new subset sums with the addition of a new element
vi newsums = sums;
for (int j=0;j<sums.size();j++) {
int y = sums[j] + S[i];
newsums.push_back(y);
if (j != 0) freq[y]++; // IMPT: coz we only need frequency of composite elements
}
// update A and subset sums
sums = newsums;
A.push_back(S[i]);
} else {
// has to be a summation of the previous elements in A
freq[S[i]]--;
}
}
for (auto v : A) cout << v << "\n";
}

For an array A [0...N-1] how many times must I decrement elements before no two non-zero elements will have difference > M?

An array of size N A [0...N-1] contains some positive integers. What is the minimum number of times that I need to decrement some element so that no two elements (A[i] and A[j] , i != j, A[i]>0, A[j]>0) have difference > M ?
My approach so far :
for(int i = N-1;i>=0;i--)
{
for(int j = 0;j<=i-1;j++)
{
while(A[i]-A[j]>M)
{
A[i]--;
ans++;
}
}
}
But,this is not the correct solution.
For example,
A = {3 2 1} and M = 0
The optimal solution is decrementing A[2] by once and A[0] by once.
That makes the array A = {2 2 0}
As, A[2] = 0 we can ignore it as we only worry about non zero element.
But,this code produces ans = 3.
What is a solution to do it ?
This can be done in O(N log N) time, or O(N) time if the array is presorted.
In pseudo-code:
Given: A : array of ints, M = max difference
sort(A); //O(N log N) time
int start = end = 0; //this is a subsequence that we will move though the array
int sum_before = 0; //sum of all elements before our subsequence
int sum_after = sum_all(A); // sum of all elements after our subsequence -- O(N) time
int best_answer = sum_after; //we could always decrement everything to zero
for (start=0; start < A.length; ++start )
{
int maxval = A[start]+M; //A is sorted, so this never gets smaller
//extend end to find the longest subsequence starting
//at A[start] that we don't have to change
while( end < A.length && A[end]<=maxval)
{
//we can increment end at most A.length times, so
//this loop is O(N) in total for all iterations
sum_after-=A[end];
++end;
}
//if we leave everything between start and end unchanged
//the we'll need to decrement everything before to zero and
//everything after to maxval
int current_answer = sum_before + sum_after - (A.length - end)*maxval;
best_answer = min(best_answer, current_answer);
//next subsequence excludes A[start] -- it goes into the "before" sum
sum_before+=A[start];
}
Note that in the end, after all decreases are done, the list will contain these kinds of elements:
0-valued elements, which need not to be taken into account (they are removed)
a minimum value min (which may occur one or multiple times)
elements in between min and min + M.
If the final minimum value (min) would be known in advance, the following procedure could be applied:
decrease all elements larger than min + M until they are equal to min + M.
decrease all elements smaller than min until they become 0 (i.e. eliminate them).
Now, the problem is that we don't know in advance what will be the minimum value in the end.
However, a key observation is that the final minimum value (min) will for sure be one of the initial values in the input list (otherwise we would get a sub-optimal solution).
Thus, to solve this task, you can take each value in the input list as a candidate for the final minimum value (min), apply for each the procedure described above, and, finally, select the best of the solutions generated in this way.
Example. For the {1, 2, 3} input list and M = 0.
Each number in the input set will be a candidate for min.
min = 1. Then we need to decrement A[2] once, and A[3] twice (3 operations). The resulting set is {1, 1, 1}.
min = 2. We need to decrement A[1], in order to eliminate it, and A[3] once in order to make it inside the allowed range (2 operations). The resulting set is {2, 2}.
min = 3. We need to eliminate A[1] by decrementing it once and A[2], by decrementing it twice (3 operations). The resulting set is {3}.
The best alternative among the ones above is to make min = 2 (which required 2 decrement operations). The resulting set is {2, 2}.

How do you reorganize an array within O(n) runtime & O(1) space complexity?

I'm a 'space-complexity' neophyte and was given a problem.
Suppose I have an array of arbitrary integers:
[1,0,4,2,1,0,5]
How would I reorder this array to have all the zeros at one end:
[1,4,2,1,5,0,0]
...and compute the count of non-zero integers (in this case: 5)?
... in O(n) runtime with O(1) space complexity?
I'm not good at this.
My background is more environmental engineering than computer science so I normally think in the abstract.
I thought I could do a sort, then count the non-zero integers.
Then I thought I could merely do a element-per-element copy as I re-arrange the array.
Then I thought something like a bubble sort, switching neighboring elements till I reached the end with the zeroes.
I thought I could save on the 'space-complexity' via shift array-members' addresses, being that the array point points to the array, with offsets to its members.
I either enhance the runtime at the expense of the space complexity or vice versa.
What's the solution?
Two-pointer approach will solve this task and keep within the time and memory constraints.
Start by placing one pointer at the end, another at the start of the array. Then decrement the end pointer until you see the first non-zero element.
Now the main loop:
If the start pointer points to zero, swap it with the value pointed
by the end pointer; then decrement the end pointer.
Always increment the start pointer.
Finish when start pointer becomes greater than or equal to the end
pointer.
Finally, return the position of the start pointer - that's the number of nonzero elements.
This is the Swift code for the smart answer provided by #kfx
func putZeroesToLeft(inout nums: [Int]) {
guard var firstNonZeroIndex: Int = (nums.enumerate().filter { $0.element != 0 }).first?.index else { return }
for index in firstNonZeroIndex..<nums.count {
if nums[index] == 0 {
swap(&nums[firstNonZeroIndex], &nums[index])
firstNonZeroIndex += 1
}
}
}
Time complexity
There are 2 simple (not nested) loops repeated max n times (where n is the length of input array). So time is O(n).
Space complexity
Beside the input array we only use the firstAvailableSlot int var. So the space is definitely a constant: O(1).
As indicated by the other answers, the idea is to have two pointers, p and q, one pointing at the end of the array (specifically at the first nonzero entry from behind) and the other pointing at the beginning of the array. Scan the array with q, each time you hit a 0, swap elements pointed to by p and q, increment p and decrement q (specifically, make it point to the next nonzero entry from behind); iterate as long as p < q.
In C++, you could do something like this:
void rearrange(std::vector<int>& v) {
int p = 0, q = v.size()-1;
// make q point to the right position
while (q >= 0 && !v[q]) --q;
while (p < q) {
if (!v[p]) { // found a zero element
std::swap(v[p], v[q]);
while (q >= 0 && !v[q]) --q; // make q point to the right position
}
++p;
}
}
Start at the far end of the array and work backwards. First scan until you hit a nonzero (if any). Keep track of the location of this nonzero. Keep scanning. Whenever you encounter a zero -- swap. Otherwise increase the count of nonzeros.
A Python implementation:
def consolidateAndCount(nums):
count = 0
#first locate last nonzero
i = len(nums)-1
while nums[i] == 0:
i -=1
if i < 0:
#no nonzeros encountered
return 0
count = 1 #since a nonzero was encountered
for j in range(i-1,-1,-1):
if nums[j] == 0:
#move to end
nums[j], nums[i] = nums[i],nums[j] #swap is constant space
i -=1
else:
count += 1
return count
For example:
>>> nums = [1,0,4,2,1,0,5]
>>> consolidateAndCount(nums)
5
>>> nums
[1, 5, 4, 2, 1, 0, 0]
The suggested answers with 2 pointers and swapping are changing the order of non-zero array elements which is in conflict with the example provided. (Although he doesn't name that restriction explicitly, so maybe it is irrelevant)
Instead, go through the list from left to right and keep track of the number of 0s encountered so far.
Set counter = 0 (zeros encountered so far).
In each step, do the following:
Check if the current element is 0 or not.
If the current element is 0, increment the counter.
Otherwise, move the current element by counter to the left.
Go to the next element.
When you reach the end of the list, overwrite the values from array[end-counter] to the end of the array with 0s.
The number of non-zero integers is the size of the array minus the counted zeros.
This algorithm has O(n) time complexity as we go at most twice through the whole array (array of all 0s; we could modify the update scheme a little to only go through at most exactly once though). It only uses an additional variable for counting which satisfies the O(1) space constraint.
Start with iterating over the array (say, i) and maintaining count of zeros encountered (say zero_count) till now.
Do not increment the iterative counter when the current element is 0. Instead increment zero_count.
Copy the value in i + zero_count index to the current index i.
Terminate the loop when i + zero_count is greater than array length.
Set the remaining array elements to 0.
Pseudo code:
zero_count = 0;
i = 0;
while i + zero_count < arr.length
if (arr[i] == 0) {
zero_count++;
if (i + zero_count < arr.length)
arr[i] = arr[i+zero_count]
} else {
i++;
}
while i < arr.length
arr[i] = 0;
i++;
Additionally, this also preserves the order of non-zero elements in the array,
You can actually solve a more generic problem called the Dutch national flag problem, which is used to in Quicksort. It partitions an array into 3 parts according to a given mid value. First, place all numbers less than mid, then all numbers equal to mid and then all numbers greater than mid.
Then you can pick the mid value as infinity and treat 0 as infinity.
The pseudocode given by the above link:
procedure three-way-partition(A : array of values, mid : value):
i ← 0
j ← 0
n ← size of A - 1
while j ≤ n:
if A[j] < mid:
swap A[i] and A[j]
i ← i + 1
j ← j + 1
else if A[j] > mid:
swap A[j] and A[n]
n ← n - 1
else:
j ← j + 1

Median of 5 sorted arrays

I am trying to find the solution for median of 5 sorted arrays. This was an interview questions.
The solution I could think of was merge the 5 arrays and then find the median [O(l+m+n+o+p)].
I know that for 2 sorted arrays of same size we can do it in log(2n). [by comparing the median of both arrays and then throwing out 1 half of each array and repeating the process]. .. Finding median can be constant time in sorted arrays .. so I think this is not log(n) ? .. what is the time complexity for this ?
1] Is there a similar solution for 5 arrays . What if the arrays are of same size , is there a better solution then ?
2] I assume since this was asked for 5, there would be some solution for N sorted arrays ?
Thanks for any pointers.
Some clarification/questions I asked back to the interviewer:
Are the arrays of same length
=> No
I guess there would be an overlap in the values of arrays
=> Yes
As an exercise, I think the logic for 2 arrays doesnt extend . Here is a try:
Applying the above logic of 2 arrays to say 3 arrays:
[3,7,9] [4,8,15] [2,3,9] ... medians 7,8,3
throw elements [3,7,9] [4,8] [3,9] .. medians 7,6,6
throw elements [3,7] [8] [9] ..medians 5,8,9 ...
throw elements [7] [8] [9] .. median = 8 ... This doesnt seem to be correct ?
The merge of sorted elements => [2,3,4,7,8,9,15] => expected median = 7
(This is a generalization of your idea for two arrays.)
If you start by looking at the five medians of the five arrays, obviously the overall median must be between the smallest and the largest of the five medians.
Proof goes something like this: If a is the min of the medians, and b is the max of the medians, then each array has less than half of its elements less than a and less than half of its elements greater than b. Result follows.
So in the array containing a, throw away numbers less than a; in the array containing b, throw away numbers greater than b... But only throw away the same number of elements from both arrays.
That is, if a is j elements from the start of its array, and b is k elements from the end of its array, you throw away the first min(j,k) elements from a's array and the last min(j,k) elements from b's array.
Iterate until you are down to 1 or 2 elements total.
Each of these operations (i.e., finding median of a sorted array and throwing away k elements from the start or end of an array) is constant time. So each iteration is constant time.
Each iteration throws away (more than) half the elements from at least one array, and you can only do that log(n) times for each of the five arrays... So the overall algorithm is log(n).
[Update]
As Himadri Choudhury points out in the comments, my solution is incomplete; there are a lot of details and corner cases to worry about. So, to flesh things out a bit...
For each of the five arrays R, define its "lower median" as R[n/2-1] and its "upper median" as R[n/2], where n is the number of elements in the array (and arrays are indexed from 0, and division by 2 rounds down).
Let "a" be the smallest of the lower medians, and "b" be the largest of the upper medians. If there are multiple arrays with the smallest lower median and/or multiple arrays with the largest upper median, choose a and b from different arrays (this is one of those corner cases).
Now, borrowing Himadri's suggestion: Erase all elements up to and including a from its array, and all elements down to and including b from its array, taking care to remove the same number of elements from both arrays. Note that a and b could be in the same array; but if so, they could not have the same value, because otherwise we would have been able to choose one of them from a different array. So it is OK if this step winds up throwing away elements from the start and end of the same array.
Iterate as long as you have three or more arrays. But once you are down to just one or two arrays, you have to change your strategy to be exclusive instead of inclusive; you only erase up to but not including a and down to but not including b. Continue like this as long as both of the remaining one or two arrays has at least three elements (guaranteeing you make progress).
Finally, you will reduce to a few cases, the trickiest of which is two arrays remaining, one of which has one or two elements. Now, if I asked you: "Given a sorted array plus one or two additional elements, find the median of all elements", I think you can do that in constant time. (Again, there are a bunch of details to hammer out, but the basic idea is that adding one or two elements to an array does not "push the median around" very much.)
Should be pretty straight to apply the same idea to 5 arrays.
First, convert the question to more general one. Finding Kth element in N sorted arrays
Find (K/N)th element in each sorted array with binary search, say K1, K2... KN
Kmin = min(K1 ... KN), Kmax = max(K1 ... KN)
Throw away all elements less than Kmin or larger than Kmax, say X elements has been thrown away.
Now repeat the process by find (K - X)th element in sorted arrays with remaining elements
You don't need to do a complete merge of the 5 arrays. You can do a merge sort until you have (l+n+o+p+q)/2 elements then you have the median value.
Finding the kth element in a list of sorted lists can be done by binary search.
from bisect import bisect_left
from bisect import bisect_right
def kthOfPiles(givenPiles, k, count):
'''
Perform binary search for kth element in multiple sorted list
parameters
==========
givenPiles are list of sorted list
count is the total number of
k is the target index in range [0..count-1]
'''
begins = [0 for pile in givenPiles]
ends = [len(pile) for pile in givenPiles]
#print('finding k=', k, 'count=', count)
for pileidx,pivotpile in enumerate(givenPiles):
while begins[pileidx] < ends[pileidx]:
mid = (begins[pileidx]+ends[pileidx])>>1
midval = pivotpile[mid]
smaller_count = 0
smaller_right_count = 0
for pile in givenPiles:
smaller_count += bisect_left(pile,midval)
smaller_right_count += bisect_right(pile,midval)
#print('check midval', midval,smaller_count,k,smaller_right_count)
if smaller_count <= k and k < smaller_right_count:
return midval
elif smaller_count > k:
ends[pileidx] = mid
else:
begins[pileidx] = mid+1
return -1
def medianOfPiles(givenPiles,count=None):
'''
Find statistical median
Parameters:
givenPiles are list of sorted list
'''
if not givenPiles:
return -1 # cannot find median
if count is None:
count = 0
for pile in givenPiles:
count += len(pile)
# get mid floor
target_mid = count >> 1
midval = kthOfPiles(givenPiles, target_mid, count)
if 0 == (count&1):
midval += kthOfPiles(givenPiles, target_mid-1, count)
midval /= 2
return '%.1f' % round(midval,1)
The code above gives correct-statistical median as well.
Coupling this above binary search with patience-sort, gives a valuable technique.
There is worth mentioning median of median algorithm for selecting pivot. It gives approximate value. I guess that is different from what we are asking here.
Use heapq to keep each list's minum candidates.
Prerequisite: N sorted K length list
O(NKlgN)
import heapq
class Solution:
def f1(self, AS):
def f(A):
n = len(A)
m = n // 2
if n % 2:
return A[m]
else:
return (A[m - 1] + A[m]) / 2
res = []
q = []
for i, A in enumerate(AS):
q.append([A[0], i, 0])
heapq.heapify(q)
N, K = len(AS), len(AS[0])
while len(res) < N * K:
mn, i, ii = heapq.heappop(q)
res.append(mn)
if ii < K - 1:
heapq.heappush(q, [AS[i][ii + 1], i, ii + 1])
return f(res)
def f2(self, AS):
q = []
for i, A in enumerate(AS):
q.append([A[0], i, 0])
heapq.heapify(q)
N, K = len(AS), len(AS[0])
n = N * K
m = n // 2
m1 = m2 = float('-inf')
k = 0
while k < N * K:
mn, i, ii = heapq.heappop(q)
res.append(mn)
k += 1
if k == m - 1:
m1 = mn
elif k == m:
m2 = mn
return m2 if n % 2 else (m1 + m2) / 2
if ii < K - 1:
heapq.heappush(q, [AS[i][ii + 1], i, ii + 1])
return 'should not go here'

Find shortest subarray containing all elements

Suppose you have an array of numbers, and another set of numbers. You have to find the shortest subarray containing all numbers with minimal complexity.
The array can have duplicates, and let's assume the set of numbers does not. It's not ordered - the subarray may contain the set of number in any order.
For example:
Array: 1 2 5 8 7 6 2 6 5 3 8 5
Numbers: 5 7
Then the shortest subarray is obviously Array[2:5] (python notation).
Also, what would you do if you want to avoid sorting the array for some reason (a la online algorithms)?
Proof of a linear-time solution
I will write right-extension to mean increasing the right endpoint of a range by 1, and left-contraction to mean increasing the left endpoint of a range by 1. This answer is a slight variation of Aasmund Eldhuset's answer. The difference here is that once we find the smallest j such that [0, j] contains all interesting numbers, we thereafter consider only ranges that contain all interesting numbers. (It's possible to interpret Aasmund's answer this way, but it's also possible to interpret it as allowing a single interesting number to be lost due to a left-contraction -- an algorithm whose correctness has yet to be established.)
The basic idea is that for each position j, we will find the shortest satisfying range ending at position j, given that we know the shortest satisfying range ending at position j-1.
EDIT: Fixed a glitch in the base case.
Base case: Find the smallest j' such that [0, j'] contains all interesting numbers. By construction, there can be no ranges [0, k < j'] that contain all interesting numbers so we don't need to worry about them further. Now find the smallestlargest i such that [i, j'] contains all interesting numbers (i.e. hold j' fixed). This is the smallest satisfying range ending at position j'.
To find the smallest satisfying range ending at any arbitrary position j, we can right-extend the smallest satisfying range ending at position j-1 by 1 position. This range will necessarily also contain all interesting numbers, though it may not be minimal-length. The fact that we already know this is a satisfying range means that we don't have to worry about extending the range "backwards" to the left, since that can only increase the range over its minimal length (i.e. make the solution worse). The only operations we need to consider are left-contractions that preserve the property of containing all interesting numbers. So the left endpoint of the range should be advanced as far as possible while this property holds. When no more left-contractions can be performed, we have the minimal-length satisfying range ending at j (since further left-contractions clearly cannot make the range satisfying again) and we are done.
Since we perform this for each rightmost position j, we can take the minimum-length range over all rightmost positions to find the overall minimum. This can be done using a nested loop in which j advances on each outer loop cycle. Clearly j advances by 1 n times. Since at any point in time we only ever need the leftmost position of the best range for the previous value of j, we can store this in i and just update it as we go. i starts at 0, is at all times <= j <= n, and only ever advances upwards by 1, meaning it can advance at most n times. Both i and j advance at most n times, meaning that the algorithm is linear-time.
In the following pseudo-code, I've combined both phases into a single loop. We only try to contract the left side if we have reached the stage of having all interesting numbers:
# x[0..m-1] is the array of interesting numbers.
# Load them into a hash/dictionary:
For i from 0 to m-1:
isInteresting[x[i]] = 1
i = 0
nDistinctInteresting = 0
minRange = infinity
For j from 0 to n-1:
If count[a[j]] == 0 and isInteresting[a[j]]:
nDistinctInteresting++
count[a[j]]++
If nDistinctInteresting == m:
# We are in phase 2: contract the left side as far as possible
While count[a[i]] > 1 or not isInteresting[a[i]]:
count[a[i]]--
i++
If j - i < minRange:
(minI, minJ) = (i, j)
count[] and isInteresting[] are hashes/dictionaries (or plain arrays if the numbers involved are small).
This sounds like a problem that is well-suited for a sliding window approach: maintain a window (a subarray) that is gradually expanding and contracting, and use a hashmap to keep track of the number of times each "interesting" number occurs in the window. E.g. start with an empty window, then expand it to include only element 0, then elements 0-1, then 0-2, 0-3, and so on, by adding subsequent elements (and using the hashmap to keep track of which numbers exist in the window). When the hashmap tells you that all interesting numbers exist in the window, you can begin contracting it: e.g. 0-5, 1-5, 2-5, etc., until you find out that the window no longer contains all interesting numbers. Then, you can begin expanding it on the right hand side again, and so on. I'm quite (but not entirely) sure that this would work for your problem, and it can be implemented to run in linear time.
Say the array has n elements, and set has m elements
Sort the array, noting the reverse index (position in the original array)
// O (n log n) time
for each element in given set
find it in the array
// O (m log n) time - log n for binary serch, m times
keep track of the minimum and maximum index for each found element
min - max defines your range
Total time complexity: O ((m+n) log n)
This solution definitely does not run in O(n) time as suggested by some of the pseudocode above, however it is real (Python) code that solves the problem and by my estimates runs in O(n^2):
def small_sub(A, B):
len_A = len(A)
len_B = len(B)
sub_A = []
sub_size = -1
dict_b = {}
for elem in B:
if elem in dict_b:
dict_b[elem] += 1
else:
dict_b.update({elem: 1})
for i in range(0, len_A - len_B + 1):
if A[i] in dict_b:
temp_size, temp_sub = find_sub(A[i:], dict_b.copy())
if (sub_size == -1 or (temp_size != -1 and temp_size < sub_size)):
sub_A = temp_sub
sub_size = temp_size
return sub_size, sub_A
def find_sub(A, dict_b):
index = 0
for i in A:
if len(dict_b) == 0:
break
if i in dict_b:
dict_b[i] -= 1
if dict_b[i] <= 0:
del(dict_b[i])
index += 1
if len(dict_b) > 0:
return -1, {}
else:
return index, A[0:index]
Here's how I solved this problem in linear time using collections.Counter objects
from collections import Counter
def smallest_subsequence(stream, search):
if not search:
return [] # the shortest subsequence containing nothing is nothing
stream_counts = Counter(stream)
search_counts = Counter(search)
minimal_subsequence = None
start = 0
end = 0
subsequence_counts = Counter()
while True:
# while subsequence_counts doesn't have enough elements to cancel out every
# element in search_counts, take the next element from search
while search_counts - subsequence_counts:
if end == len(stream): # if we've reached the end of the list, we're done
return minimal_subsequence
subsequence_counts[stream[end]] += 1
end += 1
# while subsequence_counts has enough elements to cover search_counts, keep
# removing from the start of the sequence
while not search_counts - subsequence_counts:
if minimal_subsequence is None or (end - start) < len(minimal_subsequence):
minimal_subsequence = stream[start:end]
subsequence_counts[stream[start]] -= 1
start += 1
print(smallest_subsequence([1, 2, 5, 8, 7, 6, 2, 6, 5, 3, 8, 5], [5, 7]))
# [5, 8, 7]
Java solution
List<String> paragraph = Arrays.asList("a", "c", "d", "m", "b", "a");
Set<String> keywords = Arrays.asList("a","b");
Subarray result = new Subarray(-1,-1);
Map<String, Integer> keyWordFreq = new HashMap<>();
int numKeywords = keywords.size();
// slide the window to contain the all the keywords**
// starting with [0,0]
for (int left = 0, right = 0 ; right < paragraph.size() ; right++){
// expand right to contain all the keywords
String currRight = paragraph.get(right);
if (keywords.contains(currRight)){
keyWordFreq.put(currRight, keyWordFreq.get(currRight) == null ? 1 : keyWordFreq.get(currRight) + 1);
}
// loop enters when all the keywords are present in the current window
// contract left until the all the keywords are still present
while (keyWordFreq.size() == numKeywords){
String currLeft = paragraph.get(left);
if (keywords.contains(currLeft)){
// remove from the map if its the last available so that loop exists
if (keyWordFreq.get(currLeft).equals(1)){
// now check if current sub array is the smallest
if((result.start == -1 && result.end == -1) || (right - left) < (result.end - result.start)){
result = new Subarray(left, right);
}
keyWordFreq.remove(currLeft);
}else {
// else reduce the frequcency
keyWordFreq.put(currLeft, keyWordFreq.get(currLeft) - 1);
}
}
left++;
}
}
return result;
}

Resources