Find the maximum length subarray condition 2 * min > max - arrays

This was an interview question I was recently asked at Adobe:
In an array, find the maximum length subarray with the condition 2 * min > max, where min is the minimum element of the subarray, and max is the maximum element of the subarray.
Does anyone has any approach better than O(n^2)?
Of course, we can't sort, as a subarray is required.
Below is my O(n^2) approach:
max=Integer.MIN_VALUE;
for (int i=0; i<A.length-1;i++)
for(j=i+1;j<A.length;j++)
{
int min =findMin(A,i,j);
int max =findMAx(A,i,j);
if(2*min<=max) {
if(j-i+1>max)
max = j-i+1
}
}
Does anybody know an O(n) solution?

Let A[i…j] be the subarray consisting of A[i], A[i+1], … A[j].
Observations:
If A[i…j] doesn't satisfy the criterion, then neither does A[i…(j+1)], because 2·min(A[i…(j+1)]) ≤ 2·min(A[i…j]) ≤ max(A[i…j]) ≤ max(A[i…(j+1)]). So you can abort your inner loop as soon as you find a j for which condition is not satisfied.
If we've already found a subarray of length L that meets the criterion, then there's no need to consider any subarray with length ≤ L. So you can start your inner loop with j = i + maxLength rather than j = i + 1. (Of course, you'll need to initialize maxLength to 0 rather than Integer.MIN_VALUE.)
Combining the above, we have:
int maxLength = 0;
for (int i = 0; i < A.length; ++i) {
for (int j = i + maxLength; j < A.length; ++j) {
if (findMin(A,i,j) * 2 > findMax(A,i,j)) {
// success -- now let's look for a longer subarray:
maxLength = j - i + 1;
} else {
// failure -- keep looking for a subarray this length:
break;
}
}
}
It may not be obvious at first glance, but the inner loop now goes through a total of only O(n) iterations, because j can only take each value at most once. (For example, if i is 3 and maxLength is 5, then j starts at 8. If we A[3…8] meets the criterion, we increment maxLength until we find a subarray that doesn't meet the criterion. Once that happens, we progress from A[i…(i+maxLength)] to A[(i+1)…((i+1)+maxLength)], which means the new loop starts with a greater j than the previous loop left off.)
We can make this more explicit by refactoring a bit to model A[i…j] as a sliding-and-potentially-expanding window: incrementing i removes an element from the left edge of the window, incrementing j adds an element to the right edge of the window, and there's never any need to increment i without also incrementing j:
int maxLength = 0;
int i = 0, j = 0;
while (j < A.length) {
if (findMin(A,i,j) * 2 > findMax(A,i,j)) {
// success -- now let's look for a longer subarray:
maxLength = j - i + 1;
++j;
} else {
// failure -- keep looking for a subarray this length:
++i;
++j;
}
}
or, if you prefer:
int maxLength = 0;
int i = 0;
for (int j = 0; j < A.length; ++j) {
if (findMin(A,i,j) * 2 > findMax(A,i,j)) {
// success -- now let's look for a longer subarray:
maxLength = j - i + 1;
} else {
// failure -- keep looking for a subarray this length:
++i;
}
}
Since in your solution, the inner loop iterates a total of O(n2) times, and you've stated that your solution runs in O(n2) time, we could argue that, since the above has the inner loop iterate only O(n) times, the above must run in O(n) time.
The problem is, that premise is really very questionable; you haven't indicated how you would implement findMin and findMax, but the straightforward implementation would take O(j−i) time, such that your solution actually runs in O(n3) rather than O(n2). So if we reduce the number of inner loop iterations from O(n2) to O(n), that just brings the total time complexity down from O(n3) to O(n2).
But, as it happens, it is possible to calculate the min and max of these subarrays in amortized O(1) time and O(n) extra space, using "Method 3" at https://www.geeksforgeeks.org/sliding-window-maximum-maximum-of-all-subarrays-of-size-k/. (Hat-tip to גלעד ברקן for pointing this out.) The way it works is, you maintain two deques, minseq for calculating min and maxseq for calculating max. (I'll only explain minseq; maxseq is analogous.) At any given time, the first element (head) of minseq is the index of the min element in A[i…j]; the second element of minseq is the index of the min element after the first element; and so on. (So, for example, if the subarray is [80,10,30,60,50] starting at index #2, then minseq will be [3,4,6], those being the indices of the subsequence [10,30,50].) Whenever you increment i, you check if the old value of i is the head of minseq (meaning that it's the current min); if so, you remove the head. Whenever you increment j, you repeatedly check if the tail of minseq is the index of an element that's greater or equal to the element at j; if so, you remove the tail and repeat. Once you've removed all such tail elements, you add j to the tail. Since each index is added to and removed from the deque at most once, this bookkeeping has a total cost of O(n).
That gives you overall O(n) time, as desired.

There's a simple O(n log n) time and O(n) space solution since we know the length of the window is bound, which is to binary search for the window size. For each chosen window size, we iterate over the array once, and we make O(log n) such traversals. If the window is too large, we won't find a solution and try a window half the size; otherwise we try a window halfway between this and the last successful window size. (To update the min and max in the sliding window we can use method 3 described here.)

Here's an algorithm in O(n lg k) time, where n is the length of the array and k the length of the maxmimum subarray having 2 * min > max.
Let A the array. Let's start with the following invariant: for j between 0 and length A, SA(j) is empty or 2 * min > max. It is extremely easy to initialize: take the empty subarray from indices 0 to 0. (Note that SA(j) may be empty because A[j] may be zero or negative: you don't have 2 * min > max because min >= 2 * min > max is impossible.)
The algorithm is: for each j, we set SA(j) = SA(j-1) + A[j]. But if A[j] >= 2 * min(SA(j-1)), then the invariant is broken. To restore the invariant, we have to remove all the elements e from SA(j) that meet A[j] >= 2 * e. In the same way, the invariant is broken if 2 * A[j] <= max(SA(j-1)). To restore the invariant, we have to remove all the elements e from SA(j) that meet 2 * A[j] <= e.
On the fly, we keep a track of the longest SA(j) found and return it.
Hence the algorithm:
SA(0) <- A[0..1] # 1 excluded -> empty subarray
ret <- SA(0)
for j in 1..length(A):
if A[j] >= 2 * min(SA(j-1)):
i <- the last index having A[j] >= 2 * A[i]
SA(j) <- A[i+1..j+1]
else if 2 * A[j] <= max(SA(j-1)):
i <- the last index having 2 * A[j] <= A[i]
SA(j) <- A[i+1..j+1]
if length(SA(j)) > length(ret):
ret <- SA(j)
return ret
The question is: how do we find the last index i having A[j] >= 2 * A[i]? If we iterate over SA(j-1), we need k steps at most, and then the time complexity will be O(n k) (we start with j-1 and look for the last value that keeps the invariant).
But there is a better solution. Imagine we have a min heap that stores elements of SA(j-1) along with their positions. The first element is the minimum of SA(j-1), let i0 be its index. We can remove all elements from the start of SA(j-1) to i0 included. Now, are we sure that A[j] >= 2 * A[i] for all remaining is? No: there is maybe more elements that are to small. Hence we remove the elements one after the other until the invariant is restored.
We'll need a max heap to, to handle the other situation 2 * A[j] <= max(SA(j-1)).
The easier is to create an ad hoc queue that has the following operations:
add(v): add an element v to the queue
remove_until_min_gt(v): remove elements from start of the queue until the minimum is greater than v
remove_until_max_lt(v): remove elements from start of the queue until the maximum is less than v
maximum: get the maximum of the queue
minimum: get the minimum of the queue
With two heaps, maximum and minimum are O(1), but the other operations are O(lg k).
Here is a Python implementation that keep indices of the start and the en of the queue:
import heapq
class Queue:
def __init__(self):
self._i = 0 # start in A
self._j = 0 # end in A
self._minheap = []
self._maxheap = []
def add(self, value):
# store the value and the indices in both heaps
heapq.heappush(self._minheap, (value, self._j))
heapq.heappush(self._maxheap, (-value, self._j))
# update the index in A
self._j += 1
def remove_until_min_gt(self, v):
return self._remove_until(self._minheap, lambda x: x > v)
def remove_until_max_lt(self, v):
return self._remove_until(self._maxheap, lambda x: -x < v)
def _remove_until(self, heap, check):
while heap and not check(heap[0][0]):
j = heapq.heappop(heap)[1]
if self._i < j + 1:
self._i = j + 1 # update the start index
# remove front elements before the start index
# there may remain elements before the start index in the heaps,
# but the first element is after the start index.
while self._minheap and self._minheap[0][1] < self._i:
heapq.heappop(self._minheap)
while self._maxheap and self._maxheap[0][1] < self._i:
heapq.heappop(self._maxheap)
def minimum(self):
return self._minheap[0][0]
def maximum(self):
return -self._maxheap[0][0]
def __repr__(self):
ns = [v for v, _ in self._minheap]
return f"Queue({ns})"
def __len__(self):
return self._j - self._i
def from_to(self):
return self._i, self._j
def find_min_twice_max_subarray(A):
queue = Queue()
best_len = 0
best = (0, 0)
for v in A:
queue.add(v)
if 2 * v <= queue.maximum():
queue.remove_until_max_lt(v)
elif v >= 2 * queue.minimum():
queue.remove_until_min_gt(v/2)
if len(queue) > best_len:
best_len = len(queue)
best = queue.from_to()
return best
You can see that every element of A except the last one may pass through this queue, thus the O(n lg k) time complexity.
Here's a test.
import random
A = [random.randint(-10, 20) for _ in range(25)]
print(A)
# [18, -4, 14, -9, 8, -6, 12, 13, -7, 7, -2, 14, 7, 9, -9, 9, 20, 19, 14, 13, 14, 14, 2, -8, -2]
print(A[slice(*find_min_twice_max_subarray(A))])
# [20, 19, 14, 13, 14, 14]
Obviously, if there was a way to find the start index that restores the invariant in O(1), we would have a time complexity in  O(1). (This reminds me how the KMP algorithm finds the best new start in a string matching problem, but I don't know if it is possible to create something similar here.)

Related

Data Structure Question on Arrays - How to find the best of array given conditions

I am new and learning Data structure and algorithm, I need help to solve this question
The best of an array having N elements is defined as sum of best of all elements of Array. The best of element A[i] is defined in the following manner
a: The best of element A[i] is 1 if, A[i-1]<A[i]<A[i+1]
b: The best of element A[i] is 2 if, A[i]> A[j] for j ranging from 0 to n-1
and A[i]<A[h] for h ranging from i+1 to N-1
Write program to find best of array
Note- A[0] and A[N-1] are excluded to find best of array, all elements are unique
Input - 2,1,3,9,20,7,8
Output - 3
The best of element 3 is 2 and 9 is 1. For rest element it is 0. Hence 2+1 =3
This is what I tried so far -
public static void main (String [] args) {
int [] A = {2,1,3,9,20,7,8};
int result = 0;
for(int i=1; i<A.length-2; i++) {
if(A[i-1] < A[i] && A[i]< A[i+1] ) {
result += 1;
}else if(A[i]>A[j] && A[i]<A[h]){
result +=2;
}else {
result+=0;
}
}
}
Note how the phrase:
A[i]> A[j] for j ranging from 0 to n-1
simply means: If the current element is not the Minimum of the array. Hence, if you find the minimum at the beginning, this condition can be changed into a much simpler and lightweight condition:
Let m be the minimum of the array, then if A[i] > m
So you don't need to do a linear search every iteration --> Less time complexity.
Now you have the problem with a complexity of O(N^2), ..which can be reduced further.
Regarding
and A[i]<A[h] for h ranging from i+1 to N-1
Get the maximum element from 2 to N-1. Then at every iteration, check if the current element is less than the maximum. If so, consider it while composing the score, otherwise, that means the current element is the maximum, in this case, re-calculate the maximum element from i+1 to N-1.
The worst case scenario is to find the maximum is always at index i where the array is already sorted in descending order.
Whereas the best case scenario is if the maximum is always the last element, hence the overall complexity is reduced to O(N).
Regarding
A[i-1]<A[i]<A[i+1]
This is straightforward, you simply compare the elements reside at those three indices at every iteration.
Implementation
Before anything, the following are important notes:
The result you've got in your example isn't correct as elements 3 and 9 both fulfill both conditions, so each should score either 1 or 2, but cannot be one with score of 1 and another with score of 2. Hence the overall score should be either 1+1 = 2 or 2 + 2 = 4.
I implemented this algorithm in Java (although I prefer Python), as I could guess it from your code snippet.
import java.util.Arrays;
public class ArrayBest {
private static int[] findMinMax(Integer [] B) {
// find minimum and the maximum: Time Complexity O(n log(n))
Integer[] b = Arrays.copyOf(B, B.length);
Arrays.sort(b);
return new int []{b[0], b[B.length-1]};
}
public static int find(Integer [] A) {
// Exclude the first and last elements
int N = A.length;
Integer [] B = Arrays.copyOfRange(A, 1, N-1);
N -= 2;
// find minimum and the maximum: Time Complexity O(n log(n))
// min at index 0, and max at index 1
int [] minmax = findMinMax(B);
int result = 0;
// start the search
for (int i=0; i<N-1; i++) {
// start with first condition : the easier
if (i!=0 && B[i-1]<B[i] && B[i]<B[i+1]) {
result += 1;
}else if (B[i] != minmax[0]) { // Equivalent to A[i]> A[j] : j in [0, N-1]
if (B[i] < minmax[1]) { // if it is less than the maximum
result += 2;
}else { // it is the maximum --> re-calculate the max over the range (i+1, N)
int [] minmax_ = findMinMax(Arrays.copyOfRange(B, i+1, N));
minmax[1] = minmax_[1];
}
}
}
return result;
}
public static void main(String[] args) {
Integer [] A = {2,1,3,9,7,20,8};
int res = ArrayBest.find(A);
System.out.println(res);
}
}
Ignoring the first sort, the best case scenario is when the last element is the maximum (i.e, at index N-1), hence time complexity is O(N).
The worst case scenario, is when the array is already sorted in a descending order, so the current element that is being processed is always the maximum, hence at each iteration the maximum should be found again. Consequently, the time complexity is O(N^2).
The average case scenario depends on the probability of how the elements are distributed in the array. In other words, the probability that the element being processed at the current iteration is the maximum.
Although it requires more study, my initial guess is as follows:
The probability of any i.i.d element to be the maximum is simply 1/N, and that is at the very beginning, but as we are searching over (i+1, N-1), N will be decreasing, hence the probability will go like: 1/N, 1/(N-1), 1/(N-2), ..., 1. Counting the outer loop, we can write the average complexity as O(N (1/N + 1/(N-1), 1/(N-2), + ... +1)) = O(N (1 + 1/2 + 1/3 + ... + 1/N)) where its asymptotic upper bound (according to Harmonic series) is approximately O(N log(N)).

I need help in optimization of an array problem that has been asked in interview last month

Given 2 arrays a and b find the max value of b[i] + b[j] + b[k] (i !=j != k) , where the following condition is satisfied
a[i] > a[j] > a[k]
I can calculate it by using 3 loops. I have been asked to optimize the problem during the interview but was unable to do it.
any optimized method better than O(n^3) ?
thanks
You can probably get away with an O(n log n) solution by sorting B (n log n time) and tracking which indexes are used, checking A for the property A[i] > A[j] > A[k]. Note that since the only relationship between i, j and k is that they are not equal, their order doesn't seem to matter. (Also, it might be that i and k can be equal, but we'll ignore that for now.)
Pseudocode:
function findSum(A, B):
declare array B_with_index
// O(n)
for index 0 to |B|-1:
insert (B[index], index) into B_with_index
// O(n log n)
sort B_with_index by its first value, descending
// O(n)
for index 0 to |B|-3:
b_1 := B_with_index[index]
b_2 := B_with_index[index+1]
b_3 := B_with_index[index+2]
// Note that we're only checking for inequality here
// This is because i, j and k have no relationship
// other than inequality. E.q., if A[i] < A[j], we can
// just swap the definitions of i and j.
if A[b_1.index] != A[b_2.index] and A[b_2.index] != A[b_3.index] then
return b_1.value + b_2.value + b_3.value
return error
This relies on the fact that if you sort an array such as [1, 5, 25, 2, -5, 50], the maximum sum will be at the start: [50, 25, 5, 2, 1, -5]. Then all you have to do is make sure the indexes involved have the right property within A.

Selection sort: What is n-1?

int main() {
int arr[] = { 64, 25, 12, 22, 11 };
int n = sizeof(arr) / sizeof(arr[0]);
selectionSort(arr, n);
return 0;
}
void selectionSort(int arr[], int n) {
int i, j, min_idx;
// One by one move boundary of unsorted subarray
for (i = 0; i < n - 1; i++) {
// Find the minimum element in unsorted array
min_idx = i;
for (j = i + 1; j < n; j++)
if (arr[j] < arr[min_idx])
min_idx = j;
// Swap the found minimum element with the first element
swap(&arr[min_idx], &arr[i]);
}
}
I have see this C language code that'll do sorting algorithms called Selection Sort. But my question is in the selectionSort function.
Why in the first for loop, is the condition i < n - 1 whereas in the second loop it is j < n?
What will i < n - 1 do exactly? and why different cases for the second loop? Can you please explain this code to me like I'm in sixth grade of elementary school. Thank You.
The first loop has to iterate up to index n-2 (thus i < n-1) because the second for loop has to check numbers i+1 up to n-1 (thus j < n). If i could get the value n - 1, then the access in if (arr[j] < arr[min_idx]) would be out of bounds, specifically arr[j] would be out of bounds for j==n.
You could think that this implementation of selection sort moves from left to right on the array, leaving always a sorted array on its left. That's why the second for loop starts visiting elements from index i+1.
You could find many resources online to visualize how selection sort works, e.g., Selection sort in Wikipedia
The implementation on Wikipedia is annotated and explains it.
/* advance the position through the entire array */
/* (could do i < aLength-1 because single element is also min element) */
for (i = 0; i < aLength-1; i++)
Selection sort works by finding the smallest element and swapping it in place. When there's only one unsorted element left it is the smallest unsorted element and it is at the end of the sorted array.
For example, let's say we have {3, 5, 1}.
i = 0 {3, 5, 1} // swap 3 and 1
^
i = 1 {1, 5, 3} // swap 5 and 3
^
i = 2 {1, 3, 5} // swap 5 and... 5?
^
For three elements we only need two swaps. For n elements we only need n-1 swaps.
It's an optimization which might improve performance a bit on very small arrays, but otherwise inconsequential in an O(n^2) algorithm like selection sort.
Why in the first for loop, the condition is i < n-1? But in the second loop is j < n?
The loop condition for the inner loop is j < n because the index of the last element to be sorted is n - 1, so that when j >= n we know that it is past the end of the data.
The loop condition for the outer loop could have been i < n, but observe that no useful work would then be done on the iteration when i took the value n - 1. The initial value of j in the inner loop is i + 1, which in that case would be n. Thus no iterations of the inner loop would be performed.
But no useful work is not the same as no work at all. On every outer-loop iteration in which i took the value n - 1, some bookkeeping would be performed, and arr[i] would be swapped with itself. Stopping the outer loop one iteration sooner avoids that guaranteed-useless extra work.
All of this is directly related to the fact that no work needs to be expended to sort a one-element array.
Here is the logic of these nested loops:
for each position i in the array
find the smallest element of the slice starting at this position extending to the end of the array
swap the smallest element and the element at position i
The smallest element of the 1 element slice at the end of the array is obviously already in place, so there is no need to run the last iteration of the outer loop. That's the reason for the outer loop to have a test i < n - 1.
Note however that there is a nasty pitfall in this test: if instead of int we use size_t for the type of the index and count of elements in the array, which is more correct as arrays can have more elements nowadays than the range of type int, i < n - 1 would be true for i = 0 and n = 0 because n - 1 is not negative but the largest size_t value which is huge. In other words, the code would crash on an empty array.
It would be safer to write:
for (i = 0; i + 1 < n; i++) { ...

For an array A [0...N-1] how many times must I decrement elements before no two non-zero elements will have difference > M?

An array of size N A [0...N-1] contains some positive integers. What is the minimum number of times that I need to decrement some element so that no two elements (A[i] and A[j] , i != j, A[i]>0, A[j]>0) have difference > M ?
My approach so far :
for(int i = N-1;i>=0;i--)
{
for(int j = 0;j<=i-1;j++)
{
while(A[i]-A[j]>M)
{
A[i]--;
ans++;
}
}
}
But,this is not the correct solution.
For example,
A = {3 2 1} and M = 0
The optimal solution is decrementing A[2] by once and A[0] by once.
That makes the array A = {2 2 0}
As, A[2] = 0 we can ignore it as we only worry about non zero element.
But,this code produces ans = 3.
What is a solution to do it ?
This can be done in O(N log N) time, or O(N) time if the array is presorted.
In pseudo-code:
Given: A : array of ints, M = max difference
sort(A); //O(N log N) time
int start = end = 0; //this is a subsequence that we will move though the array
int sum_before = 0; //sum of all elements before our subsequence
int sum_after = sum_all(A); // sum of all elements after our subsequence -- O(N) time
int best_answer = sum_after; //we could always decrement everything to zero
for (start=0; start < A.length; ++start )
{
int maxval = A[start]+M; //A is sorted, so this never gets smaller
//extend end to find the longest subsequence starting
//at A[start] that we don't have to change
while( end < A.length && A[end]<=maxval)
{
//we can increment end at most A.length times, so
//this loop is O(N) in total for all iterations
sum_after-=A[end];
++end;
}
//if we leave everything between start and end unchanged
//the we'll need to decrement everything before to zero and
//everything after to maxval
int current_answer = sum_before + sum_after - (A.length - end)*maxval;
best_answer = min(best_answer, current_answer);
//next subsequence excludes A[start] -- it goes into the "before" sum
sum_before+=A[start];
}
Note that in the end, after all decreases are done, the list will contain these kinds of elements:
0-valued elements, which need not to be taken into account (they are removed)
a minimum value min (which may occur one or multiple times)
elements in between min and min + M.
If the final minimum value (min) would be known in advance, the following procedure could be applied:
decrease all elements larger than min + M until they are equal to min + M.
decrease all elements smaller than min until they become 0 (i.e. eliminate them).
Now, the problem is that we don't know in advance what will be the minimum value in the end.
However, a key observation is that the final minimum value (min) will for sure be one of the initial values in the input list (otherwise we would get a sub-optimal solution).
Thus, to solve this task, you can take each value in the input list as a candidate for the final minimum value (min), apply for each the procedure described above, and, finally, select the best of the solutions generated in this way.
Example. For the {1, 2, 3} input list and M = 0.
Each number in the input set will be a candidate for min.
min = 1. Then we need to decrement A[2] once, and A[3] twice (3 operations). The resulting set is {1, 1, 1}.
min = 2. We need to decrement A[1], in order to eliminate it, and A[3] once in order to make it inside the allowed range (2 operations). The resulting set is {2, 2}.
min = 3. We need to eliminate A[1] by decrementing it once and A[2], by decrementing it twice (3 operations). The resulting set is {3}.
The best alternative among the ones above is to make min = 2 (which required 2 decrement operations). The resulting set is {2, 2}.

Find the Smallest Integer Not in a List

An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))

Resources