Rare elements in arrays - arrays

We are given a sorted array A[1..n] of n integers. We say an element x E A is rare if it occurs
in A strictly less than n/10 times. That is, x is rare if there is some index 1 <= i <= n such that
A(i) = x, and there are strictly less than n/10 distinct indices j for which A(j) = x. Our goal
is to find a rare element, or output that A contains no rare elements.
Input: A sorted array A[1..n] of n integers.
Output: A rare element x E A, or the output “No rare element exists in A.”
Is there an O(log n) time algorithm for the rare element problem? What is it?
T(n) = 10 T(n/10) + O(1) gives O(n) time which is not good enough for me.

Yes, it's possible to do it in O(log n). I assume, that you already have this array in memory. Otherwise it's impossible to do it faster than in O(n), because you need to read array at least.
Let's say that step is the biggest integer that is less than n/10. If step is equal to zero then we don't have rare elements obviously.
Consider the following algorithm:
int start = 1;
while (true) {
if (n - start + 1 <= step) {
OutputRare(A[start]); Exit;
}
int next_index = start + step;
if (A[start] != A[next_index]) {
OutputRare(A[start); Exit;
}
// Here we need to find the smallest index starting from start with
// element that is not equal to A[start]. If such element does not
// exist function returns -1.
next_index = FindFirstNonEqual(A[start], start);
if (next_index == -1) {
// There is no rare elements
Exit;
}
start = next_index;
}
This algorithm either returns rare element, or increases start by at least step. Which means that it will increase start ~10 times (because every step is about n/10). FindFirstNonEqual can be implemented using binary search, which means that total complexity is O(10log n) = O(log n).

Related

Data Structure Question on Arrays - How to find the best of array given conditions

I am new and learning Data structure and algorithm, I need help to solve this question
The best of an array having N elements is defined as sum of best of all elements of Array. The best of element A[i] is defined in the following manner
a: The best of element A[i] is 1 if, A[i-1]<A[i]<A[i+1]
b: The best of element A[i] is 2 if, A[i]> A[j] for j ranging from 0 to n-1
and A[i]<A[h] for h ranging from i+1 to N-1
Write program to find best of array
Note- A[0] and A[N-1] are excluded to find best of array, all elements are unique
Input - 2,1,3,9,20,7,8
Output - 3
The best of element 3 is 2 and 9 is 1. For rest element it is 0. Hence 2+1 =3
This is what I tried so far -
public static void main (String [] args) {
int [] A = {2,1,3,9,20,7,8};
int result = 0;
for(int i=1; i<A.length-2; i++) {
if(A[i-1] < A[i] && A[i]< A[i+1] ) {
result += 1;
}else if(A[i]>A[j] && A[i]<A[h]){
result +=2;
}else {
result+=0;
}
}
}
Note how the phrase:
A[i]> A[j] for j ranging from 0 to n-1
simply means: If the current element is not the Minimum of the array. Hence, if you find the minimum at the beginning, this condition can be changed into a much simpler and lightweight condition:
Let m be the minimum of the array, then if A[i] > m
So you don't need to do a linear search every iteration --> Less time complexity.
Now you have the problem with a complexity of O(N^2), ..which can be reduced further.
Regarding
and A[i]<A[h] for h ranging from i+1 to N-1
Get the maximum element from 2 to N-1. Then at every iteration, check if the current element is less than the maximum. If so, consider it while composing the score, otherwise, that means the current element is the maximum, in this case, re-calculate the maximum element from i+1 to N-1.
The worst case scenario is to find the maximum is always at index i where the array is already sorted in descending order.
Whereas the best case scenario is if the maximum is always the last element, hence the overall complexity is reduced to O(N).
Regarding
A[i-1]<A[i]<A[i+1]
This is straightforward, you simply compare the elements reside at those three indices at every iteration.
Implementation
Before anything, the following are important notes:
The result you've got in your example isn't correct as elements 3 and 9 both fulfill both conditions, so each should score either 1 or 2, but cannot be one with score of 1 and another with score of 2. Hence the overall score should be either 1+1 = 2 or 2 + 2 = 4.
I implemented this algorithm in Java (although I prefer Python), as I could guess it from your code snippet.
import java.util.Arrays;
public class ArrayBest {
private static int[] findMinMax(Integer [] B) {
// find minimum and the maximum: Time Complexity O(n log(n))
Integer[] b = Arrays.copyOf(B, B.length);
Arrays.sort(b);
return new int []{b[0], b[B.length-1]};
}
public static int find(Integer [] A) {
// Exclude the first and last elements
int N = A.length;
Integer [] B = Arrays.copyOfRange(A, 1, N-1);
N -= 2;
// find minimum and the maximum: Time Complexity O(n log(n))
// min at index 0, and max at index 1
int [] minmax = findMinMax(B);
int result = 0;
// start the search
for (int i=0; i<N-1; i++) {
// start with first condition : the easier
if (i!=0 && B[i-1]<B[i] && B[i]<B[i+1]) {
result += 1;
}else if (B[i] != minmax[0]) { // Equivalent to A[i]> A[j] : j in [0, N-1]
if (B[i] < minmax[1]) { // if it is less than the maximum
result += 2;
}else { // it is the maximum --> re-calculate the max over the range (i+1, N)
int [] minmax_ = findMinMax(Arrays.copyOfRange(B, i+1, N));
minmax[1] = minmax_[1];
}
}
}
return result;
}
public static void main(String[] args) {
Integer [] A = {2,1,3,9,7,20,8};
int res = ArrayBest.find(A);
System.out.println(res);
}
}
Ignoring the first sort, the best case scenario is when the last element is the maximum (i.e, at index N-1), hence time complexity is O(N).
The worst case scenario, is when the array is already sorted in a descending order, so the current element that is being processed is always the maximum, hence at each iteration the maximum should be found again. Consequently, the time complexity is O(N^2).
The average case scenario depends on the probability of how the elements are distributed in the array. In other words, the probability that the element being processed at the current iteration is the maximum.
Although it requires more study, my initial guess is as follows:
The probability of any i.i.d element to be the maximum is simply 1/N, and that is at the very beginning, but as we are searching over (i+1, N-1), N will be decreasing, hence the probability will go like: 1/N, 1/(N-1), 1/(N-2), ..., 1. Counting the outer loop, we can write the average complexity as O(N (1/N + 1/(N-1), 1/(N-2), + ... +1)) = O(N (1 + 1/2 + 1/3 + ... + 1/N)) where its asymptotic upper bound (according to Harmonic series) is approximately O(N log(N)).

Algorithm for deciding if a,b,c exist in an array so that a+b+c = z? [duplicate]

This question already has answers here:
Finding three elements in an array whose sum is closest to a given number
(15 answers)
O(NlogN) finding 3 numbers that have a sum of any arbitrary T in an array
(7 answers)
Closed 5 years ago.
a little stuck finding an efficient algorithm for the following problem. The algo has to decide if there are 3 elements a,b and c in an array so that a+b+c is equal to a given number z.
The naive way would be to try out the combinations, of course, but asymptotically the time needed would be too large.
For finding a and b in an array so that the sum is z is much easier. Sort the given array in ascending order and check for every element if z-a exists. But I'm not sure how I'd implement it in the 3 element problem and what time would be needed.
Any help is much appreciated!
Edit: a,b,c and z are integers.
The approach is very similar to finding a and b with sum z.
First sort the array. And then fix a at position i and check if you have sumz-a in the limits i + 1 to n
Since you have a O(n) algorithm to check if a sum z exist with a and b. We only extend it to fix a and check if two other variable can be used to produce the sum. Giving a overall run time of O(n^2)
From here
// returns true if there is triplet with sum equal
// to 'sum' present in A[]. Also, prints the triplet
bool find3Numbers(int A[], int arr_size, int sum)
{
int l, r;
/* Sort the elements */
sort(A, A+arr_size);
/* Now fix the first element one by one and find the
other two elements */
for (int i=0; i<arr_size-2; i++)
{
// To find the other two elements, start two index
// variables from two corners of the array and move
// them toward each other
l = i + 1; // index of the first element in the
// remaining elements
r = arr_size-1; // index of the last element
while (l < r)
{
if( A[i] + A[l] + A[r] == sum)
{
printf("Triplet is %d, %d, %d", A[i],
A[l], A[r]);
return true;
}
else if (A[i] + A[l] + A[r] < sum)
l++;
else // A[i] + A[l] + A[r] > sum
r--;
}
}
// If we reach here, then no triplet was found
return false;
}
I suppose I should've written a short comment as an answer, but I don't have enough reputation for it ... So here goes nothing!
The best algorithm I can come up with right now is O(n^2), to explain this algorithm better we shall start with the a+b = z in O(n) case (or O(nlgn) if it wasn't sorted)
First of all, iterate {a}, and find {b} such that a+b = z. Naively if you iterate all b this would cost O(n) per {a}, leading to a O(n^2) solution. However, if you iterate {a} increasingly, the value of {b} must be strictly decreasing. We can make use of this information to reduce the time complexity as in this code:
for a = first element, b = last element; a != last; a = next a
while ( ( b != first element ) and (a + b > z) )
b = previous elemnet of b
if a + b == z
return true
Note that {b} only goes through the whole list once throughout loop, so it has a complexity of amortized O(n).
Now we can apply this principle back to the original problem, we could iterate through {a}, and apply this O(n) algorithm to {b, c} to find {z-a}, the total complexity is O(n*n = n^2).
Hopefully there is a solution with a lower complexity, I don't think O(n^2) is impressive but I just can't come up with a better one.

Finding kth smallest number from n sorted arrays

So, you have n sorted arrays (not necessarily of equal length), and you are to return the kth smallest element in the combined array (i.e the combined array formed by merging all the n sorted arrays)
I have been trying it and its other variants for quite a while now, and till now I only feel comfortable in the case where there are two arrays of equal length, both sorted and one has to return the median of these two.
This has logarithmic time complexity.
After this I tried to generalize it to finding kth smallest among two sorted arrays. Here is the question on SO.
Even here the solution given is not obvious to me. But even if I somehow manage to convince myself of this solution, I am still curious as to how to solve the absolute general case (which is my question)
Can somebody explain me a step by step solution (which again in my opinion should take logarithmic time i.e O( log(n1) + log(n2) ... + log(nN) where n1, n2...nN are the lengths of the n arrays) which starts from the more specific cases and moves on to the more general one?
I know similar questions for more specific cases are there all over the internet, but I haven't found a convincing and clear answer.
Here is a link to a question (and its answer) on SO which deals with 5 sorted arrays and finding the median of the combined array. The answer just gets too complicated for me to able to generalize it.
Even clean approaches for the more specific cases (as I mentioned during the post) are welcome.
PS: Do you think this can be further generalized to the case of unsorted arrays?
PPS: It's not a homework problem, I am just preparing for interviews.
This doesn't generalize the links, but does solve the problem:
Go through all the arrays and if any have length > k, truncate to length k (this is silly, but we'll mess with k later, so do it anyway)
Identify the largest remaining array A. If more than one, pick one.
Pick the middle element M of the largest array A.
Use a binary search on the remaining arrays to find the same element (or the largest element <= M).
Based on the indexes of the various elements, calculate the total number of elements <= M and > M. This should give you two numbers: L, the number <= M and G, the number > M
If k < L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the bottom halves).
If k > L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the top halves, and search for element (k-L).
When you get to the point where you only have one element per array (or 0), make a new array of size n with those data, sort, and pick the kth element.
Because you're always guaranteed to remove at least half of one array, in N iterations, you'll get rid of half the elements. That means there are N log k iterations. Each iteration is of order N log k (due to the binary searches), so the whole thing is N^2 (log k)^2 That's all, of course, worst case, based on the assumption that you only get rid of half of the largest array, not of the other arrays. In practice, I imagine the typical performance would be quite a bit better than the worst case.
It can not be done in less than O(n) time. Proof Sketch If it did, it would have to completely not look at at least one array. Obviously, one array can arbitrarily change the value of the kth element.
I have a relatively simple O(n*log(n)*log(m)) where m is the length of the longest array. I'm sure it is possible to be slightly faster, but not a lot faster.
Consider the simple case where you have n arrays each of length 1. Obviously, this is isomorphic to finding the kth element in an unsorted list of length n. It is possible to find this in O(n), see Median of Medians algorithm, originally by Blum, Floyd, Pratt, Rivest and Tarjan, and no (asymptotically) faster algorithms are possible.
Now the problem is how to expand this to longer sorted arrays. Here is the algorithm: Find the median of each array. Sort the list of tuples (median,length of array/2) and sort it by median. Walk through keeping a sum of the lengths, until you reach a sum greater than k. You now have a pair of medians, such that you know the kth element is between them. Now for each median, we know if the kth is greater or less than it, so we can throw away half of each array. Repeat. Once the arrays are all one element long (or less), we use the selection algorithm.
Implementing this will reveal additional complexities and edge conditions, but nothing that increases the asymptotic complexity. Each step
Finds the medians or the arrays, O(1) each, so O(n) total
Sorts the medians O(n log n)
Walks through the sorted list O(n)
Slices the arrays O(1) each so, O(n) total
that is O(n) + O(n log n) + O(n) + O(n) = O(n log n). And, we must perform this untill the longest array is length 1, which will take log m steps for a total of O(n*log(n)*log(m))
You ask if this can be generalized to the case of unsorted arrays. Sadly, the answer is no. Consider the case where we only have one array, then the best algorithm will have to compare at least once with each element for a total of O(m). If there were a faster solution for n unsorted arrays, then we could implement selection by splitting our single array into n parts. Since we just proved selection is O(m), we are stuck.
You could look at my recent answer on the related question here. The same idea can be generalized to multiple arrays instead of 2. In each iteration you could reject the second half of the array with the largest middle element if k is less than sum of mid indexes of all arrays. Alternately, you could reject the first half of the array with the smallest middle element if k is greater than sum of mid indexes of all arrays, adjust k. Keep doing this until you have all but one array reduced to 0 in length. The answer is kth element of the last array which wasn't stripped to 0 elements.
Run-time analysis:
You get rid of half of one array in each iteration. But to determine which array is going to be reduced, you spend time linear to the number of arrays. Assume each array is of the same length, the run time is going to be cclog(n), where c is the number of arrays and n is the length of each array.
There exist an generalization that solves the problem in O(N log k) time, see the question here.
Old question, but none of the answers were good enough. So I am posting the solution using sliding window technique and heap:
class Node {
int elementIndex;
int arrayIndex;
public Node(int elementIndex, int arrayIndex) {
super();
this.elementIndex = elementIndex;
this.arrayIndex = arrayIndex;
}
}
public class KthSmallestInMSortedArrays {
public int findKthSmallest(List<Integer[]> lists, int k) {
int ans = 0;
PriorityQueue<Node> pq = new PriorityQueue<>((a, b) -> {
return lists.get(a.arrayIndex)[a.elementIndex] -
lists.get(b.arrayIndex)[b.elementIndex];
});
for (int i = 0; i < lists.size(); i++) {
Integer[] arr = lists.get(i);
if (arr != null) {
Node n = new Node(0, i);
pq.add(n);
}
}
int count = 0;
while (!pq.isEmpty()) {
Node curr = pq.poll();
ans = lists.get(curr.arrayIndex)[curr.elementIndex];
if (++count == k) {
break;
}
curr.elementIndex++;
pq.offer(curr);
}
return ans;
}
}
The maximum number of elements that we need to access here is O(K) and there are M arrays. So the effective time complexity will be O(K*log(M)).
This would be the code. O(k*log(m))
public int findKSmallest(int[][] A, int k) {
PriorityQueue<int[]> queue = new PriorityQueue<>(Comparator.comparingInt(x -> A[x[0]][x[1]]));
for (int i = 0; i < A.length; i++)
queue.offer(new int[] { i, 0 });
int ans = 0;
while (!queue.isEmpty() && --k >= 0) {
int[] el = queue.poll();
ans = A[el[0]][el[1]];
if (el[1] < A[el[0]].length - 1) {
el[1]++;
queue.offer(el);
}
}
return ans;
}
If the k is not that huge, we can maintain a priority min queue. then loop for every head of the sorted array to get the smallest element and en-queue. when the size of the queue is k. we get the first k smallest .
maybe we can regard the n sorted array as buckets then try the bucket sort method.
This could be considered the second half of a merge sort. We could simply merge all the sorted lists into a single list...but only keep k elements in the combined lists from merge to merge. This has the advantage of only using O(k) space, but something slightly better than merge sort's O(n log n) complexity. That is, it should in practice operate slightly faster than a merge sort. Choosing the kth smallest from the final combined list is O(1). This is kind of complexity is not so bad.
It can be done by doing binary search in each array, while calculating the number of smaller elements.
I used the bisect_left and bisect_right to make it work for non-unique numbers as well,
from bisect import bisect_left
from bisect import bisect_right
def kthOfPiles(givenPiles, k, count):
'''
Perform binary search for kth element in multiple sorted list
parameters
==========
givenPiles are list of sorted list
count is the total number of
k is the target index in range [0..count-1]
'''
begins = [0 for pile in givenPiles]
ends = [len(pile) for pile in givenPiles]
#print('finding k=', k, 'count=', count)
for pileidx,pivotpile in enumerate(givenPiles):
while begins[pileidx] < ends[pileidx]:
mid = (begins[pileidx]+ends[pileidx])>>1
midval = pivotpile[mid]
smaller_count = 0
smaller_right_count = 0
for pile in givenPiles:
smaller_count += bisect_left(pile,midval)
smaller_right_count += bisect_right(pile,midval)
#print('check midval', midval,smaller_count,k,smaller_right_count)
if smaller_count <= k and k < smaller_right_count:
return midval
elif smaller_count > k:
ends[pileidx] = mid
else:
begins[pileidx] = mid+1
return -1
Please find the below C# code to Find the k-th Smallest Element in the Union of Two Sorted Arrays. Time Complexity : O(logk)
public int findKthElement(int k, int[] array1, int start1, int end1, int[] array2, int start2, int end2)
{
// if (k>m+n) exception
if (k == 0)
{
return Math.Min(array1[start1], array2[start2]);
}
if (start1 == end1)
{
return array2[k];
}
if (start2 == end2)
{
return array1[k];
}
int mid = k / 2;
int sub1 = Math.Min(mid, end1 - start1);
int sub2 = Math.Min(mid, end2 - start2);
if (array1[start1 + sub1] < array2[start2 + sub2])
{
return findKthElement(k - mid, array1, start1 + sub1, end1, array2, start2, end2);
}
else
{
return findKthElement(k - mid, array1, start1, end1, array2, start2 + sub2, end2);
}
}

Finding the maximum subsequence binary sets that have an equal number of 1s and 0s

I found the following problem on the internet, and would like to know how I would go about solving it:
You are given an array ' containing 0s and 1s. Find O(n) time and O(1) space algorithm to find the maximum sub sequence which has equal number of 1s and 0s.
Examples:
10101010 -
The longest sub sequence that satisfies the problem is the input itself
1101000 -
The longest sub sequence that satisfies the problem is 110100
Update.
I have to completely rephrase my answer. (If you had upvoted the earlier version, well, you were tricked!)
Lets sum up the easy case again, to get it out of the way:
Find the longest prefix of the bit-string containing
an equal number of 1s and 0s of the
array.
This is trivial: A simple counter is needed, counting how many more 1s we have than 0s, and iterating the bitstring while maintaining this. The position where this counter becomes zero for the last time is the end of the longest sought prefix. O(N) time, O(1) space. (I'm completely convinced by now that this is what the original problem asked for. )
Now lets switch to the more difficult version of the problem: we no longer require subsequences to be prefixes - they can start anywhere.
After some back and forth thought, I thought there might be no linear algorithm for this. For example, consider the prefix "111111111111111111...". Every single 1 of those may be the start of the longest subsequence, there is no candidate subsequence start position that dominates (i.e. always gives better solutions than) any other position, so we can't throw away any of them (O(N) space) and at any step, we must be able to select the best start (which has an equal number of 1s and 0s to the current position) out of linearly many candidates, in O(1) time. It turns out this is doable, and easily doable too, since we can select the candidate based on the running sum of 1s (+1) and 0s (-1), this has at most size N, and we can store the first position we reach each sum in 2N cells - see pmod's answer below (yellowfog's comments and geometric insight too).
Failing to spot this trick, I had replaced a fast but wrong with a slow but sure algorithm, (since correct algorithms are preferable to wrong ones!):
Build an array A with the accumulated number of 1s from the start to that position, e.g. if the bitstring is "001001001", then the array would be [0, 0, 1, 1, 1, 2, 2, 2, 3]. Using this, we can test in O(1) whether the subsequence (i,j), inclusive, is valid: isValid(i, j) = (j - i + 1 == 2 * (A[j] - A[i - 1]), i.e. it is valid if its length is double the amount of 1s in it. For example, the subsequence (3,6) is valid because 6 - 3 + 1 == 2 * A[6] - A[2] = 4.
Plain old double loop:
maxSubsLength = 0
for i = 1 to N - 1
for j = i + 1 to N
if isValid(i, j) ... #maintain maxSubsLength
end
end
This can be sped up a bit using some branch-and-bound by skipping i/j sequences which are shorter than the current maxSubsLength, but asymptotically this is still O(n^2). Slow, but with a big plus on its side: correct!
Strictly speaking, the answer is that no such algorithm exists because the language of strings consisting of an equal number of zeros and ones is not regular.
Of course everyone ignores that fact that storing an integer of magnitude n is O(log n) in space and treats it as O(1) in space. :-) Pretty much all big-O's, including time ones, are full of (or rather empty of) missing log n factors, or equivalently, they assume n is bounded by the size of a machine word, which means you're really looking at a finite problem and everything is O(1).
New solution:
Suppose we have for n-bit input bit-array 2*n-size array to keep position of bit. So, the size of array element must have enough size to keep maximum position number. For 256 input bit array, it's needed 256x2 array of bytes (byte is enough to keep 255 - the maximum position).
Moving from the first position of bit-array we put the position into array starting from the middle of array (index is n) using a rule:
1. Increment the position if we passed "1" bit and decrement when passed "0" bit
2. When meet already initialized array element - don't change it and remember the difference between positions (current minus taken from array element) - this is a size of local maximum sequence.
3. Every time we meet local maximum compare it with the global maximum and update if the latter is less.
For example: bit sequence is 0,0,0,1,0,1
initial array index is n
set arr[n] = 0 (position)
bit 0 -> index--
set arr[n-1] = 1
bit 0 -> index--
set arr[n-2] = 2
bit 0 -> index--
set arr[n-3] = 3
bit 1 -> index++
arr[n-2] already contains 2 -> thus, local max seq is [3,2] becomes abs. maximum
will not overwrite arr[n-2]
bit 0 -> index--
arr[n-3] already contains 3 -> thus, local max seq is [4,3] is not abs. maximum
bit 1 -> index++
arr[n-2] already contains 2 -> thus, local max seq is [5,2] is abs. max
Thus, we passing through the whole bit array only once.
Does this solves the task?
input:
n - number of bits
a[n] - input bit-array
track_pos[2*n] = {0,};
ind = n;
/* start from position 1 since zero has
meaning track_pos[x] is not initialized */
for (i = 1; i < n+1; i++) {
if (track_pos[ind]) {
seq_size = i - track_pos[ind];
if (glob_seq_size < seq_size) {
/* store as interm. result */
glob_seq_size = seq_size;
glob_pos_from = track_pos[ind];
glob_pos_to = i;
}
} else {
track_pos[ind] = i;
}
if (a[i-1])
ind++;
else
ind--;
}
output:
glob_seq_size - length of maximum sequence
glob_pos_from - start position of max sequence
glob_pos_to - end position of max sequence
In this thread ( http://discuss.techinterview.org/default.asp?interview.11.792102.31 ), poster A.F. has given an algorithm that runs in O(n) time and uses O(sqrt(n log n)) bits.
brute force: start with maximum length of the array to count the o's and l's. if o eqals l, you are finished. else reduce search length by 1 and do the algorithm for all subsequences of the reduced length (that is maximium length minus reduced length) and so on. stop when the subtraction is 0.
As was pointed out by user "R..", there is no solution, strictly speaking, unless you ignore the "log n" space complexity. In the following, I will consider that the array length fits in a machine register (e.g. a 64-bit word) and that a machine register has size O(1).
The important point to notice is that if there are more 1's than 0's, then the maximum subsequence that you are looking for necessarily includes all the 0's, and that many 1's. So here the algorithm:
Notations: the array has length n, indices are counted from 0 to n-1.
First pass: count the number of 1's (c1) and 0's (c0). If c1 = c0 then your maximal subsequence is the entire array (end of algorithm). Otherwise, let d be the digit which appears the less often (d = 0 if c0 < c1, otherwise d = 1).
Compute m = min(c0, c1) * 2. This is the size of the subsequence you are looking for.
Second pass: scan the array to find the index j of the first occurrence of d.
Compute k = max(j, n - m). The subsequence starts at index k and has length m.
Note that there could be several solutions (several subsequences of maximal length which match the criterion).
In plain words: assuming that there are more 1's than 0's, then I consider the smallest subsequence which contains all the 0's. By definition, that subsequence is surrounded by bunches of 1's. So I just grab enough 1's from the sides.
Edit: as was pointed out, this does not work... The "important point" is actually wrong.
Try something like this:
/* bit(n) is a macro that returns the nth bit, 0 or 1. len is number of bits */
int c[2] = {0,0};
int d, i, a, b, p;
for(i=0; i<len; i++) c[bit(i)]++;
d = c[1] < c[0];
if (c[d] == 0) return; /* all bits identical; fail */
for(i=0; bit(i)!=d; i++);
a = b = i;
for(p=0; i<len; i++) {
p += 2*bit(i)-1;
if (!p) b = i;
}
if (a == b) { /* account for case where we need bits before the first d */
b = len - 1;
a -= abs(p);
}
printf("maximal subsequence consists of bits %d through %d\n", a, b);
Completely untested but modulo stupid mistakes it should work. Based on my reply to Thomas's answer which failed in certain cases.
New Solution:
Space complexity of O(1) and time complexity O(n^2)
int iStart = 0, iEnd = 0;
int[] arrInput = { 1, 0, 1, 1, 1,0,0,1,0,1,0,0 };
for (int i = 0; i < arrInput.Length; i++)
{
int iCurrEndIndex = i;
int iSum = 0;
for (int j = i; j < arrInput.Length; j++)
{
iSum = (arrInput[j] == 1) ? iSum+1 : iSum-1;
if (iSum == 0)
{
iCurrEndIndex = j;
}
}
if ((iEnd - iStart) < (iCurrEndIndex - i))
{
iEnd = iCurrEndIndex;
iStart = i;
}
}
I am not sure whether the array you are referring is int array of 0's and 1's or bitarray??
If its about bitarray, here is my approach:
int isEvenBitCount(int n)
{
//n ... //Decimal equivalent of the input binary sequence
int cnt1 = 0, cnt0 = 0;
while(n){
if(n&0x01) { printf("1 "); cnt1++;}
else { printf("0 "); cnt0++; }
n = n>>1;
}
printf("\n");
return cnt0 == cnt1;
}
int main()
{
int i = 40, j = 25, k = 35;
isEvenBitCount(i)?printf("-->Yes\n"):printf("-->No\n");
isEvenBitCount(j)?printf("-->Yes\n"):printf("-->No\n");
isEvenBitCount(k)?printf("-->Yes\n"):printf("-->No\n");
}
with use of bitwise operations the time complexity is almost O(1) also.

Find the Smallest Integer Not in a List

An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))

Resources