I was going through some interview questions, I saw this one
You are given the height of n towers and value k. You have to either increase or decrease the height of every tower by k. You need to minimize the difference between the height of the longest and the shortest tower and output this difference.
I think the answer will be (maxheight-k) - (minheight + k).
I have tried on some test cases it is running fine.
But I am not sure, I think I am missing something, Am I ?
m7thon's answer explains the problem with your solution, so I'll just explain how you can actually solve this . . .
The big thing to observe is that for any given tower, if you choose to increase its height from hi to hi + k, then you might as well increase the height of all shorter towers: that won't affect the maximum (because if hj < hi, then hj + k < hi + k), and may help by increasing the minimum. Conversely, if you choose to decrease the height of a tower from hi to hi − k, then you might as well decrease the heights of all taller towers.
So while there are 2n possible ways to choose which towers should be increased vs. decreased, we can actually ignore most of these. Some tower will be the tallest tower that we increase the height of; for all shorter towers, we will increase their height as well, and for all taller towers, we will decrease their height. So there are only n interesting ways to choose which towers should be increased vs. decreased: one for each tower's chance to be the tallest tower that we increase the height of.
[Pedantic note #1: You may notice that it's also valid to decrease the heights of all towers, in which case there's no such tower. But that's equivalent to increasing the heights of all towers — whether we add k to every height or subtract k from every height, either way we're not actually changing the max-minus-min.]
[Pedantic note #2: I've only mentioned "shorter towers" and "taller towers", but it's also possible that multiple towers have the same initial height. But this case isn't really important, because we might as well increase them all or decrease them all — there's no point increasing some and decreasing others. So the approach described here still works fine.]
So, let's start by sorting the original heights and numbering them in increasing order, so that h1 is the original height of the originally-shortest tower and hn is the original height of the originally-tallest tower.
For each i, try the possibility that the ith-shortest tower is the tallest tower that we increase the height of; that is, try the possibility that we increase h1 through hi and decrease hi+1 through hn. There are two groups of cases:
If i < n, then the final height of the finally-shortest tower is min(h1 + k, hi+1 − k), and the final height of the finally-tallest tower is max(hi + k, hn − k). The final difference in this case is the latter minus the former.
If i = n, then we've increased the heights of all towers equally, so the final difference is just hn − h1.
We then take the least difference from all n of these possibilities.
Here's a Java method that implements this (assuming int-valued heights; note that hi is arr[i-1] and hi+1 is arr[i]):
private static int doIt(final int[] arr, final int k) {
java.util.Arrays.sort(arr);
final int n = arr.length;
int result = arr[n - 1] - arr[0];
for (int i = 1; i < n; ++i) {
final int min = Math.min(arr[0] + k, arr[i] - k);
final int max = Math.max(arr[n - 1] - k, arr[i - 1] + k);
result = Math.min(result, max - min);
}
return result;
}
Note that I've pulled the i = n case before the loop, for convenience.
Lets say you have three towers of heights 1, 4 and 7, and k = 3. According to your reasoning the optimal minimum difference is (7 - 3) - (1 + 3) = 0. But what do you do with the tower of height 4? You either need to increase or decrease this, so the minimum difference you can achieve is in fact 3 in this example.
Even if you are allowed to keep a tower at its height, then the example 1, 5, 7 will disprove your hypothesis.
I know this does not solve the actual minimization problem, but it does show that it is not as simple as you thought. I hope this answers your question "Am I missing something?".
I assume this came from gfg.
The answer of #ruakh is perhaps the best I've found online, it'll work for most cases, but for the practice problem on gfg, there are a few cases which can cause the minimum to go below 0, and the question doesn't allow any height to be < 0.
So for that, you'll need an additional check, and the rest of it is pretty much entirely inspired from ruakh's answer
class Solution {
int getMinDiff(int[] arr, int n, int k) {
Arrays.sort(arr);
int ans = arr[n-1] - arr[0];
int smallest = arr[0] + k, largest = arr[n-1]-k;
for(int i = 0; i < n-1; i++){
int min = Math.min(smallest, arr[i+1]-k);
int max = Math.max(largest, arr[i]+k);
if(min < 0) continue;
ans = Math.min(ans, max-min);
}
return ans;
}
}
I also went in for 0-based indexing for the heights to make it more obvious, but maybe that's subjective.
Edit: One case where the < 0 check is important, is when the array is
8 1 5 4 7 5 7 9 4 6 and k is 5. The expected answer for this is 8, without the < 0 check, you'd get 7.
A bit late here. Already these guys have explained you the problem and given you the solution. However, I have prepared this code myself. The code I prepared is not the best code that you should follow but gives a clear understanding of what can be done to achieve this using brute-force.
set = list(map(int, input().split()))
k = int(input())
min = 999999999
for index in range(2**len(set)):
binary = [] //probably should have used integer to binary fuction here
while index != 0:
remainder = index % 2
index //= 2
binary.append(remainder)
while len(binary) != len(set):
binary.append(0)
binary.reverse()
separateset = []
flag = 0
for i in range(len(binary)):
if binary[i] == 0:
separateset.append(set[i]+k)
elif binary[i] == 1 and set[i]-k >= 0:
separateset.append(set[i]-k)
else:
flag = 1
break
if flag == 0:
separateset.sort()
if min > separateset[-1] - separateset[0]:
min = separateset[-1] - separateset[0]
print(min)
This is achieved by identifying all the possible subsets of the set variable but with just some modifications. If the digit is 0, the value at that i (index not the index in the for loop) in the set is added with k otherwise if the digit is 1 and set[i]-k >= 0, the value at that index in the set is subtracted by k (Now you can add or subtract k vice-versa, it doesn't matter until you get all possible combinations of +k and -k). set[i]-k >= 0 is to be followed because a negative height wouldn't make sense and if that happens, flag becomes 1 and breaks. But if the flag is 0, it means that all the heights are positive and then the separatesort is sorted and then min stores the difference between the largest and shortest tower. This min ultimately has the minimum of all the differences.
Step 1 :
Decrease all the heights by 'k' and sort them in non-decreasing order.
Step 2 :
We need to increase some subset of heights by '2 * k' (as they were decreased by
'k' in step1, so, to effectively increase their heights by 'k' we need
to add '2*k') .
Step 3 :
Clearly if we increase the 'i'th height without increasing the
'i-1'th then, it will not be useful as the minimum is still the same
and maximum may also increase !
Step 4 :
Consider all prefixes with '2*k' added to each element of the prefix .
Then calculate and update the (max - min) value.
Let me know which scenario am I missing here,
class Solution:
def getMinDiff(self, arr, n, k):
for i in range(n):
if arr[i] < k: arr[i] += k
else: arr[i] -= k
result = max(arr) - min(arr)
print('NewArr: {}\nresult: {}'.format(arr, result))
return result
C++ code :
int getMinDiff(int arr[], int n, int k) {
// code here
sort(arr , arr+n);
int result = arr[n - 1] - arr[0];
for (int i = 1; i < n; ++i) {
int min_ = min(arr[0] + k, arr[i] - k);
int max_ = max(arr[n - 1] - k, arr[i - 1] + k);
result = min(result, max_ - min_);
}
return result;
}
First you are gonna need to find average height of the towers.
lets say heights are 3, 7, 17, 25, 45 and k = 5
the average will be = ( 3 + 7 + 17 + 25 + 45 ) / 5 = 97 / 5 = 19.4
Now we will try to make every building closer to average height.
for 3 height tower we have to add 5 three times making height = 3 + (3*5) = 18 (18 is closer than 23) close to average.
for 7 height we will add 5 two times = 7 + (2 *5) = 17 (17 is closer than 22)
Similarly 25 will become 25 - 5 = 20
and 45 will become 45 - (5 *5) = 20
your height will becomes 18, 17, 17, 20, 20
This approach works on GfG practice, Problem link: https://practice.geeksforgeeks.org/problems/minimize-the-heights/0/
Approach :
Find max, min element from the array. O(n). Take the average, avg = (max_element + min_element)/2.
Iterate over the array again, and for each element, check if it is less than avg or greater.
If the current element arr[i] is less than avg, then add "k" to a[i], i.e a[i] = a[i] + k.
If the current element arr[i] is greater than or equal to avg, then subtract k from a[i], i.e a[i] = a[i] - k;
Find out the minimum and maximum element again from the modified array.
Return the min(max1-min1, max2-min2), where (max1, min1) = max and min elements initially before the array was modified, and (max2, min2) are the max and min elements after doing modification.
Entire Code can be found here: https://ide.geeksforgeeks.org/56qHOM0EOA
Related
i am a bit confused on a sliding window problem, the problem statement is relatively simple, given a parameter k where k is the size of the window, find the largest sum possible.
For example
array[] = {4,2,1,7,8,1,2,7,1,0} k = 3, the largest contingous sub array is {7,8,1} which is equal to 16
I have the code, here but I do not understand one line.
*public static int findMaxSumSubarray(int[] arr, int k)
{
int currentSum = 0;
int max = Integer.MIN_VALUE;
for(int i = 0; i<arr.length;++i)
{
currentSum += arr[i];
/*
lets take an array example let arr = {4,2,1,7,8,1,2,7,1,0}
when i is at k-1 then we have effectively covered a valid subset, lets say
k = 3, when i moves to the third element which is 1, a valid subset has been
covered
*/
if(i >= k - 1)
{
max = Math.max(max,currentSum);
currentSum -= arr[i - (k-1)];
}
}
return max;
}**
The line i don't quite understand is currentSum -= arr[i-(k-1)]
Can someone please provide a small desk check/example of what is happening here it will greatly be appreciated
My understanding
Lets take our previous array = {4,2,1,7,8,1,2,7,1,0}
We will iterate until element at index 2, we get 4+2+1, thus we have covered a valid window size, once this is done, the compiler moves to the if statement,
since i is obviously less than or equal to k-1 we will execute this block.
currentSum -= arr[i - (k-1)]; What is happening here?
Apologies in advance for the formatting, and thank you to all those answer.
if(i >= k - 1)
This check is to make sure the sum variable always has a sum of k elements.
If window size is k, then all good. Then size becomes k+1 when moving to next i. Now, sum has value of k+1 elements, so you remove the first element from last window from sum to make sum have again sum of k elements. See image below for more clarity for k = 3.
As you can see in the above picture, for every next window of size k, we have to remove previous window's first element which is done in currentSum -= arr[i - (k-1)];
In an unsorted positive integer array, how to find out the farthest smaller element on the right side of each element in most efficient way?
Ex:
Input: 6 3 1 8 2 9 7
Output: 2 2 -1 7 -1 7 -1
Explanation:
For 6, the smaller elements on it's right side are [3, 1, 2]. Since the last smallest element is 2, it's the most farthest from 6.
Like wise for others.
If no such number exists answer is "-1"
One idea is:
let us call the original array A
keep an array min[] of size n of which min[i] means the minimum value of the sub-array A[i..n-1]. Obviously, min[i] <= min[i+1].
now move from right to left on A. At every index i, do a binary search on min[i+1..n-1] to find the farthest smaller value.
Java code:
int[] A = { 6, 2, 3, 10, 1, 8 };
int n = A.length;
// calculate min
int[] min = new int[n];
min[n - 1] = A[n - 1];
for (int i = n - 2; i >= 0; i--)
min[i] = Math.min(A[i], min[i + 1]);
// calculate results
int[] results = new int[n];
results[n - 1] = -1;
for (int i = n - 2; i >= 0; i--) {
int left = i; // after the binary search, A[left] would be the answer
int right = n - 1;
while (left < right) {
int mid = left + (right - left + 1) / 2;
if (min[mid] < A[i])
left = mid;
else
right = mid - 1;
if (min[left] < A[i])
results[i] = min[left];
else
results[i] = -1;
}
}
Space complexity O(n)
Time complexity O(nlogn) for all cases.
Compared to the solution from #vivek_23, the above algorithm would be better in the following worst case:
Imagine the case of A of n elements as follows
A = [ n/2 n/2 .. n/2 1 2 .. n/2]
If we use the stack solution suggested by #vivek_23,
at the step to find the farthest smaller element of any of the first n/2 elements (which are all valued n/2), st1 should be now [1 2 .. n/2]
for each element valued n/2, all st1 elements except n/2 will be transferred to st2 in order to find the farthest smaller element n/2 - 1. After that all elements in st2 will be transferred back to st1. This results in the worst case performance of O(n). As there are n/2 elements, the total worst time performance is O(n^2).
The basic idea behind the getting the answer quickly is to use a stack when moving from right to left in the array.
We insert an element in the stack only in one of the below 2 conditions,
If the stack is empty.
If the current top element in the stack is greater than the current element in iteration.
This would ensure us the correct results since a number greater than the current element in iteration will always be anyway greater than the current element at the top of the stack and current top at stack also wins over the farthest criteria.
So, insert in stack only if less than the current top.
However, it's quite possible that current element in iteration has many elements in stack smaller than itself. So, we need to go deep in stack till we find an element in stack greater than the current one.
Implementation(in java):
int[] arr = {6,3,1,8,2,9,7};
Stack<Integer> st1 = new Stack<Integer>();
Stack<Integer> st2 = new Stack<Integer>();
List<Integer> result = new ArrayList<>();
for(int i=arr.length-1;i>=0;--i){
while(!st1.isEmpty() && arr[(int)st1.peek()] < arr[i]){
st2.push(st1.pop());
}
if(st2.isEmpty()) result.add(-1);
else result.add(arr[(int)st2.peek()]);
while(!st2.isEmpty()) st1.push(st2.pop());
if(st1.isEmpty() || arr[(int)st1.peek()] > arr[i]){
st1.push(i);
}
}
The above implementation follows the explanation provided.
We use 2 stacks to not lose the popped data as it would be useful for future elements.
So, we revert all back to the main stack once the answer is found for the current element.
Demo: https://ideone.com/0oAasu
Note: You can directly store elements in the stack instead of indexes to make it more simpler.
Update: This solution's complexity is indeed O(n^2) for case where array could have elements in [ n/2, n/2 ,.. n/2, 1, 2, .. n/2] fashion for an array of size 10^5 or more. See Quynh Tran's answer for a better solution.
int arr[]={6,3,1,8,2,9,7};
for(int i=0;i<arr.length;i++){
int min=-1;
for(int j=i+1;j<arr.length;j++){
if(arr[j]<arr[i]){
min=arr[j];
}
}
arr[i]=min;
}
Interview question from a friend
Given an unsorted integer array, how many number are not able to find using binary search?
For example, [2, 3, 4, 1, 5], only the number 1 can't be find using binary search, hence count = 1
[4,2,1,3,5] 4 and 4 and 2 are not searchable => binarySearch(arr, n) return a number that is not equal to num
Expected run time is O(n)
Can't think of an algorithm that can achieve O(n) time :(
Thought about building min and max arr, however, woudln't work as the subarray can mess it out again.
Already knew the O(nlogn) approach, it was obvious, just call binary search for each number and check.
I believe this code works fine. It does one single walk of each value in the list, so it is O(n).
function CountUnsearchable(list, minValue = -Infinity, maxValue=Infinity) {
if (list is empty) return 0;
let midPoint = mid point of "list"
let lowerCount = CountUnsearchable(left half of list, minValue, min(midPoint, maxValue));
let upperCount = CountUnsearchable(right half of list, max(minValue, midPoint), maxValue);
let midPointUnsearchable = 1 if midPoint less than minValue or greater than maxValue, otherwise 0;
return lowerCount + upperCount + midPointUnsearchable;
}
It works, because we walk the tree a bit like we would in a binary search, except at each node we take both paths, and simply track the maximum value that could have led us to take this path, and the minimum value that could have led us to take this path. That makes it simple to look at the current value and answer the question of whether it can be found via a binary search.
Try to create the following function:
def count_unsearchable(some_list, min_index=None, max_index=None, min_value=None, max_value=None):
"""How many elements of some_list are not searchable in the
range from min_index to max_index, assuming that by the time
we arrive our values are known to be in the range from
min_value to max_value. In all cases None means unbounded."""
pass #implementation TBD
It is possible to implement this function in a way that runs in time O(n). The reason why it is faster than the naive approach is that you are only making the recursive calls once per range, instead of once per element in that range.
Idea: Problem can be reworded as - find the count of numbers in the array which are greater than all numbers to their left and smaller than all numbers to their right. Further simplified, find the count of numbers which are greater than the max number to their left and smaller than the minimum number to their right.
Code: Java 11 | Time/Space: O(n)/O(n)
int binarySearchable(int[] nums) {
var n = nums.length;
var maxToLeft = new int[n + 1];
maxToLeft[0] = Integer.MIN_VALUE;
var minToRight = new int[n + 1];
minToRight[n] = Integer.MAX_VALUE;
for (var i = 1; i < n + 1; i++) {
maxToLeft[i] = Math.max(maxToLeft[i - 1], nums[i - 1]);
minToRight[n - i] = Math.min(minToRight[n + 1 - i], nums[n - i]);
}
for (var i = 0; i < n; i++)
if (nums[i] >= maxToLeft[i + 1] && nums[i] <= minToRight[i + 1])
count++;
return count;
}
TopCoder problem: https://community.topcoder.com/stat?c=problem_statement&pm=5869&rd=8078
Video explanation: https://www.youtube.com/watch?v=blICHR_ocDw
LeetCode discuss: https://leetcode.com/discuss/interview-question/352743/Google-or-Onsite-or-Guaranteed-Binary-Search-Numbers
Consider the following question:
Given a 2D array of unsigned integers and a maximum length n, find a path in that matrix that is not longer than n and which maximises the sum. The output should consist of both the path and the sum.
A path consists of neighbouring integers that are either all in the same row, or in the same column, or down a diagonal in the down-right direction.
For example, consider the following matrix and a given path length limit of 3:
1 2 3 4 5
2 1 2 2 1
3 4 5* 6 5
3 3 5 10* 5
1 2 5 7 15*
The most optimal path would be 5 + 10 + 15 (nodes are marked with *).
Now, upon seeing this problem, immediately a Dynamic Programming solution seems to be most appropriate here, given this problem's similarity to other problems like Min Cost Path or Maximum Sum Rectangular Submatrix. The issue is that in order to correctly solve this problem, you need to start building up the paths from every integer (node) in the matrix and not just start the path from the top left and end on the bottom right.
I was initially thinking of an approach similar to that of the solution for Maximum Sum Rectangular Submatrix in which I could store each possible path from every node (with path length less than n, only going right/down), but the only way I can envision that approach is by making recursive calls for down and right from each node which would seem to defeat the purpose of DP. Also, I need to be able to store the max path.
Another possible solution I was thinking about was somehow adapting a longest path search and running it from each int in the graph where each int is like an edge weight.
What would be the most efficient way to find the max path?
The challenge here is to avoid to sum the same nodes more than once. For that you could apply the following algorithm:
Algorithm
For each of the 3 directions (down, down+right, right) perform steps 2 and 3:
Determine the number of lines that exist in this direction. For the downward direction, this is the number of columns. For the rightward direction, this is the number of rows. For the diagonal direction, this is the number of diagonal lines, i.e. the sum of the number of rows and columns minus 1, as depicted by the red lines below:
For each line do:
Determine the first node on that line (call it the "head"), and also set the "tail" to that same node. These two references refer to the end points of the "current" path. Also set both the sum and path-length to zero.
For each head node on the current line perform the following bullet points:
Add the head node's value to the sum and increase the path length
If the path length is larger than the allowed maximum, subtract the tail's value from the sum, and set the tail to the node that follows it on the current line
Whenever the sum is greater than the greatest sum found so far, remember it together with the path's location.
Set the head to the node that follows it on the current line
At the end return the greatest sum and the path that generated this sum.
Code
Here is an implementation in basic JavaScript:
function maxPathSum(matrix, maxLen) {
var row, rows, col, cols, line, lines, dir, dirs, len,
headRow, headCol, tailRow, tailCol, sum, maxSum;
rows = matrix.length;
cols = matrix[0].length;
maxSum = -1;
dirs = 3; // Number of directions that paths can follow
if (maxLen == 1 || cols == 1)
dirs = 1; // Only need to check downward directions
for (dir = 1; dir <= 3; dir++) {
// Number of lines in this direction to try paths on
lines = [cols, rows, rows + cols - 1][dir-1];
for (line = 0; line < lines; line++) {
sum = 0;
len = 0;
// Set starting point depending on the direction
headRow = [0, line, line >= rows ? 0 : line][dir-1];
headCol = [line, 0, line >= rows ? line - rows : 0][dir-1];
tailRow = headRow;
tailCol = headCol;
// Traverse this line
while (headRow < rows && headCol < cols) {
// Lengthen the path at the head
sum += matrix[headRow][headCol];
len++;
if (len > maxLen) {
// Shorten the path at the tail
sum -= matrix[tailRow][tailCol];
tailRow += dir % 2;
tailCol += dir >> 1;
}
if (sum > maxSum) {
// Found a better path
maxSum = sum;
path = '(' + tailRow + ',' + tailCol + ') - '
+ '(' + headRow + ',' + headCol + ')';
}
headRow += dir % 2;
headCol += dir >> 1;
}
}
}
// Return the maximum sum and the string representation of
// the path that has this sum
return { maxSum, path };
}
// Sample input
var matrix = [
[1, 2, 3, 4, 5],
[2, 1, 2, 2, 1],
[3, 4, 5, 5, 5],
[3, 3, 5, 10, 5],
[1, 2, 5, 5, 15],
];
var best = maxPathSum(matrix, 3);
console.log(best);
Some details about the code
Be aware that row/column indexes start at 0.
The way the head and tail coordinates are incremented is based on the binary representation of the dir variable: it takes these three values (binary notation): 01, 10, 11
You can then take the first bit to indicate whether the next step in the direction is on the next column (1) or not (0), and the second bit to indicate whether it is on the next row (1) or not (0). You can depict it like this, where 00 represents the "current" node:
00 10
01 11
So we have this meaning to the values of dir:
01: walk along the column
10: walk along the row
11: walk diagonally
The code uses >>1 for extracting the first bit, and % 2 for extracting the last bit. That operation will result in a 0 or 1 in both cases, and is the value that needs to be added to either the column or the row.
The following expression creates a 1D array and takes one of its values on-the-fly:
headRow = [0, line, line >= rows ? 0 : line][dir-1];
It is short for:
switch (dir) {
case 1:
headRow = 0;
break;
case 2:
headRow = line;
break;
case 3:
if (line >= rows)
headRow = 0
else
headRow = line;
break;
}
Time and space complexity
The head will visit each node exactly once per direction. The tail will visit fewer nodes. The number of directions is constant, and the max path length value does not influence the number of head visits, so the time complexity is:
Θ(rows * columns)
There are no additional arrays used in this algorithm, just a few primitive variables. So the additional space complexity is:
Θ(1)
which both are the best you could hope for.
Is it Dynamic Programming?
In a DP solution you would typically use some kind of tabulation or memoization, possibly in the form of a matrix, where each sub-result found for a particular node is input for determining the result for neighbouring nodes.
Such solutions could need Θ(rows*columns) extra space. But this problem can be solved without such (extensive) space usage. When looking at one line at a time (a row, a column or a diagonal), the algorithm has some similarities with Kadane's algorithm:
One difference is that here the choice to extend or shorten the path/subarray is not dependent on the matrix data itself, but on the given maximum length. This is also related to the fact that here all values are guaranteed to be non-negative, while Kadane's algorithm is suitable for signed numbers.
Just like with Kadane's algorithm the best solution so far is maintained in a separate variable.
Another difference is that here we need to look in three directions. But that just means repeating the same algorithm in those three directions, while carrying over the best solution found so far.
This is a very basic use of Dynamic Programming, since you don't need the tabulation or memoization techniques here. We only keep the best results in the variables sum and maxSum. That cannot be viewed as tabluation or memoization, which typically keep track of several competing results that must be compared at some time. See this interesting answer on the subject.
Use F[i][j][k] as the max path sum where the path has length k and ends at position (i, j).
F[i][j][k] can be computed from F[i-1][j][k-1] and F[i][j-1][k-1].
The answer would be the maximum value of F.
To retrieve the max path, use another table G[i][j][k] to store the last step of F[i][j][k], i.e. it comes from (i-1,j) or (i,j-1).
The constraints are that the path can only be created by going down or to the right in the matrix.
Solution complexity O(N * M * L) where:
N: number of rows
M: number of columns
L: max length of the path
int solve(int x, int y, int l) {
if(x > N || y > M) { return -INF; }
if(l == 1) {matrix[x][y];}
if(dp[x][y][l] != -INF) {return dp[x][y][l];} // if cached before, return the answer
int option1 = solve(x+1, y, l-1); // take a step down
int option2 = solve(x, y+1, l-1); // take a step right
maxPath [x][n][l] = (option1 > option2 ) ? DOWN : RIGHT; // to trace the path
return dp[x][y][l] = max(option1, option2) + matrix[x][y];
}
example: solve(3,3,3): max path sum starting from (3,3) with length 3 ( 2 steps)
An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))