How do I perform a duplicate check in code? - loops

This might be a no-brainer to some, but I'm trying to check if there are any duplicate values in my code.
To be clearer, I am creating 5 variable Integers that randomizes a number once they are created. Let's say they're named i1, i2, i3, i4, i5.
I want to run a loop to check on each other to make sure they don't have any possible duplicates. If they do, I'll re-random the second Integer that's being checked. (e.g if (i1 == i4) { i4.rand(); }) That's to make sure i1 doesn't need to get re-checked against all the previously checked values or being stuck in a long loop until a different number is found.
This is what I'm thinking if it was an entire if else statement : if (i1 == i2), if (i1 == i3), if (i1 == i4), if (i1 == i5), if (i2 == i3), if (i2 == i4), if (i2 == i5), if (i3 == i4), if (i3 == i5), if (i4 == i5)
I know I can probably do it "manually" by creating lots of if / else statements, but is there a better way to do it? It probably isn't very feasible if I increase my Integer limit to 20 and I have to if / else my way through 20 value checks. I know there is, but I just can't remember. Search on Google is turning up nothing (maybe I'm searching for the wrong keywords), which is why I'm asking over here at StackOverflow.
All I want to know is how do I do it, theory-wise (how would you check for duplicates in theory?). The answer doesn't necessarily need to be a workable function.
If you want to create a demo code using the programming language I'm using for this problem, itsExcel VBA. But I think this information would be able to apply theory-wise to a lot of other programming languages, so feel free to write in javascript/jQuery, C++, C#, etc. Just remember to comment!

You are looking for Set;
Set<Integer> hs = new HashSet<Integer>();
hs.add(i1);
if(!hs.add(i2)){
randomize(i2);
}
Hope this helps. Let me know, if you have any questions.
The above is just a concept of what to do.
To get the logic for your code, it will be
Set<Integer> hs = new HashSet<Integer>();
for(int count=0; count<Array.length; count++){ // Store the data into the array and loop
dataToInsert = Array[count];
while(hs.add(dataToInsert)){
dataToInsert = randomize(dataToInsert);
}
}

Here is a simple way to get your integers assuming you want to generate them in the range from 1 to N
Generate an integer from 1:N
Generate an integer from 1:N-1
Generate an integer from 1:N-2
Generate an integer from 1:N-(k-1)
Now interpret these as the position of the integer that you generated (in the set of total available integers for that number) and construct your real integer.
Example, N = 5, k=4
3
1
2
2
i1 = 3
i2 = 1
i3 = 4 (the available integers are 2 4 5)
i4 = 5
Note that this requires the minimum amount of random number generations.

To be clear, what you are attempting is the wrong approach. Theoretically, checking for duplicates and "re-randomizing" when one is found, could execute for an infinitely long time because existing integers could continuously be chosen.
What you should be doing is constructing the collection of integers in such a way that there will be no duplicates in the first place. Dennis Jaheruddin's answer does this. Alternatively, if you have a specific set of integers to choose from (like 1-20), and you simply want them in a random order, you should use a shuffling algorithm. In any event, you should start by searching for existing implementations of these in your language, since it has almost certainly been done before.

What you could do is loop over the List<int> and, for each element x at index i, loop while list.Take(i-1).Contains(x) and replace x with a new random number.
If you simply wanted a relatively inexpensive check that a given List<int> is full of unique numbers, however, you could do something like:
bool areAllUnique = list.Count() != list.Distinct().Count()`

2 ways I can think of.
1: Looping over all the values in your set and comparing each one to what you're adding.
2: Creating a simplistic version of a hash map:
var set
var map_size
create_set(n):
set <- array of size n of empty lists
map_size <- n
add_number(num_to_add):
if num_to_add not in set[num_to_add % map_size]:
add num_to_add to set[num_to_add % map_size]
return success
else:
return failure
populate_set():
loop 5 times:
i <- random_number()
while(add_number(i) == failure):
i <- random_number()
This way, each time you add a number, instead of checking against every other number in your set, you're only checking against at most [max value of integer] / [map size] values. And on average [number of elements in set] / [map size] (I think, correct me if I'm wrong) values.

Try
ArrayList<Integer> list = new ArrayList<Integer>();
while (list.size() < 5)
{
int i = Math.random() * max;
if (!list.contains(i))
{
list.add(i);
}
}
and you'll got in list 5 different Integers.

Pseudo-code:
Create an empty set S.
Generate a pseudo-random number r.
If r is in S, go to 2. Else, go to 4.
Add R to S.
If there are still variables to initialize, go to 2.
Exemplary implementation in Java:
public static void main(String[] args)
{
System.out.println(getUniqueRandoms(5, 10));
}
public static Set<Integer> getUniqueRandoms(int howMany, int max)
{
final Set<Integer> uniqueRandoms = new HashSet<Integer>(howMany);
while (uniqueRandoms.size() < howMany)
{
uniqueRandoms.add((int) (Math.random() * max));
}
return uniqueRandoms;
}
Output:
[8, 2, 5, 6, 7]
If you would like to have them in array, not in Set, just call toArray() on your Set.

In R is pretty simple...
i <- as.integer(runif(5, 1, 10))
for(l in seq_along(i)){
while(any(i[l]==i[-l])) # checks each against all the other
i[l] <- as.integer(runif(1, 1, 10))
}
However in R there is the function sample that picks random elements from a given vector without duplicates ( even though you can choose to have them)
> sample(1:10, 5)
[1] 2 5 1 9 6
> sample(1:10, 5)
[1] 3 5 8 2 1
> sample(1:10, 5)
[1] 8 3 5 9 4
> sample(1:10, 5)
[1] 1 8 9 10 5

HashSet<Integer> set = new HashSet<Integer>();
for(int i = 0; i < 5; i++)
{
int x;
do
{
x = random();
}
while(!set.Add(x));
}
int i1 = set.ElementAt(0),
i2 = set.ElementAt(1),
i3 = set.ElementAt(2),
i4 = set.ElementAt(3),
i5 = set.ElementAt(4);

Related

Minimum Size Subarray Sum with sorting

The Minimum Size Subarray Sum problem:
given an array of n positive integers and a positive integer s, find the minimal length of a subarray of which the sum ≥ s. If there isn't one, return 0 instead.
For example, given the array [2,3,1,2,4,3] and s = 7,
the subarray [4,3] has the minimal length under the problem constraint.
The following is my solution:
public int minSubArrayLen(int s, int[] nums) {
long sum = 0;
int a = 0;
if (nums.length < 1)
return 0;
Arrays.sort(nums);
for (int i = nums.length-1; i >= 0; i--) {
sum += nums[i];
a++;
if (sum>=s)
break;
}
if (sum < s) {
return 0;
}
return a;
}
This solution was not accepted because it did not pass the following test case:
697439
[5334,6299,4199,9663,8945,3566,9509,3124,6026,6250,7475,5420,9201,9501,38,5897,4411,6638,9845,161,9563,8854,3731,5564,5331,4294,3275,1972,1521,2377,3701,6462,6778,187,9778,758,550,7510,6225,8691,3666,4622,9722,8011,7247,575,5431,4777,4032,8682,5888,8047,3562,9462,6501,7855,505,4675,6973,493,1374,3227,1244,7364,2298,3244,8627,5102,6375,8653,1820,3857,7195,7830,4461,7821,5037,2918,4279,2791,1500,9858,6915,5156,970,1471,5296,1688,578,7266,4182,1430,4985,5730,7941,3880,607,8776,1348,2974,1094,6733,5177,4975,5421,8190,8255,9112,8651,2797,335,8677,3754,893,1818,8479,5875,1695,8295,7993,7037,8546,7906,4102,7279,1407,2462,4425,2148,2925,3903,5447,5893,3534,3663,8307,8679,8474,1202,3474,2961,1149,7451,4279,7875,5692,6186,8109,7763,7798,2250,2969,7974,9781,7741,4914,5446,1861,8914,2544,5683,8952,6745,4870,1848,7887,6448,7873,128,3281,794,1965,7036,8094,1211,9450,6981,4244,2418,8610,8681,2402,2904,7712,3252,5029,3004,5526,6965,8866,2764,600,631,9075,2631,3411,2737,2328,652,494,6556,9391,4517,8934,8892,4561,9331,1386,4636,9627,5435,9272,110,413,9706,5470,5008,1706,7045,9648,7505,6968,7509,3120,7869,6776,6434,7994,5441,288,492,1617,3274,7019,5575,6664,6056,7069,1996,9581,3103,9266,2554,7471,4251,4320,4749,649,2617,3018,4332,415,2243,1924,69,5902,3602,2925,6542,345,4657,9034,8977,6799,8397,1187,3678,4921,6518,851,6941,6920,259,4503,2637,7438,3893,5042,8552,6661,5043,9555,9095,4123,142,1446,8047,6234,1199,8848,5656,1910,3430,2843,8043,9156,7838,2332,9634,2410,2958,3431,4270,1420,4227,7712,6648,1607,1575,3741,1493,7770,3018,5398,6215,8601,6244,7551,2587,2254,3607,1147,5184,9173,8680,8610,1597,1763,7914,3441,7006,1318,7044,7267,8206,9684,4814,9748,4497,2239]
The expected answer is 132 but my output was 80.
Does anyone have any idea what went wrong with my algorithm/code?
I will simply explain the flaw in the logic rather giving the correct logic to handle the problem statement
You are taking the numbers in a specific sequence and then adding them for comparison. Quite easily the case can be different where you take numbers in random order to get the exact sum.
For example [2,3,1,2,4,3] and s = 7.
Based on your logic
Step 1-> Sort the numbers and you get [1,2,2,3,3,4]
Step 2-> You pick last 2 number (3,4) to get your sum 7
Lets change the sum to 8
From Step 2-> You get 3+3+4 = 10 so u break out of the loop. After this step you return a = 2
Flaw here is 4+3+1 also makes 8 something your logic skips.
Same way 3+3+2 is also possible solution to achieve 8.
You sorting the array is first flaw in the logic itself. If you consider subarray of existing arrangement, sorting changes the arrangement therefore you will never be able to get the expected solution.

Divide times in two boxes and find the minimum difference

Started to learn recursion and I am stuck with this simple problem. I believe that there are more optimized ways to do this but first I'm trying to learn the bruteforce approach.
I have bag A and bag B and have n items each one with some time (a float with two decimal places). The idea is to distribute the items by the two bags and obtain the minimum difference in the two bags. The idea is to try all possible outcomes.
I thought only in one bag (lets say bag A) since the other bag will contain all the items that are not in the bag A and therefore the difference will be the absolute value of total times sum - 2 * sum of the items time that are in the bag A.
I'm calling my recursive function like this:
min = total_time;
recursive(0, items_number - 1, 0);
And the code for the function is this:
void recursive(int index, int step, float sum) {
sum += items_time[index];
float difference = fabs(total_time - 2 * sum);
if (min > difference) {
min = difference;
}
if (!(min == 0.00 || step == 1 || sum > middle_time)) {
int i;
for (i = 0; i < items_number; i++) {
if (i != index) {
recursive(i, step - 1, sum);
}
}
}
}
Imagine I have 4 items with the times 1.23, 2.17 , 2.95 , 2.31
I'm getting the result 0.30. I believe that this is the correct result but I'm almost certain that if it is is pure change because If I try with bigger cases the program stops after a while. Probably because the recursion tree gets to bigger.
Can someone point me in some direction?
Okay, after the clarification, let me (hopefully) point you to a direction:
Let's assume that you know what n is, mentioned in n items. In your example, it was 2n is 4, making n = 2. Let's pick another n, let it be 3 this time, and our times shall be:
1.00
2.00
3.00
4.00
5.00
6.00
Now, we can already tell what the answer is; what you had said is all correct, optimally each of the bags will have their n = 3 times summed up to middle_time, which is 21 / 2 = 10.5 in this case. Since integers may never sum up to numbers with decimal points, 10.5 : 10.5 may never be achieved in this example, but 10 : 11 can, and you can have 10 through 6.00 + 3.00 + 1.00 (3 elements), so... yeah, the answer is simply 1.
How would you let a computer calculate it? Well; recall what I said at the beginning:
Let us assume that you know what n is.
In that case a naive programmer would probably simply put all those inside 2 or 3 nested for loops. 2 if he/she knew that the other half will be determined when you pick a half (by simply fixing the very first element in our group, since that element is to be included in one of the groups), like you also know; 3 if he/she didn't know that. Let's make it with 2:
...
float difference;
int i;
for ( i = 1; i < items_number; i++ ) {
sum = items_time[0] + items_time[i];
int j;
for ( j = i + 1; j < items_number; j++ ) {
sum += items_time[j];
difference = fabs( total_time - 2 * sum );
if ( min > difference ) {
min = difference;
}
}
}
...
Let me comment about the code a little for faster understanding: On the first cycle, it will add up the 0th time, the 1st time and then the 2nd time as you may see; then it will do the same check you had made (calculate the difference and compare the it with min). Let us call this the 012 group. The next group that will be checked will be 013, then 014, then 015; then 023, and so on... Each possible combination that will split the 6 into two 3s will be checked.
This operation shouldn't be any tiresome for the computer to issue. Even with this simple approach, the maximum amount of tries will be the amount of combinations of 3 you could have with 6 unique elements divided by 2. In maths, people denote this as C(6, 3), which evaluates to (6 * 5 * 4) / (3 * 2 * 1) = 20; divided by 2, so it's 10.
My guess is that the computer wouldn't make it a problem even if n was 10, making the amount of combinations as high as C(20, 10) / 2 = 92 378. It would, however, be a problem for you to write down 9 nested for loops by hand...
Anyway, the good thing is, you can recursively nest these loops. Here I will end my guidance. Since you apparently are studying for the recursion already, it wouldn't be good for me to offer a solution at this point. I can assure you that it is do-able.
Also the version I have made on my end can do it within a second for up to items_number = 22, without having made any optimizations; simply with brute force. That makes 352 716 combinations, and my machine is just a simple Windows tablet...
Your problem is called the Partition Problem. It is NP-hard and after some point, it will take a very long time to complete: the tree gets exponentially bigger as the number of cases to test grows.
The partition problem is well known and well documented over the internet. There exists some optimized solution
Your approach is not the naive brute-force approach, which would just walk through the list of items and put it into bag A and bag B recursively, chosing the case with the minimum difference, for example:
double recurse(double arr[], int n, double l, double r)
{
double ll, rr;
if (n == 0) return fabs(l - r);
ll = recurse(arr + 1, n - 1, l + *arr, r);
rr = recurse(arr + 1, n - 1, l, r + *arr);
if (ll > rr) return rr;
return ll;
}
(This code is very naive - it doesn't quite early on clearly non-optimal cases and it also wastes time by calculating every case twice with bags A and B swapped. it is brute force, however.)
You maximum recursion depth is the numer of items n, you call the recursive function 2^n - 1 times.
In your code, you can put the same item into a bag over and over:
for (i = 0; i < number_of_pizzas; i++) {
if (i != index) {
recursive(i, step - 1, sum);
}
}
This loop prevents you from treating the current item, but will happily treat items that have been put into the bag in earlier recursions for a second (or third) time. If you want to use that approach, you must keep a state of which item is in which bag.
Also, I don't understand your step. You start with step - 1 and stop recursion when step == 1. That means you are considering n - 2 items. I understand that the other items are in the other bag, but that's a weird condition that won't let you find the solution to, say, {8.0, 2.4, 2.4, 2.8}.

Calculate all possibilities to get N using values from a given set [duplicate]

This question already has answers here:
Algorithm to find elements best fitting in a particular amount
(5 answers)
how do you calculate the minimum-coin change for transaction?
(3 answers)
Closed 9 years ago.
So here is the problem:
Given input = [100 80 66 25 4 2 1], I need to find the best combination to give me 50.
Looking at this, the best would be 25+25 = 50, so I need 2 elements from the array.
Other combinations include 25+4+4+4+4+4+4+1 and 25+4+4+4+4+4+2+2+1.. etc etc
I need to find all the possibilities which gives me the sum on a value I want.
EDIT: As well as the best possibility (one with least number of terms)
Here is what I have done thus far:
First build a new array (simple for loop which cycles through all elements and stores in a new temp array), check for all elements higher than my array (so for input 50, the elements 100,80,66 are higher, so discard them and then my new array is [25 4 2 1]). Then, from this, I need to check combinations.
The first thing I do is a simple if statement checking if any array elements EXACTLY match the number I want. So if I want 50, I check if 50 is in the array, if not, I need to find combinations.
My problem is, I'm not entirely sure how to find every single combination. I have been struggling trying to come up with an algorithm for a while but I always just end up getting stumped.
Any help/tips would be much appreciated.
PS - we can assume the array is always sorted in order from LARGEST to SMALLEST value.
This is the kind of problem that dynamic programming is meant to solve.
Create an array with with indices, 1 to 50. Set each entry to -1. For each element that is in your input array, set that element in the array to 0. Then, for each integer n = 2 to 50, find all possible ways to sum to n. The number of sums required is the minimum of the two addends plus 1. At the end, get the element at index 50.
Edit: Due to a misinterpretation of the question, I first answered with an efficient way to calculate the number of possibilities (instead of the possibilities themself) to get N using values from a given set. That solution can be found at the bottom of this post as a reference for other people, but first I'll give a proper answer to your questions.
Generate all possibilities, count them and give the shortest one
When generating a solution, you consider each element from the input array and ask yourself "should I use this in my solution or not?". Since we don't know the answer until after the calculation, we'll just have to try out both using it and not using it, as can be seen in the recursion step in the code below.
Now, to avoid duplicates and misses, we need to be a bit careful with the parameters for the recursive call. If we use the current element, we should also allow it to be used in the next step, because the element may be used as many times as possible. Therefore, the first parameter in this recursive call is i. However, if we decide to not use the element, we should not allow it to be used in the next step, because that would be a duplicate of the current step. Therefore, the first parameter in this recursive call is i+1.
I added an optional bound (from "branch and bound") to the algorithm, that will stop expanding the current partial solution if it is known that this solution will never be shorter then the shortest solution found so far.
package otherproblems;
import java.util.Deque;
import java.util.LinkedList;
public class GeneratePossibilities
{
// Input
private static int n = 50;
// If the input array is sorted ascending, the shortest solution is
// likely to be found somewhere at the end.
// If the input array is sorted descending, the shortest solution is
// likely to be found somewhere in the beginning.
private static int[] input = {100, 80, 66, 25, 4, 2, 1};
// Shortest possibility
private static Deque<Integer> shortest;
// Number of possibilities
private static int numberOfPossibilities;
public static void main(String[] args)
{
calculate(0, n, new LinkedList<Integer>());
System.out.println("\nAbove you can see all " + numberOfPossibilities +
" possible solutions,\nbut this one's the shortest: " + shortest);
}
public static void calculate(int i, int left, Deque<Integer> partialSolution)
{
// If there's nothing left, we reached our target
if (left == 0)
{
System.out.println(partialSolution);
if (shortest == null || partialSolution.size() < shortest.size())
shortest = new LinkedList<Integer>(partialSolution);
numberOfPossibilities++;
return;
}
// If we overshot our target, by definition we didn't reach it
// Note that this could also be checked before making the
// recursive call, but IMHO this gives a cleaner recursion step.
if (left < 0)
return;
// If there are no values remaining, we didn't reach our target
if (i == input.length)
return;
// Uncomment the next two lines if you don't want to keep generating
// possibilities when you know it can never be a better solution then
// the one you have now.
// if (shortest != null && partialSolution.size() >= shortest.size())
// return;
// Pick value i. Note that we are allowed to pick it again,
// so the argument to calculate(...) is i, not i+1.
partialSolution.addLast(input[i]);
calculate(i, left-input[i], partialSolution);
// Don't pick value i. Note that we are not allowed to pick it after
// all, so the argument to calculate(...) is i+1, not i.
partialSolution.removeLast();
calculate(i+1, left, partialSolution);
}
}
Calculate the number of possibilities efficiently
This is a nice example of dynamic programming. What you need to do is figure out how many possibilities there are to form the number x, using value y as the last addition and using only values smaller than or equal to y. This gives you a recursive formula that you can easily translate to a solution using dynamic programming. I'm not quite sure how to write down the mathematics here, but since you weren't interested in them anyway, here's the code to solve your question :)
import java.util.Arrays;
public class Possibilities
{
public static void main(String[] args)
{
// Input
int[] input = {100, 80, 66, 25, 4, 2, 1};
int n = 50;
// Prepare input
Arrays.sort(input);
// Allocate storage space
long[][] m = new long[n+1][input.length];
for (int i = 1; i <= n; i++)
for (int j = 0; j < input.length; j++)
{
// input[j] cannot be the last value used to compose i
if (i < input[j])
m[i][j] = 0;
// If input[j] is the last value used to compose i,
// it must be the only value used in the composition.
else if (i == input[j])
m[i][j] = 1;
// If input[j] is the last value used to compose i,
// we need to know the number of possibilities in which
// i - input[j] can be composed, which is the sum of all
// entries in column m[i-input[j]].
// However, to avoid counting duplicates, we only take
// combinations that are composed of values equal or smaller
// to input[j].
else
for (int k = 0; k <= j; k++)
m[i][j] += m[i-input[j]][k];
}
// Nice output of intermediate values:
int digits = 3;
System.out.printf(" %"+digits+"s", "");
for (int i = 1; i <= n; i++)
System.out.printf(" %"+digits+"d", i);
System.out.println();
for (int j = 0; j < input.length; j++)
{
System.out.printf(" %"+digits+"d", input[j]);
for (int i = 1; i <= n; i++)
System.out.printf(" %"+digits+"d", m[i][j]);
System.out.println();
}
// Answer:
long answer = 0;
for (int i = 0; i < input.length; i++)
answer += m[n][i];
System.out.println("\nThe number of possibilities to form "+n+
" using the numbers "+Arrays.toString(input)+" is "+answer);
}
}
This is the integer knapsack problem, which is one your most common NP-complete problems out there; if you are into algorithm design/study check those out. To find the best I think you have no choice but to compute them all and keep the smallest one.
For the correct solution there is a recursive algorithm that is pretty simple to put together.
import org.apache.commons.lang.ArrayUtils;
import java.util.*;
public class Stuff {
private final int target;
private final int[] steps;
public Stuff(int N, int[] steps) {
this.target = N;
this.steps = Arrays.copyOf(steps, steps.length);
Arrays.sort(this.steps);
ArrayUtils.reverse(this.steps);
this.memoize = new HashMap<Integer, List<Integer>>(N);
}
public List<Integer> solve() {
return solveForN(target);
}
private List<Integer> solveForN(int N) {
if (N == 0) {
return new ArrayList<Integer>();
} else if (N > 0) {
List<Integer> temp, min = null;
for (int i = 0; i < steps.length; i++) {
temp = solveForN(N - steps[i]);
if (temp != null) {
temp.add(steps[i]);
if (min == null || min.size() > temp.size()) {
min = temp;
}
}
}
return min;
} else {
return null;
}
}
}
It is based off the fact that to "get to N" you to have come from N - steps[0], or N - steps1, ...
Thus you start from your target total N and subtract one of the possible steps, and do it again until you are at 0 (return a List to specify that this is a valid path) or below (return null so that you cannot return an invalid path).
The complexity of this correct solution is exponential! Which is REALLY bad! Something like O(k^M) where M is the size of the steps array and k a constant.
To get a solution to this problem in less time than that you will have to use a heuristic (approximation) and you will always have a certain probability to have the wrong answer.
You can make your own implementation faster by memorizing the shortest combination seen so far for all targets (so you do not need to recompute recur(N, _, steps) if you already did). This approach is called Dynamic Programming. I will let you do that on your own (very fun stuff and really not that complicated).
Constraints of this solution : You will only find the solution if you guarantee that the input array (steps) is sorted in descending order and that you go through it in that order.
Here is a link to the general Knapsack problem if you also want to look approximation solutions: http://en.wikipedia.org/wiki/Knapsack_problem
You need to solve each sub-problem and store the solution. For example:
1 can only be 1. 2 can be 2 or 1+1. 4 can be 4 or 2+2 or 2+1+1 or 1+1+1+1. So you take each sub-solution and store it, so when you see 25=4+4+4+4+4+4+1, you already know that each 4 can also be represented as one of the 3 combinations.
Then you have to sort the digits and check to avoid duplicate patterns since, for example, (2+2)+(2+2)+(2+2)+(1+1+1+1)+(1+1+1+1)+(1+1+1+1) == (2+1+1)+(2+1+1)+(2+1+1)+(2+1+1)+(2+1+1)+(2+1+1). Six 2's and twelve 1's in both cases.
Does that make sense?
Recursion should be the easiest way to solve this (Assuming you really want to find all the solutions to the problem). The nice thing about this approach is, if you want to just find the shortest solution, you can add a check on the recursion and find just that, saving time and space :)
Assuming an element i of your array is part of the solution, you can solve the subproblem of finding the elements that sums to n-i. If we add an ordering to our solution, for example the numbers in the sum must be from the greater to the smallest, we have a way to find unique solutions.
This is a recursive solution in C#, it should be easy to translate it in java.
public static void RecursiveSum(int n, int index, List<int> lst, List<int> solution)
{
for (int i = index; i < lst.Count; i++)
{
if (n == 0)
{
Console.WriteLine("");
foreach (int j in solution)
{
Console.Write(j + " ");
}
}
if (n - lst[i] >= 0)
{
List<int> tmp = new List<int>(solution);
tmp.Add(lst[i]);
RecursiveSum(n - lst[i], i, lst, tmp);
}
}
}
You call it with
RecursiveSum(N,0,list,new List<int>());
where N is the sum you are looking for, 0 shouldn't be changed, list is your list of allowed numbers, and the last parameter shouldn't be changed either.
The problem you pose is interesting but very complex. I'd approach this by using something like OptaPlanner(formerly Drools Planner). It's difficult to describe a full solution to this problem without spending significant time, but with optaplanner you can also get "closest fit" type answers and can have incremental "moves" that would make solving your problem more efficient. Good luck.
This is a solution in python: Ideone link
# Start of tsum function
def tsum(currentSum,total,input,record,n):
if total == N :
for i in range(0,n):
if record[i]:
print input[i]
i = i+1
for i in range(i,n):
if record[i]:
print input[i]
print ""
return
i=currentSum
for i in range(i,n):
if total+input[i]>sum :
continue
if i>0 and input[i]==input[i-1] and not record[i-1] :
continue
record[i]=1
tsum(i+1,total+input[i],input,record,l)
record[i]=0
# end of function
# Below portion will be main() in Java
record = []
N = 5
input = [3, 2, 2, 1, 1]
temp = list(set(input))
newlist = input
for i in range(0, len(list(set(input)))):
val = N/temp[i]
for j in range(0, val-input.count(temp[i])):
newlist.append(temp[i])
# above logic was to create a newlist/input i.e [3, 2, 2, 1, 1, 1, 1, 1]
# This new list contains the maximum number of elements <= N
# for e.g appended three 1's as sum of new three 1's + existing two 1's <= N(5) where as
# did not append another 2 as 2+2+2 > N(5) or 3 as 3+3 > N(5)
l = len(input)
for i in range(0,l):
record.append(0)
print "all possibilities to get N using values from a given set:"
tsum(0,0,input,record,l)
OUTPUT: for set [3, 2, 2, 1, 1] taking small set and small N for demo purpose. But works well for higher N value as well.
For N = 5
all possibilities to get N using values from a given set:
3
2
3
1
1
2
2
1
2
1
1
1
1
1
1
1
1
For N = 3
all possibilities to get N using values from a given set:
3
2
1
1
1
1
Isn't this just a search problem? If so, just search breadth-first.
abstract class Numbers {
abstract int total();
public static Numbers breadthFirst(int[] numbers, int total) {
List<Numbers> stack = new LinkedList<Numbers>();
if (total == 0) { return new Empty(); }
stack.add(new Empty());
while (!stack.isEmpty()) {
Numbers nums = stack.remove(0);
for (int i : numbers) {
if (i > 0 && total - nums.total() >= i) {
Numbers more = new SomeNumbers(i, nums);
if (more.total() == total) { return more; }
stack.add(more);
}
}
}
return null; // No answer.
}
}
class Empty extends Numbers {
int total() { return 0; }
public String toString() { return "empty"; }
}
class SomeNumbers extends Numbers {
final int total;
final Numbers prev;
SomeNumbers(int n, Numbers prev) {
this.total = n + prev.total();
this.prev = prev;
}
int total() { return total; }
public String toString() {
if (prev.getClass() == Empty.class) { return "" + total; }
return prev + "," + (total - prev.total());
}
}
What about using the greedy algorithm n times (n is the number of elements in your array), each time popping the largest element off the list. E.g. (in some random pseudo-code language):
array = [70 30 25 4 2 1]
value = 50
sort(array, descending)
solutions = [] // array of arrays
while length of array is non-zero:
tmpValue = value
thisSolution = []
for each i in array:
while tmpValue >= i:
tmpValue -= i
thisSolution.append(i)
solutions.append(thisSolution)
array.pop_first() // remove the largest entry from the array
If run with the set [70 30 25 4 2 1] and 50, it should give you a solutions array like this:
[[30 4 4 4 4 4]
[30 4 4 4 4 4]
[25 25]
[4 4 4 4 4 4 4 4 4 4 4 4 2]
[2 ... ]
[1 ... ]]
Then simply pick the element from the solutions array with the smallest length.
Update: The comment is correct that this does not generate the correct answer in all cases. The reason is that greedy isn't always right. The following recursive algorithm should always work:
array = [70, 30, 25, 4, 3, 1]
def findSmallest(value, array):
minSolution = []
tmpArray = list(array)
while len(tmpArray):
elem = tmpArray.pop(0)
tmpValue = value
cnt = 0
while tmpValue >= elem:
cnt += 1
tmpValue -= elem
subSolution = findSmallest(tmpValue, tmpArray)
if tmpValue == 0 or subSolution:
if not minSolution or len(subSolution) + cnt < len(minSolution):
minSolution = subSolution + [elem] * cnt
return minSolution
print findSmallest(10, array)
print findSmallest(50, array)
print findSmallest(49, array)
print findSmallest(55, array)
Prints:
[3, 3, 4]
[25, 25]
[3, 4, 4, 4, 4, 30]
[30, 25]
The invariant is that the function returns either the smallest set for the value passed in, or an empty set. It can then be used recursively with all possible values of the previous numbers in the list. Note that this is O(n!) in complexity, so it's going to be slow for large values. Also note that there are numerous optimization potentials here.
I made a small program to help with one solution. Personally, I believe the best would be a deterministic mathematical solution, but right now I lack the caffeine to even think on how to implement it. =)
Instead, I went with a SAR approach. Stop and Reverse is a technique used on stock trading (http://daytrading.about.com/od/stou/g/SAR.htm), and is heavily used to calculate optimal curves with a minimal of inference. The Wikipedia entry for parabolical SAR goes like this:
'The Parabolic SAR is calculated almost independently for each trend
in the price. When the price is in an uptrend, the SAR emerges below
the price and converges upwards towards it. Similarly, on a
downtrend, the SAR emerges above the price and converges
downwards.'
I adapted it to your problem. I start with a random value from your series. Then the code enters a finite number of iterations.
I pick another random value from the series stack.
If the new value plus the stack sum is inferior to the target, then the value is added; if superior, then decreased.
I can go on for as much as I want until I satisfy the condition (stack sum = target), or abort if the cycle can't find a valid solution.
If successful, I record the stack and the number of iterations. Then I redo everything.
An EXTREMELY crude code follows. Please forgive the hastiness. Oh, and It's in C#. =)
Again, It does not guarantee that you'll obtain the optimal path; it's a brute force approach. It can be refined; detect if there's a perfect match for a target hit, for example.
public static class SAR
{
//I'm considering Optimal as the smallest signature (number of members).
// Once set, all future signatures must be same or smaller.
private static Random _seed = new Random();
private static List<int> _domain = new List<int>() { 100, 80, 66, 24, 4, 2, 1 };
public static void SetDomain(string domain)
{
_domain = domain.Split(',').ToList<string>().ConvertAll<int>(a => Convert.ToInt32(a));
_domain.Sort();
}
public static void FindOptimalSAR(int value)
{
// I'll skip some obvious tests. For example:
// If there is no odd number in domain, then
// it's impossible to find a path to an odd
// value.
//Determining a max path run. If the count goes
// over this, it's useless to continue.
int _maxCycle = 10;
//Determining a maximum number of runs.
int _maxRun = 1000000;
int _run = 0;
int _domainCount = _domain.Count;
List<int> _currentOptimalSig = new List<int>();
List<String> _currentOptimalOps = new List<string>();
do
{
List<int> currSig = new List<int>();
List<string> currOps = new List<string>();
int _cycle = 0;
int _cycleTot = 0;
bool _OptimalFound = false;
do
{
int _cursor = _seed.Next(_domainCount);
currSig.Add(_cursor);
if (_cycleTot < value)
{
currOps.Add("+");
_cycleTot += _domain[_cursor];
}
else
{
// Your situation doesn't allow for negative
// numbers. Otherwise, just enable the two following lines.
// currOps.Add("-");
// _cycleTot -= _domain[_cursor];
}
if (_cycleTot == value)
{
_OptimalFound = true;
break;
}
_cycle++;
} while (_cycle < _maxCycle);
if (_OptimalFound)
{
_maxCycle = _cycle;
_currentOptimalOps = currOps;
_currentOptimalSig = currSig;
Console.Write("Optimal found: ");
for (int i = 0; i < currSig.Count; i++)
{
Console.Write(currOps[i]);
Console.Write(_domain[currSig[i]]);
}
Console.WriteLine(".");
}
_run++;
} while (_run < _maxRun);
}
}
And this is the caller:
String _Domain = "100, 80, 66, 25, 4, 2, 1";
SAR.SetDomain(_Domain);
Console.WriteLine("SAR for Domain {" + _Domain + "}");
do
{
Console.Write("Input target value: ");
int _parm = (Convert.ToInt32(Console.ReadLine()));
SAR.FindOptimalSAR(_parm);
Console.WriteLine("Done.");
} while (true);
This is my result after 100k iterations for a few targets, given a slightly modified series (I switched 25 for 24 for testing purposes):
SAR for Domain {100, 80, 66, 24, 4, 2, 1}
Input target value: 50
Optimal found: +24+24+2.
Done.
Input target value: 29
Optimal found: +4+1+24.
Done.
Input target value: 75
Optimal found: +2+2+1+66+4.
Optimal found: +4+66+4+1.
Done.
Now with your original series:
SAR for Domain {100, 80, 66, 25, 4, 2, 1}
Input target value: 50
Optimal found: +25+25.
Done.
Input target value: 75
Optimal found: +25+25+25.
Done.
Input target value: 512
Optimal found: +80+80+66+100+1+80+25+80.
Optimal found: +66+100+80+100+100+66.
Done.
Input target value: 1024
Optimal found: +100+1+80+80+100+2+100+2+2+2+25+2+100+66+25+66+100+80+25+66.
Optimal found: +4+25+100+80+100+1+80+1+100+4+2+1+100+1+100+100+100+25+100.
Optimal found: +80+80+25+1+100+66+80+80+80+100+25+66+66+4+100+4+1+66.
Optimal found: +1+100+100+100+2+66+25+100+66+100+80+4+100+80+100.
Optimal found: +66+100+100+100+100+100+100+100+66+66+25+1+100.
Optimal found: +100+66+80+66+100+66+80+66+100+100+100+100.
Done.
Cons: It is worth mentioning again: This algorithm does not guarantee that you will find the optimal values. It makes a brute-force approximation.
Pros: Fast. 100k iterations may initially seem a lot, but the algorithm starts ignoring long paths after it detects more and more optimized paths, since it lessens the maximum allowed number of cycles.

Finding contiguous ranges in arrays

You are given an array of integers. You have to output the largest range so that all numbers in the range are present in the array. The numbers might be present in any order. For example, suppose that the array is
{2, 10, 3, 12, 5, 4, 11, 8, 7, 6, 15}
Here we find two (nontrivial) ranges for which all the integers in these ranges are present in the array, namely [2,8] and [10,12]. Out of these [2,8] is the longer one. So we need to output that.
When I was given this question, I was asked to do this in linear time and without using any sorting. I thought that there might be a hash-based solution, but I couldn't come up with anything.
Here's my attempt at a solution:
void printRange(int arr[])
{
int n=sizeof(arr)/sizeof(int);
int size=2;
int tempans[2];
int answer[2];// the range is stored in another array
for(int i =0;i<n;i++)
{
if(arr[0]<arr[1])
{
answer[0]=arr[0];
answer[1]=arr[1];
}
if(arr[1]<arr[0])
{
answer[0]=arr[1];
answer[1]=arr[0];
}
if(arr[i] < answer[1])
size += 1;
else if(arr[i]>answer[1]) {
initialize tempans to new range;
size2=2;
}
else {
initialize tempans to new range
}
}
//I have to check when the count becomes equal to the diff of the range
I am stuck at this part... I can't figure out how many tempanswer[] arrays should be used.
I think that the following solution will work in O(n) time using O(n) space.
Begin by putting all of the entries in the array into a hash table. Next, create a second hash table which stores elements that we have "visited," which is initially empty.
Now, iterate across the array of elements one at a time. For each element, check if the element is in the visited set. If so, skip it. Otherwise, count up from that element upward. At each step, check if the current number is in the main hash table. If so, continue onward and mark the current value as part of the visited set. If not, stop. Next, repeat this procedure, except counting downward. This tells us the number of contiguous elements in the range containing this particular array value. If we keep track of the largest range found this way, we will have a solution to our problem.
The runtime complexity of this algorithm is O(n). To see this, note that we can build the hash table in the first step in O(n) time. Next, when we begin scanning to array to find the largest range, each range scanned takes time proportional to the length of that range. Since the total sum of the lengths of the ranges is the number of elements in the original array, and since we never scan the same range twice (because we mark each number that we visit), this second step takes O(n) time as well, for a net runtime of O(n).
EDIT: If you're curious, I have a Java implementation of this algorithm, along with a much more detailed analysis of why it works and why it has the correct runtime. It also explores a few edge cases that aren't apparent in the initial description of the algorithm (for example, how to handle integer overflow).
Hope this helps!
The solution could use BitSet:
public static void detect(int []ns) {
BitSet bs = new BitSet();
for (int i = 0; i < ns.length; i++) {
bs.set(ns[i]);
}
int begin = 0;
int setpos = -1;
while((setpos = bs.nextSetBit(begin)) >= 0) {
begin = bs.nextClearBit(setpos);
System.out.print("[" + setpos + " , " + (begin - 1) + "]");
}
}
Sample I/O:
detect(new int[] {2,10, 3, 12, 5,4, 11, 8, 7, 6, 15} );
[2,8] [10,12] [15,15]
Here is the solution in Java:
public class Solution {
public int longestConsecutive(int[] num) {
int longest = 0;
Map<Integer, Boolean> map = new HashMap<Integer, Boolean>();
for(int i = 0; i< num.length; i++){
map.put(num[i], false);
}
int l, k;
for(int i = 0;i < num.length;i++){
if(map.containsKey(num[i]-1) || map.get(num[i])) continue;
map.put(num[i], true);
l = 0; k = num[i];
while (map.containsKey(k)){
l++;
k++;
}
if(longest < l) longest = l;
}
return longest;
}
}
Other approaches here.
The above answer by template will work but you don't need a hash table. Hashing could take a long time depending on what algorithm you use. You can ask the interviewer if there's a max number the integer can be, then create an array of that size. Call it exist[] Then scan through arr and mark exist[i] = 1; Then iterate through exist[] keeping track of 4 variables, size of current largest range, and the beginning of the current largest range, size of current range, and beginning of current range. When you see exist[i] = 0, compare the current range values vs largest range values and update the largest range values if needed.
If there's no max value then you might have to go with the hashing method.
Actually considering that we're only sorting integers and therefore a comparision sort is NOT necessary, you can just sort the array using a Radix- or BucketSort and then iterate through it.
Simple and certainly not what the interviewee wanted to hear, but correct nonetheless ;)
A Haskell implementation of Grigor Gevorgyan's solution, from another who didn't get a chance to post before the question was marked as a duplicate...(simply updates the hash and the longest range so far, while traversing the list)
import qualified Data.HashTable.IO as H
import Control.Monad.Random
f list = do
h <- H.new :: IO (H.BasicHashTable Int Int)
g list (0,[]) h where
g [] best h = return best
g (x:xs) best h = do
m <- H.lookup h x
case m of
Just _ -> g xs best h
otherwise -> do
(xValue,newRange) <- test
H.insert h x xValue
g xs (maximum [best,newRange]) h
where
test = do
m1 <- H.lookup h (x-1)
m2 <- H.lookup h (x+1)
case m1 of
Just x1 -> case m2 of
Just x2 -> do H.insert h (x-1) x2
H.insert h (x+1) x1
return (x,(x2 - x1 + 1,[x1,x2]))
Nothing -> do H.insert h (x-1) x
return (x1,(x - x1 + 1,[x,x1]))
Nothing -> case m2 of
Just x2 -> do H.insert h (x+1) x
return (x2,(x2 - x + 1,[x,x2]))
Nothing -> do return (x,(1,[x]))
rnd :: (RandomGen g) => Rand g Int
rnd = getRandomR (-100,100)
main = do
values <- evalRandIO (sequence (replicate (1000000) rnd))
f values >>= print
Output:
*Main> main
(10,[40,49])
(5.30 secs, 1132898932 bytes)
I read a lot of solutions on multiple platforms to this problem and one got my attention, as it solves the problem very elegantly and it is easy to follow.
The backbone of this method is to create a set/hash which takes O(n) time and from there every access to the set/hash will be O(1). As the O-Notation omit's constant terms, this Algorithm still can be described overall as O(n)
def longestConsecutive(self, nums):
nums = set(nums) # Create Hash O(1)
best = 0
for x in nums:
if x - 1 not in nums: # Optimization
y = x + 1 # Get possible next number
while y in nums: # If the next number is in set/hash
y += 1 # keep counting
best = max(best, y - x) # counting done, update best
return best
It's straight forward if you ran over it with simple numbers. The Optimization step is just a short-circuit to make sure you start counting, when that specific number is the beginning of a sequence.
All Credits to Stefan Pochmann.
Very short solution using Javascript sparse array feature:
O(n) time using O(n) additional space.
var arr = [2, 10, 3, 12, 5, 4, 11, 8, 7, 6, 15];
var a = [];
var count = 0, max_count = 0;
for (var i=0; i < arr.length; i++) a[arr[i]] = true;
for (i = 0; i < a.length; i++) {
count = (a[i]) ? count + 1 : 0;
max_count = Math.max(max_count, count);
}
console.log(max_count); // 7
A quick way to do it (PHP) :
$tab = array(14,12,1,5,7,3,4,10,11,8);
asort($tab);
$tab = array_values($tab);
$tab_contiguous = array();
$i=0;
foreach ($tab as $key => $val) {
$tab_contiguous[$i][] = $tab[$key];
if (isset($tab[$key+1])) {
if($tab[$key] + 1 != $tab[$key+1])
$i++;
}
}
echo(json_encode($tab_contiguous));

Find the Smallest Integer Not in a List

An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))

Resources