Algorithm to find the duplicate numbers in an array ---Fastest Way - c

I need the fastest and simple algorithm which finds the duplicate numbers in an array, also should be able to know the number of duplicates.
Eg: if the array is {2,3,4,5,2,4,6,2,4,7,3,8,2}
I should be able to know that there are four 2's, two 3's and three 4's.

Make a hash table where the key is array item and value is counter how many times the corresponding array item has occurred in array. This is efficient way to do it, but probably not the fastest way.
Something like this (in pseudo code). You will find plenty of hash map implementations for C by googling.
hash_map = create_new_hash_map()
for item in array {
if hash_map.contains_key(item){
counter = hash_map.get(item)
} else {
counter = 0
}
counter = counter + 1
hash_map.put(item, counter)
}

This can be solved elegantly using Linq:
public static void Main(string[] args)
{
List<int> list = new List<int> { 2, 3, 4, 5, 2, 4, 6, 2, 4, 7, 3, 8, 2 };
var grouping = list
.GroupBy(x => x)
.Select(x => new { Item = x.Key, Count = x.Count()});
foreach (var item in grouping)
Console.WriteLine("Item {0} has count {1}", item.Item, item.Count);
}
Internally it probably uses hashing to partition the list, but the code hides the internal details - here we are only telling it what to calculate. The compiler / runtime is free to choose how to calculate it, and optimize as it sees fit. Thanks to Linq this same code will run efficiently whether run an a list in memory, or if the list is in a database. In real code you should use this, but I guess you want to know how internally it works.
A more imperative approach that demonstrates the actual algorithm is as follows:
List<int> list = new List<int> { 2, 3, 4, 5, 2, 4, 6, 2, 4, 7, 3, 8, 2 };
Dictionary<int, int> counts = new Dictionary<int, int>();
foreach (int item in list)
{
if (!counts.ContainsKey(item))
{
counts[item] = 1;
}
else
{
counts[item]++;
}
}
foreach (KeyValuePair<int, int> item in counts)
Console.WriteLine("Item {0} has count {1}", item.Key, item.Value);
Here you can see that we iterate over the list only once, keeping a count for each item we see on the way. This would be a bad idea if the items were in a database though, so for real code, prefer to use the Linq method.

here's a C version that does it with standard input; it's as fast as the length of the input (beware, the number of parameters on the command line is limited...) but should give you an idea on how to proceed:
#include <stdio.h>
int main ( int argc, char **argv ) {
int dups[10] = { 0 };
int i;
for ( i = 1 ; i < argc ; i++ )
dups[atoi(argv[i])]++;
for ( i = 0 ; i < 10 ; i++ )
printf("%d: %d\n", i, dups[i]);
return 0;
}
example usage:
$ gcc -o dups dups.c
$ ./dups 0 0 3 4 5
0: 2
1: 0
2: 0
3: 1
4: 1
5: 1
6: 0
7: 0
8: 0
9: 0
caveats:
if you plan to count also the number of 10s, 11s, and so on -> the dups[] array must be bigger
left as an exercise is to implement reading from an array of integers and to determine their position

The more you tell us about the input arrays the faster we can make the algorithm. For example, for your example of single-digit numbers then creating an array of 10 elements (indexed 0:9) and accumulating number of occurrences of number in the right element of the array (poorly worded explanation but you probably catch my drift) is likely to be faster than hashing. (I say likely to be faster because I haven't done any measurements and won't).
I agree with most respondents that hashing is probably the right approach for the most general case, but it's always worth thinking about whether yours is a special case.

If you know the lower and upper bounds, and they are not too far apart, this would be a good place to use a Radix Sort. Since this smells of homework, I'm leaving it to the OP to read the article and implement the algorithm.

If you don't want to use hash table or smtg like that, just sort the array then count the number of occurrences, something like below should work
Arrays.sort(array);
lastOne=array's first element;
count=0,
for(i=0; i <array's length; i++)
{
if(array[i]==lastOne)
increment count
else
print(array[i] + " has " + count + " occurrences");
lastOne=array[i+1];
}

If the range of the numbers is known and small, you could use an array to keep track of how many times you've seen each (this is a bucket sort in essence). IF it's big you can sort it and then count duplicates as they will be following each other.

option 1: hash it.
option 2: sort it and then count consecutive runs.

You can use hash tables to store each element value as a key. Then increment +1 each time a key already exists.

Using hash tables / associative arrays / dictionaries (all the same thing but the terminology changes between programming environments) is the way to go.
As an example in python:
numberList = [1, 2, 3, 2, 1, ...]
countDict = {}
for value in numberList:
countDict[value] = countDict.get(value, 0) + 1
# Now countDict contains each value pointing to their count
Similar constructions exist in most programming languages.

> I need the fastest and simple algorithm which finds the duplicate numbers in an array, also should be able to know the number of duplicates.
I think the fastest algorithm is counting the duplicates in an array:
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#include <assert.h>
typedef int arr_t;
typedef unsigned char dup_t;
const dup_t dup_t_max=UCHAR_MAX;
dup_t *count_duplicates( arr_t *arr, arr_t min, arr_t max, size_t arr_len ){
assert( min <= max );
dup_t *dup = calloc( max-min+1, sizeof(dup[0]) );
for( size_t i=0; i<arr_len; i++ ){
assert( min <= arr[i] && arr[i] <= max && dup[ arr[i]-min ] < dup_t_max );
dup[ arr[i]-min ]++;
}
return dup;
}
int main(void){
arr_t arr[] = {2,3,4,5,2,4,6,2,4,7,3,8,2};
size_t arr_len = sizeof(arr)/sizeof(arr[0]);
arr_t min=0, max=16;
dup_t *dup = count_duplicates( arr, min, max, arr_len );
printf( " value count\n" );
printf( " -----------\n" );
for( size_t i=0; i<(size_t)(max-min+1); i++ ){
if( dup[i] ){
printf( "%5i %5i\n", (int)(i+min), (int)(dup[i]) );
}
}
free(dup);
}
Note: You can not use the fastest algorithm on every array.

The code first sorts the array and then moves unique elements to the front, keeping track of the number of elements. It's slower than using bucket sort, but more convenient.
#include <stdio.h>
#include <stdlib.h>
static int cmpi(const void *p1, const void *p2)
{
int i1 = *(const int *)p1;
int i2 = *(const int *)p2;
return (i1 > i2) - (i1 < i2);
}
size_t make_unique(int values[], size_t count, size_t *occ_nums)
{
if(!count) return 0;
qsort(values, count, sizeof *values, cmpi);
size_t top = 0;
int prev_value = values[0];
if(occ_nums) occ_nums[0] = 1;
size_t i = 1;
for(; i < count; ++i)
{
if(values[i] != prev_value)
{
++top;
values[top] = prev_value = values[i];
if(occ_nums) occ_nums[top] = 1;
}
else ++occ_nums[top];
}
return top + 1;
}
int main(void)
{
int values[] = { 2, 3, 4, 5, 2, 4, 6, 2, 4, 7, 3, 8, 2 };
size_t occ_nums[sizeof values / sizeof *values];
size_t unique_count = make_unique(
values, sizeof values / sizeof *values, occ_nums);
size_t i = 0;
for(; i < unique_count; ++i)
{
printf("number %i occurred %u time%s\n",
values[i], (unsigned)occ_nums[i], occ_nums[i] > 1 ? "s": "");
}
}

There is an "algorithm" that I use all the time to find duplicate lines in a file in Unix:
sort file | uniq -d
If you implement the same strategy in C, then it is very difficult to beat it with a fancier strategy such as hash tables. Call a sorting algorithm, and then call your own function to detect duplicates in the sorted list. The sorting algorithm takes O(n*log(n)) time and the uniq function takes linear time. (Southern Hospitality makes a similar point, but I want to emphasize that what he calls "option 2" seems both simpler and faster than the more popular hash tables suggestion.)

Counting sort is the answer to the above question.If you see the algorithm for counting sort you will find that there is an array that is kept for keeping the count of an element i present in the original array.

Here is another solution but it takes O(nlogn) time.
Use Divide and Conquer approach to sort the given array using merge sort.
During combine step in merge sort, find the duplicates by comparing the elements in the two sorted sub-arrays.

Related

Generate list of remaining numbers

Given a number n, and an array with size m where m<n. Provided that each number in the array is between 0 and n-1 (inclusive), I want to get as efficiently as possible the list of n-m numbers from 0 to n-1 which aren't in the array.
That's how I'm doing it (in pseudocode), but it feels rather inefficient and I'm wondering if there's a better way:
int[] remaining (int[] assigned) {
Set<int> s
int[n-m] remaining
add each int in assigned to s
for(i = 0 to n-1)
if(not s.contains(i)) remaining.add(i);
}
This isn't any particular computer language but it should be ilustrative. We'll assume that accessing an array is of course O(1) and adding/checking a set is O(log(n)) as an AVL set would be. So basically I'm trying to get this in linear time, instead of O(n·logn) like it's now, but if the initial array isn't sorted I don't know how to go about it, or if it's even possible.
copy the array into a hashmap H. This takes O(m).
for i from 0 to n-1
if(H.ispresent(i) == FALSE)
output i
This for loop takes O(n).
As n>=m the overall complexity is O(n)
I think it would be a little faster
pseudocode also
int[] remaining (int[] assigned) {
Set<int> s
int[n] all
int[n-m] remaining
for(i = 0 to m-1)
all[assigned[i]]=-1
int counter=0
for(i = 0 to n-1)
if (all[i]==-1)
remaining[counter]=all[i]
counter++
return remaining
}
The bitset (bit array) idea:
#include <iostream>
#include <fstream>
#include <bitset>
const int SIZE = 10; // for example
int main() {
std::bitset<SIZE> bs;
int i;
std::ifstream fin("numbers.txt");
while (fin >> i)
bs.set(i);
fin.close();
for (i = 0; i < SIZE; ++i)
if (!bs[i])
std::cout << i << '\n';
return 0;
}
If you have to find 1 or 2 missing numbers, you can always use the sum and/or the product of the numbers to figure out the missing numbers.If it is more than 2
Code for using a Bitset in java to find the missing numbers.
public List<Integer> findMissingNumbers(List<Integer> input,int maxNum){
/*You can also interate through the list and find the maNum later. The bitset is based on vector and can increase in size
*/
if(input==null || input.size()==0)
return null;
BitSet existSet=new BitSet(maxNum);
for(int val:input){
existSet.set(val);
}
List<Integer> missingNum=new ArrayList<Integer>();
for(int i=0;i<existSet.length()){
nextIndex=bitSet.nextClearBit();
if(nextIndex==-1)
break;
missingNum.add(nextIndex);
index=nextIndex+1;
}
return missingNum;
}

Sort an increasing array

The pseudo codes:
S = {};
Loop 10000 times:
u = unsorted_fixed_size_array_producer();
S = sort(S + u);
I need an efficient implementation of sort, which takes a sorted array and an unsorted one, then sort them all. But here we know after a few iterations, size(S) will be much bigger than size(u), that's a prior.
Update: There's another prior: the size of u is known, say 10 or 20, and the looping times is also known.
Update: I implemented the algorithm that #Dukelnig advised in C https://gist.github.com/blackball/bd7e5619a1e83bd985a3 which fits for my needs. Thanks!
Sort u, then merge S and u.
Merging simply involves iterating through two sorted arrays at the same time, and picking the smaller element and incrementing that iterator at each step.
The running time is O(|u| log |u| + |S|).
This is very similar to what merge sort does, so that it would result in a sorted array can be derived from there.
Some Java code for merge, derived from Wikipedia: (the C code wouldn't look all that different)
static void merge(int S[], int u[], int newS[])
{
int iS = 0, iu = 0;
for (int j = 0; j < S.length + u.length; j++)
if (iS < S.length && (iu >= u.length || S[iS] <= u[iu]))
newS[j] = S[iS++]; // Increment iS after using it as an index
else
newS[j] = u[iu++]; // Increment iu after using it as an index
}
This can also be done in-place (in S, assuming it has enough additional space) by going from the back.
Here's some working Java code that does this:
static void mergeInPlace(int S[], int SLength, int u[])
{
int iS = SLength-1, iu = u.length-1;
for (int j = SLength + u.length - 1; j >= 0; j--)
if (iS >= 0 && (iu < 0 || S[iS] >= u[iu]))
S[j] = S[iS--];
else
S[j] = u[iu--];
}
public static void main(String[] args)
{
int[] S = {1,5,9,13,22, 0,0,0,0}; // 4 additional spots reserved here
int[] u = {0,10,11,15};
mergeInPlace(S, 5, u);
// prints [0, 1, 5, 9, 10, 11, 13, 15, 22]
System.out.println(Arrays.toString(S));
}
To reduce the number of comparisons, we can also use binary search (although the time complexity would remain the same - this can be useful when comparisons are expensive).
// returns the first element in S before SLength greater than value,
// or returns SLength if no such element exists
static int binarySearch(int S[], int SLength, int value) { ... }
static void mergeInPlaceBinarySearch(int S[], int SLength, int u[])
{
int iS = SLength-1;
int iNew = SLength + u.length - 1;
for (int iu = u.length-1; iu >= 0; iu--)
{
if (iS >= 0)
{
int index = binarySearch(S, iS+1, u[iu]);
for ( ; iS >= index; iS--)
S[iNew--] = S[iS];
}
S[iNew--] = u[iu];
}
// assert (iS != iNew)
for ( ; iS >= 0; iS--)
S[iNew--] = S[iS];
}
If S doesn't have to be an array
The above assumes that S has to be an array. If it doesn't, something like a binary search tree might be better, depending on how large u and S are.
The running time would be O(|u| log |S|) - just substitute some values to see which is better.
If you really really have to use a literal array for S at all times, then the best approach would be to individually insert the new elements into the already sorted S. I.e. basically use the classic insertion sort technique for each element in each new batch. This will be expensive in a sense that insertion into an array is expensive (you have to move the elements), but that's the price of having to use an array for S.
So if the size of S is much more than the size of u, isn't what you want simply an efficient sort for a mostly sorted array? Traditionally this would be insertion sort. But you will only know the real answer by experimentation and measurement - try different algorithms and pick the best one. Without actually running your code (and perhaps more importantly, with your data), you cannot reliably predict performance, even with something as well studied as sorting algorithms.
Say we have a big sorted list of size n and a little sorted list of size k.
Binary search, starting from the end (position n-1, n-2, n-4, &c) for the insertion point for the largest element of the smaller list. Shift the tail end of the larger list k elements to the right, insert the largest element of the smaller list, then repeat.
So if we have the lists [1,2,4,5,6,8,9] and [3,7], we will do:
[1,2,4,5,6, , ,8,9]
[1,2,4,5,6, ,7,8,9]
[1,2, ,4,5,6,7,8,9]
[1,2,3,4,5,6,7,8,9]
But I would advise you to benchmark just concatenating the lists and sorting the whole thing before resorting to interesting merge procedures.

Calculate all possibilities to get N using values from a given set [duplicate]

This question already has answers here:
Algorithm to find elements best fitting in a particular amount
(5 answers)
how do you calculate the minimum-coin change for transaction?
(3 answers)
Closed 9 years ago.
So here is the problem:
Given input = [100 80 66 25 4 2 1], I need to find the best combination to give me 50.
Looking at this, the best would be 25+25 = 50, so I need 2 elements from the array.
Other combinations include 25+4+4+4+4+4+4+1 and 25+4+4+4+4+4+2+2+1.. etc etc
I need to find all the possibilities which gives me the sum on a value I want.
EDIT: As well as the best possibility (one with least number of terms)
Here is what I have done thus far:
First build a new array (simple for loop which cycles through all elements and stores in a new temp array), check for all elements higher than my array (so for input 50, the elements 100,80,66 are higher, so discard them and then my new array is [25 4 2 1]). Then, from this, I need to check combinations.
The first thing I do is a simple if statement checking if any array elements EXACTLY match the number I want. So if I want 50, I check if 50 is in the array, if not, I need to find combinations.
My problem is, I'm not entirely sure how to find every single combination. I have been struggling trying to come up with an algorithm for a while but I always just end up getting stumped.
Any help/tips would be much appreciated.
PS - we can assume the array is always sorted in order from LARGEST to SMALLEST value.
This is the kind of problem that dynamic programming is meant to solve.
Create an array with with indices, 1 to 50. Set each entry to -1. For each element that is in your input array, set that element in the array to 0. Then, for each integer n = 2 to 50, find all possible ways to sum to n. The number of sums required is the minimum of the two addends plus 1. At the end, get the element at index 50.
Edit: Due to a misinterpretation of the question, I first answered with an efficient way to calculate the number of possibilities (instead of the possibilities themself) to get N using values from a given set. That solution can be found at the bottom of this post as a reference for other people, but first I'll give a proper answer to your questions.
Generate all possibilities, count them and give the shortest one
When generating a solution, you consider each element from the input array and ask yourself "should I use this in my solution or not?". Since we don't know the answer until after the calculation, we'll just have to try out both using it and not using it, as can be seen in the recursion step in the code below.
Now, to avoid duplicates and misses, we need to be a bit careful with the parameters for the recursive call. If we use the current element, we should also allow it to be used in the next step, because the element may be used as many times as possible. Therefore, the first parameter in this recursive call is i. However, if we decide to not use the element, we should not allow it to be used in the next step, because that would be a duplicate of the current step. Therefore, the first parameter in this recursive call is i+1.
I added an optional bound (from "branch and bound") to the algorithm, that will stop expanding the current partial solution if it is known that this solution will never be shorter then the shortest solution found so far.
package otherproblems;
import java.util.Deque;
import java.util.LinkedList;
public class GeneratePossibilities
{
// Input
private static int n = 50;
// If the input array is sorted ascending, the shortest solution is
// likely to be found somewhere at the end.
// If the input array is sorted descending, the shortest solution is
// likely to be found somewhere in the beginning.
private static int[] input = {100, 80, 66, 25, 4, 2, 1};
// Shortest possibility
private static Deque<Integer> shortest;
// Number of possibilities
private static int numberOfPossibilities;
public static void main(String[] args)
{
calculate(0, n, new LinkedList<Integer>());
System.out.println("\nAbove you can see all " + numberOfPossibilities +
" possible solutions,\nbut this one's the shortest: " + shortest);
}
public static void calculate(int i, int left, Deque<Integer> partialSolution)
{
// If there's nothing left, we reached our target
if (left == 0)
{
System.out.println(partialSolution);
if (shortest == null || partialSolution.size() < shortest.size())
shortest = new LinkedList<Integer>(partialSolution);
numberOfPossibilities++;
return;
}
// If we overshot our target, by definition we didn't reach it
// Note that this could also be checked before making the
// recursive call, but IMHO this gives a cleaner recursion step.
if (left < 0)
return;
// If there are no values remaining, we didn't reach our target
if (i == input.length)
return;
// Uncomment the next two lines if you don't want to keep generating
// possibilities when you know it can never be a better solution then
// the one you have now.
// if (shortest != null && partialSolution.size() >= shortest.size())
// return;
// Pick value i. Note that we are allowed to pick it again,
// so the argument to calculate(...) is i, not i+1.
partialSolution.addLast(input[i]);
calculate(i, left-input[i], partialSolution);
// Don't pick value i. Note that we are not allowed to pick it after
// all, so the argument to calculate(...) is i+1, not i.
partialSolution.removeLast();
calculate(i+1, left, partialSolution);
}
}
Calculate the number of possibilities efficiently
This is a nice example of dynamic programming. What you need to do is figure out how many possibilities there are to form the number x, using value y as the last addition and using only values smaller than or equal to y. This gives you a recursive formula that you can easily translate to a solution using dynamic programming. I'm not quite sure how to write down the mathematics here, but since you weren't interested in them anyway, here's the code to solve your question :)
import java.util.Arrays;
public class Possibilities
{
public static void main(String[] args)
{
// Input
int[] input = {100, 80, 66, 25, 4, 2, 1};
int n = 50;
// Prepare input
Arrays.sort(input);
// Allocate storage space
long[][] m = new long[n+1][input.length];
for (int i = 1; i <= n; i++)
for (int j = 0; j < input.length; j++)
{
// input[j] cannot be the last value used to compose i
if (i < input[j])
m[i][j] = 0;
// If input[j] is the last value used to compose i,
// it must be the only value used in the composition.
else if (i == input[j])
m[i][j] = 1;
// If input[j] is the last value used to compose i,
// we need to know the number of possibilities in which
// i - input[j] can be composed, which is the sum of all
// entries in column m[i-input[j]].
// However, to avoid counting duplicates, we only take
// combinations that are composed of values equal or smaller
// to input[j].
else
for (int k = 0; k <= j; k++)
m[i][j] += m[i-input[j]][k];
}
// Nice output of intermediate values:
int digits = 3;
System.out.printf(" %"+digits+"s", "");
for (int i = 1; i <= n; i++)
System.out.printf(" %"+digits+"d", i);
System.out.println();
for (int j = 0; j < input.length; j++)
{
System.out.printf(" %"+digits+"d", input[j]);
for (int i = 1; i <= n; i++)
System.out.printf(" %"+digits+"d", m[i][j]);
System.out.println();
}
// Answer:
long answer = 0;
for (int i = 0; i < input.length; i++)
answer += m[n][i];
System.out.println("\nThe number of possibilities to form "+n+
" using the numbers "+Arrays.toString(input)+" is "+answer);
}
}
This is the integer knapsack problem, which is one your most common NP-complete problems out there; if you are into algorithm design/study check those out. To find the best I think you have no choice but to compute them all and keep the smallest one.
For the correct solution there is a recursive algorithm that is pretty simple to put together.
import org.apache.commons.lang.ArrayUtils;
import java.util.*;
public class Stuff {
private final int target;
private final int[] steps;
public Stuff(int N, int[] steps) {
this.target = N;
this.steps = Arrays.copyOf(steps, steps.length);
Arrays.sort(this.steps);
ArrayUtils.reverse(this.steps);
this.memoize = new HashMap<Integer, List<Integer>>(N);
}
public List<Integer> solve() {
return solveForN(target);
}
private List<Integer> solveForN(int N) {
if (N == 0) {
return new ArrayList<Integer>();
} else if (N > 0) {
List<Integer> temp, min = null;
for (int i = 0; i < steps.length; i++) {
temp = solveForN(N - steps[i]);
if (temp != null) {
temp.add(steps[i]);
if (min == null || min.size() > temp.size()) {
min = temp;
}
}
}
return min;
} else {
return null;
}
}
}
It is based off the fact that to "get to N" you to have come from N - steps[0], or N - steps1, ...
Thus you start from your target total N and subtract one of the possible steps, and do it again until you are at 0 (return a List to specify that this is a valid path) or below (return null so that you cannot return an invalid path).
The complexity of this correct solution is exponential! Which is REALLY bad! Something like O(k^M) where M is the size of the steps array and k a constant.
To get a solution to this problem in less time than that you will have to use a heuristic (approximation) and you will always have a certain probability to have the wrong answer.
You can make your own implementation faster by memorizing the shortest combination seen so far for all targets (so you do not need to recompute recur(N, _, steps) if you already did). This approach is called Dynamic Programming. I will let you do that on your own (very fun stuff and really not that complicated).
Constraints of this solution : You will only find the solution if you guarantee that the input array (steps) is sorted in descending order and that you go through it in that order.
Here is a link to the general Knapsack problem if you also want to look approximation solutions: http://en.wikipedia.org/wiki/Knapsack_problem
You need to solve each sub-problem and store the solution. For example:
1 can only be 1. 2 can be 2 or 1+1. 4 can be 4 or 2+2 or 2+1+1 or 1+1+1+1. So you take each sub-solution and store it, so when you see 25=4+4+4+4+4+4+1, you already know that each 4 can also be represented as one of the 3 combinations.
Then you have to sort the digits and check to avoid duplicate patterns since, for example, (2+2)+(2+2)+(2+2)+(1+1+1+1)+(1+1+1+1)+(1+1+1+1) == (2+1+1)+(2+1+1)+(2+1+1)+(2+1+1)+(2+1+1)+(2+1+1). Six 2's and twelve 1's in both cases.
Does that make sense?
Recursion should be the easiest way to solve this (Assuming you really want to find all the solutions to the problem). The nice thing about this approach is, if you want to just find the shortest solution, you can add a check on the recursion and find just that, saving time and space :)
Assuming an element i of your array is part of the solution, you can solve the subproblem of finding the elements that sums to n-i. If we add an ordering to our solution, for example the numbers in the sum must be from the greater to the smallest, we have a way to find unique solutions.
This is a recursive solution in C#, it should be easy to translate it in java.
public static void RecursiveSum(int n, int index, List<int> lst, List<int> solution)
{
for (int i = index; i < lst.Count; i++)
{
if (n == 0)
{
Console.WriteLine("");
foreach (int j in solution)
{
Console.Write(j + " ");
}
}
if (n - lst[i] >= 0)
{
List<int> tmp = new List<int>(solution);
tmp.Add(lst[i]);
RecursiveSum(n - lst[i], i, lst, tmp);
}
}
}
You call it with
RecursiveSum(N,0,list,new List<int>());
where N is the sum you are looking for, 0 shouldn't be changed, list is your list of allowed numbers, and the last parameter shouldn't be changed either.
The problem you pose is interesting but very complex. I'd approach this by using something like OptaPlanner(formerly Drools Planner). It's difficult to describe a full solution to this problem without spending significant time, but with optaplanner you can also get "closest fit" type answers and can have incremental "moves" that would make solving your problem more efficient. Good luck.
This is a solution in python: Ideone link
# Start of tsum function
def tsum(currentSum,total,input,record,n):
if total == N :
for i in range(0,n):
if record[i]:
print input[i]
i = i+1
for i in range(i,n):
if record[i]:
print input[i]
print ""
return
i=currentSum
for i in range(i,n):
if total+input[i]>sum :
continue
if i>0 and input[i]==input[i-1] and not record[i-1] :
continue
record[i]=1
tsum(i+1,total+input[i],input,record,l)
record[i]=0
# end of function
# Below portion will be main() in Java
record = []
N = 5
input = [3, 2, 2, 1, 1]
temp = list(set(input))
newlist = input
for i in range(0, len(list(set(input)))):
val = N/temp[i]
for j in range(0, val-input.count(temp[i])):
newlist.append(temp[i])
# above logic was to create a newlist/input i.e [3, 2, 2, 1, 1, 1, 1, 1]
# This new list contains the maximum number of elements <= N
# for e.g appended three 1's as sum of new three 1's + existing two 1's <= N(5) where as
# did not append another 2 as 2+2+2 > N(5) or 3 as 3+3 > N(5)
l = len(input)
for i in range(0,l):
record.append(0)
print "all possibilities to get N using values from a given set:"
tsum(0,0,input,record,l)
OUTPUT: for set [3, 2, 2, 1, 1] taking small set and small N for demo purpose. But works well for higher N value as well.
For N = 5
all possibilities to get N using values from a given set:
3
2
3
1
1
2
2
1
2
1
1
1
1
1
1
1
1
For N = 3
all possibilities to get N using values from a given set:
3
2
1
1
1
1
Isn't this just a search problem? If so, just search breadth-first.
abstract class Numbers {
abstract int total();
public static Numbers breadthFirst(int[] numbers, int total) {
List<Numbers> stack = new LinkedList<Numbers>();
if (total == 0) { return new Empty(); }
stack.add(new Empty());
while (!stack.isEmpty()) {
Numbers nums = stack.remove(0);
for (int i : numbers) {
if (i > 0 && total - nums.total() >= i) {
Numbers more = new SomeNumbers(i, nums);
if (more.total() == total) { return more; }
stack.add(more);
}
}
}
return null; // No answer.
}
}
class Empty extends Numbers {
int total() { return 0; }
public String toString() { return "empty"; }
}
class SomeNumbers extends Numbers {
final int total;
final Numbers prev;
SomeNumbers(int n, Numbers prev) {
this.total = n + prev.total();
this.prev = prev;
}
int total() { return total; }
public String toString() {
if (prev.getClass() == Empty.class) { return "" + total; }
return prev + "," + (total - prev.total());
}
}
What about using the greedy algorithm n times (n is the number of elements in your array), each time popping the largest element off the list. E.g. (in some random pseudo-code language):
array = [70 30 25 4 2 1]
value = 50
sort(array, descending)
solutions = [] // array of arrays
while length of array is non-zero:
tmpValue = value
thisSolution = []
for each i in array:
while tmpValue >= i:
tmpValue -= i
thisSolution.append(i)
solutions.append(thisSolution)
array.pop_first() // remove the largest entry from the array
If run with the set [70 30 25 4 2 1] and 50, it should give you a solutions array like this:
[[30 4 4 4 4 4]
[30 4 4 4 4 4]
[25 25]
[4 4 4 4 4 4 4 4 4 4 4 4 2]
[2 ... ]
[1 ... ]]
Then simply pick the element from the solutions array with the smallest length.
Update: The comment is correct that this does not generate the correct answer in all cases. The reason is that greedy isn't always right. The following recursive algorithm should always work:
array = [70, 30, 25, 4, 3, 1]
def findSmallest(value, array):
minSolution = []
tmpArray = list(array)
while len(tmpArray):
elem = tmpArray.pop(0)
tmpValue = value
cnt = 0
while tmpValue >= elem:
cnt += 1
tmpValue -= elem
subSolution = findSmallest(tmpValue, tmpArray)
if tmpValue == 0 or subSolution:
if not minSolution or len(subSolution) + cnt < len(minSolution):
minSolution = subSolution + [elem] * cnt
return minSolution
print findSmallest(10, array)
print findSmallest(50, array)
print findSmallest(49, array)
print findSmallest(55, array)
Prints:
[3, 3, 4]
[25, 25]
[3, 4, 4, 4, 4, 30]
[30, 25]
The invariant is that the function returns either the smallest set for the value passed in, or an empty set. It can then be used recursively with all possible values of the previous numbers in the list. Note that this is O(n!) in complexity, so it's going to be slow for large values. Also note that there are numerous optimization potentials here.
I made a small program to help with one solution. Personally, I believe the best would be a deterministic mathematical solution, but right now I lack the caffeine to even think on how to implement it. =)
Instead, I went with a SAR approach. Stop and Reverse is a technique used on stock trading (http://daytrading.about.com/od/stou/g/SAR.htm), and is heavily used to calculate optimal curves with a minimal of inference. The Wikipedia entry for parabolical SAR goes like this:
'The Parabolic SAR is calculated almost independently for each trend
in the price. When the price is in an uptrend, the SAR emerges below
the price and converges upwards towards it. Similarly, on a
downtrend, the SAR emerges above the price and converges
downwards.'
I adapted it to your problem. I start with a random value from your series. Then the code enters a finite number of iterations.
I pick another random value from the series stack.
If the new value plus the stack sum is inferior to the target, then the value is added; if superior, then decreased.
I can go on for as much as I want until I satisfy the condition (stack sum = target), or abort if the cycle can't find a valid solution.
If successful, I record the stack and the number of iterations. Then I redo everything.
An EXTREMELY crude code follows. Please forgive the hastiness. Oh, and It's in C#. =)
Again, It does not guarantee that you'll obtain the optimal path; it's a brute force approach. It can be refined; detect if there's a perfect match for a target hit, for example.
public static class SAR
{
//I'm considering Optimal as the smallest signature (number of members).
// Once set, all future signatures must be same or smaller.
private static Random _seed = new Random();
private static List<int> _domain = new List<int>() { 100, 80, 66, 24, 4, 2, 1 };
public static void SetDomain(string domain)
{
_domain = domain.Split(',').ToList<string>().ConvertAll<int>(a => Convert.ToInt32(a));
_domain.Sort();
}
public static void FindOptimalSAR(int value)
{
// I'll skip some obvious tests. For example:
// If there is no odd number in domain, then
// it's impossible to find a path to an odd
// value.
//Determining a max path run. If the count goes
// over this, it's useless to continue.
int _maxCycle = 10;
//Determining a maximum number of runs.
int _maxRun = 1000000;
int _run = 0;
int _domainCount = _domain.Count;
List<int> _currentOptimalSig = new List<int>();
List<String> _currentOptimalOps = new List<string>();
do
{
List<int> currSig = new List<int>();
List<string> currOps = new List<string>();
int _cycle = 0;
int _cycleTot = 0;
bool _OptimalFound = false;
do
{
int _cursor = _seed.Next(_domainCount);
currSig.Add(_cursor);
if (_cycleTot < value)
{
currOps.Add("+");
_cycleTot += _domain[_cursor];
}
else
{
// Your situation doesn't allow for negative
// numbers. Otherwise, just enable the two following lines.
// currOps.Add("-");
// _cycleTot -= _domain[_cursor];
}
if (_cycleTot == value)
{
_OptimalFound = true;
break;
}
_cycle++;
} while (_cycle < _maxCycle);
if (_OptimalFound)
{
_maxCycle = _cycle;
_currentOptimalOps = currOps;
_currentOptimalSig = currSig;
Console.Write("Optimal found: ");
for (int i = 0; i < currSig.Count; i++)
{
Console.Write(currOps[i]);
Console.Write(_domain[currSig[i]]);
}
Console.WriteLine(".");
}
_run++;
} while (_run < _maxRun);
}
}
And this is the caller:
String _Domain = "100, 80, 66, 25, 4, 2, 1";
SAR.SetDomain(_Domain);
Console.WriteLine("SAR for Domain {" + _Domain + "}");
do
{
Console.Write("Input target value: ");
int _parm = (Convert.ToInt32(Console.ReadLine()));
SAR.FindOptimalSAR(_parm);
Console.WriteLine("Done.");
} while (true);
This is my result after 100k iterations for a few targets, given a slightly modified series (I switched 25 for 24 for testing purposes):
SAR for Domain {100, 80, 66, 24, 4, 2, 1}
Input target value: 50
Optimal found: +24+24+2.
Done.
Input target value: 29
Optimal found: +4+1+24.
Done.
Input target value: 75
Optimal found: +2+2+1+66+4.
Optimal found: +4+66+4+1.
Done.
Now with your original series:
SAR for Domain {100, 80, 66, 25, 4, 2, 1}
Input target value: 50
Optimal found: +25+25.
Done.
Input target value: 75
Optimal found: +25+25+25.
Done.
Input target value: 512
Optimal found: +80+80+66+100+1+80+25+80.
Optimal found: +66+100+80+100+100+66.
Done.
Input target value: 1024
Optimal found: +100+1+80+80+100+2+100+2+2+2+25+2+100+66+25+66+100+80+25+66.
Optimal found: +4+25+100+80+100+1+80+1+100+4+2+1+100+1+100+100+100+25+100.
Optimal found: +80+80+25+1+100+66+80+80+80+100+25+66+66+4+100+4+1+66.
Optimal found: +1+100+100+100+2+66+25+100+66+100+80+4+100+80+100.
Optimal found: +66+100+100+100+100+100+100+100+66+66+25+1+100.
Optimal found: +100+66+80+66+100+66+80+66+100+100+100+100.
Done.
Cons: It is worth mentioning again: This algorithm does not guarantee that you will find the optimal values. It makes a brute-force approximation.
Pros: Fast. 100k iterations may initially seem a lot, but the algorithm starts ignoring long paths after it detects more and more optimized paths, since it lessens the maximum allowed number of cycles.

Weighted random number generation

I would like to generate weighted random numbers in an exact manner. I can explain exact with an example: My input array is [1, 2, 3] and their weights are again [1, 2, 3]. In that case I expect to see 1 for 1 times, 2 for 2 times and 3 for 3. Like 3 -> 2 -> 3 -> 1 -> 3 -> 2...
I am implementing random number generation with rand() to get a range between [0, sum_of_weights). sum_of_weights = 1 + 2 + 3 = 6 for the example above. I searched for existing solutions on the Internet, however the result is not what I want. Sometimes I got 2 more than 2 times and no 1 in the sequence. Its still weighted but not exactly give the number of times I waited for.
I am not sure whats wrong with my code below. Should I do something wrong or I try totally different? Thanks for your answers.
int random_t (int items[], int items_weight[], int number_of_items)
{
double random_weight;
double sum_of_weight = 0;
int i;
/* Calculate the sum of weights */
for (i = 0; i < number_of_items; i++) {
sum_of_weight += items_weight[i];
}
/* Choose a random number in the range [0,1) */
srand(time(NULL));
double g = rand() / ( (double) RAND_MAX + 1.0 );
random_weight = g * sum_of_weight;
/* Find a random number wrt its weight */
int temp_total = 0;
for (i = 0; i < number_of_items; i++)
{
temp_total += items_weight[i];
if (random_weight < temp_total)
{
return items[i];
}
}
return -1; /* Oops, we could not find a random number */
}
I also tried something different (the code is below). It worked for my case, but integer overflow and extensive use of static variables makes it problematic.
If you enter an input array before give NULL and continue to work with it. A little bit similar to strtok() usage.
int random_w(int *arr, int weights[], int size)
{
int selected, i;
int totalWeight;
double ratio;
static long int total;
static long int *eachTotal = NULL;
static int *local_arr = NULL;
static double *weight = NULL;
if (arr != NULL)
{
free(eachTotal);
free(weight);
eachTotal = (long int*) calloc(size, sizeof(long));
weight = (double*) calloc(size, sizeof(double));
total = 0;
totalWeight = 0;
local_arr = arr;
for (i = 0; i < size; i++)
{
totalWeight += weights[i];
}
for (i = 0; i < size; i++)
{
weight[i] = (double)weights[i] / totalWeight;
}
srand(time(NULL));
}
while (1)
{
selected = rand() % size;
ratio = (double)(eachTotal[selected])/(double)(total+1);
if (ratio < weight[selected])
{
total++;
eachTotal[selected]++;
return local_arr[selected];
}
}
}
Is this what you want?
# Weights: one 1, two 2s, three 3s
>>> import random
>>> vals = [1] * 1 + [2] * 2 + [3] * 3
>>> random.shuffle(vals)
>>> vals
[2, 3, 1, 2, 3, 3]
Edit: Whoops, for some reason my mind replaced the C tag with the Python one. Regardless, I think what you want is not "weighted" random number generators, but a shuffle. This ought to help.
When you say you didn't get "exactly" the number of values you expected for each weighted value, how many runs are you talking? If you only did six runs of any random process, I wouldn't expect you to be able to definitively say anything was working or not. Your code may work fine. Try running it a million times and check the results then. Or maybe you actually want what Nathon is talking about, a preweighted list of values, which you can then randomly shuffle and still have the exact weights you're looking for.
You can sample from a multinomial distribution. Your universe of random samples (or "urn of balls in a bucket") is {1, 2, 3} and the probabilities ("weights") of observing each is, respectively, {1/6, 2/6, 3/6}.
For demonstration purposes, a Perl script can give you a list of observations of labeled balls with these probabilities:
#!/usr/bin/perl
use strict;
use warnings;
use Math::Random qw(random_multinomial);
use Data::Dumper;
my $events = 10;
my #probabilities = qw(0.167 0.333 0.5);
my #observations = random_multinomial($events, #probabilities);
print Dumper \#observations;
For 10 events, a single trial will return something like:
$VAR1 = 1;
$VAR2 = 2;
$VAR3 = 7;
This means you have (from this single trial) one 1-labeled event, two 2-labeled events, and seven 3-labeled events.
If you repeat the trial, you may get a different distribution of 1, 2 and 3-labeled events.
You can trivially build a list from this to the equivalent {1, 2, 2, 3, 3, 3, 3, 3, 3, 3} list.
Just randomly shuffle this second list to get your weighted, observed list of random numbers.
If you want to have the sample frequencies be completely deterministic, I think
the way to go is generate an array that has the proper number of occurrences
for each value, then do a random shuffle (which preserves the frequencies)
and take successive elements of the shuffled array as your random sequence.
ok, my answer will sound like a hack - but short or writing your own distribution - maybe you can map an uniform distribution and leverage boost (check out http://www.boost.org/doc/libs/1_44_0/doc/html/boost_random/reference.html#boost_random.reference.distributions)
so following your example:
1 -> 1
2,3 ->2
4,5,6 ->3
7,8,9,10 ->4 (etc...)
then generate random number between 1 and 10 and return the mapped element.
and then use boost's uniform_int distribution to get a number which you then map.
here is an example of generating the numbers; you would then need to map the results:
#include <iostream>
#include <boost/random.hpp>
#include <time.h>
using namespace std;
using namespace boost;
int main ( ) {
uniform_int<> distribution(0, 10) ;
mt19937 engine;
engine.seed(time(NULL));
variate_generator<mt19937, uniform_int<> > myrandom (engine, distribution);
cout << myrandom() << endl;
}

Algorithm to determine if array contains n...n+m?

I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview questions:
Write a method that takes an int array of size m, and returns (True/False) if the array consists of the numbers n...n+m-1, all numbers in that range and only numbers in that range. The array is not guaranteed to be sorted. (For instance, {2,3,4} would return true. {1,3,1} would return false, {1,2,4} would return false.
The problem I had with this one is that my interviewer kept asking me to optimize (faster O(n), less memory, etc), to the point where he claimed you could do it in one pass of the array using a constant amount of memory. Never figured that one out.
Along with your solutions please indicate if they assume that the array contains unique items. Also indicate if your solution assumes the sequence starts at 1. (I've modified the question slightly to allow cases where it goes 2, 3, 4...)
edit: I am now of the opinion that there does not exist a linear in time and constant in space algorithm that handles duplicates. Can anyone verify this?
The duplicate problem boils down to testing to see if the array contains duplicates in O(n) time, O(1) space. If this can be done you can simply test first and if there are no duplicates run the algorithms posted. So can you test for dupes in O(n) time O(1) space?
Under the assumption numbers less than one are not allowed and there are no duplicates, there is a simple summation identity for this - the sum of numbers from 1 to m in increments of 1 is (m * (m + 1)) / 2. You can then sum the array and use this identity.
You can find out if there is a dupe under the above guarantees, plus the guarantee no number is above m or less than n (which can be checked in O(N))
The idea in pseudo-code:
0) Start at N = 0
1) Take the N-th element in the list.
2) If it is not in the right place if the list had been sorted, check where it should be.
3) If the place where it should be already has the same number, you have a dupe - RETURN TRUE
4) Otherwise, swap the numbers (to put the first number in the right place).
5) With the number you just swapped with, is it in the right place?
6) If no, go back to step two.
7) Otherwise, start at step one with N = N + 1. If this would be past the end of the list, you have no dupes.
And, yes, that runs in O(N) although it may look like O(N ^ 2)
Note to everyone (stuff collected from comments)
This solution works under the assumption you can modify the array, then uses in-place Radix sort (which achieves O(N) speed).
Other mathy-solutions have been put forth, but I'm not sure any of them have been proved. There are a bunch of sums that might be useful, but most of them run into a blowup in the number of bits required to represent the sum, which will violate the constant extra space guarantee. I also don't know if any of them are capable of producing a distinct number for a given set of numbers. I think a sum of squares might work, which has a known formula to compute it (see Wolfram's)
New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):
So, it has been mentioned to maybe use sum + sum of squares. No one knew if this worked or not, and I realized that it only becomes an issue when (x + y) = (n + m), such as the fact 2 + 2 = 1 + 3. Squares also have this issue thanks to Pythagorean triples (so 3^2 + 4^2 + 25^2 == 5^2 + 7^2 + 24^2, and the sum of squares doesn't work). If we use Fermat's last theorem, we know this can't happen for n^3. But we also don't know if there is no x + y + z = n for this (unless we do and I don't know it). So no guarantee this, too, doesn't break - and if we continue down this path we quickly run out of bits.
In my glee, however, I forgot to note that you can break the sum of squares, but in doing so you create a normal sum that isn't valid. I don't think you can do both, but, as has been noted, we don't have a proof either way.
I must say, finding counterexamples is sometimes a lot easier than proving things! Consider the following sequences, all of which have a sum of 28 and a sum of squares of 140:
[1, 2, 3, 4, 5, 6, 7]
[1, 1, 4, 5, 5, 6, 6]
[2, 2, 3, 3, 4, 7, 7]
I could not find any such examples of length 6 or less. If you want an example that has the proper min and max values too, try this one of length 8:
[1, 3, 3, 4, 4, 5, 8, 8]
Simpler approach (modifying hazzen's idea):
An integer array of length m contains all the numbers from n to n+m-1 exactly once iff
every array element is between n and n+m-1
there are no duplicates
(Reason: there are only m values in the given integer range, so if the array contains m unique values in this range, it must contain every one of them once)
If you are allowed to modify the array, you can check both in one pass through the list with a modified version of hazzen's algorithm idea (there is no need to do any summation):
For all array indexes i from 0 to m-1 do
If array[i] < n or array[i] >= n+m => RETURN FALSE ("value out of range found")
Calculate j = array[i] - n (this is the 0-based position of array[i] in a sorted array with values from n to n+m-1)
While j is not equal to i
If list[i] is equal to list[j] => RETURN FALSE ("duplicate found")
Swap list[i] with list[j]
Recalculate j = array[i] - n
RETURN TRUE
I'm not sure if the modification of the original array counts against the maximum allowed additional space of O(1), but if it doesn't this should be the solution the original poster wanted.
By working with a[i] % a.length instead of a[i] you reduce the problem to needing to determine that you've got the numbers 0 to a.length - 1.
We take this observation for granted and try to check if the array contains [0,m).
Find the first node that's not in its correct position, e.g.
0 1 2 3 7 5 6 8 4 ; the original dataset (after the renaming we discussed)
^
`---this is position 4 and the 7 shouldn't be here
Swap that number into where it should be. i.e. swap the 7 with the 8:
0 1 2 3 8 5 6 7 4 ;
| `--------- 7 is in the right place.
`--------------- this is now the 'current' position
Now we repeat this. Looking again at our current position we ask:
"is this the correct number for here?"
If not, we swap it into its correct place.
If it is in the right place, we move right and do this again.
Following this rule again, we get:
0 1 2 3 4 5 6 7 8 ; 4 and 8 were just swapped
This will gradually build up the list correctly from left to right, and each number will be moved at most once, and hence this is O(n).
If there are dupes, we'll notice it as soon is there is an attempt to swap a number backwards in the list.
Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.
Simpler method:
Step 1, figure out if there are any duplicates. I'm not sure if this is possible in O(1) space. Anyway, return false if there are duplicates.
Step 2, iterate through the list, keep track of the lowest and highest items.
Step 3, Does (highest - lowest) equal m ? If so, return true.
Any one-pass algorithm requires Omega(n) bits of storage.
Suppose to the contrary that there exists a one-pass algorithm that uses o(n) bits. Because it makes only one pass, it must summarize the first n/2 values in o(n) space. Since there are C(n,n/2) = 2^Theta(n) possible sets of n/2 values drawn from S = {1,...,n}, there exist two distinct sets A and B of n/2 values such that the state of memory is the same after both. If A' = S \ A is the "correct" set of values to complement A, then the algorithm cannot possibly answer correctly for the inputs
A A' - yes
B A' - no
since it cannot distinguish the first case from the second.
Q.E.D.
Vote me down if I'm wrong, but I think we can determine if there are duplicates or not using variance. Because we know the mean beforehand (n + (m-1)/2 or something like that) we can just sum up the numbers and square of difference to mean to see if the sum matches the equation (mn + m(m-1)/2) and the variance is (0 + 1 + 4 + ... + (m-1)^2)/m. If the variance doesn't match, it's likely we have a duplicate.
EDIT: variance is supposed to be (0 + 1 + 4 + ... + [(m-1)/2]^2)*2/m, because half of the elements are less than the mean and the other half is greater than the mean.
If there is a duplicate, a term on the above equation will differ from the correct sequence, even if another duplicate completely cancels out the change in mean. So the function returns true only if both sum and variance matches the desrired values, which we can compute beforehand.
Here's a working solution in O(n)
This is using the pseudocode suggested by Hazzen plus some of my own ideas. It works for negative numbers as well and doesn't require any sum-of-the-squares stuff.
function testArray($nums, $n, $m) {
// check the sum. PHP offers this array_sum() method, but it's
// trivial to write your own. O(n) here.
if (array_sum($nums) != ($m * ($m + 2 * $n - 1) / 2)) {
return false; // checksum failed.
}
for ($i = 0; $i < $m; ++$i) {
// check if the number is in the proper range
if ($nums[$i] < $n || $nums[$i] >= $n + $m) {
return false; // value out of range.
}
while (($shouldBe = $nums[$i] - $n) != $i) {
if ($nums[$shouldBe] == $nums[$i]) {
return false; // duplicate
}
$temp = $nums[$i];
$nums[$i] = $nums[$shouldBe];
$nums[$shouldBe] = $temp;
}
}
return true; // huzzah!
}
var_dump(testArray(array(1, 2, 3, 4, 5), 1, 5)); // true
var_dump(testArray(array(5, 4, 3, 2, 1), 1, 5)); // true
var_dump(testArray(array(6, 4, 3, 2, 0), 1, 5)); // false - out of range
var_dump(testArray(array(5, 5, 3, 2, 1), 1, 5)); // false - checksum fail
var_dump(testArray(array(5, 4, 3, 2, 5), 1, 5)); // false - dupe
var_dump(testArray(array(-2, -1, 0, 1, 2), -2, 5)); // true
Awhile back I heard about a very clever sorting algorithm from someone who worked for the phone company. They had to sort a massive number of phone numbers. After going through a bunch of different sort strategies, they finally hit on a very elegant solution: they just created a bit array and treated the offset into the bit array as the phone number. They then swept through their database with a single pass, changing the bit for each number to 1. After that, they swept through the bit array once, spitting out the phone numbers for entries that had the bit set high.
Along those lines, I believe that you can use the data in the array itself as a meta data structure to look for duplicates. Worst case, you could have a separate array, but I'm pretty sure you can use the input array if you don't mind a bit of swapping.
I'm going to leave out the n parameter for time being, b/c that just confuses things - adding in an index offset is pretty easy to do.
Consider:
for i = 0 to m
if (a[a[i]]==a[i]) return false; // we have a duplicate
while (a[a[i]] > a[i]) swapArrayIndexes(a[i], i)
sum = sum + a[i]
next
if sum = (n+m-1)*m return true else return false
This isn't O(n) - probably closer to O(n Log n) - but it does provide for constant space and may provide a different vector of attack for the problem.
If we want O(n), then using an array of bytes and some bit operations will provide the duplication check with an extra n/32 bytes of memory used (assuming 32 bit ints, of course).
EDIT: The above algorithm could be improved further by adding the sum check to the inside of the loop, and check for:
if sum > (n+m-1)*m return false
that way it will fail fast.
Assuming you know only the length of the array and you are allowed to modify the array it can be done in O(1) space and O(n) time.
The process has two straightforward steps.
1. "modulo sort" the array. [5,3,2,4] => [4,5,2,3] (O(2n))
2. Check that each value's neighbor is one higher than itself (modulo) (O(n))
All told you need at most 3 passes through the array.
The modulo sort is the 'tricky' part, but the objective is simple. Take each value in the array and store it at its own address (modulo length). This requires one pass through the array, looping over each location 'evicting' its value by swapping it to its correct location and moving in the value at its destination. If you ever move in a value which is congruent to the value you just evicted, you have a duplicate and can exit early.
Worst case, it's O(2n).
The check is a single pass through the array examining each value with it's next highest neighbor. Always O(n).
Combined algorithm is O(n)+O(2n) = O(3n) = O(n)
Pseudocode from my solution:
foreach(values[])
while(values[i] not congruent to i)
to-be-evicted = values[i]
evict(values[i]) // swap to its 'proper' location
if(values[i]%length == to-be-evicted%length)
return false; // a 'duplicate' arrived when we evicted that number
end while
end foreach
foreach(values[])
if((values[i]+1)%length != values[i+1]%length)
return false
end foreach
I've included the java code proof of concept below, it's not pretty, but it passes all the unit tests I made for it. I call these a 'StraightArray' because they correspond to the poker hand of a straight (contiguous sequence ignoring suit).
public class StraightArray {
static int evict(int[] a, int i) {
int t = a[i];
a[i] = a[t%a.length];
a[t%a.length] = t;
return t;
}
static boolean isStraight(int[] values) {
for(int i = 0; i < values.length; i++) {
while(values[i]%values.length != i) {
int evicted = evict(values, i);
if(evicted%values.length == values[i]%values.length) {
return false;
}
}
}
for(int i = 0; i < values.length-1; i++) {
int n = (values[i]%values.length)+1;
int m = values[(i+1)]%values.length;
if(n != m) {
return false;
}
}
return true;
}
}
Hazzen's algorithm implementation in C
#include<stdio.h>
#define swapxor(a,i,j) a[i]^=a[j];a[j]^=a[i];a[i]^=a[j];
int check_ntom(int a[], int n, int m) {
int i = 0, j = 0;
for(i = 0; i < m; i++) {
if(a[i] < n || a[i] >= n+m) return 0; //invalid entry
j = a[i] - n;
while(j != i) {
if(a[i]==a[j]) return -1; //bucket already occupied. Dupe.
swapxor(a, i, j); //faster bitwise swap
j = a[i] - n;
if(a[i]>=n+m) return 0; //[NEW] invalid entry
}
}
return 200; //OK
}
int main() {
int n=5, m=5;
int a[] = {6, 5, 7, 9, 8};
int r = check_ntom(a, n, m);
printf("%d", r);
return 0;
}
Edit: change made to the code to eliminate illegal memory access.
boolean determineContinuousArray(int *arr, int len)
{
// Suppose the array is like below:
//int arr[10] = {7,11,14,9,8,100,12,5,13,6};
//int len = sizeof(arr)/sizeof(int);
int n = arr[0];
int *result = new int[len];
for(int i=0; i< len; i++)
result[i] = -1;
for (int i=0; i < len; i++)
{
int cur = arr[i];
int hold ;
if ( arr[i] < n){
n = arr[i];
}
while(true){
if ( cur - n >= len){
cout << "array index out of range: meaning this is not a valid array" << endl;
return false;
}
else if ( result[cur - n] != cur){
hold = result[cur - n];
result[cur - n] = cur;
if (hold == -1) break;
cur = hold;
}else{
cout << "found duplicate number " << cur << endl;
return false;
}
}
}
cout << "this is a valid array" << endl;
for(int j=0 ; j< len; j++)
cout << result[j] << "," ;
cout << endl;
return true;
}
def test(a, n, m):
seen = [False] * m
for x in a:
if x < n or x >= n+m:
return False
if seen[x-n]:
return False
seen[x-n] = True
return False not in seen
print test([2, 3, 1], 1, 3)
print test([1, 3, 1], 1, 3)
print test([1, 2, 4], 1, 3)
Note that this only makes one pass through the first array, not considering the linear search involved in not in. :)
I also could have used a python set, but I opted for the straightforward solution where the performance characteristics of set need not be considered.
Update: Smashery pointed out that I had misparsed "constant amount of memory" and this solution doesn't actually solve the problem.
If you want to know the sum of the numbers [n ... n + m - 1] just use this equation.
var sum = m * (m + 2 * n - 1) / 2;
That works for any number, positive or negative, even if n is a decimal.
Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.
O(1) indicates constant space which does not change by the number of n. It does not matter if it is 1 or 2 variables as long as it is a constant number. Why are you saying it is more than O(1) space? If you are calculating the sum of n numbers by accumulating it in a temporary variable, you would be using exactly 1 variable anyway.
Commenting in an answer because the system does not allow me to write comments yet.
Update (in reply to comments): in this answer i meant O(1) space wherever "space" or "time" was omitted. The quoted text is a part of an earlier answer to which this is a reply to.
Given this -
Write a method that takes an int array of size m ...
I suppose it is fair to conclude there is an upper limit for m, equal to the value of the largest int (2^32 being typical). In other words, even though m is not specified as an int, the fact that the array can't have duplicates implies there can't be more than the number of values you can form out of 32 bits, which in turn implies m is limited to be an int also.
If such a conclusion is acceptable, then I propose to use a fixed space of (2^33 + 2) * 4 bytes = 34,359,738,376 bytes = 34.4GB to handle all possible cases. (Not counting the space required by the input array and its loop).
Of course, for optimization, I would first take m into account, and allocate only the actual amount needed, (2m+2) * 4 bytes.
If this is acceptable for the O(1) space constraint - for the stated problem - then let me proceed to an algorithmic proposal... :)
Assumptions: array of m ints, positive or negative, none greater than what 4 bytes can hold. Duplicates are handled. First value can be any valid int. Restrict m as above.
First, create an int array of length 2m-1, ary, and provide three int variables: left, diff, and right. Notice that makes 2m+2...
Second, take the first value from the input array and copy it to position m-1 in the new array. Initialize the three variables.
set ary[m-1] - nthVal // n=0
set left = diff = right = 0
Third, loop through the remaining values in the input array and do the following for each iteration:
set diff = nthVal - ary[m-1]
if (diff > m-1 + right || diff < 1-m + left) return false // out of bounds
if (ary[m-1+diff] != null) return false // duplicate
set ary[m-1+diff] = nthVal
if (diff>left) left = diff // constrains left bound further right
if (diff<right) right = diff // constrains right bound further left
I decided to put this in code, and it worked.
Here is a working sample using C#:
public class Program
{
static bool puzzle(int[] inAry)
{
var m = inAry.Count();
var outAry = new int?[2 * m - 1];
int diff = 0;
int left = 0;
int right = 0;
outAry[m - 1] = inAry[0];
for (var i = 1; i < m; i += 1)
{
diff = inAry[i] - inAry[0];
if (diff > m - 1 + right || diff < 1 - m + left) return false;
if (outAry[m - 1 + diff] != null) return false;
outAry[m - 1 + diff] = inAry[i];
if (diff > left) left = diff;
if (diff < right) right = diff;
}
return true;
}
static void Main(string[] args)
{
var inAry = new int[3]{ 2, 3, 4 };
Console.WriteLine(puzzle(inAry));
inAry = new int[13] { -3, 5, -1, -2, 9, 8, 2, 3, 0, 6, 4, 7, 1 };
Console.WriteLine(puzzle(inAry));
inAry = new int[3] { 21, 31, 41 };
Console.WriteLine(puzzle(inAry));
Console.ReadLine();
}
}
note: this comment is based on the original text of the question (it has been corrected since)
If the question is posed exactly as written above (and it is not just a typo) and for array of size n the function should return (True/False) if the array consists of the numbers 1...n+1,
... then the answer will always be false because the array with all the numbers 1...n+1 will be of size n+1 and not n. hence the question can be answered in O(1). :)
Counter-example for XOR algorithm.
(can't post it as a comment)
#popopome
For a = {0, 2, 7, 5,} it return true (means that a is a permutation of the range [0, 4) ), but it must return false in this case (a is obviously is not a permutaton of [0, 4) ).
Another counter example: {0, 0, 1, 3, 5, 6, 6} -- all values are in range but there are duplicates.
I could incorrectly implement popopome's idea (or tests), therefore here is the code:
bool isperm_popopome(int m; int a[m], int m, int n)
{
/** O(m) in time (single pass), O(1) in space,
no restrictions on n,
no overflow,
a[] may be readonly
*/
int even_xor = 0;
int odd_xor = 0;
for (int i = 0; i < m; ++i)
{
if (a[i] % 2 == 0) // is even
even_xor ^= a[i];
else
odd_xor ^= a[i];
const int b = i + n;
if (b % 2 == 0) // is even
even_xor ^= b;
else
odd_xor ^= b;
}
return (even_xor == 0) && (odd_xor == 0);
}
A C version of b3's pseudo-code
(to avoid misinterpretation of the pseudo-code)
Counter example: {1, 1, 2, 4, 6, 7, 7}.
int pow_minus_one(int power)
{
return (power % 2 == 0) ? 1 : -1;
}
int ceil_half(int n)
{
return n / 2 + (n % 2);
}
bool isperm_b3_3(int m; int a[m], int m, int n)
{
/**
O(m) in time (single pass), O(1) in space,
doesn't use n
possible overflow in sum
a[] may be readonly
*/
int altsum = 0;
int mina = INT_MAX;
int maxa = INT_MIN;
for (int i = 0; i < m; ++i)
{
const int v = a[i] - n + 1; // [n, n+m-1] -> [1, m] to deal with n=0
if (mina > v)
mina = v;
if (maxa < v)
maxa = v;
altsum += pow_minus_one(v) * v;
}
return ((maxa-mina == m-1)
and ((pow_minus_one(mina + m-1) * ceil_half(mina + m-1)
- pow_minus_one(mina-1) * ceil_half(mina-1)) == altsum));
}
In Python:
def ispermutation(iterable, m, n):
"""Whether iterable and the range [n, n+m) have the same elements.
pre-condition: there are no duplicates in the iterable
"""
for i, elem in enumerate(iterable):
if not n <= elem < n+m:
return False
return i == m-1
print(ispermutation([1, 42], 2, 1) == False)
print(ispermutation(range(10), 10, 0) == True)
print(ispermutation((2, 1, 3), 3, 1) == True)
print(ispermutation((2, 1, 3), 3, 0) == False)
print(ispermutation((2, 1, 3), 4, 1) == False)
print(ispermutation((2, 1, 3), 2, 1) == False)
It is O(m) in time and O(1) in space. It does not take into account duplicates.
Alternate solution:
def ispermutation(iterable, m, n):
"""Same as above.
pre-condition: assert(len(list(iterable)) == m)
"""
return all(n <= elem < n+m for elem in iterable)
MY CURRENT BEST OPTION
def uniqueSet( array )
check_index = 0;
check_value = 0;
min = array[0];
array.each_with_index{ |value,index|
check_index = check_index ^ ( 1 << index );
check_value = check_value ^ ( 1 << value );
min = value if value < min
}
check_index = check_index << min;
return check_index == check_value;
end
O(n) and Space O(1)
I wrote a script to brute force combinations that could fail that and it didn't find any.
If you have an array which contravenes this function do tell. :)
#J.F. Sebastian
Its not a true hashing algorithm. Technically, its a highly efficient packed boolean array of "seen" values.
ci = 0, cv = 0
[5,4,3]{
i = 0
v = 5
1 << 0 == 000001
1 << 5 == 100000
0 ^ 000001 = 000001
0 ^ 100000 = 100000
i = 1
v = 4
1 << 1 == 000010
1 << 4 == 010000
000001 ^ 000010 = 000011
100000 ^ 010000 = 110000
i = 2
v = 3
1 << 2 == 000100
1 << 3 == 001000
000011 ^ 000100 = 000111
110000 ^ 001000 = 111000
}
min = 3
000111 << 3 == 111000
111000 === 111000
The point of this being mostly that in order to "fake" most the problem cases one uses duplicates to do so. In this system, XOR penalises you for using the same value twice and assumes you instead did it 0 times.
The caveats here being of course:
both input array length and maximum array value is limited by the maximum value for $x in ( 1 << $x > 0 )
ultimate effectiveness depends on how your underlying system implements the abilities to:
shift 1 bit n places right.
xor 2 registers. ( where 'registers' may, depending on implementation, span several registers )
edit
Noted, above statements seem confusing. Assuming a perfect machine, where an "integer" is a register with Infinite precision, which can still perform a ^ b in O(1) time.
But failing these assumptions, one has to start asking the algorithmic complexity of simple math.
How complex is 1 == 1 ?, surely that should be O(1) every time right?.
What about 2^32 == 2^32 .
O(1)? 2^33 == 2^33? Now you've got a question of register size and the underlying implementation.
Fortunately XOR and == can be done in parallel, so if one assumes infinite precision and a machine designed to cope with infinite precision, it is safe to assume XOR and == take constant time regardless of their value ( because its infinite width, it will have infinite 0 padding. Obviously this doesn't exist. But also, changing 000000 to 000100 is not increasing memory usage.
Yet on some machines , ( 1 << 32 ) << 1 will consume more memory, but how much is uncertain.
A C version of Kent Fredric's Ruby solution
(to facilitate testing)
Counter-example (for C version): {8, 33, 27, 30, 9, 2, 35, 7, 26, 32, 2, 23, 0, 13, 1, 6, 31, 3, 28, 4, 5, 18, 12, 2, 9, 14, 17, 21, 19, 22, 15, 20, 24, 11, 10, 16, 25}. Here n=0, m=35. This sequence misses 34 and has two 2.
It is an O(m) in time and O(1) in space solution.
Out-of-range values are easily detected in O(n) in time and O(1) in space, therefore tests are concentrated on in-range (means all values are in the valid range [n, n+m)) sequences. Otherwise {1, 34} is a counter example (for C version, sizeof(int)==4, standard binary representation of numbers).
The main difference between C and Ruby version:
<< operator will rotate values in C due to a finite sizeof(int),
but in Ruby numbers will grow to accomodate the result e.g.,
Ruby: 1 << 100 # -> 1267650600228229401496703205376
C: int n = 100; 1 << n // -> 16
In Ruby: check_index ^= 1 << i; is equivalent to check_index.setbit(i). The same effect could be implemented in C++: vector<bool> v(m); v[i] = true;
bool isperm_fredric(int m; int a[m], int m, int n)
{
/**
O(m) in time (single pass), O(1) in space,
no restriction on n,
?overflow?
a[] may be readonly
*/
int check_index = 0;
int check_value = 0;
int min = a[0];
for (int i = 0; i < m; ++i) {
check_index ^= 1 << i;
check_value ^= 1 << (a[i] - n); //
if (a[i] < min)
min = a[i];
}
check_index <<= min - n; // min and n may differ e.g.,
// {1, 1}: min=1, but n may be 0.
return check_index == check_value;
}
Values of the above function were tested against the following code:
bool *seen_isperm_trusted = NULL;
bool isperm_trusted(int m; int a[m], int m, int n)
{
/** O(m) in time, O(m) in space */
for (int i = 0; i < m; ++i) // could be memset(s_i_t, 0, m*sizeof(*s_i_t));
seen_isperm_trusted[i] = false;
for (int i = 0; i < m; ++i) {
if (a[i] < n or a[i] >= n + m)
return false; // out of range
if (seen_isperm_trusted[a[i]-n])
return false; // duplicates
else
seen_isperm_trusted[a[i]-n] = true;
}
return true; // a[] is a permutation of the range: [n, n+m)
}
Input arrays are generated with:
void backtrack(int m; int a[m], int m, int nitems)
{
/** generate all permutations with repetition for the range [0, m) */
if (nitems == m) {
(void)test_array(a, nitems, 0); // {0, 0}, {0, 1}, {1, 0}, {1, 1}
}
else for (int i = 0; i < m; ++i) {
a[nitems] = i;
backtrack(a, m, nitems + 1);
}
}
The Answer from "nickf" dows not work if the array is unsorted
var_dump(testArray(array(5, 3, 1, 2, 4), 1, 5)); //gives "duplicates" !!!!
Also your formula to compute sum([n...n+m-1]) looks incorrect....
the correct formula is (m(m+1)/2 - n(n-1)/2)
An array contains N numbers, and you want to determine whether two of the
numbers sum to a given number K. For instance, if the input is 8,4, 1,6 and K is 10,
the answer is yes (4 and 6). A number may be used twice. Do the following.
a. Give an O(N2) algorithm to solve this problem.
b. Give an O(N log N) algorithm to solve this problem. (Hint: Sort the items first.
After doing so, you can solve the problem in linear time.)
c. Code both solutions and compare the running times of your algorithms.
4.
Product of m consecutive numbers is divisible by m! [ m factorial ]
so in one pass you can compute the product of the m numbers, also compute m! and see if the product modulo m ! is zero at the end of the pass
I might be missing something but this is what comes to my mind ...
something like this in python
my_list1 = [9,5,8,7,6]
my_list2 = [3,5,4,7]
def consecutive(my_list):
count = 0
prod = fact = 1
for num in my_list:
prod *= num
count +=1
fact *= count
if not prod % fact:
return 1
else:
return 0
print consecutive(my_list1)
print consecutive(my_list2)
HotPotato ~$ python m_consecutive.py
1
0
I propose the following:
Choose a finite set of prime numbers P_1,P_2,...,P_K, and compute the occurrences of the elements in the input sequence (minus the minimum) modulo each P_i. The pattern of a valid sequence is known.
For example for a sequence of 17 elements, modulo 2 we must have the profile: [9 8], modulo 3: [6 6 5], modulo 5: [4 4 3 3 3], etc.
Combining the test using several bases we obtain a more and more precise probabilistic test. Since the entries are bounded by the integer size, there exists a finite base providing an exact test. This is similar to probabilistic pseudo primality tests.
S_i is an int array of size P_i, initially filled with 0, i=1..K
M is the length of the input sequence
Mn = INT_MAX
Mx = INT_MIN
for x in the input sequence:
for i in 1..K: S_i[x % P_i]++ // count occurrences mod Pi
Mn = min(Mn,x) // update min
Mx = max(Mx,x) // and max
if Mx-Mn != M-1: return False // Check bounds
for i in 1..K:
// Check profile mod P_i
Q = M / P_i
R = M % P_i
Check S_i[(Mn+j) % P_i] is Q+1 for j=0..R-1 and Q for j=R..P_i-1
if this test fails, return False
return True
Any contiguous array [ n, n+1, ..., n+m-1 ] can be mapped on to a 'base' interval [ 0, 1, ..., m ] using the modulo operator. For each i in the interval, there is exactly one i%m in the base interval and vice versa.
Any contiguous array also has a 'span' m (maximum - minimum + 1) equal to it's size.
Using these facts, you can create an "encountered" boolean array of same size containing all falses initially, and while visiting the input array, put their related "encountered" elements to true.
This algorithm is O(n) in space, O(n) in time, and checks for duplicates.
def contiguous( values )
#initialization
encountered = Array.new( values.size, false )
min, max = nil, nil
visited = 0
values.each do |v|
index = v % encountered.size
if( encountered[ index ] )
return "duplicates";
end
encountered[ index ] = true
min = v if min == nil or v < min
max = v if max == nil or v > max
visited += 1
end
if ( max - min + 1 != values.size ) or visited != values.size
return "hole"
else
return "contiguous"
end
end
tests = [
[ false, [ 2,4,5,6 ] ],
[ false, [ 10,11,13,14 ] ] ,
[ true , [ 20,21,22,23 ] ] ,
[ true , [ 19,20,21,22,23 ] ] ,
[ true , [ 20,21,22,23,24 ] ] ,
[ false, [ 20,21,22,23,24+5 ] ] ,
[ false, [ 2,2,3,4,5 ] ]
]
tests.each do |t|
result = contiguous( t[1] )
if( t[0] != ( result == "contiguous" ) )
puts "Failed Test : " + t[1].to_s + " returned " + result
end
end
I like Greg Hewgill's idea of Radix sorting. To find duplicates, you can sort in O(N) time given the constraints on the values in this array.
For an in-place O(1) space O(N) time that restores the original ordering of the list, you don't have to do an actual swap on that number; you can just mark it with a flag:
//Java: assumes all numbers in arr > 1
boolean checkArrayConsecutiveRange(int[] arr) {
// find min/max
int min = arr[0]; int max = arr[0]
for (int i=1; i<arr.length; i++) {
min = (arr[i] < min ? arr[i] : min);
max = (arr[i] > max ? arr[i] : max);
}
if (max-min != arr.length) return false;
// flag and check
boolean ret = true;
for (int i=0; i<arr.length; i++) {
int targetI = Math.abs(arr[i])-min;
if (arr[targetI] < 0) {
ret = false;
break;
}
arr[targetI] = -arr[targetI];
}
for (int i=0; i<arr.length; i++) {
arr[i] = Math.abs(arr[i]);
}
return ret;
}
Storing the flags inside the given array is kind of cheating, and doesn't play well with parallelization. I'm still trying to think of a way to do it without touching the array in O(N) time and O(log N) space. Checking against the sum and against the sum of least squares (arr[i] - arr.length/2.0)^2 feels like it might work. The one defining characteristic we know about a 0...m array with no duplicates is that it's uniformly distributed; we should just check that.
Now if only I could prove it.
I'd like to note that the solution above involving factorial takes O(N) space to store the factorial itself. N! > 2^N, which takes N bytes to store.
Oops! I got caught up in a duplicate question and did not see the already identical solutions here. And I thought I'd finally done something original! Here is a historical archive of when I was slightly more pleased:
Well, I have no certainty if this algorithm satisfies all conditions. In fact, I haven't even validated that it works beyond a couple test cases I have tried. Even if my algorithm does have problems, hopefully my approach sparks some solutions.
This algorithm, to my knowledge, works in constant memory and scans the array three times. Perhaps an added bonus is that it works for the full range of integers, if that wasn't part of the original problem.
I am not much of a pseudo-code person, and I really think the code might simply make more sense than words. Here is an implementation I wrote in PHP. Take heed of the comments.
function is_permutation($ints) {
/* Gather some meta-data. These scans can
be done simultaneously */
$lowest = min($ints);
$length = count($ints);
$max_index = $length - 1;
$sort_run_count = 0;
/* I do not have any proof that running this sort twice
will always completely sort the array (of course only
intentionally happening if the array is a permutation) */
while ($sort_run_count < 2) {
for ($i = 0; $i < $length; ++$i) {
$dest_index = $ints[$i] - $lowest;
if ($i == $dest_index) {
continue;
}
if ($dest_index > $max_index) {
return false;
}
if ($ints[$i] == $ints[$dest_index]) {
return false;
}
$temp = $ints[$dest_index];
$ints[$dest_index] = $ints[$i];
$ints[$i] = $temp;
}
++$sort_run_count;
}
return true;
}
So there is an algorithm that takes O(n^2) that does not require modifying the input array and takes constant space.
First, assume that you know n and m. This is a linear operation, so it does not add any additional complexity. Next, assume there exists one element equal to n and one element equal to n+m-1 and all the rest are in [n, n+m). Given that, we can reduce the problem to having an array with elements in [0, m).
Now, since we know that the elements are bounded by the size of the array, we can treat each element as a node with a single link to another element; in other words, the array describes a directed graph. In this directed graph, if there are no duplicate elements, every node belongs to a cycle, that is, a node is reachable from itself in m or less steps. If there is a duplicate element, then there exists one node that is not reachable from itself at all.
So, to detect this, you walk the entire array from start to finish and determine if each element returns to itself in <=m steps. If any element is not reachable in <=m steps, then you have a duplicate and can return false. Otherwise, when you finish visiting all elements, you can return true:
for (int start_index= 0; start_index<m; ++start_index)
{
int steps= 1;
int current_element_index= arr[start_index];
while (steps<m+1 && current_element_index!=start_index)
{
current_element_index= arr[current_element_index];
++steps;
}
if (steps>m)
{
return false;
}
}
return true;
You can optimize this by storing additional information:
Record sum of the length of the cycle from each element, unless the cycle visits an element before that element, call it sum_of_steps.
For every element, only step m-sum_of_steps nodes out. If you don't return to the starting element and you don't visit an element before the starting element, you have found a loop containing duplicate elements and can return false.
This is still O(n^2), e.g. {1, 2, 3, 0, 5, 6, 7, 4}, but it's a little bit faster.
ciphwn has it right. It is all to do with statistics. What the question is asking is, in statistical terms, is whether or not the sequence of numbers form a discrete uniform distribution. A discrete uniform distribution is where all values of a finite set of possible values are equally probable. Fortunately there are some useful formulas to determine if a discrete set is uniform. Firstly, to determine the mean of the set (a..b) is (a+b)/2 and the variance is (n.n-1)/12. Next, determine the variance of the given set:
variance = sum [i=1..n] (f(i)-mean).(f(i)-mean)/n
and then compare with the expected variance. This will require two passes over the data, once to determine the mean and again to calculate the variance.
References:
uniform discrete distribution
variance
Here is a solution in O(N) time and O(1) extra space for finding duplicates :-
public static boolean check_range(int arr[],int n,int m) {
for(int i=0;i<m;i++) {
arr[i] = arr[i] - n;
if(arr[i]>=m)
return(false);
}
System.out.println("In range");
int j=0;
while(j<m) {
System.out.println(j);
if(arr[j]<m) {
if(arr[arr[j]]<m) {
int t = arr[arr[j]];
arr[arr[j]] = arr[j] + m;
arr[j] = t;
if(j==arr[j]) {
arr[j] = arr[j] + m;
j++;
}
}
else return(false);
}
else j++;
}
Explanation:-
Bring number to range (0,m-1) by arr[i] = arr[i] - n if out of range return false.
for each i check if arr[arr[i]] is unoccupied that is it has value less than m
if so swap(arr[i],arr[arr[i]]) and arr[arr[i]] = arr[arr[i]] + m to signal that it is occupied
if arr[j] = j and simply add m and increment j
if arr[arr[j]] >=m means it is occupied hence current value is duplicate hence return false.
if arr[j] >= m then skip

Resources