Finding array indicies for "divide-and-conquer" algorithm? - arrays

I have to implement a divide-and-conquer algorithm in C++ for a max function, which returns the maximum value in an array. I understand the algorithm and have designed the function already, but I am running into issues with the array indices.
In pseudocode, here is my function:
def max(array, startIndex, endIndex)
// if there is only one element, return it
if startIdx = endIdx
return array[startIdx];
leftHigh = max(array, startIdx, endIdx/2);
rightHigh = max(array, endIdx/2 + 1, endIdx);
return maximum of leftHigh and rightHigh;
However, I run into an issue with these values for the recursive call parameters. The following paragraph demonstrates what I found when I mentally stepped through the algorithm:
The simplest case is an array of 4 elements. The first call to max will take index parameters 0, 3, and will make the calls with parameters 0, 1 and 2, 3. The first recursive call will result in calls with 0, 0 and 1, 1 which will terminate correctly. However, the second recursive call will result in calls with 2, 1 and 2, 3. The first eventually results in overstepping the array bounds and the second results in an infinite loop since those parameters have already been used.
I have tried messing with it, for example using (startIdx, endIdx/2 -1) for the first bounds and (endIdx/2, endIdx) for the second bounds, and this fixes the second branch of recursive calls but messes up the first.
Is there a way to find these indices resulting in the correct behavior? I appreciate the help.

It should be
leftHigh = max(array, startIdx, (startIdx + endIdx)/2);
rightHigh = max(array, (startIdx + endIdx)/2 + 1, endIdx);

If I give you two numbers: a < c, then how would you characterise any numbers in between the two. I.E, what can we say about b when a < b < c ?
a < b < c
0 < b - a < c - a
You are picking b = c/2. Which we can see only meets the above criteria in some cases:
b = c / 2
and b > a
then c / 2 > a
So we can see that your method works as long as a > c/2. In your case a = startIdx and c = endIdx, so your algorithm works only while startIdx < endIdx / 2.
Consider this finding carefully:
a < b < c
0 < b - a < c - a (subtract a from all parts)
If b is half-way between a and c, then what is its value? In this case, how does (b - a) relate to (c - a)?

Suggested solution in Python:
from math import floor, ceil
def maximum(arr, left, right):
if left >= right:
if left < len(arr):
return arr[left]
else:
return arr[right]
else:
left = maximum(arr, left, int((left+right)/2)) # pay attention to the midpoint!
right = maximum(arr, int((left+right)/2), right)
return max(left, right)
print maximum([1,8,2,9,3,15,5,3,2], 0, 8)
OUTPUT
15

Related

How to generate all possible binary sequences of a set length --given set portions of the sequence

This began as a simple exercise to practice some Python programming, but I have been unable to fix the flaw in my logic!
The goal is simple. Given an array of a predetermined size, I need to build all possible binary permutations of the array size. An additional complication is that specific array indices will be already set, thus avoiding the need to generate a number of permutations.
To give a concrete example:
n=3
arr = [None]*n
arr[0] = 0
The resultant list of binary permutations should be: [0, 0, 0],[0, 0, 1],[0, 1, 0],[0, 1, 1]
My code is returning: [0, 0, 0],[0, 0, 1],[0, 1, 1]
So I can clearly see at which point it is not creating further permutations (as this problem is only exacerbated with larger array sizes). Unfortunately, I do not understand why it not generating that last permutation.
This is the code I am using to generate and print the binary strings:
def arrayPrint(arr, n):
for i in range(0, n):
print(arr[i], end=" ")
print()
def binaryStringGen(n, arr, i):
if i == n:
arrayPrint(arr, n)
return
if (arr[i] == None):
arr[i] = 0
binaryStringGen(n, arr, i + 1)
arr[i] = 1
binaryStringGen(n, arr, i + 1)
else:
binaryStringGen(n, arr, i + 1)
Any help in understanding this would be greatly appreciated!
You must add arr[i] = None right before your else branch:
if (arr[i] == None):
arr[i] = 0
binaryStringGen(n, arr, i + 1)
arr[i] = 1
binaryStringGen(n, arr, i + 1)
arr[i] = None # <= add this
Otherwise when you backtrack from binaryStringGen and later call again binaryStringGen with the same value of i, arr[i] will still be set to 1 and your code will enter the else branch.
More generally when you're writing recursive functions that have side effects on your arguments be sure that these are properly set back after the function terminates.

Recursion inside loops

So there was an assignment given by my professor on recursion the question is as follows
Problem 1:
You are given scales for weighing loads. On the left side lies a single stone of known weight W < 2N . You own a set of N different weights, weighing 1, 2, 4, ..., 2N-1 units of mass respectively. Determine how many possible ways there are of placing some weights on the sides of the scales, so as to balance them (put them in a state of equilibrium).
The solution was also given
#include <stdio.h>
int N;
int no_ways(int W, int index) {
if (!W)
return 1;
if (index == N)
return 0;
int ans = 0, i;
for (i = 0; i * (1 << index) <= W; i++)
ans += no_ways(W - i * (1 << index), index + 1);
return ans;
}
void main() {
int W;
scanf("%d%d", &N, &W);
printf("%d\n", no_ways(W, 0));
return 0;
}
In this I understood how the base conditions were tested however I could not understand the recursive call inside the for loop and how the value of index differs in each recursive call.
Any easier approach or help in understanding this program?
PS: I am new to recursion and this seemed to be way too complex for me to understand
In order to understand recursion, I recommend you pick some very small input values and visualize it pen & paper style:
// Example Input:
W = 2, N = 2
// first call
no_ways(2, 0)
// loop0: i = 0 to 2
// first recursive call
no_ways(2, 1)
// loop1: i = 0 to 1
no_ways(2, 2)
= 0 (index == N)
no_ways(0, 2)
= 1 (W == 0)
= 1 (sum of recursions in loop1)
no_ways(1, 1)
// loop2: i = 0 to 0
no_ways(1, 2)
= 0 (index == N)
= 0 (sum of recursions in loop2)
no_ways(0, 1)
= 1 (W == 0)
= 2 (sum of recursions in loop0)
As you can see, the sequence of recursive calls and the collection of results becomes fairly complex even with this very small input, but I hope it's still readable to you.
As David C. Rankin mentioned in the comments, this algorithm is not really good. It will always reach recursion depth (number of nested calls) of N for any W > 0, even though it would be possible to detect early, when a specific recursion path is unable to produce any non-zero result.
The algorithm is written in a way, where with increasing index only W values that can be divided by 2^index are solveable.
So (for example) any recursion of the first function call, where W is an odd number, will never lead to any result other than 0, since all weights with index > 0 are even number weights.
The code you present solves different problem than you described. So either you made a mistake in your description or the solution is wrong.
The problem your code solves has a central condition like this:
You own a set of weights, weighing 1, 2, 4, ..., 2N-1 units of mass respectively, with arbitrary number of each weight
and additionally you can place those weights on one side of scales only (the one opposite to the stone).
This allows, for example, to balance weighing scale with a 2-unit stone in two ways, as the answer by grek40 shows: one solution is a single weigh of 2, the other one is two weighs 1 unit each.
Here is how your code achieves it.
The parameter W to a function no_ways represents an unbalanced (part of) weight of your stone, and the parameter index denotes the smallest weigh you can use. So to find all possible solutions we call no_ways(W,0) which corresponds to balancing the total weight of W with all available weighs.
The two base cases are 'there's no unbalanced weigh left', which means we found a solution, so we return 1; and 'we exhausted the allowed range of weighs`, which means we can not find a solution, so we return 0.
Then we try to expand a partial solution by trying to add the lightest available weighs to scales. The lightest weigh is (1 << index), which is 2index, so we multiply it by increasing values of i and subtract it from W; this is done with:
for (i = 0; i * (1 << index) <= W; i++)
(W - i * (1 << index), )
and we try to balance the remaining W - i * (1 << index) with the next available weigh (defined by the next value of index) by calling:
for (i = 0; i * (1 << index) <= W; i++)
no_ways(W - i * (1 << index), index + 1)
Finally we accumulate the number of solutions found by summing the results:
for (i = 0; i * (1 << index) <= W; i++)
ans += no_ways(W - i * (1 << index), index + 1);
The sum is returned up the recursion so that at the top level we get a number of all solutions found.
I have modified you code a bit to build and print an explicit representation of each solution found. It consists of an array int stack[] and a variable top, which indicates the free position in the stack. Initially top==0, the stack is empty.
Whenever the for() loop devises a decrement to W it puts the value onto the stack and advances the top pointer, so that recursive calls build up a solution in the array. On return from the recursion we decrement top so that a new iteration of for() can put a new value at the same place.
When we have a new solution, the whole stack is printed.
Here is the code:
int stack[15];
int top = 0;
void print_stack() {
int k;
for (k = 0; k < top; k ++)
printf(" %d", stack[k]);
printf("\n");
}
int N;
int no_ways(int W, int index) {
if (!W) {
print_stack();
return 1;
}
if (index == N)
return 0;
int ans = 0, i;
for (i = 0; i * (1 << index) <= W; i++) {
stack[top ++] = i * (1 << index);
ans += no_ways(W - i * (1 << index), index + 1);
top --;
}
return ans;
}
You can easily find all lines I added — those are lines containing 'stack' or 'top'.
For the case W=2, N=2 investigated by grek40 the code prints:
0 2
2
2
The last line shows there are two solutions found: the first is a weight 2 obtained with one 2-unit weigh and zero 1-unit weighs (correct) and the other one is TWO one-unit weighs.
Here are the results for W=5 and N=3:
1 0 4
1 4
3 2
5
4
These are solutions: 5=1+4 (correct), 5=1+2*2 (with a weigh of 2 units used twice), 5=3*1+2 (with a weigh of 1 unit used thrice) and 5=5*1 (with five one-unit weighs).
Solutions found in total: 4.
I have tested the code in an online compiler/debugger at https://www.onlinegdb.com/
EDIT
For solving the problem as you stated it, that is:
having precisely one weight equal 1 unit, one weight equal 2 units and so on through powers of 2 up to one weight of 2N-1 units, which can be placed on both sides of the scales, find a balance
you could modify the solution as follows.
Every weight can be placed either on the same plate where the stone is, thus adding a weight, or on the opposite one, thus (partially) reducing the weight – or can be left alone, out of the scales. The aim is to get a zero unbalanced weight. This corresponds to satisfying an equation like
W + s0×1 + s1×2 + s2×4 + ... + sN-1×2N-1 = 0
by choosing appropriately each sn term equal –1, 0 or 1.
That can be achieved with a simple modification of the code:
int no_ways(int W, int index) {
if (!W)
return 1;
if (index == N)
return 0;
int ans = 0, i;
for (i = -1; i <= 1; i++) // i equals -1, 0 or 1
ans += no_ways(W + i * (1 << index), index + 1);
return ans;
}

Finding permutations of Array without for loops

I saw this interview question on a LinkedIn group here
To summarise, if I have an array
[1,2,3,4,5]
and input
3
I require the output
[1,2,3], [3,2,1], [2,3,1], [2,1,3], [1,3,2], [3,1,2], [2,3,4], [4,3,2],...
In no particular order.
I have been thinking about this one for a while now. I have come up with various different ways of solving but all methods use for-loops.
I think it's clear that in order to eliminate loops it must be recursive.
I thought I got close to doing this recursively partitioning the array and joining elements, but with great frustration I ended up requiring another for loop.
Im beginning to think this is impossible (which it can't be, otherwise why the interview question?).
Any ideas or links? The amount of possible outputs should be 5PN, where N is the input.
The following recursive algorithm will attempt to print every subset of {1,.., n}. These subsets are in one to one with numbers between 0 and 2^n-1 via the following bijection: to an integer x between 0 and 2^n-1, associate the set that contains 1 if the first bit of x is set to one, 2 if the second bit of x is set to one, ..
void print_all_subsets (int n, int m, int x) {
if (x==pow(2,n)) {
return;
}
else if (x has m bits set to one) {
print the set corresponding to x;
}
print_all_subsets(n,m,x+1);
}
You need to call it with n = 5 (in your case), m=3 (in your case), and x = 0.
Then you need to implement the two functions "print the set corresponding to x" and "x has m bits set to one" without for loops... but this is easily done using again recursion.
However, I think this is more of a challenge -- there is no point in completely eliminating for-loops, what makes sense is just to use them in a smart way.
Your first thought is right. Every loop can be replaced with recursion. In some languages (for example Scheme), loops are actually implemented with recursion. So just start with any solution, and keep on turning loops into recursion. Eventually you will be done.
Here is a working solution in Python.
def subsets_of_size (array, size, start=0, prepend=None):
if prepend is None:
prepend = [] # Standard Python precaution with modifiable defaults.
if 0 == size:
return [[] + prepend] # Array with one thing. The + forces a copy.
elif len(array) < start + size:
return [] # Array with no things.
else:
answer = subsets_of_size(array, size, start=start + 1, prepend=prepend)
prepend.append(array[start])
answer = answer + subsets_of_size(array, size-1, start=start + 1, prepend=prepend)
prepend.pop()
return answer
print subsets_of_size([1,2,3,4,5], 3)
I don't think the solution is not to use for-loop but there is an optimum way to use for-loop.
And so, there is the Heap's Algorithm. Below from wiki http://en.wikipedia.org/wiki/Heap%27s_algorithm
procedure generate(n : integer, A : array of any):
if n = 1 then
output(A)
else
for i := 0; i < n; i += 1 do
generate(n - 1, A)
if n is even then
swap(A[i], A[n - 1])
else
swap(A[0], A[n-1])
end if
end for
end if
define listPermutations:
input: int p_l , int[] prevP , int atElement , int[] val , int nextElement
output: list
if nextElement > length(val) OR atElement == p_l OR contains(prevP , val[nextElement]
return EMPTY
list result
int[] tmp = copy(prevP)
tmp[atElement] = val[nextElement]
add(result , tmp)
//create the next permutation stub with the last sign different to this sign
//(node with the same parent)
addAll(result , listPermutations(p_l , tmp , atElement , val , nextElement + 1))
//create the next permutation stub with an additional sign
//(child node of the current permutation
addAll(result , listPermutations(p_l , tmp , atElement + 1 , val , 0))
return result
//this will return the permutations for your example input:
listPermutations(3 , new int[3] , 0 , int[]{1 , 2 , 3 , 4 , 5} , 0)
Basic idea: all permutations of a given number of elements form a tree, where the node is the empty permutation and all childnodes of a node have one additional element. Now all the algorithm has to do is to traverse this tree level by level, until the level is equal to the required length of the permutation and list all nodes on that level
You could use recursion here, and every time you call an inner level, you give it the location it is in the array and when it returns it return an increased location. You'd be using one while loop for this.
Pseudo code:
int[] input = [1,2,3,4,5];
int level = 3;
int PrintArrayPermutation(int level, int location, string base)
{
if (level == 0)
{
print base + input[location];
return location + 1;
}
while (location <= input.Length)
{
location =
PrintArrayPermutation(level - 1, location, base + input[location]);
}
}
This is a very basic outline of my idea.
Here are two recursive functions in JavaScript. The first is the combinatorial choose function to which we apply the second function, permuting each result (permutator is adapted from the SO user, delimited's, answer here: Permutations in JavaScript?)
function c(n,list){
var result = [];
function _c(p,r){
if (p > list.length)
return
if (r.length == n){
result = result.concat(permutator(r));
} else {
var next = list[p],
_r = r.slice();
_r.push(next)
_c(p+1,_r);
_c(p+1,r);
}
}
_c(0,[])
return result;
}
function permutator(inputArr) {
var results = [];
function permute(arr, memo) {
var cur, memo = memo || [];
function _permute (i,arr,l){
if (i == l)
return
cur = arr.splice(i,1);
if (arr.length === 0){
results.push(memo.concat(cur));
}
permute(arr.slice(), memo.concat(cur));
arr.splice(i, 0, cur[0]);
_permute(i + 1,arr,l)
}
_permute(0,arr,arr.length);
return results;
}
return permute(inputArr);
}
Output:
console.log(c(3,[1,2,3,4,5]))
[[1,2,3],[1,3,2],[2,1,3]...[4,5,3],[5,3,4],[5,4,3]]

Find the Smallest Integer Not in a List

An interesting interview question that a colleague of mine uses:
Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?
FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?
FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory
CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.
If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.
# Pass 1, move every value to the position of its value
for cursor in range(N):
target = array[cursor]
while target < N and target != array[target]:
new_target = array[target]
array[target] = target
target = new_target
# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
if array[cursor] != cursor:
return cursor
return N
Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
Find the length of the list; lets say it is N.
Allocate an array of N booleans, initialized to all false.
For each number X in the list, if X is less than N, set the X'th element of the array to true.
Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.
In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.
This is how / why the algorithm works.
Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.
The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)
#Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.
By contrast, the algorithm presented above simply requires N bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.
Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.
1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.
In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.
As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.
You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.
On the first partition phase, remove duplicates.
Once the partitioning is complete look at the number of items in the lower partition
Is this value equal to the value used for creating the partition?
If so then it implies that the gap is in the higher partition.
Continue with the quicksort, ignoring the lower partition
Otherwise the gap is in the lower partition
Continue with the quicksort, ignoring the higher partition
This saves a large number of computations.
To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.
for i in [0..2^64):
if i not in list: return i
print "no 64-bit integers are missing"
Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.
if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.
For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).
set U = N; L=0
First partition the number space in k regions. Like this:
0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
Find how many numbers (count{i}) are in each region. (N*k steps)
Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
if h - count{h-1} = 1 you've got your answer
set U = count{h}; L = count{h-1}
goto 2
this can be improved using hashing (thanks for Nic this idea).
same
First partition the number space in k regions. Like this:
L + (i/k)->L + (i+1/k)*(U-L)
inc count{j} using j = (number - L)/k (if L < number < U)
find first region (h) that doesn't have k elements in it
if count{h} = 1 h is your answer
set U = maximum value in region h L = minimum value in region h
This will run in O(log(N)*N).
I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).
In terms of an algorithm, something like this would do it:
def smallest_not_in_list(list):
sort(list)
if list[0] != 0:
return 0
for i = 1 to list.last:
if list[i] != list[i-1] + 1:
return list[i-1] + 1
if list[list.last] == 2^64 - 1:
assert ("No gaps")
return list[list.last] + 1
Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)
I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.
The algorithm for that one would be along the lines of:
def smallest_not_in_list(list):
bitmask = mask_make(2^64) // might take a while :-)
mask_clear_all (bitmask)
for i = 1 to list.last:
mask_set (bitmask, list[i])
for i = 0 to 2^64 - 1:
if mask_is_clear (bitmask, i):
return i
assert ("No gaps")
Sort the list, look at the first and second elements, and start going up until there is a gap.
We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.
for every i in X // One scan Θ(1)
hashtable.put(i, i); // O(1)
low = 0;
while (hashtable.get(i) <> null) // at most n+1 times
low++;
print low;
The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).
You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.
For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.
Here it is as a Python function:
def smallest_missing_uint64(source_list):
the_answer = None
target = 0L
while target < 2L**64:
target_found = False
for item in source_list:
if item == target:
target_found = True
if not target_found and the_answer is None:
the_answer = target
target += 1L
return the_answer
This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.
Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.
Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.
Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.
This will be completed in O(n) time and O(sqrt(n)) space.
(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)
Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.
int i = 0;
while ( i < Array.Length)
{
if (Array[i] == i + 1)
{
i++;
}
if (i < Array.Length)
{
if (Array[i] <= Array.Length)
{//SWap
int temp = Array[i];
int AnoTemp = Array[temp - 1];
Array[temp - 1] = temp;
Array[i] = AnoTemp;
}
else
i++;
}
}
for (int j = 0; j < Array.Length; j++)
{
if (Array[j] > Array.Length)
{
Console.WriteLine(j + 1);
j = Array.Length;
}
else
if (j == Array.Length - 1)
Console.WriteLine("Not Found !!");
}
}
Here's my answer written in Java:
Basic Idea:
1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.
2- Compute the sum as max * (max+1)/2.
3- Find the difference between the sums calculated at steps 1 & 2
4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.
public static int solution(int[] A) {
if (A == null || A.length == 0) {
throw new IllegalArgumentException();
}
int sum = 0;
Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
int max = A[0];
for (int i = 0; i < A.length; i++) {
if(A[i] < 0) {
continue;
}
if(uniqueNumbers.get(A[i]) != null) {
continue;
}
if (A[i] > max) {
max = A[i];
}
uniqueNumbers.put(A[i], true);
sum += A[i];
}
int completeSum = (max * (max + 1)) / 2;
for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
if(uniqueNumbers.get(j) == null) { //O(1)
return j;
}
}
//All negative case
if(uniqueNumbers.isEmpty()) {
return 1;
}
return 0;
}
As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.
The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.
I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:
LowNum=0
i=0
do forever {
if i == N then leave /* Processed entire array */
if array[i] == LowNum {
LowNum++
i=0
}
else {
i++
}
}
display LowNum
The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)
I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by:
(n+2)(n+1)/2-(n+1)n/2
EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.
Thanks for pointing out the fact that I was putting some middle pieces in my mind.
Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:
#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
int m = n;
for (int i = 0; i < m;) {
if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
m--;
SWAP (a[i], a[m]);
continue;
}
if (a[i] > i) {
SWAP (a[i], a[a[i]]);
continue;
}
i++;
}
return m;
}
m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".
This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.
If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.
Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.
The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)
The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.
method AntsAasma(A: array<int>) returns (M: int)
requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
modifies A;
{
// Pass 1, move every value to the position of its value
var N := A.Length;
var cursor := 0;
while (cursor < N)
{
var target := A[cursor];
while (0 <= target < N && target != A[target])
{
var new_target := A[target];
A[target] := target;
target := new_target;
}
cursor := cursor + 1;
}
// Pass 2, find first location where the index doesn't match the value
cursor := 0;
while (cursor < N)
{
if (A[cursor] != cursor)
{
return cursor;
}
cursor := cursor + 1;
}
return N;
}
Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.
Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):
int smallestMissingValue(List<Integer> values) {
BitSet bitset = new BitSet(values.size() + 1);
for (int i : values) {
if (i >= 0 && i <= values.size()) {
bitset.set(i);
}
}
return bitset.nextClearBit(0);
}
def solution(A):
index = 0
target = []
A = [x for x in A if x >=0]
if len(A) ==0:
return 1
maxi = max(A)
if maxi <= len(A):
maxi = len(A)
target = ['X' for x in range(maxi+1)]
for number in A:
target[number]= number
count = 1
while count < maxi+1:
if target[count] == 'X':
return count
count +=1
return target[count-1] + 1
Got 100% for the above solution.
1)Filter negative and Zero
2)Sort/distinct
3)Visit array
Complexity: O(N) or O(N * log(N))
using Java8
public int solution(int[] A) {
int result = 1;
boolean found = false;
A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
//System.out.println(Arrays.toString(A));
for (int i = 0; i < A.length; i++) {
result = i + 1;
if (result != A[i]) {
found = true;
break;
}
}
if (!found && result == A.length) {
//result is larger than max element in array
result++;
}
return result;
}
An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.
int firstMissingPositive(vector<int>& nums) {
unordered_set<int> fre;
// storing each positive number in a hash.
for(int i = 0; i < nums.size(); i +=1)
{
if(nums[i] > 0)
fre.insert(nums[i]);
}
int i = 1;
// Iterating from 1 to size of the set and checking
// for the occurrence of 'i'
for(auto it = fre.begin(); it != fre.end(); ++it)
{
if(fre.find(i) == fre.end())
return i;
i +=1;
}
return i;
}
Solution through basic javascript
var a = [1, 3, 6, 4, 1, 2];
function findSmallest(a) {
var m = 0;
for(i=1;i<=a.length;i++) {
j=0;m=1;
while(j < a.length) {
if(i === a[j]) {
m++;
}
j++;
}
if(m === 1) {
return i;
}
}
}
console.log(findSmallest(a))
Hope this helps for someone.
With python it is not the most efficient, but correct
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime
# write your code in Python 3.6
def solution(A):
MIN = 0
MAX = 1000000
possible_results = range(MIN, MAX)
for i in possible_results:
next_value = (i + 1)
if next_value not in A:
return next_value
return 1
test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
def solution(A):
A.sort()
j = 1
for i, elem in enumerate(A):
if j < elem:
break
elif j == elem:
j += 1
continue
else:
continue
return j
this can help:
0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length; (O(1))
2- initialize B Cells With 1; (O(n))
3- For Each Item In A:
if (B.Length <= item) then B[Item] = -1 (O(n))
4- The answer is smallest index in B such that B[index] != -1 (O(n))

Algorithm to determine if array contains n...n+m?

I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview questions:
Write a method that takes an int array of size m, and returns (True/False) if the array consists of the numbers n...n+m-1, all numbers in that range and only numbers in that range. The array is not guaranteed to be sorted. (For instance, {2,3,4} would return true. {1,3,1} would return false, {1,2,4} would return false.
The problem I had with this one is that my interviewer kept asking me to optimize (faster O(n), less memory, etc), to the point where he claimed you could do it in one pass of the array using a constant amount of memory. Never figured that one out.
Along with your solutions please indicate if they assume that the array contains unique items. Also indicate if your solution assumes the sequence starts at 1. (I've modified the question slightly to allow cases where it goes 2, 3, 4...)
edit: I am now of the opinion that there does not exist a linear in time and constant in space algorithm that handles duplicates. Can anyone verify this?
The duplicate problem boils down to testing to see if the array contains duplicates in O(n) time, O(1) space. If this can be done you can simply test first and if there are no duplicates run the algorithms posted. So can you test for dupes in O(n) time O(1) space?
Under the assumption numbers less than one are not allowed and there are no duplicates, there is a simple summation identity for this - the sum of numbers from 1 to m in increments of 1 is (m * (m + 1)) / 2. You can then sum the array and use this identity.
You can find out if there is a dupe under the above guarantees, plus the guarantee no number is above m or less than n (which can be checked in O(N))
The idea in pseudo-code:
0) Start at N = 0
1) Take the N-th element in the list.
2) If it is not in the right place if the list had been sorted, check where it should be.
3) If the place where it should be already has the same number, you have a dupe - RETURN TRUE
4) Otherwise, swap the numbers (to put the first number in the right place).
5) With the number you just swapped with, is it in the right place?
6) If no, go back to step two.
7) Otherwise, start at step one with N = N + 1. If this would be past the end of the list, you have no dupes.
And, yes, that runs in O(N) although it may look like O(N ^ 2)
Note to everyone (stuff collected from comments)
This solution works under the assumption you can modify the array, then uses in-place Radix sort (which achieves O(N) speed).
Other mathy-solutions have been put forth, but I'm not sure any of them have been proved. There are a bunch of sums that might be useful, but most of them run into a blowup in the number of bits required to represent the sum, which will violate the constant extra space guarantee. I also don't know if any of them are capable of producing a distinct number for a given set of numbers. I think a sum of squares might work, which has a known formula to compute it (see Wolfram's)
New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):
So, it has been mentioned to maybe use sum + sum of squares. No one knew if this worked or not, and I realized that it only becomes an issue when (x + y) = (n + m), such as the fact 2 + 2 = 1 + 3. Squares also have this issue thanks to Pythagorean triples (so 3^2 + 4^2 + 25^2 == 5^2 + 7^2 + 24^2, and the sum of squares doesn't work). If we use Fermat's last theorem, we know this can't happen for n^3. But we also don't know if there is no x + y + z = n for this (unless we do and I don't know it). So no guarantee this, too, doesn't break - and if we continue down this path we quickly run out of bits.
In my glee, however, I forgot to note that you can break the sum of squares, but in doing so you create a normal sum that isn't valid. I don't think you can do both, but, as has been noted, we don't have a proof either way.
I must say, finding counterexamples is sometimes a lot easier than proving things! Consider the following sequences, all of which have a sum of 28 and a sum of squares of 140:
[1, 2, 3, 4, 5, 6, 7]
[1, 1, 4, 5, 5, 6, 6]
[2, 2, 3, 3, 4, 7, 7]
I could not find any such examples of length 6 or less. If you want an example that has the proper min and max values too, try this one of length 8:
[1, 3, 3, 4, 4, 5, 8, 8]
Simpler approach (modifying hazzen's idea):
An integer array of length m contains all the numbers from n to n+m-1 exactly once iff
every array element is between n and n+m-1
there are no duplicates
(Reason: there are only m values in the given integer range, so if the array contains m unique values in this range, it must contain every one of them once)
If you are allowed to modify the array, you can check both in one pass through the list with a modified version of hazzen's algorithm idea (there is no need to do any summation):
For all array indexes i from 0 to m-1 do
If array[i] < n or array[i] >= n+m => RETURN FALSE ("value out of range found")
Calculate j = array[i] - n (this is the 0-based position of array[i] in a sorted array with values from n to n+m-1)
While j is not equal to i
If list[i] is equal to list[j] => RETURN FALSE ("duplicate found")
Swap list[i] with list[j]
Recalculate j = array[i] - n
RETURN TRUE
I'm not sure if the modification of the original array counts against the maximum allowed additional space of O(1), but if it doesn't this should be the solution the original poster wanted.
By working with a[i] % a.length instead of a[i] you reduce the problem to needing to determine that you've got the numbers 0 to a.length - 1.
We take this observation for granted and try to check if the array contains [0,m).
Find the first node that's not in its correct position, e.g.
0 1 2 3 7 5 6 8 4 ; the original dataset (after the renaming we discussed)
^
`---this is position 4 and the 7 shouldn't be here
Swap that number into where it should be. i.e. swap the 7 with the 8:
0 1 2 3 8 5 6 7 4 ;
| `--------- 7 is in the right place.
`--------------- this is now the 'current' position
Now we repeat this. Looking again at our current position we ask:
"is this the correct number for here?"
If not, we swap it into its correct place.
If it is in the right place, we move right and do this again.
Following this rule again, we get:
0 1 2 3 4 5 6 7 8 ; 4 and 8 were just swapped
This will gradually build up the list correctly from left to right, and each number will be moved at most once, and hence this is O(n).
If there are dupes, we'll notice it as soon is there is an attempt to swap a number backwards in the list.
Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.
Simpler method:
Step 1, figure out if there are any duplicates. I'm not sure if this is possible in O(1) space. Anyway, return false if there are duplicates.
Step 2, iterate through the list, keep track of the lowest and highest items.
Step 3, Does (highest - lowest) equal m ? If so, return true.
Any one-pass algorithm requires Omega(n) bits of storage.
Suppose to the contrary that there exists a one-pass algorithm that uses o(n) bits. Because it makes only one pass, it must summarize the first n/2 values in o(n) space. Since there are C(n,n/2) = 2^Theta(n) possible sets of n/2 values drawn from S = {1,...,n}, there exist two distinct sets A and B of n/2 values such that the state of memory is the same after both. If A' = S \ A is the "correct" set of values to complement A, then the algorithm cannot possibly answer correctly for the inputs
A A' - yes
B A' - no
since it cannot distinguish the first case from the second.
Q.E.D.
Vote me down if I'm wrong, but I think we can determine if there are duplicates or not using variance. Because we know the mean beforehand (n + (m-1)/2 or something like that) we can just sum up the numbers and square of difference to mean to see if the sum matches the equation (mn + m(m-1)/2) and the variance is (0 + 1 + 4 + ... + (m-1)^2)/m. If the variance doesn't match, it's likely we have a duplicate.
EDIT: variance is supposed to be (0 + 1 + 4 + ... + [(m-1)/2]^2)*2/m, because half of the elements are less than the mean and the other half is greater than the mean.
If there is a duplicate, a term on the above equation will differ from the correct sequence, even if another duplicate completely cancels out the change in mean. So the function returns true only if both sum and variance matches the desrired values, which we can compute beforehand.
Here's a working solution in O(n)
This is using the pseudocode suggested by Hazzen plus some of my own ideas. It works for negative numbers as well and doesn't require any sum-of-the-squares stuff.
function testArray($nums, $n, $m) {
// check the sum. PHP offers this array_sum() method, but it's
// trivial to write your own. O(n) here.
if (array_sum($nums) != ($m * ($m + 2 * $n - 1) / 2)) {
return false; // checksum failed.
}
for ($i = 0; $i < $m; ++$i) {
// check if the number is in the proper range
if ($nums[$i] < $n || $nums[$i] >= $n + $m) {
return false; // value out of range.
}
while (($shouldBe = $nums[$i] - $n) != $i) {
if ($nums[$shouldBe] == $nums[$i]) {
return false; // duplicate
}
$temp = $nums[$i];
$nums[$i] = $nums[$shouldBe];
$nums[$shouldBe] = $temp;
}
}
return true; // huzzah!
}
var_dump(testArray(array(1, 2, 3, 4, 5), 1, 5)); // true
var_dump(testArray(array(5, 4, 3, 2, 1), 1, 5)); // true
var_dump(testArray(array(6, 4, 3, 2, 0), 1, 5)); // false - out of range
var_dump(testArray(array(5, 5, 3, 2, 1), 1, 5)); // false - checksum fail
var_dump(testArray(array(5, 4, 3, 2, 5), 1, 5)); // false - dupe
var_dump(testArray(array(-2, -1, 0, 1, 2), -2, 5)); // true
Awhile back I heard about a very clever sorting algorithm from someone who worked for the phone company. They had to sort a massive number of phone numbers. After going through a bunch of different sort strategies, they finally hit on a very elegant solution: they just created a bit array and treated the offset into the bit array as the phone number. They then swept through their database with a single pass, changing the bit for each number to 1. After that, they swept through the bit array once, spitting out the phone numbers for entries that had the bit set high.
Along those lines, I believe that you can use the data in the array itself as a meta data structure to look for duplicates. Worst case, you could have a separate array, but I'm pretty sure you can use the input array if you don't mind a bit of swapping.
I'm going to leave out the n parameter for time being, b/c that just confuses things - adding in an index offset is pretty easy to do.
Consider:
for i = 0 to m
if (a[a[i]]==a[i]) return false; // we have a duplicate
while (a[a[i]] > a[i]) swapArrayIndexes(a[i], i)
sum = sum + a[i]
next
if sum = (n+m-1)*m return true else return false
This isn't O(n) - probably closer to O(n Log n) - but it does provide for constant space and may provide a different vector of attack for the problem.
If we want O(n), then using an array of bytes and some bit operations will provide the duplication check with an extra n/32 bytes of memory used (assuming 32 bit ints, of course).
EDIT: The above algorithm could be improved further by adding the sum check to the inside of the loop, and check for:
if sum > (n+m-1)*m return false
that way it will fail fast.
Assuming you know only the length of the array and you are allowed to modify the array it can be done in O(1) space and O(n) time.
The process has two straightforward steps.
1. "modulo sort" the array. [5,3,2,4] => [4,5,2,3] (O(2n))
2. Check that each value's neighbor is one higher than itself (modulo) (O(n))
All told you need at most 3 passes through the array.
The modulo sort is the 'tricky' part, but the objective is simple. Take each value in the array and store it at its own address (modulo length). This requires one pass through the array, looping over each location 'evicting' its value by swapping it to its correct location and moving in the value at its destination. If you ever move in a value which is congruent to the value you just evicted, you have a duplicate and can exit early.
Worst case, it's O(2n).
The check is a single pass through the array examining each value with it's next highest neighbor. Always O(n).
Combined algorithm is O(n)+O(2n) = O(3n) = O(n)
Pseudocode from my solution:
foreach(values[])
while(values[i] not congruent to i)
to-be-evicted = values[i]
evict(values[i]) // swap to its 'proper' location
if(values[i]%length == to-be-evicted%length)
return false; // a 'duplicate' arrived when we evicted that number
end while
end foreach
foreach(values[])
if((values[i]+1)%length != values[i+1]%length)
return false
end foreach
I've included the java code proof of concept below, it's not pretty, but it passes all the unit tests I made for it. I call these a 'StraightArray' because they correspond to the poker hand of a straight (contiguous sequence ignoring suit).
public class StraightArray {
static int evict(int[] a, int i) {
int t = a[i];
a[i] = a[t%a.length];
a[t%a.length] = t;
return t;
}
static boolean isStraight(int[] values) {
for(int i = 0; i < values.length; i++) {
while(values[i]%values.length != i) {
int evicted = evict(values, i);
if(evicted%values.length == values[i]%values.length) {
return false;
}
}
}
for(int i = 0; i < values.length-1; i++) {
int n = (values[i]%values.length)+1;
int m = values[(i+1)]%values.length;
if(n != m) {
return false;
}
}
return true;
}
}
Hazzen's algorithm implementation in C
#include<stdio.h>
#define swapxor(a,i,j) a[i]^=a[j];a[j]^=a[i];a[i]^=a[j];
int check_ntom(int a[], int n, int m) {
int i = 0, j = 0;
for(i = 0; i < m; i++) {
if(a[i] < n || a[i] >= n+m) return 0; //invalid entry
j = a[i] - n;
while(j != i) {
if(a[i]==a[j]) return -1; //bucket already occupied. Dupe.
swapxor(a, i, j); //faster bitwise swap
j = a[i] - n;
if(a[i]>=n+m) return 0; //[NEW] invalid entry
}
}
return 200; //OK
}
int main() {
int n=5, m=5;
int a[] = {6, 5, 7, 9, 8};
int r = check_ntom(a, n, m);
printf("%d", r);
return 0;
}
Edit: change made to the code to eliminate illegal memory access.
boolean determineContinuousArray(int *arr, int len)
{
// Suppose the array is like below:
//int arr[10] = {7,11,14,9,8,100,12,5,13,6};
//int len = sizeof(arr)/sizeof(int);
int n = arr[0];
int *result = new int[len];
for(int i=0; i< len; i++)
result[i] = -1;
for (int i=0; i < len; i++)
{
int cur = arr[i];
int hold ;
if ( arr[i] < n){
n = arr[i];
}
while(true){
if ( cur - n >= len){
cout << "array index out of range: meaning this is not a valid array" << endl;
return false;
}
else if ( result[cur - n] != cur){
hold = result[cur - n];
result[cur - n] = cur;
if (hold == -1) break;
cur = hold;
}else{
cout << "found duplicate number " << cur << endl;
return false;
}
}
}
cout << "this is a valid array" << endl;
for(int j=0 ; j< len; j++)
cout << result[j] << "," ;
cout << endl;
return true;
}
def test(a, n, m):
seen = [False] * m
for x in a:
if x < n or x >= n+m:
return False
if seen[x-n]:
return False
seen[x-n] = True
return False not in seen
print test([2, 3, 1], 1, 3)
print test([1, 3, 1], 1, 3)
print test([1, 2, 4], 1, 3)
Note that this only makes one pass through the first array, not considering the linear search involved in not in. :)
I also could have used a python set, but I opted for the straightforward solution where the performance characteristics of set need not be considered.
Update: Smashery pointed out that I had misparsed "constant amount of memory" and this solution doesn't actually solve the problem.
If you want to know the sum of the numbers [n ... n + m - 1] just use this equation.
var sum = m * (m + 2 * n - 1) / 2;
That works for any number, positive or negative, even if n is a decimal.
Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.
O(1) indicates constant space which does not change by the number of n. It does not matter if it is 1 or 2 variables as long as it is a constant number. Why are you saying it is more than O(1) space? If you are calculating the sum of n numbers by accumulating it in a temporary variable, you would be using exactly 1 variable anyway.
Commenting in an answer because the system does not allow me to write comments yet.
Update (in reply to comments): in this answer i meant O(1) space wherever "space" or "time" was omitted. The quoted text is a part of an earlier answer to which this is a reply to.
Given this -
Write a method that takes an int array of size m ...
I suppose it is fair to conclude there is an upper limit for m, equal to the value of the largest int (2^32 being typical). In other words, even though m is not specified as an int, the fact that the array can't have duplicates implies there can't be more than the number of values you can form out of 32 bits, which in turn implies m is limited to be an int also.
If such a conclusion is acceptable, then I propose to use a fixed space of (2^33 + 2) * 4 bytes = 34,359,738,376 bytes = 34.4GB to handle all possible cases. (Not counting the space required by the input array and its loop).
Of course, for optimization, I would first take m into account, and allocate only the actual amount needed, (2m+2) * 4 bytes.
If this is acceptable for the O(1) space constraint - for the stated problem - then let me proceed to an algorithmic proposal... :)
Assumptions: array of m ints, positive or negative, none greater than what 4 bytes can hold. Duplicates are handled. First value can be any valid int. Restrict m as above.
First, create an int array of length 2m-1, ary, and provide three int variables: left, diff, and right. Notice that makes 2m+2...
Second, take the first value from the input array and copy it to position m-1 in the new array. Initialize the three variables.
set ary[m-1] - nthVal // n=0
set left = diff = right = 0
Third, loop through the remaining values in the input array and do the following for each iteration:
set diff = nthVal - ary[m-1]
if (diff > m-1 + right || diff < 1-m + left) return false // out of bounds
if (ary[m-1+diff] != null) return false // duplicate
set ary[m-1+diff] = nthVal
if (diff>left) left = diff // constrains left bound further right
if (diff<right) right = diff // constrains right bound further left
I decided to put this in code, and it worked.
Here is a working sample using C#:
public class Program
{
static bool puzzle(int[] inAry)
{
var m = inAry.Count();
var outAry = new int?[2 * m - 1];
int diff = 0;
int left = 0;
int right = 0;
outAry[m - 1] = inAry[0];
for (var i = 1; i < m; i += 1)
{
diff = inAry[i] - inAry[0];
if (diff > m - 1 + right || diff < 1 - m + left) return false;
if (outAry[m - 1 + diff] != null) return false;
outAry[m - 1 + diff] = inAry[i];
if (diff > left) left = diff;
if (diff < right) right = diff;
}
return true;
}
static void Main(string[] args)
{
var inAry = new int[3]{ 2, 3, 4 };
Console.WriteLine(puzzle(inAry));
inAry = new int[13] { -3, 5, -1, -2, 9, 8, 2, 3, 0, 6, 4, 7, 1 };
Console.WriteLine(puzzle(inAry));
inAry = new int[3] { 21, 31, 41 };
Console.WriteLine(puzzle(inAry));
Console.ReadLine();
}
}
note: this comment is based on the original text of the question (it has been corrected since)
If the question is posed exactly as written above (and it is not just a typo) and for array of size n the function should return (True/False) if the array consists of the numbers 1...n+1,
... then the answer will always be false because the array with all the numbers 1...n+1 will be of size n+1 and not n. hence the question can be answered in O(1). :)
Counter-example for XOR algorithm.
(can't post it as a comment)
#popopome
For a = {0, 2, 7, 5,} it return true (means that a is a permutation of the range [0, 4) ), but it must return false in this case (a is obviously is not a permutaton of [0, 4) ).
Another counter example: {0, 0, 1, 3, 5, 6, 6} -- all values are in range but there are duplicates.
I could incorrectly implement popopome's idea (or tests), therefore here is the code:
bool isperm_popopome(int m; int a[m], int m, int n)
{
/** O(m) in time (single pass), O(1) in space,
no restrictions on n,
no overflow,
a[] may be readonly
*/
int even_xor = 0;
int odd_xor = 0;
for (int i = 0; i < m; ++i)
{
if (a[i] % 2 == 0) // is even
even_xor ^= a[i];
else
odd_xor ^= a[i];
const int b = i + n;
if (b % 2 == 0) // is even
even_xor ^= b;
else
odd_xor ^= b;
}
return (even_xor == 0) && (odd_xor == 0);
}
A C version of b3's pseudo-code
(to avoid misinterpretation of the pseudo-code)
Counter example: {1, 1, 2, 4, 6, 7, 7}.
int pow_minus_one(int power)
{
return (power % 2 == 0) ? 1 : -1;
}
int ceil_half(int n)
{
return n / 2 + (n % 2);
}
bool isperm_b3_3(int m; int a[m], int m, int n)
{
/**
O(m) in time (single pass), O(1) in space,
doesn't use n
possible overflow in sum
a[] may be readonly
*/
int altsum = 0;
int mina = INT_MAX;
int maxa = INT_MIN;
for (int i = 0; i < m; ++i)
{
const int v = a[i] - n + 1; // [n, n+m-1] -> [1, m] to deal with n=0
if (mina > v)
mina = v;
if (maxa < v)
maxa = v;
altsum += pow_minus_one(v) * v;
}
return ((maxa-mina == m-1)
and ((pow_minus_one(mina + m-1) * ceil_half(mina + m-1)
- pow_minus_one(mina-1) * ceil_half(mina-1)) == altsum));
}
In Python:
def ispermutation(iterable, m, n):
"""Whether iterable and the range [n, n+m) have the same elements.
pre-condition: there are no duplicates in the iterable
"""
for i, elem in enumerate(iterable):
if not n <= elem < n+m:
return False
return i == m-1
print(ispermutation([1, 42], 2, 1) == False)
print(ispermutation(range(10), 10, 0) == True)
print(ispermutation((2, 1, 3), 3, 1) == True)
print(ispermutation((2, 1, 3), 3, 0) == False)
print(ispermutation((2, 1, 3), 4, 1) == False)
print(ispermutation((2, 1, 3), 2, 1) == False)
It is O(m) in time and O(1) in space. It does not take into account duplicates.
Alternate solution:
def ispermutation(iterable, m, n):
"""Same as above.
pre-condition: assert(len(list(iterable)) == m)
"""
return all(n <= elem < n+m for elem in iterable)
MY CURRENT BEST OPTION
def uniqueSet( array )
check_index = 0;
check_value = 0;
min = array[0];
array.each_with_index{ |value,index|
check_index = check_index ^ ( 1 << index );
check_value = check_value ^ ( 1 << value );
min = value if value < min
}
check_index = check_index << min;
return check_index == check_value;
end
O(n) and Space O(1)
I wrote a script to brute force combinations that could fail that and it didn't find any.
If you have an array which contravenes this function do tell. :)
#J.F. Sebastian
Its not a true hashing algorithm. Technically, its a highly efficient packed boolean array of "seen" values.
ci = 0, cv = 0
[5,4,3]{
i = 0
v = 5
1 << 0 == 000001
1 << 5 == 100000
0 ^ 000001 = 000001
0 ^ 100000 = 100000
i = 1
v = 4
1 << 1 == 000010
1 << 4 == 010000
000001 ^ 000010 = 000011
100000 ^ 010000 = 110000
i = 2
v = 3
1 << 2 == 000100
1 << 3 == 001000
000011 ^ 000100 = 000111
110000 ^ 001000 = 111000
}
min = 3
000111 << 3 == 111000
111000 === 111000
The point of this being mostly that in order to "fake" most the problem cases one uses duplicates to do so. In this system, XOR penalises you for using the same value twice and assumes you instead did it 0 times.
The caveats here being of course:
both input array length and maximum array value is limited by the maximum value for $x in ( 1 << $x > 0 )
ultimate effectiveness depends on how your underlying system implements the abilities to:
shift 1 bit n places right.
xor 2 registers. ( where 'registers' may, depending on implementation, span several registers )
edit
Noted, above statements seem confusing. Assuming a perfect machine, where an "integer" is a register with Infinite precision, which can still perform a ^ b in O(1) time.
But failing these assumptions, one has to start asking the algorithmic complexity of simple math.
How complex is 1 == 1 ?, surely that should be O(1) every time right?.
What about 2^32 == 2^32 .
O(1)? 2^33 == 2^33? Now you've got a question of register size and the underlying implementation.
Fortunately XOR and == can be done in parallel, so if one assumes infinite precision and a machine designed to cope with infinite precision, it is safe to assume XOR and == take constant time regardless of their value ( because its infinite width, it will have infinite 0 padding. Obviously this doesn't exist. But also, changing 000000 to 000100 is not increasing memory usage.
Yet on some machines , ( 1 << 32 ) << 1 will consume more memory, but how much is uncertain.
A C version of Kent Fredric's Ruby solution
(to facilitate testing)
Counter-example (for C version): {8, 33, 27, 30, 9, 2, 35, 7, 26, 32, 2, 23, 0, 13, 1, 6, 31, 3, 28, 4, 5, 18, 12, 2, 9, 14, 17, 21, 19, 22, 15, 20, 24, 11, 10, 16, 25}. Here n=0, m=35. This sequence misses 34 and has two 2.
It is an O(m) in time and O(1) in space solution.
Out-of-range values are easily detected in O(n) in time and O(1) in space, therefore tests are concentrated on in-range (means all values are in the valid range [n, n+m)) sequences. Otherwise {1, 34} is a counter example (for C version, sizeof(int)==4, standard binary representation of numbers).
The main difference between C and Ruby version:
<< operator will rotate values in C due to a finite sizeof(int),
but in Ruby numbers will grow to accomodate the result e.g.,
Ruby: 1 << 100 # -> 1267650600228229401496703205376
C: int n = 100; 1 << n // -> 16
In Ruby: check_index ^= 1 << i; is equivalent to check_index.setbit(i). The same effect could be implemented in C++: vector<bool> v(m); v[i] = true;
bool isperm_fredric(int m; int a[m], int m, int n)
{
/**
O(m) in time (single pass), O(1) in space,
no restriction on n,
?overflow?
a[] may be readonly
*/
int check_index = 0;
int check_value = 0;
int min = a[0];
for (int i = 0; i < m; ++i) {
check_index ^= 1 << i;
check_value ^= 1 << (a[i] - n); //
if (a[i] < min)
min = a[i];
}
check_index <<= min - n; // min and n may differ e.g.,
// {1, 1}: min=1, but n may be 0.
return check_index == check_value;
}
Values of the above function were tested against the following code:
bool *seen_isperm_trusted = NULL;
bool isperm_trusted(int m; int a[m], int m, int n)
{
/** O(m) in time, O(m) in space */
for (int i = 0; i < m; ++i) // could be memset(s_i_t, 0, m*sizeof(*s_i_t));
seen_isperm_trusted[i] = false;
for (int i = 0; i < m; ++i) {
if (a[i] < n or a[i] >= n + m)
return false; // out of range
if (seen_isperm_trusted[a[i]-n])
return false; // duplicates
else
seen_isperm_trusted[a[i]-n] = true;
}
return true; // a[] is a permutation of the range: [n, n+m)
}
Input arrays are generated with:
void backtrack(int m; int a[m], int m, int nitems)
{
/** generate all permutations with repetition for the range [0, m) */
if (nitems == m) {
(void)test_array(a, nitems, 0); // {0, 0}, {0, 1}, {1, 0}, {1, 1}
}
else for (int i = 0; i < m; ++i) {
a[nitems] = i;
backtrack(a, m, nitems + 1);
}
}
The Answer from "nickf" dows not work if the array is unsorted
var_dump(testArray(array(5, 3, 1, 2, 4), 1, 5)); //gives "duplicates" !!!!
Also your formula to compute sum([n...n+m-1]) looks incorrect....
the correct formula is (m(m+1)/2 - n(n-1)/2)
An array contains N numbers, and you want to determine whether two of the
numbers sum to a given number K. For instance, if the input is 8,4, 1,6 and K is 10,
the answer is yes (4 and 6). A number may be used twice. Do the following.
a. Give an O(N2) algorithm to solve this problem.
b. Give an O(N log N) algorithm to solve this problem. (Hint: Sort the items first.
After doing so, you can solve the problem in linear time.)
c. Code both solutions and compare the running times of your algorithms.
4.
Product of m consecutive numbers is divisible by m! [ m factorial ]
so in one pass you can compute the product of the m numbers, also compute m! and see if the product modulo m ! is zero at the end of the pass
I might be missing something but this is what comes to my mind ...
something like this in python
my_list1 = [9,5,8,7,6]
my_list2 = [3,5,4,7]
def consecutive(my_list):
count = 0
prod = fact = 1
for num in my_list:
prod *= num
count +=1
fact *= count
if not prod % fact:
return 1
else:
return 0
print consecutive(my_list1)
print consecutive(my_list2)
HotPotato ~$ python m_consecutive.py
1
0
I propose the following:
Choose a finite set of prime numbers P_1,P_2,...,P_K, and compute the occurrences of the elements in the input sequence (minus the minimum) modulo each P_i. The pattern of a valid sequence is known.
For example for a sequence of 17 elements, modulo 2 we must have the profile: [9 8], modulo 3: [6 6 5], modulo 5: [4 4 3 3 3], etc.
Combining the test using several bases we obtain a more and more precise probabilistic test. Since the entries are bounded by the integer size, there exists a finite base providing an exact test. This is similar to probabilistic pseudo primality tests.
S_i is an int array of size P_i, initially filled with 0, i=1..K
M is the length of the input sequence
Mn = INT_MAX
Mx = INT_MIN
for x in the input sequence:
for i in 1..K: S_i[x % P_i]++ // count occurrences mod Pi
Mn = min(Mn,x) // update min
Mx = max(Mx,x) // and max
if Mx-Mn != M-1: return False // Check bounds
for i in 1..K:
// Check profile mod P_i
Q = M / P_i
R = M % P_i
Check S_i[(Mn+j) % P_i] is Q+1 for j=0..R-1 and Q for j=R..P_i-1
if this test fails, return False
return True
Any contiguous array [ n, n+1, ..., n+m-1 ] can be mapped on to a 'base' interval [ 0, 1, ..., m ] using the modulo operator. For each i in the interval, there is exactly one i%m in the base interval and vice versa.
Any contiguous array also has a 'span' m (maximum - minimum + 1) equal to it's size.
Using these facts, you can create an "encountered" boolean array of same size containing all falses initially, and while visiting the input array, put their related "encountered" elements to true.
This algorithm is O(n) in space, O(n) in time, and checks for duplicates.
def contiguous( values )
#initialization
encountered = Array.new( values.size, false )
min, max = nil, nil
visited = 0
values.each do |v|
index = v % encountered.size
if( encountered[ index ] )
return "duplicates";
end
encountered[ index ] = true
min = v if min == nil or v < min
max = v if max == nil or v > max
visited += 1
end
if ( max - min + 1 != values.size ) or visited != values.size
return "hole"
else
return "contiguous"
end
end
tests = [
[ false, [ 2,4,5,6 ] ],
[ false, [ 10,11,13,14 ] ] ,
[ true , [ 20,21,22,23 ] ] ,
[ true , [ 19,20,21,22,23 ] ] ,
[ true , [ 20,21,22,23,24 ] ] ,
[ false, [ 20,21,22,23,24+5 ] ] ,
[ false, [ 2,2,3,4,5 ] ]
]
tests.each do |t|
result = contiguous( t[1] )
if( t[0] != ( result == "contiguous" ) )
puts "Failed Test : " + t[1].to_s + " returned " + result
end
end
I like Greg Hewgill's idea of Radix sorting. To find duplicates, you can sort in O(N) time given the constraints on the values in this array.
For an in-place O(1) space O(N) time that restores the original ordering of the list, you don't have to do an actual swap on that number; you can just mark it with a flag:
//Java: assumes all numbers in arr > 1
boolean checkArrayConsecutiveRange(int[] arr) {
// find min/max
int min = arr[0]; int max = arr[0]
for (int i=1; i<arr.length; i++) {
min = (arr[i] < min ? arr[i] : min);
max = (arr[i] > max ? arr[i] : max);
}
if (max-min != arr.length) return false;
// flag and check
boolean ret = true;
for (int i=0; i<arr.length; i++) {
int targetI = Math.abs(arr[i])-min;
if (arr[targetI] < 0) {
ret = false;
break;
}
arr[targetI] = -arr[targetI];
}
for (int i=0; i<arr.length; i++) {
arr[i] = Math.abs(arr[i]);
}
return ret;
}
Storing the flags inside the given array is kind of cheating, and doesn't play well with parallelization. I'm still trying to think of a way to do it without touching the array in O(N) time and O(log N) space. Checking against the sum and against the sum of least squares (arr[i] - arr.length/2.0)^2 feels like it might work. The one defining characteristic we know about a 0...m array with no duplicates is that it's uniformly distributed; we should just check that.
Now if only I could prove it.
I'd like to note that the solution above involving factorial takes O(N) space to store the factorial itself. N! > 2^N, which takes N bytes to store.
Oops! I got caught up in a duplicate question and did not see the already identical solutions here. And I thought I'd finally done something original! Here is a historical archive of when I was slightly more pleased:
Well, I have no certainty if this algorithm satisfies all conditions. In fact, I haven't even validated that it works beyond a couple test cases I have tried. Even if my algorithm does have problems, hopefully my approach sparks some solutions.
This algorithm, to my knowledge, works in constant memory and scans the array three times. Perhaps an added bonus is that it works for the full range of integers, if that wasn't part of the original problem.
I am not much of a pseudo-code person, and I really think the code might simply make more sense than words. Here is an implementation I wrote in PHP. Take heed of the comments.
function is_permutation($ints) {
/* Gather some meta-data. These scans can
be done simultaneously */
$lowest = min($ints);
$length = count($ints);
$max_index = $length - 1;
$sort_run_count = 0;
/* I do not have any proof that running this sort twice
will always completely sort the array (of course only
intentionally happening if the array is a permutation) */
while ($sort_run_count < 2) {
for ($i = 0; $i < $length; ++$i) {
$dest_index = $ints[$i] - $lowest;
if ($i == $dest_index) {
continue;
}
if ($dest_index > $max_index) {
return false;
}
if ($ints[$i] == $ints[$dest_index]) {
return false;
}
$temp = $ints[$dest_index];
$ints[$dest_index] = $ints[$i];
$ints[$i] = $temp;
}
++$sort_run_count;
}
return true;
}
So there is an algorithm that takes O(n^2) that does not require modifying the input array and takes constant space.
First, assume that you know n and m. This is a linear operation, so it does not add any additional complexity. Next, assume there exists one element equal to n and one element equal to n+m-1 and all the rest are in [n, n+m). Given that, we can reduce the problem to having an array with elements in [0, m).
Now, since we know that the elements are bounded by the size of the array, we can treat each element as a node with a single link to another element; in other words, the array describes a directed graph. In this directed graph, if there are no duplicate elements, every node belongs to a cycle, that is, a node is reachable from itself in m or less steps. If there is a duplicate element, then there exists one node that is not reachable from itself at all.
So, to detect this, you walk the entire array from start to finish and determine if each element returns to itself in <=m steps. If any element is not reachable in <=m steps, then you have a duplicate and can return false. Otherwise, when you finish visiting all elements, you can return true:
for (int start_index= 0; start_index<m; ++start_index)
{
int steps= 1;
int current_element_index= arr[start_index];
while (steps<m+1 && current_element_index!=start_index)
{
current_element_index= arr[current_element_index];
++steps;
}
if (steps>m)
{
return false;
}
}
return true;
You can optimize this by storing additional information:
Record sum of the length of the cycle from each element, unless the cycle visits an element before that element, call it sum_of_steps.
For every element, only step m-sum_of_steps nodes out. If you don't return to the starting element and you don't visit an element before the starting element, you have found a loop containing duplicate elements and can return false.
This is still O(n^2), e.g. {1, 2, 3, 0, 5, 6, 7, 4}, but it's a little bit faster.
ciphwn has it right. It is all to do with statistics. What the question is asking is, in statistical terms, is whether or not the sequence of numbers form a discrete uniform distribution. A discrete uniform distribution is where all values of a finite set of possible values are equally probable. Fortunately there are some useful formulas to determine if a discrete set is uniform. Firstly, to determine the mean of the set (a..b) is (a+b)/2 and the variance is (n.n-1)/12. Next, determine the variance of the given set:
variance = sum [i=1..n] (f(i)-mean).(f(i)-mean)/n
and then compare with the expected variance. This will require two passes over the data, once to determine the mean and again to calculate the variance.
References:
uniform discrete distribution
variance
Here is a solution in O(N) time and O(1) extra space for finding duplicates :-
public static boolean check_range(int arr[],int n,int m) {
for(int i=0;i<m;i++) {
arr[i] = arr[i] - n;
if(arr[i]>=m)
return(false);
}
System.out.println("In range");
int j=0;
while(j<m) {
System.out.println(j);
if(arr[j]<m) {
if(arr[arr[j]]<m) {
int t = arr[arr[j]];
arr[arr[j]] = arr[j] + m;
arr[j] = t;
if(j==arr[j]) {
arr[j] = arr[j] + m;
j++;
}
}
else return(false);
}
else j++;
}
Explanation:-
Bring number to range (0,m-1) by arr[i] = arr[i] - n if out of range return false.
for each i check if arr[arr[i]] is unoccupied that is it has value less than m
if so swap(arr[i],arr[arr[i]]) and arr[arr[i]] = arr[arr[i]] + m to signal that it is occupied
if arr[j] = j and simply add m and increment j
if arr[arr[j]] >=m means it is occupied hence current value is duplicate hence return false.
if arr[j] >= m then skip

Resources