Given an unordered list of integers, return a value not present in the list - c

I have an algorithm problem that I came across at work but have not been able to come up with a satisfactory solution for. I browsed this forum some and the closest I have come to the same problem is How to find a duplicate element in an array of shuffled consecutive integers?.
I have a list of N elements of integers which can contain the elements 1-M (M>N), further the list is unsorted. I want a function that will take this list as input and return a value in range 1-M not present in the list. The list contains no duplicates. I was hoping for an o(N) solution, with out using additional space
UPDATE: function cannot change the original list L
for instance N = 5 M = 10
List (L): 1, 2, 4, 8, 3
then f(L) = 5
To be honest i dont care if it returns an element other than 5, just so long as it meets the contraints above
The only solution I have come up with so far is using an additional array of M elements. Walking through the input list and setting the corresponding array elements to 1 if present in the list. Then iterating over this list again and returning the index of the first element with value 0. As you can see this uses additional o(M) space and has complexity 2*o(N).
Any help would we appreciated.
Thanks for the help everyone. The stack overflow community is definitely super helpful.
To give everyone a little more context of the problem I am trying to solve.
I have a set of M token that I give out to some clients (one per client). When a client is done with the token they get returned to my pile. As you can see the original order I give client a token is sorted.
so M = 3 Tokens
client1: 1 <2,3>
client2: 2 <3>
client1 return: 1 <1,3>
client 3: 3 <1>
Now the question is giving client4 token 1. I could at this stage give client 4 token 2 and sort the list. Not sure if that would help. In any case if I come up with a nice clean solution I will be sure to post it
Just realised I might have confused everyone. I do not have the list of free token with me when I am called. I could statically maintain such a list but I would rather not

You can do divide and conquer. Basically given the range 1..m, do a quicksort style swapping with m/2 as the pivot. If there are less than m/2 elements in the first half, then there is a missing number and iteratively find it. Otherwise, there is a missing number in the second half. Complexity: n+n/2+n/4... = O(n)
def findmissing(x, startIndex, endIndex, minVal, maxVal):
pivot = (minVal+maxVal)/2
i=startIndex
j=endIndex
while(True):
while( (x[i] <= pivot) and (i<j) ):
i+=1
if i>=j:
break
while( (x[j] > pivot) and (i<j) ):
j+=1
if i>=j:
break
swap(x,i,j)
k = findlocation(x,pivot)
if (k-startIndex) < (pivot-minVal):
findmissing(x,startIndex, k, minVal, pivot)
else:
findmissing(x, k+1, endIndex, pivot+1, maxVal)
I have not implemented the end condition which I will leave it to you.

You can have O(N) time and space. You can be sure there is an absent element within 1..N+1, so make an array of N+1 elements, and ignore numbers larger than N+1.
If M is large compared to N, say M>2N, generate a random number in 1..M and check if it is not on the list in O(N) time, O(1) space. The probability you will find a solution in a single pass is at least 1/2, and therefore (geometric distribution) the expected number of passes is constant, average complexity O(N).
If M is N+1 or N+2, use the approach described here.

Can you do something like a counting sort? Create an array of size (M-1) then go through the list once (N) and change the array element indexed at i-1 to one. After looping once through N, search 0->(M-1) until you find the first array with a zero value.
Should me O(N+M).
Array L of size (M-1): [0=0, 1=0, 2=0, 3=0, 4=0, 5=0, 6=0, 7=0, 8=0, 9=0]
After looping through N elements: [0=1, 1=1, 2=1, 3=1, 4=0, 5=0, 6=0, 7=1, 8=0, 9=0]
Search array 0->(M-1) finds index 4 is zero, therefore 5 (4+1) is the first integer not in L.

After reading your updated i guess you are making it over complex. First of all let me list down what i get from your question
Yoou need to give a token to the client regardless of its order, quoting from your original post
for instance N = 5 M = 10 List (L): 1, 2, 4, 8, 3 then f(L) = 5 To be
honest i dont care if it returns an element other than 5, just so long
as it meets the contraints above
Secondly, you are already mantaining a list of "M" Tokens
Client is fetching the token and after using it returning it back to you
Given these 2 points, why don't you implement a TokenPool?
Implement your M list based on a Queue
Whenever a client ask for a a token, fetch a token from the queue i.e. removing it from queue. By this method, your queue will always maintain those tokens which aren't given away. you are doing it O(1)
Whenever a client is done with the token he will return it back to you. Add it back to the queue. Again O(1).
In whole implementation, you wouldn't have to loop through any of list. All you have to do is to Generate the token and insert in the queue.

Related

Search specific permutation of permutationsubset with constraints

Iam searching one permutation P consisting of p1...pn of following subset S.
S is defined of the Labels L.
L1...Lk. Where a L contains pi...pj.
Where the inverse of P has at most k-1 decreasing adjecent Elements. k <= n.
Example:
n := 4
k := 2
L1 := 1,2
L2 := 3,4
L := L1,L2,L1,L2
S := 1324,1423,2314,2413
one solution would be P := 1342
no solution would be P := 3142 because decreasing adjecent elements are 2 but only max1 ist allowed because k =2.
Exists therefor an algorithm to find P of S defined by L?
Currently I use bruteforce to figure one permutation P, but its getting very fast unusable slow.
So each of L1, ..., Lk is a consecutive set of elements. At each place we see Li, Lj in the definition of L, one of three things is true:
i < j in which case it is ascending.
i = j in which case it could be ascending or descending.
i > j in which case it must be descending.
By counting the number of places where case 3 is true, we get a minimum number of descending elements already in the definition of L.
Next, for each Li we have a pattern we can write down with len(Li)-1 ; and , where a ; means that there are elements of other Ljs between two members of Li, and , means that Li elements are adjacent and so the order of the elements may result in a descent. We want to know, "For each possible number of descents within Li, how many permutations of Li have that number of descents?"
We will think of building the permutations as follows:
The first element goes at position 0.
The second element goes to position 0 or 1. (If at 0, the first element is moved.)
The third element goes to position 0, 1, or 2.
etc
A descent is when the next element is smaller than the previous, at a transition matching a ,.
We actually will want the following data structure for later use:
cache[Li] gives:
by how many elements are chosen:
by the last element chosen:
by the number of descents we will add:
how many ways of finishing this permutation
So we can write a recursive function that takes:
The pattern for Li.
How many elements have been chosen.
What index was last chosen.
It then returns a dictionary mapping descents to count of ways to finish the permutation for Li.
Memoize that and we get our desired data structure.
Now we'll repeat the idea. We want:
cache2[i] gives:
by number of descents to use:
how many permutations of L[i], L[i+1], ..., L[k] meet it.
Again we can write a recursive function using cache to calculate this, and we can memoize it to get cache2.
And NOW we can reverse the process.
We know how many descents came from the definition of L.
We know the distribution of remaining descents from cache2[1], so we can randomly pick how many descents there will be meeting our condition among L1...Lk.
For L1...Lk we can look at cache[L1][1][0] and cache2[i+1] to figure out how many descents there will be within Li with the correct probability.
For each Li we can look at how many descents we want to wind up with, its pattern, and cache2[Li] to figure out a random sequence of inserts winding up with the right pattern. The first insert is always at 0. After that you always know the size, and where the last insert was, and how many descents are left. So for each possiblenext insert you figure out if it counts as a descent (look at both pattern, and whether it is before the last insert), and the number of ways to finish from there. Then you can choose the next insert randomly with the right possibility.
For each Li we can turn the pattern of inserts into the list of values in order. (I will explain this step more.)
We can now follow the pattern of L and fill in all of the values.
Now for step 5, let's illustrate with your example from the chat. Suppose that L2 = [4, 5, 6] and the pattern of inserts we came up with was [0, 1, 0]. How do we figure out the arrangement of values?
Well first we do our inserts:
[1]
[1, 2]
[3, 1, 2]
This says that the first element (4) goes to the third place, the second (5) to the first, and the third (6) to the second. So our permutation for L2 is [5, 6, 4].
This will be a lot of code to write. But it will be polynomial. Specifically if m is the count of the most common label, cache will have total size at most O(k m^2). Thanks to memoization, each entry takes O(m) to calculate. Everything else is small relative to that. So total space is O(k m^2) and time is O(k m^3).

How to find a number that was repeated (n/3) times an array of size n, in O(n) time and O(n) space?

I have this question that I just can't figure it out! Any hints would mean a lot. Thank you in advance.
I have an array, A. It's size is n, and I want to find an algorithm that will find x that appears in this array at least n/3 times. If there is no such x in the array then we will print that we didn't find one!
I need to find an algorithm that does this in O(n) time and takes O(n) space.
For example:
A=[1 1 2 2 1 1 1 5 6 7]
For the above array, the algorithm should return 1.
If I was you, I write an algorithm that:
Instantiates a map (i.e. key/value pairs) in whatever language you're using. The key will be the integer you find, the value will be the number of times it has been seen so far.
Iterate over the array. For the current integer, check whether the number exists as a key in your map. If it exists, increment the map's value. If it doesn't exist, insert a new element with a count of 1.
After the iteration is complete, iterate over your map. If any elements have counts of greater than n/3, print it out. Handle the case where none are found, etc.
Here is my solution in pseudocode; note that it is possible to have two solutions as well as one or none:
func anna(A, n) # array and length
ht := {} # create empty hash table
for k in [0,n) # iterate over array
if A[k] in ht # previously seen
ht{k} := ht{k} + 1 # increment count
else # previously seen
ht{k} := 1 # initialize count
solved := False # flag if solution found
for k in keys(ht) # iterate over hash table
if ht{k} > n / 3 # found solution
solved := True # update flag
print k # write it
if not solved # no solution found
print "No solution" # report failure
The first for loop takes O(n) time. The second for loop potentially takes O(n) time if all items in the array are distinct, though most often the second for loop will take much less time. The hash table takes O(n) space if all items in the array are distinct, though most often it takes much less space.
It is possible to optimize the solution so it stops early and reports failure if there are no possible solutions. To do that, keep a variable max in the first for loop, increment it every time it is exceeded by a new hash table count, and check after each element is added to the hash table if max + n - k < n / 3.

algorithm which finds the numbers in a sequence which appear 3 times or more, and prints their indexes

Suppose I input a sequence of numbers which ends with -1.
I want to print all the values of the sequence that occur in it 3 times or more, and also print their indexes in the sequence.
For example , if the input is : 2 3 4 2 2 5 2 4 3 4 2 -1
so the expected output in that case is :
2: 0 3 4 6 10
4: 2 7 9
First I thought of using quick-sort , but then I realized that as a result I will lose the original indexes of the sequence. I also have been thinking of using count, but that sequence has no given range of numbers - so maybe count will be no good in that case.
Now I wonder if I might use an array of pointers (but how?)
Do you have any suggestions or tips for an algorithm with time complexity O(nlogn) for that ? It would be very appreciated.
Keep it simple!
The easiest way would be to scan the sequence and count the number of occurrence of each element, put the elements that match the condition in an auxiliary array.
Then, for each element in the auxiliary array, scan the sequence again and print out the indices.
First of all, sorry for my bad english (It's not my language) I'll try my best.
So similar to what #vvigilante told, here is an algorithm implemented in python (it is in python because is more similar to pseudo code, so you can translate it to any language you want, and moreover I add a lot of comment... hope you get it!)
from typing import Dict, List
def three_or_more( input_arr:int ) -> None:
indexes: Dict[int, List[int]] = {}
#scan the array
i:int
for i in range(0, len(input_arr)-1):
#create list for the number in position i
# (if it doesn't exist)
#and append the number
indexes.setdefault(input_arr[i],[]).append(i)
#for each key in the dictionary
n:int
for n in indexes.keys():
#if the number of element for that key is >= 3
if len(indexes[n]) >= 3:
#print the key
print("%d: "%(n), end='')
#print each element int the current key
el:int
for el in indexes[n]:
print("%d,"%(el), end='')
#new line
print("\n", end='')
#call the function
three_or_more([2, 3, 4, 2, 2, 5, 2, 4, 3, 4, 2, -1])
Complexity:
The first loop scan the input array = O(N).
The second one check for any number (digit) in the array,
since they are <= N (you can not have more number than element), so it is O(numbers) the complexity is O(N).
The loop inside the loop go through all indexes corresponding to the current number...
the complexity seem to be O(N) int the worst case (but it is not)
So the complexity would be O(N) + O(N)*O(N) = O(N^2)
but remember that the two nest loop can at least print all N indexes, and since the indexes are not repeated the complexity of them is O(N)...
So O(N)+O(N) ~= O(N)
Speaking about memory it is O(N) for the input array + O(N) for the dictionary (because it contain all N indexes) ~= O(N).
Well if you do it in c++ remember that maps are way slower than array, so if N is small, you should use an array of array (or std::vector> ), else you can also try an unordered map that use hashes
P.S. Remember that get the size of a vector is O(1) time because it is a difference of pointers!
Starting with a sorted list is a good idea.
You could create a second array of original indices and duplicate all of the memory moves for the sort on the indices array. Then checking for triplicates is trivial and only requires sort + 1 traversal.

No. of paths in integer array

There is an integer array, for eg.
{3,1,2,7,5,6}
One can move forward through the array either each element at a time or can jump a few elements based on the value at that index. For e.g., one can go from 3 to 1 or 3 to 7, then one can go from 1 to 2 or 1 to 2(no jumping possible here), then one can go 2 to 7 or 2 to 5, then one can go 7 to 5 only coz index of 7 is 3 and adding 7 to 3 = 10 and there is no tenth element.
I have to only count the number of possible paths to reach the end of the array from start index.
I could only do it recursively and naively which runs in exponential time.
Somebody plz help.
My recommendation: use dynamic programming.
If this key word is sufficient and you want the challenge to find a possible solution on your own, dont read any further!
Here a possible DP-algorithm on the example input {3,1,2,7,5,6}. It will be your job to adjust on the general problem.
create array sol length 6 with just zeros in it. the array will hold the number of ways.
sol[5] = 1;
for (i = 4; i>=0;i--) {
sol[i] = sol[i+1];
if (i+input[i] < 6 && input[i] != 1)
sol[i] += sol[i+input[i]];
}
return sol[0];
runtime O(n)
As for the directed graph solution hinted in the comments :
Each cell in the array represents a node. Make an directed edge from each node to the node accessable. Basically you can then count more easily the number of ways by just looking at the outdegrees on the nodes (since there is no directed cycle) however it is a lot of boiler plate to actual program it.
Adjusting the recursive solution
another solution would be to pruning. This is basically equivalent to the DP-algorithm. The exponentiel time comes from the fact, that you calculate values several times. Eg function is recfunc(index). The initial call recFunc(0) calls recFunc(1) and recFunc(3) and so on. However recFunc(3) is bound to be called somewhen again, which leads to a repeated recursive calculation. To prune this you add a Map to hold all already calculated values. If you make a call recFunc(x) you lookup in the map if x was already calculated. If yes, return the stored value. If not, calculate, store and return it. This way you get a O(n) too.

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources