How to find a number that was repeated (n/3) times an array of size n, in O(n) time and O(n) space? - arrays

I have this question that I just can't figure it out! Any hints would mean a lot. Thank you in advance.
I have an array, A. It's size is n, and I want to find an algorithm that will find x that appears in this array at least n/3 times. If there is no such x in the array then we will print that we didn't find one!
I need to find an algorithm that does this in O(n) time and takes O(n) space.
For example:
A=[1 1 2 2 1 1 1 5 6 7]
For the above array, the algorithm should return 1.

If I was you, I write an algorithm that:
Instantiates a map (i.e. key/value pairs) in whatever language you're using. The key will be the integer you find, the value will be the number of times it has been seen so far.
Iterate over the array. For the current integer, check whether the number exists as a key in your map. If it exists, increment the map's value. If it doesn't exist, insert a new element with a count of 1.
After the iteration is complete, iterate over your map. If any elements have counts of greater than n/3, print it out. Handle the case where none are found, etc.

Here is my solution in pseudocode; note that it is possible to have two solutions as well as one or none:
func anna(A, n) # array and length
ht := {} # create empty hash table
for k in [0,n) # iterate over array
if A[k] in ht # previously seen
ht{k} := ht{k} + 1 # increment count
else # previously seen
ht{k} := 1 # initialize count
solved := False # flag if solution found
for k in keys(ht) # iterate over hash table
if ht{k} > n / 3 # found solution
solved := True # update flag
print k # write it
if not solved # no solution found
print "No solution" # report failure
The first for loop takes O(n) time. The second for loop potentially takes O(n) time if all items in the array are distinct, though most often the second for loop will take much less time. The hash table takes O(n) space if all items in the array are distinct, though most often it takes much less space.
It is possible to optimize the solution so it stops early and reports failure if there are no possible solutions. To do that, keep a variable max in the first for loop, increment it every time it is exceeded by a new hash table count, and check after each element is added to the hash table if max + n - k < n / 3.

Related

Minimum steps needed to make all elements equal by adding adjacent elements

I have an array A of size N. All elements are positive integers. In one step, I can add two adjacent elements and replace them with their sum. That said, the array size reduces by 1. Now I need to make all the elements same by performing minimum number of steps.
For example: A = [1,2,3,2,1,3].
Step 1: Merge index 0 and 1 ==> A = [3,3,2,1,3]
Step 2: Merge index 2 and 3 (of new array) ==> [3,3,3,3]
Hence number of steps are 2.
I couldn't think of a straight solution, so tried a recursive approach by merging all indices one by one and returning the min level I can get when either array size is 1 or all elements are equal.
Below is the code I tried:
# Checks if all the elements are same or not
def check(A):
if len(set(A)) == 1:
return True
return False
# Recursive function to find min steps
def min_steps(N,A,level):
# If all are equal return the level
if N == 1 or check(A):
return level
# Initialize min variable
mn = float('+inf')
# Try merging all one by one and recur
for i in range(N-1):
temp = A[:]
temp[i]+=temp[i+1]
temp.pop(i+1)
mn = min(mn, min_steps(N-1,temp, level+1))
return mn
This solution has complexity of O(N^N). I want to reduce it to polynomial time near to O(N^2) or O(N^3). Can anyone help me modify this solution or tell me if I am missing something?
Combining any k adjacent pairs of elements (even if they include elements formed from previous combining steps) leaves exactly n-k elements in total, each of which we can map back to the contiguous subarray of the original problem that constitutes the elements that were added together to form it. So, this problem is equivalent to partitioning the array into the largest possible number of contiguous subarrays such that all subarrays have the same sum: Any adjacent pair of elements within the same subarray can be combined into a single element, and this process repeated within the subarray with adjacent pairs chosen in any order, until all elements have been combined into a single element.
So, if there are n elements and they sum to T, then a simple O(nT) algorithm is:
For i from 0 to T:
Try partitioning the elements into subarrays each having sum i. This amounts to scanning along the array, greedily adding the current element to the current subarray if the sum of elements in the current subarray is strictly < i. When we reach a total of exactly i, the current subarray ends and a new subarray (initially having sum 0) begins. If adding the current element takes us above the target of i, or if we run out of elements before reaching the target, stop this scan and try the next outer loop iteration (value of i). OTOH if we get to the end, having formed k subarrays in the process, stop and report n-k as the optimal (minimum possible) number of combining moves.
A small speedup would be to only try target i values that evenly divide T.
EDIT: To improve the time complexity from O(nT) to O(n^2), it suffices to only try target i values corresponding to sums of prefixes of the array (since there must be a subarray containing the first element, and this subarray can only have such a sum).

algorithm which finds the numbers in a sequence which appear 3 times or more, and prints their indexes

Suppose I input a sequence of numbers which ends with -1.
I want to print all the values of the sequence that occur in it 3 times or more, and also print their indexes in the sequence.
For example , if the input is : 2 3 4 2 2 5 2 4 3 4 2 -1
so the expected output in that case is :
2: 0 3 4 6 10
4: 2 7 9
First I thought of using quick-sort , but then I realized that as a result I will lose the original indexes of the sequence. I also have been thinking of using count, but that sequence has no given range of numbers - so maybe count will be no good in that case.
Now I wonder if I might use an array of pointers (but how?)
Do you have any suggestions or tips for an algorithm with time complexity O(nlogn) for that ? It would be very appreciated.
Keep it simple!
The easiest way would be to scan the sequence and count the number of occurrence of each element, put the elements that match the condition in an auxiliary array.
Then, for each element in the auxiliary array, scan the sequence again and print out the indices.
First of all, sorry for my bad english (It's not my language) I'll try my best.
So similar to what #vvigilante told, here is an algorithm implemented in python (it is in python because is more similar to pseudo code, so you can translate it to any language you want, and moreover I add a lot of comment... hope you get it!)
from typing import Dict, List
def three_or_more( input_arr:int ) -> None:
indexes: Dict[int, List[int]] = {}
#scan the array
i:int
for i in range(0, len(input_arr)-1):
#create list for the number in position i
# (if it doesn't exist)
#and append the number
indexes.setdefault(input_arr[i],[]).append(i)
#for each key in the dictionary
n:int
for n in indexes.keys():
#if the number of element for that key is >= 3
if len(indexes[n]) >= 3:
#print the key
print("%d: "%(n), end='')
#print each element int the current key
el:int
for el in indexes[n]:
print("%d,"%(el), end='')
#new line
print("\n", end='')
#call the function
three_or_more([2, 3, 4, 2, 2, 5, 2, 4, 3, 4, 2, -1])
Complexity:
The first loop scan the input array = O(N).
The second one check for any number (digit) in the array,
since they are <= N (you can not have more number than element), so it is O(numbers) the complexity is O(N).
The loop inside the loop go through all indexes corresponding to the current number...
the complexity seem to be O(N) int the worst case (but it is not)
So the complexity would be O(N) + O(N)*O(N) = O(N^2)
but remember that the two nest loop can at least print all N indexes, and since the indexes are not repeated the complexity of them is O(N)...
So O(N)+O(N) ~= O(N)
Speaking about memory it is O(N) for the input array + O(N) for the dictionary (because it contain all N indexes) ~= O(N).
Well if you do it in c++ remember that maps are way slower than array, so if N is small, you should use an array of array (or std::vector> ), else you can also try an unordered map that use hashes
P.S. Remember that get the size of a vector is O(1) time because it is a difference of pointers!
Starting with a sorted list is a good idea.
You could create a second array of original indices and duplicate all of the memory moves for the sort on the indices array. Then checking for triplicates is trivial and only requires sort + 1 traversal.

How to locate in a huge list of numbers, two numbers where xi=xj?

I have the following question, and it screams at me for a solution with hashing:
Problem :
Given a huge list of numbers, x1........xn where xi <= T, we'd like to know
whether or not exists two indices i,j, where x_i == x_j.
Find an algorithm in O(n) run time, and also with expectancy of O(n), for the problem.
My solution at the moment : We use hashing, where we'll have a mapping function h(x) using chaining.
First - we build a new array, let's call it A, where each cell is a linked list - this would be the destination array.
Now - we run on all the n numbers and map each element in x1........xn, to its rightful place, using the hash function. This would take O(n) run time.
After that we'll run on A, and look for collisions. If we'll find a cell where length(A[k]) > 1
then we return the xi and xj that were mapped to the value that's stored in A[k] - total run time here would be O(n) for the worst case , if the mapped value of two numbers (if they indeed exist) in the last cell of A.
The same approach can be ~twice faster (on average), still O(n) on average - but with better constants.
No need to map all the elements into the hash and then go over it - a faster solution could be:
for each element e:
if e is in the table:
return e
else:
insert e into the table
Also note that if T < n, there must be a dupe within the first T+1 elements, from pigeonhole principle.
Also for small T, you can use a simple array of size T, no hash is needed (hash(x) = x). Initializing T can be done in O(1) to contain zeros as initial values.

Given an unordered list of integers, return a value not present in the list

I have an algorithm problem that I came across at work but have not been able to come up with a satisfactory solution for. I browsed this forum some and the closest I have come to the same problem is How to find a duplicate element in an array of shuffled consecutive integers?.
I have a list of N elements of integers which can contain the elements 1-M (M>N), further the list is unsorted. I want a function that will take this list as input and return a value in range 1-M not present in the list. The list contains no duplicates. I was hoping for an o(N) solution, with out using additional space
UPDATE: function cannot change the original list L
for instance N = 5 M = 10
List (L): 1, 2, 4, 8, 3
then f(L) = 5
To be honest i dont care if it returns an element other than 5, just so long as it meets the contraints above
The only solution I have come up with so far is using an additional array of M elements. Walking through the input list and setting the corresponding array elements to 1 if present in the list. Then iterating over this list again and returning the index of the first element with value 0. As you can see this uses additional o(M) space and has complexity 2*o(N).
Any help would we appreciated.
Thanks for the help everyone. The stack overflow community is definitely super helpful.
To give everyone a little more context of the problem I am trying to solve.
I have a set of M token that I give out to some clients (one per client). When a client is done with the token they get returned to my pile. As you can see the original order I give client a token is sorted.
so M = 3 Tokens
client1: 1 <2,3>
client2: 2 <3>
client1 return: 1 <1,3>
client 3: 3 <1>
Now the question is giving client4 token 1. I could at this stage give client 4 token 2 and sort the list. Not sure if that would help. In any case if I come up with a nice clean solution I will be sure to post it
Just realised I might have confused everyone. I do not have the list of free token with me when I am called. I could statically maintain such a list but I would rather not
You can do divide and conquer. Basically given the range 1..m, do a quicksort style swapping with m/2 as the pivot. If there are less than m/2 elements in the first half, then there is a missing number and iteratively find it. Otherwise, there is a missing number in the second half. Complexity: n+n/2+n/4... = O(n)
def findmissing(x, startIndex, endIndex, minVal, maxVal):
pivot = (minVal+maxVal)/2
i=startIndex
j=endIndex
while(True):
while( (x[i] <= pivot) and (i<j) ):
i+=1
if i>=j:
break
while( (x[j] > pivot) and (i<j) ):
j+=1
if i>=j:
break
swap(x,i,j)
k = findlocation(x,pivot)
if (k-startIndex) < (pivot-minVal):
findmissing(x,startIndex, k, minVal, pivot)
else:
findmissing(x, k+1, endIndex, pivot+1, maxVal)
I have not implemented the end condition which I will leave it to you.
You can have O(N) time and space. You can be sure there is an absent element within 1..N+1, so make an array of N+1 elements, and ignore numbers larger than N+1.
If M is large compared to N, say M>2N, generate a random number in 1..M and check if it is not on the list in O(N) time, O(1) space. The probability you will find a solution in a single pass is at least 1/2, and therefore (geometric distribution) the expected number of passes is constant, average complexity O(N).
If M is N+1 or N+2, use the approach described here.
Can you do something like a counting sort? Create an array of size (M-1) then go through the list once (N) and change the array element indexed at i-1 to one. After looping once through N, search 0->(M-1) until you find the first array with a zero value.
Should me O(N+M).
Array L of size (M-1): [0=0, 1=0, 2=0, 3=0, 4=0, 5=0, 6=0, 7=0, 8=0, 9=0]
After looping through N elements: [0=1, 1=1, 2=1, 3=1, 4=0, 5=0, 6=0, 7=1, 8=0, 9=0]
Search array 0->(M-1) finds index 4 is zero, therefore 5 (4+1) is the first integer not in L.
After reading your updated i guess you are making it over complex. First of all let me list down what i get from your question
Yoou need to give a token to the client regardless of its order, quoting from your original post
for instance N = 5 M = 10 List (L): 1, 2, 4, 8, 3 then f(L) = 5 To be
honest i dont care if it returns an element other than 5, just so long
as it meets the contraints above
Secondly, you are already mantaining a list of "M" Tokens
Client is fetching the token and after using it returning it back to you
Given these 2 points, why don't you implement a TokenPool?
Implement your M list based on a Queue
Whenever a client ask for a a token, fetch a token from the queue i.e. removing it from queue. By this method, your queue will always maintain those tokens which aren't given away. you are doing it O(1)
Whenever a client is done with the token he will return it back to you. Add it back to the queue. Again O(1).
In whole implementation, you wouldn't have to loop through any of list. All you have to do is to Generate the token and insert in the queue.

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources