algorithm which finds the numbers in a sequence which appear 3 times or more, and prints their indexes - c

Suppose I input a sequence of numbers which ends with -1.
I want to print all the values of the sequence that occur in it 3 times or more, and also print their indexes in the sequence.
For example , if the input is : 2 3 4 2 2 5 2 4 3 4 2 -1
so the expected output in that case is :
2: 0 3 4 6 10
4: 2 7 9
First I thought of using quick-sort , but then I realized that as a result I will lose the original indexes of the sequence. I also have been thinking of using count, but that sequence has no given range of numbers - so maybe count will be no good in that case.
Now I wonder if I might use an array of pointers (but how?)
Do you have any suggestions or tips for an algorithm with time complexity O(nlogn) for that ? It would be very appreciated.

Keep it simple!
The easiest way would be to scan the sequence and count the number of occurrence of each element, put the elements that match the condition in an auxiliary array.
Then, for each element in the auxiliary array, scan the sequence again and print out the indices.

First of all, sorry for my bad english (It's not my language) I'll try my best.
So similar to what #vvigilante told, here is an algorithm implemented in python (it is in python because is more similar to pseudo code, so you can translate it to any language you want, and moreover I add a lot of comment... hope you get it!)
from typing import Dict, List
def three_or_more( input_arr:int ) -> None:
indexes: Dict[int, List[int]] = {}
#scan the array
i:int
for i in range(0, len(input_arr)-1):
#create list for the number in position i
# (if it doesn't exist)
#and append the number
indexes.setdefault(input_arr[i],[]).append(i)
#for each key in the dictionary
n:int
for n in indexes.keys():
#if the number of element for that key is >= 3
if len(indexes[n]) >= 3:
#print the key
print("%d: "%(n), end='')
#print each element int the current key
el:int
for el in indexes[n]:
print("%d,"%(el), end='')
#new line
print("\n", end='')
#call the function
three_or_more([2, 3, 4, 2, 2, 5, 2, 4, 3, 4, 2, -1])
Complexity:
The first loop scan the input array = O(N).
The second one check for any number (digit) in the array,
since they are <= N (you can not have more number than element), so it is O(numbers) the complexity is O(N).
The loop inside the loop go through all indexes corresponding to the current number...
the complexity seem to be O(N) int the worst case (but it is not)
So the complexity would be O(N) + O(N)*O(N) = O(N^2)
but remember that the two nest loop can at least print all N indexes, and since the indexes are not repeated the complexity of them is O(N)...
So O(N)+O(N) ~= O(N)
Speaking about memory it is O(N) for the input array + O(N) for the dictionary (because it contain all N indexes) ~= O(N).
Well if you do it in c++ remember that maps are way slower than array, so if N is small, you should use an array of array (or std::vector> ), else you can also try an unordered map that use hashes
P.S. Remember that get the size of a vector is O(1) time because it is a difference of pointers!

Starting with a sorted list is a good idea.
You could create a second array of original indices and duplicate all of the memory moves for the sort on the indices array. Then checking for triplicates is trivial and only requires sort + 1 traversal.

Related

Array operations for maximum sum

Given an array A consisting of N elements. Our task is to find the maximal subarray sum after applying the following operation exactly once:
. Select any subarray and set all the elements in it to zero.
Eg:- array is -1 4 -1 2 then answer is 6 because we can choose -1 at index 2 as a subarray and make it 0. So the resultatnt array will be after applying the operation is : -1 4 0 2. Max sum subarray is 4+0+2 = 6.
My approach was to find start and end indexes of minimum sum subarray and make all elements as 0 of that subarray and after that find maximum sum subarray. But this approach is wrong.
Starting simple:
First, let us start with the part of the question: Finding the maximal subarray sum.
This can be done via dynamic programming:
a = [1, 2, 3, -2, 1, -6, 3, 2, -4, 1, 2, 3]
a = [-1, -1, 1, 2, 3, 4, -6, 1, 2, 3, 4]
def compute_max_sums(a):
res = []
currentSum = 0
for x in a:
if currentSum > 0:
res.append(x + currentSum)
currentSum += x
else:
res.append(x)
currentSum = x
return res
res = compute_max_sums(a)
print(res)
print(max(res))
Quick explanation: we iterate through the array. As long as the sum is non-negative, it is worth appending the whole block to the next number. If we dip below zero at any point, we discard whole "tail" sequence since it will not be profitable to keep it anymore and we start anew. At the end, we have an array, where j-th element is the maximal sum of a subarray i:j where 0 <= i <= j.
Rest is just the question of finding the maximal value in the array.
Back to the original question
Now that we solved the simplified version, it is time to look further. We can now select a subarray to be deleted to increase the maximal sum. The naive solution would be to try every possible subarray and to repeat the steps above. This would unfortunately take too long1. Fortunately, there is a way around this: we can think of the zeroes as a bridge between two maxima.
There is one more thing to address though - currently, when we have the j-th element, we only know that the tail is somewhere behind it so if we were to take maximum and 2nd biggest element from the array, it could happen that they would overlap which would be a problem since we would be counting some of the elements more than once.
Overlapping tails
How to mitigate this "overlapping tails" issue?
The solution is to compute everything once more, this time from the end to start. This gives us two arrays - one where j-th element has its tail i pointing towards the left end of the array(e.g. i <=j) and the other where the reverse is true. Now, if we take x from first array and y from second array we know that if index(x) < index(y) then their respective subarrays are non-overlapping.
We can now proceed to try every suitable x, y pair - there is O(n2) of them. However since we don't need any further computation as we already precomputed the values, this is the final complexity of the algorithm since the preparation cost us only O(n) and thus it doesn't impose any additional penalty.
Here be dragons
So far the stuff we did was rather straightforward. This following section is not that complex but there are going to be some moving parts. Time to brush up the max heaps:
Accessing the max is in constant time
Deleting any element is O(log(n)) if we have a reference to that element. (We can't find the element in O(log(n)). However if we know where it is, we can swap it with the last element of the heap, delete it, and bubble down the swapped element in O(log(n)).
Adding any element into the heap is O(log(n)) as well.
Building a heap can be done in O(n)
That being said, since we need to go from start to the end, we can build two heaps, one for each of our pre-computed arrays.
We will also need a helper array that will give us quick index -> element-in-heap access to get the delete in log(n).
The first heap will start empty - we are at the start of the array, the second one will start full - we have the whole array ready.
Now we can iterate over whole array. In each step i we:
Compare the max(heap1) + max(heap2) with our current best result to get the current maximum. O(1)
Add the i-th element from the first array into the first heap - O(log(n))
Remove the i-th indexed element from the second heap(this is why we have to keep the references in a helper array) - O(log(n))
The resulting complexity is O(n * log(n)).
Update:
Just a quick illustration of the O(n2) solution since OP nicely and politely asked. Man oh man, I'm not your bro.
Note 1: Getting the solution won't help you as much as figuring out the solution on your own.
Note 2: The fact that the following code gives the correct answer is not a proof of its correctness. While I'm fairly certain that my solution should work it is definitely worth looking into why it works(if it works) than looking at one example of it working.
input = [100, -50, -500, 2, 8, 13, -160, 5, -7, 100]
reverse_input = [x for x in reversed(input)]
max_sums = compute_max_sums(input)
rev_max_sums = [x for x in reversed(compute_max_sums(reverse_input))]
print(max_sums)
print(rev_max_sums)
current_max = 0
for i in range(len(max_sums)):
if i < len(max_sums) - 1:
for j in range(i + 1, len(rev_max_sums)):
if max_sums[i] + rev_max_sums[j] > current_max:
current_max = max_sums[i] + rev_max_sums[j]
print(current_max)
1 There are n possible beginnings, n possible ends and the complexity of the code we have is O(n) resulting in a complexity of O(n3). Not the end of the world, however it's not nice either.

How to find a number that was repeated (n/3) times an array of size n, in O(n) time and O(n) space?

I have this question that I just can't figure it out! Any hints would mean a lot. Thank you in advance.
I have an array, A. It's size is n, and I want to find an algorithm that will find x that appears in this array at least n/3 times. If there is no such x in the array then we will print that we didn't find one!
I need to find an algorithm that does this in O(n) time and takes O(n) space.
For example:
A=[1 1 2 2 1 1 1 5 6 7]
For the above array, the algorithm should return 1.
If I was you, I write an algorithm that:
Instantiates a map (i.e. key/value pairs) in whatever language you're using. The key will be the integer you find, the value will be the number of times it has been seen so far.
Iterate over the array. For the current integer, check whether the number exists as a key in your map. If it exists, increment the map's value. If it doesn't exist, insert a new element with a count of 1.
After the iteration is complete, iterate over your map. If any elements have counts of greater than n/3, print it out. Handle the case where none are found, etc.
Here is my solution in pseudocode; note that it is possible to have two solutions as well as one or none:
func anna(A, n) # array and length
ht := {} # create empty hash table
for k in [0,n) # iterate over array
if A[k] in ht # previously seen
ht{k} := ht{k} + 1 # increment count
else # previously seen
ht{k} := 1 # initialize count
solved := False # flag if solution found
for k in keys(ht) # iterate over hash table
if ht{k} > n / 3 # found solution
solved := True # update flag
print k # write it
if not solved # no solution found
print "No solution" # report failure
The first for loop takes O(n) time. The second for loop potentially takes O(n) time if all items in the array are distinct, though most often the second for loop will take much less time. The hash table takes O(n) space if all items in the array are distinct, though most often it takes much less space.
It is possible to optimize the solution so it stops early and reports failure if there are no possible solutions. To do that, keep a variable max in the first for loop, increment it every time it is exceeded by a new hash table count, and check after each element is added to the hash table if max + n - k < n / 3.

finding the maximum number in array

there is an array of numbers an this array is irregular and we should find a maximum number (n) that at least n number is bigger than it (this number may be in array and may not be in array )
for example if we give 2 5 7 6 9 number 4 is maximum number that at least 4 number (or more than it ) is bigger than 4 (5 6 7 9 are bigger)
i solve this problem but i think it gives time limit in big array of numbers so i want to resolve this problem in another way
so i use merge sort for sorting that because it take nlog(n) and then i use a counter an it counts from 1 to k if we have k number more than k we count again for example we count from 1 to 4 then in 5 we don't have 5 number more than 5 so we give k-1 = 4 and this is our n .
it's good or it maybe gives time limit ? does anybody have another idea ?
thanks
In c++ there is a function called std::nth_element and it can find the nth element of an array in linear time. Using this function you should find the N - n- th element (where N is the total number of elements in the array) and subtract 1 from it.
As you seek a solution in C you can not make use of this function, but you can implement your solution similarly. nth_element performs something quite similar to qsort, but it only performs partition on the part of the array where the n-th element is.
Now let's assume you have nth_element implemented. We will perform something like combination of binary search and nth_element. First we assume that the answer of the question is the middle element of the array (i.e. the N/2-th element). We use nth_element and we find the N/2th element. If it is more than N/2 we know the answer to your problem is at least N/2, otherwise it will be less. Either way in order to find the answer we will only continue with one of the two partitions created by the N/2th element. If this partition is the right one(elements bigger than N/2) we continue solving the same problem, otherwise we start searching for the max element M on the left of the N/2th element that has at least x bigger elements such that x + N/2 > M. The two subproblems will have the same complexity. You continue performing this operation until the interval you are interested in is of length 1.
Now let's prove the complexity of the above algorithm is linear. First nth_element is linear performing operations in the order of N, second nth_element that only considers one half of the array will perform operations in the order of N/2 the third - in the order of N/4 and so on. All in all you will perform operations in the order of N + N/2 + N/4 + ... + 1. This sum is less than 2 * N thus your complexity is still linear.
Your solution is asymptotically slower than what I propose above as it has a complexity O(n*log(n)), while my solution has complexity of O(n).
I would use a modified variant of a sorting algorithm that uses pivot values.
The reason is that you want to sort as few elements as possible.
So I would use qsort as my base algorithm and let the pivot element control which partition to sort (you will only need to sort one).

Find the most frequent triplet in an array

We have an array of N numbers. All the numbers are between 1-k.
The problem is how to find the best way of finding the most frequent triplet.
My approach to the problem is:
Say if the input is like { 1, 2, 3, 4, 1, 2, 3, 4}
First search for the count of triplet ( 1, 2, 3) start from the second element in the array till the end of the array. Now we will have the count as 1.
Now start with { 2, 3, 4) and search the array.
for each triplet we scan the array and find the count. Like this we run the array for n-1 times.
This way my algorithm runs in the order of n*n time complexity. Is there any better way for
this problem.?
You can do it in O(n * log n) worst-case space and time complexity: Just insert all triples into a balanced binary search tree and find the maximum afterwards.
Alternatively, you can use a hash table to get O(n) expected time (which is typically faster than the search tree approach in reality, if you choose a good hash function).
Are there any memory boundaries i.e. does it run on a device with memory limitations?
If not, maybe this could be good solution: iterate over array and for each tripple build and representation object (or struct if implemented in c#) which goes into map as a key and the tripple counter as a value.
If you implement hash and equals functions appropriately, you will be able to find the "most popular" tripple where numbers order matters or not e.g. 1,2,3 != 2,1,3 or 1,2,3 == 2,1,3
After iterating entire array you would have to find the largest value and its key would be your "most popular" tripple. With that approach you could find X most popular tripples too. Also you would scan array only once and aggregate all the trippels (no extra scanning for each tripple).

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources