Find the most frequent triplet in an array

Find the most frequent triplet in an array - arrays

We have an array of N numbers. All the numbers are between 1-k.
The problem is how to find the best way of finding the most frequent triplet.
My approach to the problem is:
Say if the input is like { 1, 2, 3, 4, 1, 2, 3, 4}
First search for the count of triplet ( 1, 2, 3) start from the second element in the array till the end of the array. Now we will have the count as 1.
Now start with { 2, 3, 4) and search the array.
for each triplet we scan the array and find the count. Like this we run the array for n-1 times.
This way my algorithm runs in the order of n*n time complexity. Is there any better way for
this problem.?

You can do it in O(n * log n) worst-case space and time complexity: Just insert all triples into a balanced binary search tree and find the maximum afterwards.
Alternatively, you can use a hash table to get O(n) expected time (which is typically faster than the search tree approach in reality, if you choose a good hash function).

Are there any memory boundaries i.e. does it run on a device with memory limitations?
If not, maybe this could be good solution: iterate over array and for each tripple build and representation object (or struct if implemented in c#) which goes into map as a key and the tripple counter as a value.
If you implement hash and equals functions appropriately, you will be able to find the "most popular" tripple where numbers order matters or not e.g. 1,2,3 != 2,1,3 or 1,2,3 == 2,1,3
After iterating entire array you would have to find the largest value and its key would be your "most popular" tripple. With that approach you could find X most popular tripples too. Also you would scan array only once and aggregate all the trippels (no extra scanning for each tripple).

Related

Array operations for maximum sum

Given an array A consisting of N elements. Our task is to find the maximal subarray sum after applying the following operation exactly once:
. Select any subarray and set all the elements in it to zero.
Eg:- array is -1 4 -1 2 then answer is 6 because we can choose -1 at index 2 as a subarray and make it 0. So the resultatnt array will be after applying the operation is : -1 4 0 2. Max sum subarray is 4+0+2 = 6.
My approach was to find start and end indexes of minimum sum subarray and make all elements as 0 of that subarray and after that find maximum sum subarray. But this approach is wrong.

Starting simple:
First, let us start with the part of the question: Finding the maximal subarray sum.
This can be done via dynamic programming:
a = [1, 2, 3, -2, 1, -6, 3, 2, -4, 1, 2, 3]
a = [-1, -1, 1, 2, 3, 4, -6, 1, 2, 3, 4]
def compute_max_sums(a):
res = []
currentSum = 0
for x in a:
if currentSum > 0:
res.append(x + currentSum)
currentSum += x
else:
res.append(x)
currentSum = x
return res
res = compute_max_sums(a)
print(res)
print(max(res))
Quick explanation: we iterate through the array. As long as the sum is non-negative, it is worth appending the whole block to the next number. If we dip below zero at any point, we discard whole "tail" sequence since it will not be profitable to keep it anymore and we start anew. At the end, we have an array, where j-th element is the maximal sum of a subarray i:j where 0 <= i <= j.
Rest is just the question of finding the maximal value in the array.
Back to the original question
Now that we solved the simplified version, it is time to look further. We can now select a subarray to be deleted to increase the maximal sum. The naive solution would be to try every possible subarray and to repeat the steps above. This would unfortunately take too long1. Fortunately, there is a way around this: we can think of the zeroes as a bridge between two maxima.
There is one more thing to address though - currently, when we have the j-th element, we only know that the tail is somewhere behind it so if we were to take maximum and 2nd biggest element from the array, it could happen that they would overlap which would be a problem since we would be counting some of the elements more than once.
Overlapping tails
How to mitigate this "overlapping tails" issue?
The solution is to compute everything once more, this time from the end to start. This gives us two arrays - one where j-th element has its tail i pointing towards the left end of the array(e.g. i <=j) and the other where the reverse is true. Now, if we take x from first array and y from second array we know that if index(x) < index(y) then their respective subarrays are non-overlapping.
We can now proceed to try every suitable x, y pair - there is O(n2) of them. However since we don't need any further computation as we already precomputed the values, this is the final complexity of the algorithm since the preparation cost us only O(n) and thus it doesn't impose any additional penalty.
Here be dragons
So far the stuff we did was rather straightforward. This following section is not that complex but there are going to be some moving parts. Time to brush up the max heaps:
Accessing the max is in constant time
Deleting any element is O(log(n)) if we have a reference to that element. (We can't find the element in O(log(n)). However if we know where it is, we can swap it with the last element of the heap, delete it, and bubble down the swapped element in O(log(n)).
Adding any element into the heap is O(log(n)) as well.
Building a heap can be done in O(n)
That being said, since we need to go from start to the end, we can build two heaps, one for each of our pre-computed arrays.
We will also need a helper array that will give us quick index -> element-in-heap access to get the delete in log(n).
The first heap will start empty - we are at the start of the array, the second one will start full - we have the whole array ready.
Now we can iterate over whole array. In each step i we:
Compare the max(heap1) + max(heap2) with our current best result to get the current maximum. O(1)
Add the i-th element from the first array into the first heap - O(log(n))
Remove the i-th indexed element from the second heap(this is why we have to keep the references in a helper array) - O(log(n))
The resulting complexity is O(n * log(n)).
Update:
Just a quick illustration of the O(n2) solution since OP nicely and politely asked. Man oh man, I'm not your bro.
Note 1: Getting the solution won't help you as much as figuring out the solution on your own.
Note 2: The fact that the following code gives the correct answer is not a proof of its correctness. While I'm fairly certain that my solution should work it is definitely worth looking into why it works(if it works) than looking at one example of it working.
input = [100, -50, -500, 2, 8, 13, -160, 5, -7, 100]
reverse_input = [x for x in reversed(input)]
max_sums = compute_max_sums(input)
rev_max_sums = [x for x in reversed(compute_max_sums(reverse_input))]
print(max_sums)
print(rev_max_sums)
current_max = 0
for i in range(len(max_sums)):
if i < len(max_sums) - 1:
for j in range(i + 1, len(rev_max_sums)):
if max_sums[i] + rev_max_sums[j] > current_max:
current_max = max_sums[i] + rev_max_sums[j]
print(current_max)
1 There are n possible beginnings, n possible ends and the complexity of the code we have is O(n) resulting in a complexity of O(n3). Not the end of the world, however it's not nice either.

algorithm which finds the numbers in a sequence which appear 3 times or more, and prints their indexes

Suppose I input a sequence of numbers which ends with -1.
I want to print all the values of the sequence that occur in it 3 times or more, and also print their indexes in the sequence.
For example , if the input is : 2 3 4 2 2 5 2 4 3 4 2 -1
so the expected output in that case is :
2: 0 3 4 6 10
4: 2 7 9
First I thought of using quick-sort , but then I realized that as a result I will lose the original indexes of the sequence. I also have been thinking of using count, but that sequence has no given range of numbers - so maybe count will be no good in that case.
Now I wonder if I might use an array of pointers (but how?)
Do you have any suggestions or tips for an algorithm with time complexity O(nlogn) for that ? It would be very appreciated.

Keep it simple!
The easiest way would be to scan the sequence and count the number of occurrence of each element, put the elements that match the condition in an auxiliary array.
Then, for each element in the auxiliary array, scan the sequence again and print out the indices.

First of all, sorry for my bad english (It's not my language) I'll try my best.
So similar to what #vvigilante told, here is an algorithm implemented in python (it is in python because is more similar to pseudo code, so you can translate it to any language you want, and moreover I add a lot of comment... hope you get it!)
from typing import Dict, List
def three_or_more( input_arr:int ) -> None:
indexes: Dict[int, List[int]] = {}
#scan the array
i:int
for i in range(0, len(input_arr)-1):
#create list for the number in position i
# (if it doesn't exist)
#and append the number
indexes.setdefault(input_arr[i],[]).append(i)
#for each key in the dictionary
n:int
for n in indexes.keys():
#if the number of element for that key is >= 3
if len(indexes[n]) >= 3:
#print the key
print("%d: "%(n), end='')
#print each element int the current key
el:int
for el in indexes[n]:
print("%d,"%(el), end='')
#new line
print("\n", end='')
#call the function
three_or_more([2, 3, 4, 2, 2, 5, 2, 4, 3, 4, 2, -1])
Complexity:
The first loop scan the input array = O(N).
The second one check for any number (digit) in the array,
since they are <= N (you can not have more number than element), so it is O(numbers) the complexity is O(N).
The loop inside the loop go through all indexes corresponding to the current number...
the complexity seem to be O(N) int the worst case (but it is not)
So the complexity would be O(N) + O(N)*O(N) = O(N^2)
but remember that the two nest loop can at least print all N indexes, and since the indexes are not repeated the complexity of them is O(N)...
So O(N)+O(N) ~= O(N)
Speaking about memory it is O(N) for the input array + O(N) for the dictionary (because it contain all N indexes) ~= O(N).
Well if you do it in c++ remember that maps are way slower than array, so if N is small, you should use an array of array (or std::vector> ), else you can also try an unordered map that use hashes
P.S. Remember that get the size of a vector is O(1) time because it is a difference of pointers!

Starting with a sorted list is a good idea.
You could create a second array of original indices and duplicate all of the memory moves for the sort on the indices array. Then checking for triplicates is trivial and only requires sort + 1 traversal.

Find repeated number in an array of unique numbers

There is an array where except one number (say magic-number) all are unique. The magic-number
repeats itself more than half times the size of the array. e.g. 2, 10, 10, 10, 3. Find the magic-number without
using any extra-space and without sorting it. Now is there any method to do it in O(n).

Check each element against its neighbors, if any are equal then you have found the number. O(N)
If the first test did not find the number, then you in the following situation:
10,2,10,3,10.
In this case, the first number in the array is the magic number. O(1)

Sort an array so the difference of elements a[i]-a[i+1]<=a[i+1]-a[i+2]

My mind is blown since I began, last week, trying to sort an array of N elements by condition: the difference between 2 elements being always less or equal to the next 2 elements. For example:
Α[4] = { 10, 2, 7, 4}
It is possible to rearrange that array this way:
{2, 7, 10, 4} because (2 - 7 = -5) < (7 - 10 = -3) < (10 - 4 = 6)
{4, 10, 7, 2} because (4 - 10 = -6) < (10 - 7 = 3) < (7 - 2 = 5)
One solution I considered was just shuffling the array and checking each time if it agreed with the conditions, an efficient method for a small number of elements, but time consuming or even impossible for a larger number of elements.
Another was trying to move elements around the array with loops, hoping again to meet the requirements, but again this method is very time consuming and also sometimes not possible.
Trying to find an algorithm doesn't seem to have any result but there must be something.
Thank you very much in advance.

I normally don't just provide code, but this question intrigued me, so here's a brute-force solution, that might get you started.
The concept will always be slow because the individual elements in the list to be sorted are not independent of each other, so they cannot be sorted using traditional O(N log N) algorithms. However, the differences can be sorted that way, which simplifies checking for a solution, and permutations could be checked in parallel to speed up the processing.
import os,sys
import itertools
def is_diff_sorted(qa):
diffs = [qa[i] - qa[i+1] for i in range(len(qa)-1)]
for i in range(len(diffs)-1):
if diffs[i] > diffs[i+1]:
return False
return True
a = [2,4,7,10]
#a = [1,4,6,7,20]
a.sort()
for perm in itertools.permutations(a):
if is_diff_sorted(perm):
print "Solution:",str(a)
break

This condition is related to differentiation. The (negative) difference between neighbouring elements has to be steady or increasing with increasing index. Multiply the condition by -1 and you get
a[i+1]-a[i] => a[i+2]-a[i+1]
or
0 => (a[i+2]-a[i+1])- (a[i+1]-a[i])
So the 2nd derivative has to be 0 or negative, which is the same as having the first derivative stay the same or changing downwards, like e.g. portions of the upper half of a circle. That does not means that the first derivative itself has to start out positive or negative, just that it never change upward.
The problem algorithmically is that it can't be a simple sort, since you never compare just 2 elements of the list, you'll have to compare three at a time (i,i+1,i+2).
So the only thing you know apart from random permutations is given in Klas` answer (values first rising if at all, then falling if at all), but his is not a sufficient condition since you can have a positive 2nd derivative in his two sets (rising/falling).
So is there a solution much faster than the random shuffle? I can only think of the following argument (similar to Klas' answer). For a given vector the solution is more likely if you separate the data into a 1st segment that is rising or steady (not falling) and a 2nd that is falling or steady (not rising) and neither is empty. Likely an argument could be made that the two segments should have approximately equal size. The rising segment should have the data that are closer together and the falling segment should contain data that are further apart. So one could start with the mean, and look for data that are close to it, move them to the first set,then look for more widely spaced data and move them to the 2nd set. So a histogram might help.
[4 7 10 2] --> diff [ 3 3 -8] --> 2diff [ 0 -11]

Here is a solution based on backtracking algorithm.
Sort input array in non-increasing order.
Start dividing the array's values into two subsets: put the largest element to both subsets (this would be the "middle" element), then place second largest one into arbitrary subset.
Sequentially put the remaining elements to either subset. If this cannot be done without violating the "difference" condition, use other subset. If both subsets are not acceptable, rollback and change preceding decisions.
Reverse one of the arrays produced on step 3 and concatenate it with other array.
Below is Python implementation (it is not perfect, the worst defect is recursive implementation: while recursion is quite common for backtracking algorithms, this particular algorithm seems to work in linear time, and recursion is not good for very large input arrays).
def is_concave_end(a, x):
return a[-2] - a[-1] <= a[-1] - x
def append_element(sa, halves, labels, which, x):
labels.append(which)
halves[which].append(x)
if len(labels) == len(sa) or split_to_halves(sa, halves, labels):
return True
if which == 1 or not is_concave_end(halves[1], halves[0][-1]):
halves[which].pop()
labels.pop()
return False
labels[-1] = 1
halves[1].append(halves[0][-1])
halves[0].pop()
if split_to_halves(sa, halves, labels):
return True
halves[1].pop()
labels.pop()
def split_to_halves(sa, halves, labels):
x = sa[len(labels)]
if len(halves[0]) < 2 or is_concave_end(halves[0], x):
return append_element(sa, halves, labels, 0, x)
if is_concave_end(halves[1], x):
return append_element(sa, halves, labels, 1, x)
def make_concave(a):
sa = sorted(a, reverse = True)
halves = [[sa[0]], [sa[0], sa[1]]]
labels = [0, 1]
if split_to_halves(sa, halves, labels):
return list(reversed(halves[1][1:])) + halves[0]
print make_concave([10, 2, 7, 4])
It is not easy to produce a good data set to test this algorithm: plain set of random numbers either is too simple for this algorithm or does not have any solutions. Here I tried to generate a set that is "difficult enough" by mixing together two sorted lists, each satisfying the "difference" condition. Still this data set is processed in linear time. And I have no idea how to prepare any data set that would demonstrate more-than-linear time complexity of this algorithm...

Not that since the diffence should be ever-rising, any solution will have element first in rising order and then in falling order. The length of either of the two "suborders" may be 0, so a solution could consist of a strictly rising or strictly falling sequence.
The following algorithm will find any solutions:
Divide the set into two sets, A and B. Empty sets are allowed.
Sort A in rising order and B in falling order.
Concatenate the two sorted sets: AB
Check if you have a solution.
Do this for all possible divisions into A and B.

Expanding on the #roadrunner66 analysis, the solution is to take two smallest elements of the original array, and make them first and last in the target array; take two next smallest elements and make them second and next-to-last; keep going until all the elements are placed into the target. Notice that which one goes to the left, and which one to the right doesn't matter.
Sorting the original array facilitates the process (finding smallest elements becomes trivial), so the time complexity is O(n log n). The space complexity is O(n), because it requires a target array. I don't know off-hand if it is possible to do it in-place.

Checking if two substring overlaps in O(n) time

If I have a string S of length n, and a list of tuples (a,b), where a specifies the staring position of the substring of S and b is the length of the substring. To check if any substring overlaps, we can, for example, mark the position in S whenever it's touched. However, I think this will take O(n^2) time if the list of tuples has a size of n (looping the tuple list, then looping S).
Is it possible to check if any substring actually overlaps with the other in O(n) time?
Edit:
For example, S = "abcde". Tuples = [(1,2),(3,3),(4,2)], representing "ab","cde" and "de". I want to the know an overlap is discovered when (4,2) is read.
I was thinking it is O(n^2) because you get a tuple every time, then you need to loop through the substring in S to see if any character is marked dirty.
Edit 2:
I cannot exit once a collide is detected. Imagine I need to report all the subsequent tuples that collide, so i have to loop through the whole tuple list.
Edit 3:
A high level view of the algorithm:
for each tuple (a,b)
for (int i=a; i <= a+b; i++)
if S[i] is dirty
then report tuple and break //break inner loop only

Your basic approach is correct, but you could optimize your stopping condition, in a way that guarantees bounded complexity in the worst case. Think about it this way - how many positions in S would you have to traverse and mark in the worst case?
If there is no collision, then at worst you'll visit length(S) positions (and run out of tuples by then, since any additional tuple would have to collide). If there is a collision - you can stop at the first marked object, so again you're bounded by the max number of unmarked elements, which is length(S)
EDIT: since you added a requirement to report all colliding tuples, let's calculate this again (extending my comment) -
Once you marked all elements, you can detect collision for every further tuple with a single step (O(1)), and therefore you would need O(n+n) = O(n).
This time, each step would either mark an unmarked element (overall n in the worst case), or identify a colliding tuple (worst O(tuples) which we assume is also n).
The actual steps may be interleaved, since the tuples may be organized in any way without colliding first, but once they do (after at most n tuples which cover all n elements before colliding for the first time), you have to collide every time on the first step. other arrangements may collide earlier even before marking all elements, but again - you're just rearranging the same number of steps.
Worst case example: one tuple covering the entire array, then n-1 tuples (doesn't matter which) -
[(1,n), (n,1), (n-1,1), ...(1,1)]
First tuple would take n steps to mark all elements, the rest would take O(1) each to finish. overall O(2n)=O(n). Now convince yourself that the following example takes the same number of steps -
[(1,n/2-1), (1,1), (2,1), (3,1), (n/2,n/2), (4,1), (5,1) ...(n,1)]

According to your description and comment, the overlap problem may be not about string algorithm, it can be regarded as "segment overlap" problem.
Just use your example, it can be translated to 3 segments: [1, 2], [3, 5], [4, 5]. The question is to check whether the 3 segments have overlap.
Suppose we have m segments each have format [start, end] which means segment start position and end position, one efficient algorithm to detect overlap is to sort them by start position in ascending order, it takes O(m * lgm). Then iterate the sorted m segments, for each segment, try to find whether its end position, here you only need to check:
if(start[i] <= max(end[j], 1 <= j <= i-1) {
segment i is overlap;
}
maxEnd[i] = max(maxEnd[i-1], end[i]); // update max end position of 1 to i
Which each check operation takes O(1). Then the total time complexity is O(m*lgm + m), which can be regarded as O(m*lgm). While for each output, time complexity is related to each tuple's length, which is also related to n.

This is a segment overlap problem and the solution should be possible in O(n) itself if the list of tuples has been sorted in ascending order wrt the first field. Consider the following approach:
Transform the intervals from (start, number of characters) to (start, inclusive_end). Hence the above example becomes: [(1,2),(3,3),(4,2)] ==> [(1, 2), (3, 5), (4, 5)]
The tuples are valid if transformed consecutive tuples (a, b) and (c, d) always follow b < c. Else there is an overlap in the tuples mentioned above.
Each of 1 and 2 can be done in O(n) if the array is sorted in the form mentioned above.