Search specific permutation of permutationsubset with constraints - arrays

Iam searching one permutation P consisting of p1...pn of following subset S.
S is defined of the Labels L.
L1...Lk. Where a L contains pi...pj.
Where the inverse of P has at most k-1 decreasing adjecent Elements. k <= n.
Example:
n := 4
k := 2
L1 := 1,2
L2 := 3,4
L := L1,L2,L1,L2
S := 1324,1423,2314,2413
one solution would be P := 1342
no solution would be P := 3142 because decreasing adjecent elements are 2 but only max1 ist allowed because k =2.
Exists therefor an algorithm to find P of S defined by L?
Currently I use bruteforce to figure one permutation P, but its getting very fast unusable slow.

So each of L1, ..., Lk is a consecutive set of elements. At each place we see Li, Lj in the definition of L, one of three things is true:
i < j in which case it is ascending.
i = j in which case it could be ascending or descending.
i > j in which case it must be descending.
By counting the number of places where case 3 is true, we get a minimum number of descending elements already in the definition of L.
Next, for each Li we have a pattern we can write down with len(Li)-1 ; and , where a ; means that there are elements of other Ljs between two members of Li, and , means that Li elements are adjacent and so the order of the elements may result in a descent. We want to know, "For each possible number of descents within Li, how many permutations of Li have that number of descents?"
We will think of building the permutations as follows:
The first element goes at position 0.
The second element goes to position 0 or 1. (If at 0, the first element is moved.)
The third element goes to position 0, 1, or 2.
etc
A descent is when the next element is smaller than the previous, at a transition matching a ,.
We actually will want the following data structure for later use:
cache[Li] gives:
by how many elements are chosen:
by the last element chosen:
by the number of descents we will add:
how many ways of finishing this permutation
So we can write a recursive function that takes:
The pattern for Li.
How many elements have been chosen.
What index was last chosen.
It then returns a dictionary mapping descents to count of ways to finish the permutation for Li.
Memoize that and we get our desired data structure.
Now we'll repeat the idea. We want:
cache2[i] gives:
by number of descents to use:
how many permutations of L[i], L[i+1], ..., L[k] meet it.
Again we can write a recursive function using cache to calculate this, and we can memoize it to get cache2.
And NOW we can reverse the process.
We know how many descents came from the definition of L.
We know the distribution of remaining descents from cache2[1], so we can randomly pick how many descents there will be meeting our condition among L1...Lk.
For L1...Lk we can look at cache[L1][1][0] and cache2[i+1] to figure out how many descents there will be within Li with the correct probability.
For each Li we can look at how many descents we want to wind up with, its pattern, and cache2[Li] to figure out a random sequence of inserts winding up with the right pattern. The first insert is always at 0. After that you always know the size, and where the last insert was, and how many descents are left. So for each possiblenext insert you figure out if it counts as a descent (look at both pattern, and whether it is before the last insert), and the number of ways to finish from there. Then you can choose the next insert randomly with the right possibility.
For each Li we can turn the pattern of inserts into the list of values in order. (I will explain this step more.)
We can now follow the pattern of L and fill in all of the values.
Now for step 5, let's illustrate with your example from the chat. Suppose that L2 = [4, 5, 6] and the pattern of inserts we came up with was [0, 1, 0]. How do we figure out the arrangement of values?
Well first we do our inserts:
[1]
[1, 2]
[3, 1, 2]
This says that the first element (4) goes to the third place, the second (5) to the first, and the third (6) to the second. So our permutation for L2 is [5, 6, 4].
This will be a lot of code to write. But it will be polynomial. Specifically if m is the count of the most common label, cache will have total size at most O(k m^2). Thanks to memoization, each entry takes O(m) to calculate. Everything else is small relative to that. So total space is O(k m^2) and time is O(k m^3).

Related

Given a sorted array of integers find subarrays such that the largest elements of the subarrays are within some distance of the smallest

For example, given an array
a = [1, 2, 3, 7, 8, 9]
and an integer
i = 2. Find maximal subarrays where the distance between the largest and the smallest elements is at most i. The output for the example above would be:
[1,2,3] [7,8,9]
The subarrays are maximal in the sense given two subarrays A and B. There exists no element b in B such that A + b satisfies the condition given. Does there exist a non-polynomial time algorithm for said problem ?
This problem might be solved in linear time using method of two pointers and two deques storing indices, the first deque keeps minimum, another keeps maximum in sliding window.
Deque for minimum (similar for maximum):
current_minimum = a[minq.front]
Adding i-th element of array: //at the right index
while (!minq.empty and a[minq.back] > a[i]):
//last element has no chance to become a minimum because newer one is better
minq.pop_back
minq.push_back(i)
Extracting j-th element: //at the left index
if (!minq.empty and minq.front == j)
minq.pop_front
So min-deque always contains non-decreasing sequence.
Now set left and right indices in 0, insert index 0 into deques, and start to move right. At every step add index in order into deques, and check than left..right interval range is good. When range becomes too wide (min-max distance is exceeded), stop moving right index, check length of the last good interval, compare with the best length.
Now move left index, removing elements from deques. When max-min becomes good, stop left and start with right again. Repeat until array end.

Array operations for maximum sum

Given an array A consisting of N elements. Our task is to find the maximal subarray sum after applying the following operation exactly once:
. Select any subarray and set all the elements in it to zero.
Eg:- array is -1 4 -1 2 then answer is 6 because we can choose -1 at index 2 as a subarray and make it 0. So the resultatnt array will be after applying the operation is : -1 4 0 2. Max sum subarray is 4+0+2 = 6.
My approach was to find start and end indexes of minimum sum subarray and make all elements as 0 of that subarray and after that find maximum sum subarray. But this approach is wrong.
Starting simple:
First, let us start with the part of the question: Finding the maximal subarray sum.
This can be done via dynamic programming:
a = [1, 2, 3, -2, 1, -6, 3, 2, -4, 1, 2, 3]
a = [-1, -1, 1, 2, 3, 4, -6, 1, 2, 3, 4]
def compute_max_sums(a):
res = []
currentSum = 0
for x in a:
if currentSum > 0:
res.append(x + currentSum)
currentSum += x
else:
res.append(x)
currentSum = x
return res
res = compute_max_sums(a)
print(res)
print(max(res))
Quick explanation: we iterate through the array. As long as the sum is non-negative, it is worth appending the whole block to the next number. If we dip below zero at any point, we discard whole "tail" sequence since it will not be profitable to keep it anymore and we start anew. At the end, we have an array, where j-th element is the maximal sum of a subarray i:j where 0 <= i <= j.
Rest is just the question of finding the maximal value in the array.
Back to the original question
Now that we solved the simplified version, it is time to look further. We can now select a subarray to be deleted to increase the maximal sum. The naive solution would be to try every possible subarray and to repeat the steps above. This would unfortunately take too long1. Fortunately, there is a way around this: we can think of the zeroes as a bridge between two maxima.
There is one more thing to address though - currently, when we have the j-th element, we only know that the tail is somewhere behind it so if we were to take maximum and 2nd biggest element from the array, it could happen that they would overlap which would be a problem since we would be counting some of the elements more than once.
Overlapping tails
How to mitigate this "overlapping tails" issue?
The solution is to compute everything once more, this time from the end to start. This gives us two arrays - one where j-th element has its tail i pointing towards the left end of the array(e.g. i <=j) and the other where the reverse is true. Now, if we take x from first array and y from second array we know that if index(x) < index(y) then their respective subarrays are non-overlapping.
We can now proceed to try every suitable x, y pair - there is O(n2) of them. However since we don't need any further computation as we already precomputed the values, this is the final complexity of the algorithm since the preparation cost us only O(n) and thus it doesn't impose any additional penalty.
Here be dragons
So far the stuff we did was rather straightforward. This following section is not that complex but there are going to be some moving parts. Time to brush up the max heaps:
Accessing the max is in constant time
Deleting any element is O(log(n)) if we have a reference to that element. (We can't find the element in O(log(n)). However if we know where it is, we can swap it with the last element of the heap, delete it, and bubble down the swapped element in O(log(n)).
Adding any element into the heap is O(log(n)) as well.
Building a heap can be done in O(n)
That being said, since we need to go from start to the end, we can build two heaps, one for each of our pre-computed arrays.
We will also need a helper array that will give us quick index -> element-in-heap access to get the delete in log(n).
The first heap will start empty - we are at the start of the array, the second one will start full - we have the whole array ready.
Now we can iterate over whole array. In each step i we:
Compare the max(heap1) + max(heap2) with our current best result to get the current maximum. O(1)
Add the i-th element from the first array into the first heap - O(log(n))
Remove the i-th indexed element from the second heap(this is why we have to keep the references in a helper array) - O(log(n))
The resulting complexity is O(n * log(n)).
Update:
Just a quick illustration of the O(n2) solution since OP nicely and politely asked. Man oh man, I'm not your bro.
Note 1: Getting the solution won't help you as much as figuring out the solution on your own.
Note 2: The fact that the following code gives the correct answer is not a proof of its correctness. While I'm fairly certain that my solution should work it is definitely worth looking into why it works(if it works) than looking at one example of it working.
input = [100, -50, -500, 2, 8, 13, -160, 5, -7, 100]
reverse_input = [x for x in reversed(input)]
max_sums = compute_max_sums(input)
rev_max_sums = [x for x in reversed(compute_max_sums(reverse_input))]
print(max_sums)
print(rev_max_sums)
current_max = 0
for i in range(len(max_sums)):
if i < len(max_sums) - 1:
for j in range(i + 1, len(rev_max_sums)):
if max_sums[i] + rev_max_sums[j] > current_max:
current_max = max_sums[i] + rev_max_sums[j]
print(current_max)
1 There are n possible beginnings, n possible ends and the complexity of the code we have is O(n) resulting in a complexity of O(n3). Not the end of the world, however it's not nice either.

The best order to choose elements in the random array to maximize output?

We have an array as input to production.
R = [5, 2, 8, 3, 6, 9]
If ith input is chosen the output is sum of ith element, the max element whose index is less than i and the min element whose index is greater than i.
For example if I take 8, output would be 8+5+3=16.
Selected items cannot be selected again. So, if I select 8 the next array for next selection would look like R = [5, 2, 3, 6, 9]
What is the order to choose all inputs with maximum output in total? If possible, please send dynamic programming solutions.
I'll start the bidding with an O(n2n) solution . . .
There are a number of ambiguities in your description of the problem, that you have declined to address in comments. None of these ambiguities affects the runtime complexity of this solution, but they do affect implementation details of the solution, so the solution is necessarily somewhat of a sketch.
The solution is as follows:
Create an array results of 2n integers. Each array index i will denote a certain subsequence of the input, and results[i] will be the greatest sum that we can achieve starting with that subsequence.
A convenient way to manage the index-to-subsequence mapping is to represent the first element of the input using the least significant bit (the 1's place), the second element with the 2's place, etc.; so, for example, if our input is [5, 2, 8, 3, 6, 9], then the subsequence 5 2 8 would be represented as array index 0001112 = 7, meaning results[7]. (You can also start with the most significant bit ā€” which is probably more intuitive ā€” but then the implementation of that mapping is a little bit less convenient. Up to you.)
Then proceed in order, from subset #0 (the empty subset) up through subset #2nāˆ’1 (the full input), calculating each array-element by seeing how much we get if we select each possible element and add the corresponding previously-stored values. So, for example, to calculate results[7] (for the subsequence 5 2 8), we select the largest of these values:
results[6] plus how much we get if we select the 5
results[5] plus how much we get if we select the 2
results[3] plus how much we get if we select the 8
Now, it might seem like it should require O(n2) time to compute any given array-element, since there are n elements in the input that we could potentially select, and seeing how much we get if we do so requires examining all other elements (to find the maximum among prior elements and the minimum among later elements). However, we can actually do it in just O(n) time by first making a pass from right to left to record the minimal value that is later than each element of the input, and then proceeding from left to right to try each possible value. (Two O(n) passes add up to O(n).)
An important caveat: I suspect that the correct solution only ever involves, at each step, selecting either the rightmost or second-to-rightmost element. If so, then the above solution calculates many, many more values than an algorithm that took that into account. For example, the result at index 1110002 is clearly not relevant in that case. But I can't prove this suspicion, so I present the above O(n2n) solution as the fastest solution whose correctness I'm certain of.
(I'm assuming that the elements are nonnegative absent a suggestion to the contrary.)
Here's an O(n^2)-time algorithm based on ruakh's conjecture that there exists an optimal solution where every selection is from the rightmost two, which I prove below.
The states of the DP are (1) n, the number of elements remaining (2) k, the index of the rightmost element. We have a recurrence
OPT(n, k) = max(max(R(0), ..., R(n - 2)) + R(n - 1) + R(k) + OPT(n - 1, k),
max(R(0), ..., R(n - 1)) + R(k) + OPT(n - 1, n - 1)),
where the first line is when we take the second rightmost element, and the second line is when we take the rightmost. The empty max is zero. The base cases are
OPT(1, k) = R(k)
for all k.
Proof: the condition of choosing from the two rightmost elements is equivalent to the restriction that the element at index i (counting from zero) can be chosen only when at most i + 2 elements remain. We show by induction that there exists an optimal solution satisfying this condition for all i < j where j is the induction variable.
The base case is trivial, since every optimal solution satisfies the vacuous restriction for j = 0. In the inductive case, assume that there exists an optimal solution satisfying the restriction for all i < j. If j is chosen when there are more than j + 2 elements left, let's consider what happens if we defer that choice until there are exactly j + 2 elements left. None of the elements left of j are chosen in this interval by the inductive hypothesis, so they are irrelevant. Choosing the elements right of j can only be at least as profitable, since including j cannot decrease the max. Meanwhile, the set of elements left of j is the same at both times, and the set of the elements right of j is a subset at the later time as compared to the earlier time, so the min does not decrease. We conclude that this deferral does not affect the profitability of the solution.

Checking if two substring overlaps in O(n) time

If I have a string S of length n, and a list of tuples (a,b), where a specifies the staring position of the substring of S and b is the length of the substring. To check if any substring overlaps, we can, for example, mark the position in S whenever it's touched. However, I think this will take O(n^2) time if the list of tuples has a size of n (looping the tuple list, then looping S).
Is it possible to check if any substring actually overlaps with the other in O(n) time?
Edit:
For example, S = "abcde". Tuples = [(1,2),(3,3),(4,2)], representing "ab","cde" and "de". I want to the know an overlap is discovered when (4,2) is read.
I was thinking it is O(n^2) because you get a tuple every time, then you need to loop through the substring in S to see if any character is marked dirty.
Edit 2:
I cannot exit once a collide is detected. Imagine I need to report all the subsequent tuples that collide, so i have to loop through the whole tuple list.
Edit 3:
A high level view of the algorithm:
for each tuple (a,b)
for (int i=a; i <= a+b; i++)
if S[i] is dirty
then report tuple and break //break inner loop only
Your basic approach is correct, but you could optimize your stopping condition, in a way that guarantees bounded complexity in the worst case. Think about it this way - how many positions in S would you have to traverse and mark in the worst case?
If there is no collision, then at worst you'll visit length(S) positions (and run out of tuples by then, since any additional tuple would have to collide). If there is a collision - you can stop at the first marked object, so again you're bounded by the max number of unmarked elements, which is length(S)
EDIT: since you added a requirement to report all colliding tuples, let's calculate this again (extending my comment) -
Once you marked all elements, you can detect collision for every further tuple with a single step (O(1)), and therefore you would need O(n+n) = O(n).
This time, each step would either mark an unmarked element (overall n in the worst case), or identify a colliding tuple (worst O(tuples) which we assume is also n).
The actual steps may be interleaved, since the tuples may be organized in any way without colliding first, but once they do (after at most n tuples which cover all n elements before colliding for the first time), you have to collide every time on the first step. other arrangements may collide earlier even before marking all elements, but again - you're just rearranging the same number of steps.
Worst case example: one tuple covering the entire array, then n-1 tuples (doesn't matter which) -
[(1,n), (n,1), (n-1,1), ...(1,1)]
First tuple would take n steps to mark all elements, the rest would take O(1) each to finish. overall O(2n)=O(n). Now convince yourself that the following example takes the same number of steps -
[(1,n/2-1), (1,1), (2,1), (3,1), (n/2,n/2), (4,1), (5,1) ...(n,1)]
According to your description and comment, the overlap problem may be not about string algorithm, it can be regarded as "segment overlap" problem.
Just use your example, it can be translated to 3 segments: [1, 2], [3, 5], [4, 5]. The question is to check whether the 3 segments have overlap.
Suppose we have m segments each have format [start, end] which means segment start position and end position, one efficient algorithm to detect overlap is to sort them by start position in ascending order, it takes O(m * lgm). Then iterate the sorted m segments, for each segment, try to find whether its end position, here you only need to check:
if(start[i] <= max(end[j], 1 <= j <= i-1) {
segment i is overlap;
}
maxEnd[i] = max(maxEnd[i-1], end[i]); // update max end position of 1 to i
Which each check operation takes O(1). Then the total time complexity is O(m*lgm + m), which can be regarded as O(m*lgm). While for each output, time complexity is related to each tuple's length, which is also related to n.
This is a segment overlap problem and the solution should be possible in O(n) itself if the list of tuples has been sorted in ascending order wrt the first field. Consider the following approach:
Transform the intervals from (start, number of characters) to (start, inclusive_end). Hence the above example becomes: [(1,2),(3,3),(4,2)] ==> [(1, 2), (3, 5), (4, 5)]
The tuples are valid if transformed consecutive tuples (a, b) and (c, d) always follow b < c. Else there is an overlap in the tuples mentioned above.
Each of 1 and 2 can be done in O(n) if the array is sorted in the form mentioned above.

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources