Least cost accumulative path in a sorted array - arrays

This question is an extension of a question asked earlier:
Least cost path in a sorted array
Given a sorted array A e.g. {4,9,10,11,19}. The cost for moving from i->j is
abs(A[j] - A[i]) + cost_incurred_till_i. Start from a given element e.g. 10. Find the least cost path without visiting same element twice.
For the given array:
10->9->4->11->19 cost: 1+(1+5)+(1+5+7)+(1+5+7+8) = 41
10->4->9->11->19 cost: 5+(5+5)+(5+5+2)+(5+5+2+8) = 47
10->9->11->4->19 cost: 1+(1+2)+(1+2+7)+(1+2+7+15) = 39
10->11->9->4->19 cost: 1+(1+2)+(1+2+5)+(1+2+5+15) = 35 --one of optimal paths
10->11->19->9->4 cost: 1+(1+8)+(1+8+10)+(1+8+10+5) = 53
10->11->19->4->9 cost: 1+(1+8)+(1+8+15)+(1+8+15+5) = 63
...
I tried to solve this using nearest neighbor approach.
i = start
While (array is not empty)
ldiff = A[i] - A[i-1]
rdiff = A[i+1] - A[i]
(ldiff < rdiff) ? sum += ldiff : sum += rdiff
remove A[i]
In this case nearest neighbor works for some cases where we don't have equal weighted paths. I have realised that this is TSP problem. What could be the best approach to solve this problem? Shall I use TSP heuristics like Christofides or some other algorithm?

You're close, and you can just modify the nearest neighbor a bit. When the two neighbors are equal, check the element past that neighbor, and go in the opposite direction of whichever's closer(to avoid backtracking as much). If those elements are the same distance, just keep looking ahead until they're not. If you reach an out-of-bounds before you see a difference, go toward it.
Your example is a good one to see this:
The only branch point we have is deciding whether to visit 9 or 11 in the first step from 10. Looking past them in both directions shows 4 and 19. 4 is closer to 10, so head away from it(to 11).
Obviously this will be quicker with arrays that don't have many sequential evenly-spaced elements. If none of them were evenly spaced, it would be the same as yours, running in n steps.
Worst case is that you'll have to look all the way to both ends at each step, which would visit every element. Since we're running this once for each n element, it comes out to O(n^2). An example would be an array with all evenly spaced elements, starting your search from dead center.

There is an O(n2) dynamic programming solution. I don't know if it's optimal.
The next choice is always an immediate neighbour from amongst the unvisited nodes, so the visited nodes form a contiguous range. A logical subproblem is to find a partial solution given the range of visited nodes. The optimal solutions to the subproblems only depend on the visited range and the last visited node (which must be one of the endpoints).
Subproblems can be encoded using two indices identifying the visited range, with the order indicating the last visited node. The solution to subproblem (a, b) is the partial solution given that the nodes from min(a,b) to max(a,b) have already been visited and that a was the last visited node. It can be defined recursively as the better of
insert(a, solve(a - dir, b))
insert(a, solve(b + dir, a))
where dir is 1 if b >= a and -1 otherwise.
There are two base cases. Subproblem (0, n-1) has solution {A[0]}, and subproblem (n-1, 0) has solution {A[n-1]}. These correspond to the final choice, which is either the first node or the last node.
The full problem corresponds to subproblem (s, s), where s is the index of the starting element.

Related

Sum over n-tuples with total sum equal to k

I want to sum over tuples of length n, i.e. I have a vector (m_1,...,m_n) where mi is an integer greater or equal to zero with the constraint that the sum of all vector elements is equal to k.
What is the most efficient way to implement this?
My naive approach would be to iterate through all combinations with m_i between 0 and k and check if they satisfy the criterion, but this seems inefficient.
For instance, if k=2 and n=2, then
(2,0),(1,1),(0,2) would be the possible values of m1,m2 that I would like to have. Is there a way to generate these numbers efficiently (I don't necessarily have to store them all in an array, but I want to iterate over all possible combinations)
Ok, random stuff I deleted.
If you look at FXT book/library by J.Arndt, there is on page 342 section 16.3 "Partition into m parts"
Here is algorithm and reference to the code to generate exactly m-vector of partitioning of n.
You'll probably need to modify it, he doesn't have bins with zeros, starts with ones.
And some thoughts on the matter. n is sum, and you have k bins. Start with |n|0|...|0| combination. Define operation "distribute 1" which is take one from the leftmost bin and distribute it into all other bins.
E.g. D1(|n|0|...|0|)=tuple(|n-1|1|...|0|, ..., |n-1|0|...|1|)
Then you apply D1() to the tuple, and get tuple of tuples. And so on and so forth, till first bin is exhausted.
You could think this as a tree:
root |n|0|...|0|
D1 applied once, k-1 leaves |n-1|1|...|0| ... |n-1|0|...|1|
Next tree level, D1 applied once to previous level, each node getting k-1 children.
THe only thing left is how to traverse it - DFS, BFS, or anything else from https://en.wikipedia.org/wiki/Tree_traversal

Interleaving array {a1,a2,....,an,b1,b2,...,bn} to {a1,b1,a2,b2,a3,b3} in O(n) time and O(1) space

I have to interleave a given array of the form
{a1,a2,....,an,b1,b2,...,bn}
as
{a1,b1,a2,b2,a3,b3}
in O(n) time and O(1) space.
Example:
Input - {1,2,3,4,5,6}
Output- {1,4,2,5,3,6}
This is the arrangement of elements by indices:
Initial Index Final Index
0 0
1 2
2 4
3 1
4 3
5 5
By observation after taking some examples, I found that ai (i<n/2) goes from index (i) to index (2i) & bi (i>=n/2) goes from index (i) to index (((i-n/2)*2)+1). You can verify this yourselves. Correct me if I am wrong.
However, I am not able to correctly apply this logic in code.
My pseudo code:
for (i = 0 ; i < n ; i++)
if(i < n/2)
swap(arr[i],arr[2*i]);
else
swap(arr[i],arr[((i-n/2)*2)+1]);
It's not working.
How can I write an algorithm to solve this problem?
Element bn is in the correct position already, so lets forget about it and only worry about the other N = 2n-1 elements. Notice that N is always odd.
Now the problem can be restated as "move the element at each position i to position 2i % N"
The item at position 0 doesn't move, so lets start at position 1.
If you start at position 1 and move it to position 2%N, you have to remember the item at position 2%N before you replace it. The the one from position 2%N goes to position 4%N, the one from 4%N goes to 8%N, etc., until you get back to position 1, where you can put the remaining item into the slot you left.
You are guaranteed to return to slot 1, because N is odd and multiplying by 2 mod an odd number is invertible. You are not guaranteed to cover all positions before you get back, though. The whole permutation will break into some number of cycles.
If you can start this process at one element from each cycle, then you will do the whole job. The trouble is figuring out which ones are done and which ones aren't, so you don't cover any cycle twice.
I don't think you can do this for arbitrary N in a way that meets your time and space constraints... BUT if N = 2x-1 for some x, then this problem is much easier, because each cycle includes exactly the cyclic shifts of some bit pattern. You can generate single representatives for each cycle (called cycle leaders) in constant time per index. (I'll describe the procedure in an appendix at the end)
Now we have the basis for a recursive algorithm that meets your constraints.
Given [a1...an,b1...bn]:
Find the largest x such that 2x <= 2n
Rotate the middle elements to create [a1...ax,b1...bx,ax+1...an,bx+1...bn]
Interleave the first part of the array in linear time using the above-described procedure, since it will have modulus 2x-1
Recurse to interleave the last part of the array.
Since the last part of the array we recurse on is guaranteed to be at most half the size of the original, we have this recurrence for the time complexity:
T(N) = O(N) + T(N/2)
= O(N)
And note that the recursion is a tail call, so you can do this in constant space.
Appendix: Generating cycle leaders for shifts mod 2x-1
A simple algorithm for doing this is given in a paper called "An algorithm for generating necklaces of beads in 2 colors" by Fredricksen and Kessler. You can get a PDF here: https://core.ac.uk/download/pdf/82148295.pdf
The implementation is easy. Start with x 0s, and repeatedly:
Set the lowest order 0 bit to 1. Let this be bit y
Copy the lower order bits starting from the top
The result is a cycle leader if x-y divides x
Repeat until you have all x 1s
For example, if x=8 and we're at 10011111, the lowest 0 is bit 5. We switch it to 1 and then copy the remainder from the top to give 10110110. 8-5=3, though, and 3 does not divide 8, so this one is not a cycle leader and we continue to the next.
The algorithm I'm going to propose is probably not o(n).
It's not based on swapping elements but on moving elements which probably could be O(1) if you have a list and not an array.
Given 2N elements, at each iteration (i) you take the element in position N/2 + i and move it to position 2*i
a1,a2,a3,...,an,b1,b2,b3,...,bn
| |
a1,b1,a2,a3,...,an,b2,b3,...,bn
| |
a1,b1,a2,b2,a3,...,an,b3,...,bn
| |
a1,b1,a2,b2,a3,b3,...,an,...,bn
and so on.
example with N = 4
1,2,3,4,5,6,7,8
1,5,2,3,4,6,7,8
1,5,2,6,3,4,7,8
1,5,2,6,3,7,4,8
One idea which is a little complex is supposing each location has the following value:
1, 3, 5, ..., 2n-1 | 2, 4, 6, ..., 2n
a1,a2, ..., an | b1, b2, ..., bn
Then using inline merging of two sorted arrays as explained in this article in O(n) time an O(1) space complexity. However, we need to manage this indexing during the process.
There is a practical linear time* in-place algorithm described in this question. Pseudocode and C code are included.
It involves swapping the first 1/2 of the items into the correct place, then unscrambling the permutation of the 1/4 of the items that got moved, then repeating for the remaining 1/2 array.
Unscrambling the permutation uses the fact that left items move into the right side with an alternating "add to end, swap oldest" pattern. We can find the i'th index in this permutation with this this rule:
For even i, the end was at i/2.
For odd i, the oldest was added to the end at step (i-1)/2
*The number of data moves is definitely O(N). The question asks for the time complexity of the unscramble index calculation. I believe it is no worse than O(lg lg N).

Time Complexity of Insertion and Selection sort When there are only two key values in an array

I am reviewing Algorithm, 4th Editon by sedgewick recently, and come across such a problem and cannot solve it.
The problem goes like this:
2.1.28 Equal keys. Formulate and validate hypotheses about the running time of insertion
sort and selection sort for arrays that contain just two key values, assuming that
the values are equally likely to occur.
Explanation: You have n elements, each can be 0 or 1 (without loss of generality), and for each element x: P(x=0)=P(x=1).
Any help will be welcomed.
Selection sort:
The time complexity is going to remain the same (as it is without the 2 keys assumption), it is independent on the values of the arrays, only the number of elements.
Time complexity for selection sort in this case is O(n^2)
However, this is true only for the original algorithm that scans the entire tail of the array for each outer loop iteration. if you optimize it to find the next "0", at iteration i, since you have already "cleared" the first i-1 zeros, the i'th zero mean location is at index 2i. This means each time, the inner loop will need to do 2i-(i-1)=i+1 iterations.
Suming it up will be:
1 + 2 + ... + n = n(n+1)/2
Which is, unfortunately, still in O(n^2).
Another optimization could be to "remmber" where you have last stopped. This will significantly improve complexity to O(n), since you don't need to traverse the same element more than once - but that's going to be a different algorithm, not selection sort.
Insertion Sort:
Here, things are more complicated. Note that in the inner loop (taken from wikipedia), the number of operations depends on the values:
while j > 0 and A[j-1] > x
However, recall that in insertion sort, after the ith step, the first i elements are sorted. Since we are assuming P(x=0)=P(x=1), an average of i/2 elements are 0's and i/2 are 1's.
This means, the time complexity on average, for the inner loop is O(i/2).
Summing this up will get you:
1/2 + 2/2 + 3/2 + ... + n/2 = 1/2* (1+2+...+n) = 1/2*n(n+1)/2 = n(n+1)/4
The above is however, still in O(n^2).
The above is not a formal proof, because it implicitly uses E(f(E(x)) = E(f(x)), which is not true, but it can give you guidelines how to formally build your proof.
Well obviosuly you only need to search until you find the first 0, when searching for the next smmalest. For example, in the selection sort, you scan the array looking for the next smallest number to swap into the current position. Since there are only 0s and 1s you can stop the scan when encountering the first 0 (since it is the next smallest number), so there is no need to continue scanning the rest of the array in this cycle. If 0 is not found then the sorting is complete, since the "unsorted" portion is all 1s.
Insertion sort is basically the same. They are both O(N) in this case.

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Median of Lists

I was asked this question:
You are given two lists of integers, each of which is sorted in ascending order and each of which has length n. All integers in the two lists are different. You wish to find the n-th smallest element of the union of the two lists. (That is, if you concatenated the lists and sorted the resulting list in ascending order, the element which would be at the n-th position.)
My Solution:
Assume that lists are 0-indexed.
O(n) solution:
A straight-forward solution is to observe that the arrays are already sorted,so we can merge them, and stop after n steps. The first n-1 elements do not need to be copied
into a new array, so this solution takes O(n) time and O(1) memory.
O(log2 n) solution:
The O(log2 n) solution uses alternates binary search on each list. In short, it takes the middle element in the current search interval in the first list (l1[p1]) and searches for it in l2. Since the elements are unique, we will find at most 2 values closest to l1[p1]. Depending on their values relative to l1[p1-1] and l1[p1 + 1] and their indices p21 and p22, we either return the n-th element or recurse: If the sum of any out of the (at most) 3 indices in l1 can be combined with one of the (at most) 2 indices in l2 so that l1[p1'] and l2[p2'] would be right next to each other in the sorted union of the two lists and p1' + p2' = n or p1' + p2' = n + 1, we return one of the 5 elements. If p1 + p2 > n, we recurse to left half of the search interval in l1, otherwise we recurse to the right interval. This way, for out of the O(log n) possible midpoints in l1 we do an O(log n) binary search in l2. Therefore the running time is O(log2 n).
O(log n) solution:
Assuming the lists l1 and l2 have constant access time to any of their elements, we
can use a modified version of binary search to get an O(log n) solution. The easiest approach is to search for an index p1 in just one of the lists and calculate the corresponding index p2 in the other list so that p1 + p2 = n at all times. (This assumes the lists are indexed from 1.)
First we check for the special case when all elements of one list are smaller than any element in the other list:
If l1[n] < l2[0] return l1[n].
If l2[n] < l1[0] return l2[n].
If we do not find the n-th smallest element after this step, call findNth(1,n) with the approximate pseudocode:
findNth(start,end)
p1 = (start + end)/2
p2 = n-p1
if l1[p1] < l2[p2]:
if l1[p1 + 1] > l2[p2]:
return l2[p2]
else:
return findNth(p1+1, end)
else:
if l2[p2 + 1] > l1[p1]:
return l1[p1]
else:
return findNth(start,p1-1)
Element l2[p2] is returned when l2[p2] is greater than exactly p1 + p2-1 = n-1 elements
(and therefore is the n-th smallest). l1[p1] is returned under the same but symmetric conditions. If l1[p1] < l2[p2] and l1[p1+1] < l2[p2], the rank of l2[p2] is greater than n, so we need to take more elements from l1 and less from l2. Therefore we search for p1 in the upper half of the previous search interval. On the other hand, if l2[p2] < l1[p1] and l2[p2 + 1] < l1[p1], the rank of l1[p1] is greater than n. Therefore the real p1 will lie in the bottom half of our current search interval.Since we are halving the size of the problem at each call to findNth and we need to do only constant work to halve the problem size, the recurrence for this algorithm is T(n) = T(n/2) +O(1), which has an O(log n)-time solution.
Interviewer continually ask me different approaches for above problem.I have proposed above three approaches.Is they are correct?Is there any other best possible solution for this question? Actually this question asked lot of times so please provide some good stuff about it.
Not sure if you took a look at this: http://www.leetcode.com/2011/01/find-k-th-smallest-element-in-union-of.html
That solve a more generalized version of the problem you are asking about. Definitely log complexity is possible...
I think this will be the best solution . .
->1 2 3 4 5 6 7 8 9
->10 11 12 13 14 15 16 17 18
take two pointers i and j each pointing at start of arrays, increment i if a[i]< b[j]
increment j if a[i]>b[j]
do this n times.
linear O(n) O(1) space solution.

Resources