I'm currently working on a project which involved getting the amount of conflicts between two arrays. This means the differences in order in which certain numbers are placed in the array. Each number only occurs once and the two array's are always the same size.
For example:
[1,2,3,4]
[4,3,2,1]
These two arrays have 6 conflicts:
1 comes for 2 in the first array, but 2 comes for 1 in the second, so conflict + 1.
1 comes for 3 in the first array, but 3 comes for 1 in the second, so conflict + 1.
etc.
I've tried certain approaches to make an algorithm which computes the amount in O(n log n). I've already made one by using Dynamic Programming which is O(N²), but I want an algorithm which computes the value by Divide and Conquer.
Anyone has any thought on this?
You can also use self balancing binary search tree for finding the number of conflicts ("inversions").
Lets take an AVL tree for example.
Initialize inversion count = 0.
Iterate from 0 to n-1 and do following for every arr[i]
Insertion also updates the result. Keep counting the number of greater nodes when tree is traversed from root to leaf.
When we insert arr[i], elements from arr[0] to arr[i-1] are already inserted into AVL Tree. All we need to do is count these nodes.
For insertion into AVL Tree, we traverse tree from root to a leaf by comparing every node with arr[i[].
When arr[i[ is smaller than current node, we increase inversion count by 1 plus the number of nodes in right subtree of current node. Which is basically count of greater elements on left of arr[i], i.e., inversions.
Time complexity of above solution is O(n Log n) as AVL insert takes O(Log n) time.
Related
There are so many reference to find minimum/maximum of all subarrays of size k but how to find nth maximum/minimum in best possible way.
If we have to find just min/max of subarrays, then we can use deque solution with linear time complexity. But for nth min/max, I am not able to find the solution.
Note: n<=k
Example:
arr = {7,1,4,20,11,17,15}
n=2, k=4
output: 4,4,11,15
I believe the data structure you need is a little modified Binary Search Tree (BST), where each node also stores the size of it's subtree.
Adding, removing elements or finding nth element in a BST all then becomes log(K)*. So while sliding the window over your array, you have 3 log(K) operations, assuming total N elements in given array, the overall time complexity is therefore N*log(K).
You need a balanced BST (Red Black Tree for example) to maintain this time complexity. If you are coming from any online judges like Codeforce or Hackerrank, remember they more often than not provide inputs that will generate degenerate BSTs.
I have created a binary search tree in C language, when i am testing my tree, the insertion and search operations take different times to execute. for example, i have two scenarios, inserting random values from 1 to 10000 and inserting sorted values from 1 to 10000. when i insert random values from 1 to 10000 into my BST then it takes less time than inserting sorted values from 1 to 10000 into my BST.
the same for search operation to be executed in my BST it takes less time while i am searching in those random values, but takes too long while searching in sorted values in my BST.
Now, the problem is the time complexity, can anyone explain, how is this handled? what is the time complexity for all four cases?
Note: Inserting and searching those sorted values almost take the same time, still searching takes a bit longer!
If you don't balance the tree, its structure depends on the insertion order, and a "fully unbalanced" binary search tree is equivalent to a sorted linked list.
Thus, the worst case time complexity for your operations is linear in the tree's size, not logarithmic as it would be in a balanced tree.
For instance, if you insert from 1 and incrementing, you'll end up with
1
/\
2
/\
3
/\
...
where the right "spine" is a linked list.
Use AVL Tree. It will keep your tree balanced and you will always get search time of log(n)
I've been thinking a lot about the following problem:
We are given an array of n numbers. We start at the first index and our task is to get to the last index. Every move we can jump one or two steps forward and the number at the index we jump to represents the cost we need to pay for visiting that index. We need to find the cheapest way for getting to the end of the array.
For example if the array looks like this: [2,1,4,2,5] the cheapest way to get to the end is 10: we visit the indexes 1->2->4->5 and we have to pay 2+1+2+5 = 10 which is the cheapest possible way. Let f(i) be the cheapest price to get to the index i. This we can calculate easily in O(n) time with dynamic programming by realizing that f(i) = arr[i] + min(f(i-1),f(i-2))
But here's the twist:
The array gets updated several times and after every update we need to be able to tell in O(logn) time what is the cheapest way at the moment. Updating the array happens by telling the index which will be changed and the number it will be changed to. For example the update could be arr[2] = 7 changing our example array to [2,7,4,2,5]. Now the cheapest way would be 11.
Now how can we support these updates in O(logn) time? Any ideas?
Here's what I've come up with so far:
First I would create an array f for the dynamic programming like described before. I would store the content of this array in a segment tree s in the following way: s(i) = f(i) - f(i-1). This would allow me to update intervals of f (adding a constant to every value) in O(logn) time and ask for the values at a given index in O(logn) time. This would come handy since after some updates it often happens that all the values in f would need to be increased by a constant after some given index in f. So by asking for the value of s(n) in the segment tree after every update we would get the answer we need.
There are however different things that can happen after an update:
Only one value in f needs to be updated. For exampe if [2,1,4,2,5,4,9,3,7] gets changed to [2,1,9,2,5,4,9,3,7] only f(3) would need to be updated, since no cheapest way went through the 3. index anyway.
All the values in f after a given index need to be updated by a constant. This is what the segment tree is good for.
Every other value in f after a given index needs to be updated by a constant.
Something more random.
Alright, I managed to solve the problem all myself so I decided to share the solution with you. :)
I was on the right track with dynamic programming and segment tree, but i was feeding the segment tree in a wrong way in my previous attempts.
Here's how we can support the updates in O(logn) time:
The idea is to use a binary segment tree where the leaves of the tree represent the current array and every node stores 4 different values.
v1 = The lowest cost to get from the leftmost descendant to the rightmost descendant
v2 = The lowest cost to get from the leftmost descendant to the second rightmost descendant
v3 = The lowest cost to get from the second leftmost descendant to the rightmost descendant
v4 = The lowest cost to get from the second leftmost descendant to the second rightmost descendant
With descendants I mean the descendants of the node that are also leaves.
When updating the array we update the value at the leaf and then all its ancestors up to the root. Since at every node we already know all of these 4 values of its two children, we can calculate easily the new 4 values for the current parent node. Just to give an example: v1_current_node = min(v2_leftchild+v1_rightchild, v1_leftchild+v1_rightchild, v1_leftchild+v3_rightchild). All the other three values can be calculated in a similar way.
Since there are only O(logn) ancestors for every leaf, and all 4 values are calculated in O(1) time it takes only O(logn) time to update the entire tree.
Now that we know the 4 values for every node, we can in a similar way calculate the lowest cost from the first to the n:th node by using the nodes of the highest powers of 2 in our path in the tree that add up to n. For example if n = 11 we want to know the lowest cost from first to eleventh node and this can be done by using the information of the node that covers the leaves 1-8, the node that covers the leaves 9-10 and the leaf node 11. For all of those three nodes we know the 4 values described and we can in a similar way combine that information to figure out the answer. At very most we need to consider O(logn) nodes for doing this so that is not a problem.
Suppose you have an array of integers (for eg. [1 5 3 4 6]). The elements are rearranged according to the following rule. Every element can hop forward (towards left) and slide the elements in those indices over which it hopped. The process starts with element in second index (i.e. 5). It has a choice to hop over element 1 or it can stay in its own position.If it does choose to hop, element 1 slides down to index 2. Let us assume it does choose to hop and our resultant array will then be [5 1 3 4 6]. Element 3 can now hop over 1 or 2 positions and the process repeats. If 3 hops over one position the array will now be [5 3 1 4 6] and if it hops over two positions it will now be [3 5 1 4 6].
It is very easy to show that all possible permutation of the elements is possible in this way. Also any final configuration can be reached by an unique set of occurrences.
The question is, given a final array and a source array, find the total number of hops required to arrive at the final array from the source. A O(N^2) implementation is easy to find, however I believe this can be done in O(N) or O(NlogN). Also if it is not possible to do better than O(N2) it will be great to know.
For example if the final array is [3,5,1,4,6] and the source array [1,5,3,4,6], the answer will be 3.
My O(N2) algorithm is like this: you loop over all the positions of the source array from the end, since we know that is the last element to move. Here it will be 6 and we check its position in the final array. We calculate the number of hops necessary and need to rearrange the final array to put that element in its original position in the source array. The rearranging step goes over all the elements in the array and the process loops over all the elements, hence O(N^2). Using Hashmap or map can help in searching, but the map needs to be updated after every step which makes in O(N^2).
P.S. I am trying to model correlation between two permutations in a Bayesian way and this is a sub-problem of that. Any ideas on modifying the generative process to make the problem simpler is also helpful.
Edit: I have found my answer. This is exactly what Kendall Tau distance does. There is an easy merge sort based algorithm to find this out in O(NlogN).
Consider the target array as an ordering. A target array [2 5 4 1 3] can be seen as [1 2 3 4 5], just by relabeling. You only have to know the mapping to be able to compare elements in constant time. On this instance, to compare 4 and 5 you check: index[4]=2 > index[5]=1 (in the target array) and so 4 > 5 (meaning: 4 must be to the right of 5 at the end).
So what you really have is just a vanilla sorting problem. The ordering is just different from the usual numerical ordering. The only thing that changes is your comparison function. The rest is basically the same. Sorting can be achieved in O(nlgn), or even O(n) (radix sort). That said, you have some additional constraints: you must sort in-place, and you can only swap two adjacent elements.
A strong and simple candidate would be selection sort, which will do just that in O(n^2) time. On each iteration, you identify the "leftiest" remaining element in the "unplaced" portion and swap it until it lands at the end of the "placed" portion. It can improve to O(nlgn) with the use of an appropriate data structure (priority queue for identifying the "leftiest" remaining element in O(lgn) time). Since nlgn is a lower bound for comparison based sorting, I really don't think you can do better than that.
Edit: So you're not interested in the sequence of swaps at all, only the minimum number of swaps required. This is exactly the number of inversions in the array (adapted to your particular needs: "non natural ordering" comparison function, but it doesn't change the maths). See this answer for a proof of that assertion.
One way to find the number of inversions is to adapt the Merge Sort algorithm. Since you have to actually sort the array to compute it, it turns out to be still O(nlgn) time. For an implementation, see this answer or this (again, remember that you'll have to adapt).
From your answer I assume number of hops is total number of swaps of adjacent elements needed to transform original array to final array.
I suggest to use something like insert sort, but without insertion part - data in arrays will not be altered.
You can make queue t for stalled hoppers as balanced binary search tree with counters (number of elements in subtree).
You can add element to t, remove element from t, balance t and find element position in t in O(log C) time, where C is the number of elements in t.
Few words on finding position of element. It consists of binary search with accumulation of skipped left sides (and middle elements +1 if you keep elements on branches) counts.
Few words on balancing/addition/removal. You have to traverse upward from removed/added element/subtree and update counters. Overall number of operations still hold at O(log C) for insert+balance and remove+balance.
Let's t is that (balanced search tree) queue, p is current original array index, q is final array index, original array is a, final array is f.
Now we have 1 loop starting from left side (say, p=0, q=0):
If a[p] == f[q], then original array element hops over the whole queue. Add t.count to the answer, increment p, increment q.
If a[p] != f[q] and f[q] is not in t, then insert a[p] into t and increment p.
If a[p] != f[q] and f[q] is in t, then add f[q]'s position in queue to answer, remove f[q] from t and increment q.
I like the magic that will ensure this process will move p and q to ends of arrays in the same time if arrays are really permutations of one array. Nevertheless you should probably check p and q overflows to detect incorrect data as we have no really faster way to prove data is correct.
I was reading some practice interview questions and I have a question about this one. Assume a list of random integers each between 1 & 100, compute the sum of k largest integers? Discuss space and time complexity and whether the approach changes if each integer is between 1 & m where m varies?
My first thought is to sort the array and compute the sum of largest k numbers. Then, I thought if I use a binary tree structure where I can look starting from bottom right tree. I am not sure if my approach would change whether numbers are 1 to 100 or 1 to m? Any thoughts of most efficient approach?
The most efficient way might be to use something like randomized quickselect. It doesn't do the sorting step to completion and instead does just the partition step from quicksort. If you don't want the k largest integers in some particular order, this would be the way I'd go with. It takes linear time but the analysis is not very straightforward. m would have little impact on this. Also, you can write code in such a way that the sum is computed as you partition the array.
Time: O(n)
Space: O(1)
The alternative is sorting using something like counting sort which has a linear time guarantee. As you say the values are integers in a fixed range, it would work quite well. As m increases the space requirement goes up, but computing the sum is quite efficient within the buckets.
Time: O(m) in the worst case (see comments for the argument)
Space: O(m)
I'd say sorting is probably uneccessary. If k is small, then all you need to do is maintain a sorted list that truncates elements beyond the kth largest element.
Each step in this should be O(k) in the worst possible case where the element added is maximized. However, the average case scenario is much better, after a certain number of elements, most should just be smaller than the last element in the list and the operation will be O(log(k)).
One way is to use a min-heap (implemented as a binary tree) of maximum size k. To see if a new element belongs in the heap or not is only O(1) since it's a min-heap and retrieval of minimum element is a constant time operation. Each insertion step (or non-insertion...in the case of an element that is too small to be inserted) along the O(n) list is O(log k). The final tree traversal and summation step is O(k).
Total complexity:
O (n log k + k) = O(n log k))
Unless you have multiple cores running on your computer, in which case, parallel computing is an option, summation should only be done at the end. On-the-fly-computing adds additional computation steps without actually reducing your time complexity at all (you will actually have more computations to do) . You will always have to sum k elements anyways, so why not avoid the additional addition and subtraction steps?