Calculate the maximum number of overlapping intervals - avl-tree

Calculate the maximum number of overlapping intervals with some conditions about operations:
Insert an interval: O(logN)
Remove an interval: O(logN)
Calculate(the maximum number of overlapping intervals): O(1)
I think this problem can be solved by using avl tree (suitable for Insert and Remove operations) but I dont know how to design avl tree to satisfy requirement of Calculate operation.
Edit: Example: [start, end)
Input: [1,2),[3,4),[1,6),[3,6),[6,7)
Output: 3

You need to use a Red Black tree and implement a Point of Maximum Overlap method.
The pseudo-code is in this link. Point of Maximum Overlap

Related

why is the size of an Avl tree O(n)?

The AVL tree only has O(logn) for all his operation since its a balanced tree. The height is O(logn) as well so how come the size of the AVL tree itself is O(n) can someone explain that to me? I know that you have to to calculate left subtree+1(for root)+ right subtree to get the size of the whole tree.Howevery the operation to get for exmaple the size of the right subtree is log(n) and logn + logn+1 doesnt equal O(n)
When we talk about time complexity or space complexity, we mean the rate at which the time or space requirements change with respect to the size of input. Eg. when we say O(1), we mean regardless of the size of input, the time (in case of time complexity) or space (in case of space complexity) is constant. So O(1) does not mean 1 second or 1 minute. It just means constant with respect to input size. If you plot the execution time against different input sizes, you'd get a horizontal line. Similar is the case for O(n) or O(log n).
Now with this understanding, let's talk about AVL tree. AVL tree is a balanced binary search tree. Therefore the average time complexity to search for a node in the tree is O(log n). Note that to search a node, you don't visit every single node of the tree (unlike a LinkedList). If you had to visit every single node, you'd have said the time complexity is O(n). In case of AVL tree, every time you find a mismatch, you discard one half of the tree and move on to search in the remaining half.
In the worst case you'd make one comparision at each level of the tree i.e. equal to the hight of the tree, so the search time complexity is O(log n). Size of left tree is not O(log n).
Talking about size, you do need space to store each node. if you have to store 1 node, you'd need 1 unit space, for 2 nodes, 2 units, for 3 nodes, 3 units and so on. This unit could be anything 10 bytes, 1 KB, 5 KB anything. Thr point is if you plot the space requirement of the input in computer memory against the number of trees, all you get is a linear graph starting at zero. That's O(n).
Too further clarify, while computing the time or space complexity of an algorithm, if the complexity comes as O(1 + log n + 4n + 2^n + 100), we call it O(2^n) i.e. we take the largest value because we are not calculating the absolute value, we are calculating the rate of change with respect to the size of input and thus the largest value is what matters.
If you talk about the time complexity of the algorithm to calculate the size of the tree, you need yo visit every node in the tree. Since the total number of nodes is n, it will be O(n).
To calculate the size of a tree you will have to traverse each node once present in the tree.Hence if there are n nodes in a tree traversing each node once will eventually lead to the time complexity of o(n).

Sort a set of disks in minimum number of moves

So this question is more of an algorithm/approach seeking question where I'm looking for any thoughts/insights on how I can approach this problem. I'm browsing through a set of programming problems and came across one question where I'm required to provide the minimum number of moves needed to sort a list of items. Although this problem is marked as 'Easy', I can't find a good solution for this. Your thoughts are welcome.
The problem statement is something like this.
X has N disks of equal radius. Every disk has a distinct number out of 1 to N associated with it. Disks are placed one over other in a single pile in a random order. X wants to sort this pile of disk in increasing order, top to bottom. But he has a very special method of doing this. In a single step he can only choose one disk out of the pile and he can only put it at the top. And X wants to sort his pile of disks in minimum number of possible steps. Can you find the minimum number of moves required to sort this pile of randomly ordered disks?
The easy way to solving it without considering making minimum moves will be:
Take a disk that is max value and put it on top. And then take the second max and put it on top. And so on till all are sorted. Now this greedy approach will not always give you min steps.
Consider this example: [5,4,1,2,3] with the above greedy approach it will be like this:
[5,4,1,2,3]
[4,1,2,3,5]
[1,2,3,5,4]
[1,2,5,4,3]
[1,5,4,3,2]
[5,4,3,2,1]
Which takes 5 moves, but the min moves should be this:
[5,4,1,2,3]
[5,4,1,3,2]
[5,4,3,2,1]
Which takes only 2
To get min moves, first think how many values are already in descending order starting from N, you can consider those something you don’t need to move. And for the rest you have to move which is the min value. For example
[1,5,2,3,10,4,9,6,8,7]
Here starting from 10 there are in total 4 numbers that are in desc order [10,9,8,7] for the rest you need to move. So the min moves will the 10-4 = 6
[1,5,2,3,10,4,9,6,8,7]
[1,5,2,3,10,4,9,8,7,6]
[1,2,3,10,4,9,8,7,6,5]
[1,2,3,10,9,8,7,6,5,4]
[1,2,10,9,8,7,6,5,4,3]
[1,10,9,8,7,6,5,4,3,2]
[10,9,8,7,6,5,4,3,2,1]

Sorting algorithm vs. Simple iterations

I'm just getting started in algorithms and sorting, so bear with me...
Let's say I have an array of 50000 integers.
I need to select the smallest 30000 of them.
I thought of two methods :
1. I iterate the entire array and find each smallest integer
2. I first sort the entire array , and then simply select the first 30000.
Can anyone tell me what's the difference, which method would be faster, and why?
What if the array was smaller or bigger? Would the answer change?
Option 1 sounds like the naive solution. It would involve passing through the array to find the smallest item 30000 times. Each time it finds the smallest, presumably it would swap that item to the beginning or end of the array. In basic terms, this is O(n^2) complexity.
The actual number of operations involved would be less than n^2 because n reduces every time. So you would have roughly 50000 + 49999 + 49998 + ... + 20001, which amounts to just over 1 billion (1000 million) iterations.
Option 2 would employ an algorithm like quicksort or similar, which is commonly O(n.logn).
Here it's harder to provide actual figures, because some efficient sorting algorithms can have a worst-case of O(n^2). But let's say you use a well-behaved one that is guaranteed to be O(n.logn). This would amount to 50000 * 15.61 which is about 780 thousand.
So it's clear that Option 2 wins in this case.
What if the array was smaller or bigger? Would the answer change?
Unless the array became trivially small, the answer would still be Option 2. And the larger your array becomes, the more beneficial Option 2 becomes. This is the nature of time complexity. O(n^2) grows much faster than O(n.logn).
A better question to ask is "what if I want fewer smallest values, and when does Option 1 become preferable?". Although the answer is slightly more complex because of numerous factors (such as what constitutes "one operation" in Option 1 vs Option 2, plus other issues like memory access patterns etc), you can get the simple answer directly from time complexity. Option 1 would become preferable when the number of smallest values to select drops below n.logn. In the case of a 50000-element array, that would mean if you want to select 15 or less smallest elements, then Option 1 wins.
Now, consider an Option 3, where you transform the array into a min-heap. Building a heap is O(n), and removing one item from it is O(logn). You are going to remove 30000 items. So you have the cost of building plus the cost of removal: 50000 + 30000 * 15.6 = approximately 520 thousand. And this is ignoring the fact that n gets smaller every time you remove an element. It's still O(n.logn), like Option 2 but it is probably faster: you've saved time by not bothering to sort the elements you don't care about.
I should mention that in all three cases, the result would be the smallest 30000 values in sorted order. There may be other solutions that would give you these values in no particular order.
30k is close to 50k. Just sort the array and get the smallest 30k e.g., in Python: sorted(a)[:30000]. It is O(n * log n) operation.
If you were needed to find 100 smallest items instead (100 << 50k) then a heap might be more suitable e.g., in Python: heapq.nsmallest(100, a). It is O(n * log k).
If the range of integers is limited—you could consider O(n) sorting methods such as counting sort and radix sort.
Simple iterative method is O(n**2) (quadratic) here. Even for a moderate n that is around a million; it leads to ~10**12 operations that is much worse than ~10**6 for a linear algorithm.
For nearly all practical purposes, sorting and taking the first 30,000 is the likely to be best. In most languages, this is one or two lines of code. Hard to get wrong.
If you have a truly demanding application or are just out to fiddle, you can use a selection algorithm to find the 30,000th largest number. Then one more pass through the array will find 29,999 that are no bigger.
There are several well known selection algorithms that require only O(n) comparisons and some that are sub-linear for data with specific properties.
The fastest in practice is QuickSelect, which - as its name implies - works roughly like a partial QuickSort. Unfortunately, if the data happens to be very badly ordered, QuickSelect can require O(n^2) time (just as QuickSort can). There are various tricks for selecting pivots that the make it virtually impossible to get the worst case run time.
QuickSelect will finish with the array reordered so the smallest 30,000 elements are in the first part (unsorted) followed by the rest.
Because standard selection algorithms are comparison-based, they'll work on any kind of comparable data, not just integers.
You can do this in potentially O(N) time with radix sort or counting sort, given that your input is integers.
Another method is to get the 30000th largest integer by quickselect and simply iterate through the original array. This has Θ(N) time complexity, but in the worst case has O(N^2) for quickselect.

Minimum Heigth AVL-Tree

I was just reading this (http://condor.depaul.edu/ntomuro/courses/417/notes/lecture1.html) paper which proves the minimum number of nodes in an AVL-Tree.
Yet, I do not understand the meaning of the result, since O(log n) is not referring to the number of nodes at all. How can this be a prove?
I do however understand the first steps and how the iterations are simplified.
But after the 4th step I am failing to understand what he is exactly doing (even though I can vaguely imagine).
Could anybody please explain to me, what the last few lines are proving and how he is simplifying expressions at the end of part 1?
Thanks
O(logn) does refer to nodes. "n" represents the the number of nodes. You can think about it intuitively by realizing that the number of nodes on each subsequent level doubles. Because it's an AVL tree, the previous level has to be full before pushing nodes to the next level. This restricts the height of the tree to logn because of the fact that each layer doubles the number of nodes. In other words, the number of nodes can be written as nodes=2^height - 1. When you solve for the height and round you get logn.

A* search algorithm heuristic function

I am trying to find the optimal solution to a Sliding Block Puzzle of any length using the A* algorithm.
The Sliding Block Puzzle is a game with white (W) and black tiles (B) arranged on a linear game board with a single empty space(-). Given the initial state of the board, the aim of the game is to arrange the tiles into a target pattern.
For example my current state on the board is BBW-WWB and I have to achieve BBB-WWW state.
Tiles can move in these ways :
1. slide into an adjacent empty space with a cost of 1.
2. hop over another tile into the empty space with a cost of 1.
3. hop over 2 tiles into the empty space with a cost of 2.
I have everything implemented, but I am not sure about the heuristic function. It computes the shortest distance (minimal cost) possible for a misplaced tile in current state to a closest placed same color tile in goal state.
Considering the given problem for the current state BWB-W and goal state BB-WW the heuristic function gives me a result of 3. (according to minimal distance: B=0 + W=2 + B=1 + W=0). But the actual cost of reaching the goal is not 3 (moving the misplaced W => cost 1 then the misplaced B => cost 1) but 2.
My question is: should I compute the minimal distance this way and don't care about the overestimation, or should I divide it by 2? According to the ways tiles can move, one tile can for the same cost overcome twice as much(see moves 1 and 2).
I tried both versions. While the divided distance gives better final path cost to the achieved goal, it visits more nodes => takes more time than the not divided one. What is the proper way to compute it? Which one should I use?
It is not obvious to me what an admissible heuristic function for this problem looks like, so I won't commit to saying, "Use the divided by two function." But I will tell you that the naive function you came up with is not admissible, and therefore will not give you good performance. In order for A* to work properly, the heuristic used must be admissible; in order to be admissible, the heuristic must absolutely always give an optimistic estimate. This one doesn't, for exactly the reason you highlight in your example.
(Although now that I think about it, dividing by two does seem like a reasonable way to force admissibility. I'm just not going to commit to it.)
Your heuristic is not admissible, so your A* is not guaranteed to find the optimal answer every time. An admissible heuristic must never overestimate the cost.
A better heuristic than dividing your heuristic cost by 3, would be: instead of adding the distance D of each letter to its final position, add ceil(D/2). This way, a letter 1 or 2 away, gets a 1 value, 3 or 4 away, gets a 2 value, an so on.

Resources