why is the size of an Avl tree O(n)? - avl-tree

The AVL tree only has O(logn) for all his operation since its a balanced tree. The height is O(logn) as well so how come the size of the AVL tree itself is O(n) can someone explain that to me? I know that you have to to calculate left subtree+1(for root)+ right subtree to get the size of the whole tree.Howevery the operation to get for exmaple the size of the right subtree is log(n) and logn + logn+1 doesnt equal O(n)

When we talk about time complexity or space complexity, we mean the rate at which the time or space requirements change with respect to the size of input. Eg. when we say O(1), we mean regardless of the size of input, the time (in case of time complexity) or space (in case of space complexity) is constant. So O(1) does not mean 1 second or 1 minute. It just means constant with respect to input size. If you plot the execution time against different input sizes, you'd get a horizontal line. Similar is the case for O(n) or O(log n).
Now with this understanding, let's talk about AVL tree. AVL tree is a balanced binary search tree. Therefore the average time complexity to search for a node in the tree is O(log n). Note that to search a node, you don't visit every single node of the tree (unlike a LinkedList). If you had to visit every single node, you'd have said the time complexity is O(n). In case of AVL tree, every time you find a mismatch, you discard one half of the tree and move on to search in the remaining half.
In the worst case you'd make one comparision at each level of the tree i.e. equal to the hight of the tree, so the search time complexity is O(log n). Size of left tree is not O(log n).
Talking about size, you do need space to store each node. if you have to store 1 node, you'd need 1 unit space, for 2 nodes, 2 units, for 3 nodes, 3 units and so on. This unit could be anything 10 bytes, 1 KB, 5 KB anything. Thr point is if you plot the space requirement of the input in computer memory against the number of trees, all you get is a linear graph starting at zero. That's O(n).
Too further clarify, while computing the time or space complexity of an algorithm, if the complexity comes as O(1 + log n + 4n + 2^n + 100), we call it O(2^n) i.e. we take the largest value because we are not calculating the absolute value, we are calculating the rate of change with respect to the size of input and thus the largest value is what matters.
If you talk about the time complexity of the algorithm to calculate the size of the tree, you need yo visit every node in the tree. Since the total number of nodes is n, it will be O(n).

To calculate the size of a tree you will have to traverse each node once present in the tree.Hence if there are n nodes in a tree traversing each node once will eventually lead to the time complexity of o(n).

Related

Calculate the maximum number of overlapping intervals

Calculate the maximum number of overlapping intervals with some conditions about operations:
Insert an interval: O(logN)
Remove an interval: O(logN)
Calculate(the maximum number of overlapping intervals): O(1)
I think this problem can be solved by using avl tree (suitable for Insert and Remove operations) but I dont know how to design avl tree to satisfy requirement of Calculate operation.
Edit: Example: [start, end)
Input: [1,2),[3,4),[1,6),[3,6),[6,7)
Output: 3
You need to use a Red Black tree and implement a Point of Maximum Overlap method.
The pseudo-code is in this link. Point of Maximum Overlap

Understanding the time complexities of the R-Tree?

A quick search on Wikipedia reveals that the R-Tree's worst case performance for a search is undefined and the average case is O(logMn).
I suppose the worst case is this way because we can't know how many times a search has to be performed in this structure until we find the item, indeed, Guttman does say that "more than one subtree under a node visited may need to be searched, hence it is not possible to guarantee good worst-case performance." Can we express the worst case in terms of the number of searches that have to be performed?
Regarding the average case, I do not understand how this is calculated. And what about the best case?
I'd say worst case is O(n + logM n): Imagine you store lots of overlapping rectangles in the R-Tree. Now store a single small rectangle that is located in the area where all other rectangles overlap. A query for that rectangle will have to traverse all subtrees: nodes -> O(logM n) and entries -> O(n).
Best case is O(log n). An R-Tree has the same depth in every branch, and data is only stored in leaf-nodes, so you will always have to traverse O(logM n) nodes and all entries in that node, so it should be O(M * logM n).
I'm not sure you can really calculate average O(logM n). But if you have some average normally distributed data (whatever that means) with few overlaps (whatever few means) than your average query (whatever average is) should not have to traverse more than a few (1 or 2?) subtrees. I'd actually say the average is O(M * logM n), because of the traversal of M entries in a node.

BST of minimal height

I'm trying to solve the following problem: "Given a sorted (increasing order) array with unique integer elements, write an algorithm to create a BST with minimal height."
The given answer takes the root node to be the middle of the array. While doing this makes sense to me intuitively, I'm trying to prove, rigorously, that it's always best to make the root node the middle of the array.
The justification given in the book is: "To create a tree of minimal height, we need to match the number of nodes in the left subtree to the number of nodes in the right subtree as much as possible. This means that we want the root node to be the middle of the array, since this would mean that half the elements would be less than the root and half would be greater."
I'd like to ask:
Why would any tree of minimal height be one where the number of nodes in the left subtree be as equal as possible to the number of nodes in the right subtree? (Or, do you have any other way to prove that it's best to make the root node the middle of the array?)
Is a tree with minimal height the same as a tree that's balanced? From a previous question on SO, that's the impression I got, (Visualizing a balanced tree) but I'm confused because the book specifically states "BST with minimal height" and never "balanced BST".
Thanks.
Source: Cracking the Coding Interview
The way I like to think about it, if you balance a tree using tree rotations (zig-zig and zig-zag rotations) you will eventually reach a state in which the left and right subtree differ by at most height of one. It is not always the case that a balanced tree must have the same number of children on the right and the left; however, if you have that invariant(same # of children on each side), you can reach a tree that is balanced using tree rotations)
Balance is defined arbitrarily. AVL trees define it in such as way that no subtree of the tree has a children whose heights differ by more than one. Other trees define balance in different ways, so they are not the same distinction. They are inherently related yet not exactly the same. That being said, a tree of minimal height will always be balanced under any definition since balancing exists to maintain a O(log(n)) lookup time of the BST.
If I missed anything or said anything wrong, feel free to edit/correct me.
Hope this helps
Why would any tree of minimal height be one where the number of nodes
in the left subtree be as equal as possible to the number of nodes in
the right subtree?
There can be a scenario where in minimal height tree which are ofcourse balanced can have different number of node count on left and right hand side. BST worst case traversal is O(n) in case if it is sorted and in minimal height trees the complexity for worst case is O(log n).
*
/ \
* *
/
*
Here you can clearly see that left node count and right nodes are not equal though it is a minimal height tree.
Is a tree with minimal height the same as a tree that's balanced? From a previous question on SO, that's the impression I got, (Visualizing a balanced tree) but I'm confused because the book specifically states "BST with minimal height" and never "balanced BST".
Minimal height tree is balanced one, for more details you can take a look on AVL trees which are also known as height-balanced trees. While making BST a height balanced tree you have to perform rotations (LR, RR, LL, RL).

Minimum Heigth AVL-Tree

I was just reading this (http://condor.depaul.edu/ntomuro/courses/417/notes/lecture1.html) paper which proves the minimum number of nodes in an AVL-Tree.
Yet, I do not understand the meaning of the result, since O(log n) is not referring to the number of nodes at all. How can this be a prove?
I do however understand the first steps and how the iterations are simplified.
But after the 4th step I am failing to understand what he is exactly doing (even though I can vaguely imagine).
Could anybody please explain to me, what the last few lines are proving and how he is simplifying expressions at the end of part 1?
Thanks
O(logn) does refer to nodes. "n" represents the the number of nodes. You can think about it intuitively by realizing that the number of nodes on each subsequent level doubles. Because it's an AVL tree, the previous level has to be full before pushing nodes to the next level. This restricts the height of the tree to logn because of the fact that each layer doubles the number of nodes. In other words, the number of nodes can be written as nodes=2^height - 1. When you solve for the height and round you get logn.

What is the difference between Greedy-Search and Uniform-Cost-Search?

When searching in a tree, my understanding of uniform cost search is that for a given node A, having child nodes B,C,D with associated costs of (10, 5, 7), my algorithm will choose C, as it has a lower cost. After expanding C, I see nodes E, F, G with costs of (40, 50, 60). It will choose 40, as it has the minimum value from both 3.
Now, isn't it just the same as doing a Greedy-Search, where you always choose what seems to be the best action?
Also, when defining costs from going from certain nodes to others, should we consider the whole cost from the beginning of the tree to the current node, or just the cost itself from going from node n to node n'?
Thanks
Nope. Your understanding isn't quite right.
The next node to be visited in case of uniform-cost-search would be D, as that has the lowest total cost from the root (7, as opposed to 40+5=45).
Greedy Search doesn't go back up the tree - it picks the lowest value and commits to that. Uniform-Cost will pick the lowest total cost from the entire tree.
In a uniform cost search you always consider all unvisited nodes you have seen so far, not just those that are connected to the node you looked at. So in your example, after choosing C, you would find that visiting G has a total cost of 40 + 5 = 45 which is higher than the cost of starting again from the root and visiting D, which has cost 7. So you would visit D next.
The difference between them is that the Greedy picks the node with the lowest heuristic value while the UCS picks the node with the lowest action cost. Consider the following graph:
If you run both algorithms, you'll get:
UCS
Picks: S (cost 0), B (cost 1), A (cost 2), D (cost 3), C (cost 5), G (cost 7)
Answer: S->A->D->G
Greedy:
*supposing it chooses the A instead of B; A and B have the same heuristic value
Picks: S , A (h = 3), C (h = 1), G (h = 0)
Answer: S->A->C->G
So, it's important to differentiate the action cost to get to the node from the heuristic value, which is a piece of information that is added to the node, based on the understanding of the problem definition.
Greedy search (for most of this answer, think of greedy best-first search when I say greedy search) is an informed search algorithm, which means the function that is evaluated to choose which node to expand has the form of f(n) = h(n), where h is the heuristic function for a given node n that returns the estimated value from this node n to a goal state. If you're trying to travel to a place, one example of a heuristic function is one that returns the estimated distance from node n to your destination.
Uniform-cost search, on the other hand, is an uninformed search algorithm, also known as a blind search strategy. This means that the value of the function f for a given node n, f(n), for uninformed search algorithms, takes into consideration g(n), the total action cost from the root node to the node n, that is, the path cost. It doesn't have any information about the problem apart from the problem description, so that's all it can know. You don't have any information that can help you decide how close one node is to a goal state, only to the root node. You can watch the nodes expanding here (Animation of the Uniform Cost Algorithm) and see how the cost from node n to the root is used to choose which nodes to expand.
Greedy search, just like any greedy algorithm, takes locally optimal solutions and uses a function that returns an estimated value from a given node n to the goal state. You can watch the nodes expanding here (Greedy Best First Search | Quick Explanation with Visualization) and see how the return of the heuristic function from node n to the goal state is used to choose which nodes to expand.
By the way, sometimes, the path chosen by greedy search is not a global optimum. In the example in the video, for example, node A is never expanded because there are always nodes with smaller values of h(n). But what if A has such a high value, and the values for the next nodes are very small and therefore a global optimum? That can happen. A bad heuristic function can cause this. Getting stuck in a loop is also possible. A*, which is also a greedy search algorithm, fixes this by making use of both the path cost (which implies knowing nodes already visited) and a heuristic function, that is, f(n) = g(n) + h(n).
It's possible that to this point, it's still not clear to you HOW uniform-cost knows there is another path that looks better locally but not globally. It should become clear after telling you that if all paths have the same cost, uniform cost search is the same thing as the breadth-first search (BFS). It would expand all nodes just like BFS.
UCS cares about history,
Greedy does not.
In your example, after expanding C, the next node would be D according to the UCS. Because, it's our history. UCS can't forget the past and remember that the total cost of D is much lower than E.
Don't be Greedy. Be UCS and if going back is really a better choice, don't afraid of going back!

Resources