Wouldn't AVL inserts be O(logn) space, because you need logn stack frames to do inserts? AVL tree itself is O(n) space and time to insert is O(logn)
Due to the balancing property, the insertion, deletion and search operations take O(logn) in both the average and the worst cases. Therefore, AVL trees give us an edge over Binary Search Trees which have an O(n) time complexity in the worst case scenario.
The space complexity of an AVL tree is O(n) in both the average and the worst case.
Related
Searching a database B-Tree index can be performed in O(log n) time.
For a B-Tree index, what is the base of log?
I know that in Big-O notation, the base does not matter. Regardless I'm curious what the base is for B-Tree indexes.
For example, a binary search on a sorted list has a base of 2 in O(log n). This is because we can discard half the list on each comparison.
Likewise, a binary search on a balanced binary tree is base 2 for the same reason.
The base of the log function really doesn't matter.
By changing the base, you change only a constant multiplication factor to apply to the speed of the processor on which the code runs.
That said, B-Tree indexes in DBs have branching factors (the number of children per node) that depend inversely with the quantity of bytes needed to store the value being indexed, specifically varying with (I/O page size)/(entry byte size).
For postgres, the branching factor can number in the low thousands for small sized entries. For MySql, it might be about 50. In general, database btree have high branching factors to minimize disk page reads, which are in the order of 1000 times slower than the time it takes to process the page.
The number of nodes traversed varies with O(logbn) where b is the branching factor and thus the base of the log.
I heard some statements like, consider the height of the AVL tree and the maximum keys that an AVL tree node can contain, the search of AVL tree will be time-consuming because of the disk io.
However, imagine that an index file contains the whole AVL tree structure, and then the size of the index file is less than a fan size, we can just read the whole AVL tree in only once disk io.
It seems like using AVL tree does not bring about extra disk io, how do you explain B tree is better?
Databases use balanced binary trees(plus) avl is only a special case of these balanced trees, so there is no need for it
we can just read the whole AVL tree in only once disk io
Yes, it could work like that. Essentially, the whole data structure would be brought into memory. IO would no longer be a concern.
Some databases use this strategy. For example, SQL Server In-Memory "Hekaton" does this and delivers ~100x the normal throughput for OLTP.
Hekaton uses two index data structues: hash tables and trees. I think the trees are called cw-trees and are similar to b-trees.
For general purpose database workloads it is very desirable to not need everything in memory. B-trees are a great design tradeoff in those cases.
Its coz B-Trees usually have larger number of keys in single node and hence reducing the depth of the search, in record indexing the link traversal time is longer if the depth is more, hence for cache locality and making the tree wider than deeper, multiple keys are stored in array of a node which improves cache performance and quick lookup comparatively.
For the purpose of fast searching, insertion and deletion the nodes in index we could use Red-blue tree, but B-plus tree is employed instead.
Why ?
if uniform cost search is optimal why we need depth A* search?
I read that uniform cost search is optimal and A* search is also optimal. If its the case then why do we even consider A* search
"optimal" only means that both algorithms are guaranteed to eventually find a correct and optimal solution if one exists. Typically, A* will be significantly more efficient from a computational point of view (takes less processing time before finding a solution)
What is difference between amortized and average complexity?
Also what is doubling and incremental strategy while implementing stack using linked list and arrays?
Typically, average-case complexity and amortized complexity refer to different concepts. Average-case complexity is usually used when describing the runtime of a randomized algorithm or data structure in the average case. For example, we can talk about the average-case runtime of randomized quicksort (where the pivot is chosen randomly), or the average-case runtime of a skiplist.
Amortized complexity, on the other hand, usually refers to ratio of the total work done by some series of operations divided by the total number of operations performed. For example, in a dynamic array that doubles its size whenever more space is needed, each individual append operation might do O(n) work copying the array and transferring elements over. However, in doing so, it makes the next n operations complete in time O(1) because space is guaranteed to be available. Consequently, any n appends takes time O(n) total, so the amortized cost of each operation is O(1); this is the ratio of the total work (O(n)) to the number of operations (n).
The two concepts may seem similar, but they represent fundamentally different concepts. Average complexity refers to the amount of work on average given that there is an underlying probability distribution of runtimes. Amortized complexity refers to the average amount of work done by a series of operations based on the fact that certain operations are "expensive" but can be paid for by "cheaper" operations. This is why some data structures (for example, chained hash tables with rehashing) have operations that take O(1) amortized time in the average case - on average, any sequence of n operations on the data structure will take O(n) time.
As to your next question - doubling versus incremental growth - the doubling strategy ensures O(1) amortized complexity for push operations while the incremental growth strategy does not. Intuitively, if you double the size of the array when you need more space, then after doing work to copy n elements to an array of size 2n, the next n pushes will be "cheap" because space will be available. Consequently, any series of n operations take O(n) time and all operations take amortized O(1) time. With some trickier math, you can show that growing by any factor greater than one will lead to O(1) amortized time.
Incremental growth, on the other hand, doesn't guarantee this. Think of it this way - if you grow by adding k elements to the array and have an array of size 10000k, then when growing you'll do 10000k work to make the next k operations fast. After doing those k operations "cheaply," you have to do 10001k work to grow the array again. You can show that this will take Θ(n2) work over a series of n pushes, which is why it's not recommended.
Finally, you asked about arrays versus linked lists. When using linked lists, you don't need to worry about doubling or incremental growth because you can efficiently (in O(1)) allocate new linked list cells and chain them on. They're worst-case efficient structures rather than amortized efficient structures, though the constant factors are higher.
Hope this helps!