Heaps vs. Binary Trees - How to implement? - arrays

when implementing a heap structure, we can store the data in an array such that the children of the node at position i are at position 2i and 2i+1.
my question is, why dont we use an array to represent binary search trees and instead we deal with pointers etc.?
thanks

Personally
Because using pointers its easier to
grow the data structure size
dynamically
I find It's easier to maintain bin
tree than a heap
The algorithms to balance, remove, insert elements in the tree will alter only pointers and not move then physically as in a vector.
and so on...

If the position of all children is statically precomputed like that, then the array essentially represents a completely full, completely balanced binary tree.
Not all binary trees in "real life" are completely full and perfectly balanced. If you should happen to have a few especially long branches, you'd have to make your whole array a lot larger to accomodate all nodes at the bottom-most level.
If an array-bound binary tree is mostly empty, most of the array space is wasted.
If only some of the tree's branches are deep enough to reach to the "bottom" of the array, there's also a lot of space being wasted.
If the tree (or just one branch) needs to grow "deeper" than the size of the array will allow, this would require "growing" the array, which is usually implemented as copying to a larger array. This is a time-expensive operation.
So: Using pointers allows us to grow the structure dynamically and flexibly. Representing a tree in an array is a nice academic exercise and works well for small and simple cases but often does not fulfill the demands of "real" computing.

Mainly because the recursive tree allows for very simple code. If you flatten the tree into an array, the code becomes really complex because you have to do a lot of bookkeeping which the recursive algorithm does for you.
Also, a tree of height N can have anything between N and 2^(N+1)-1 nodes (. Only the actual nodes will need memory. If you use an array, you must always allocate space for all nodes (even the empty ones) unless you use a sparse array (which would make the code even more complex). So while it is easy to keep a sparse tree of height 100 in memory, it would be problematic to find a computer which can allocate 20282409603651670423947251286008 bytes of RAM.

To insert an element into a heap, you can place it anywhere and swap it with its parent until the heap constraint is valid again. Swap-with-parent is an operation that keeps the binary tree structure of the heap intact. This means a heap of size N will be represented as an N-cell array, and you can add a new element in logarithmic time.
A binary search tree can be represented as an array of size N using the same representation structure as a heap (children 2n and 2n+1), but inserting an element this way is a lot harder, because unlike the heap constraint, the binary search tree constraint requires rotations to be performed to retrieve a balanced tree. So, either you do manage to keep an N-node tree in an N-cell array at a cost higher than logarithmic, or you waste space by keeping the tree in a larger array (if my memory serves, a red-back tree could waste as much as 50% of your array).
So, a binary search tree in an array is only interesting if the data inside is constant. And if it is, then you don't need the heap structure (children 2n and 2n+1) : you can just sort your array and use binary search.

As far as I know, we can use Array to represent binary search trees.
But it is more flexible to use pointers.

The array based implementation is useful if you need a heap that is used as a priority queue in graph algorithms. In that case, the elements in the heap are constant, you pop the top most element and insert new elements. Removing the top element (or min-element) requires some re-balancing to become a heap again, which can be done such that the array is reasonably balanced.
A reference for this is the algorithm by Goldberg and Tarjan about efficiently computing optimal network flow in directed graphs, iirc.

Heap data structure is a complete binary tree unlike BST. Hence, using arrays is not of much use for BST.

Related

Use of memory between an array and a linked list

In C, which is more efficient in terms of memory management, a linked list or an array?
For my program, I could use one or both of them. I would like to take this point into consideration before starting.
Both link list and array have good and bad sides.
Array
Accessing at a particular position take O(1) time, because memory initialized is consecutive for array. So if address of first position is A, then address of 5th element if A+4.
If you want to insert a number at some position it will take O(n) time. Because you have to shift every single numbers after that particular position and also increase size of array.
About searching an element. Considering the array is sorted. you can do a binary search and accessing each position is O(1). So you do the search in order of binary search. In case the array is not sorted you have to traverse the entire array so O(n) time.
Deletion its the exact opposite of insertion. You have to left shift all the numbers starting from the place where you deleted it. You might also need to recrete the array for memory efficiency. So O(n)
Memory must be contiguous, which can be a problem on old x86 machines
with 64k segments.
Freeing is a single operation.
LinkList
Accessing at a particular position take O(n) time, because you have to traverse the entire list to get to a particular position.
If you want to insert a number at some position and you have a pointer at that position already, it will take O(1) time to insert the new value.
About searching an element. No matter how the numbers are arranged you have to traverse the numbers from front to back one by one to find your particular number. So its always O(n)
about deletion its the exact opposite of insertion. If you know the position already by some pointer suppose the list was like this . p->q->r you want to delete q all you need is set next of p to r. and nothing else. So O(1) [Given you know pointer to p]
Memory is dispersed. With a naive implementation, that can be bad of cache coherency, and overall take can be high because the memory allocation system has overhead for each node. However careful programming can get round this problem.
Deletion requires a separate call for each node, however again careful programming can get round this problem.
So depending on what kind of problem you are solving you have to choose one of the two.
Linked list uses more memory, from both the linked list itself and inside the memory manager due to the fact you are allocating many individual blocks of memory.
That does not mean it is less efficient at all, depending on what you are doing.
While a linked list uses more memory, adding or removing elements from it is very efficient, as it doesn't require moving data around at all, while resizing a dynamic array means you have to allocate a whole new area in memory to fit the new and modified array with items added/removed. You can also sort a linked list without moving it's data.
On the other hand, arrays can be substantially faster to iterate due to caching, path prediction etc, as the data is placed sequentially in memory.
Which one is better for you will really depend on the application.

Can I represent a red black tree as an array?

Is it worth representing a red black tree as an array to eliminate the memory overhead. Or will the array take up more memory since the array will have empty slots?
It will have both positive and negative sides. This answer is applicable for C [since you mentioned this is what you will use]
Positive sides
Lets assume you have created an array as pool of objects that you will use for red-black tree. Deleting an element or initializing a new element when the position is found will be a little fast, because you probably will use the memory pool you have created yourself.
Negative sides
Yes the array will most probably end up taking more memory since the array will have empty slots sometimes.
You have to be sure about the MAX size of the red-black trees in this case. So there is a limitation of size.
You are not using the benefit of sequential memory space, so that might be a waste of resource.
Yes, you can represent red-black tree as an array, but it's not worth it.
Maximum height of red-black tree is 2*log2(n+1), where n is number of entries. Number of entries in array representation on each level is 2**n, where n is level. So to store 1_000 entries you'd have to allocate array of 1_048_576 entries. To store 1_000_000 entries you'd have to allocate array of 1_099_511_627_776 entries.
It's not worth it.
Red-back tree (and most data structures, really) doesn't care about which storage facility is used, that means you can use array or even HashTable/Map to store the tree node, the array index or map key is your new "pointer". You can even put the tree on the disk as a file and use file offset as node index if you would like to (though, in this case, you should use B-Tree instead).
The main problem is increased complexity as now you have to manage the storage manually (opposed to letting the OS and/or language runtime do it for you). Sometimes you want to scale the array up so you can store more items, sometimes you want to scale it down (vacuum) to free up unused space. These operations can be costly on their own.
Memory usage wise, storage facility does not change how many nodes on the tree. If you have 2,000 nodes on your old school pointerer tree (tree height=10), you'll still have 2,000 nodes on your fancy arrayilized tree (tree height is still 10). However, redundant space may exists in between vacuum operations.

Contiguous mutable ordered list

This may or may not exist, but I'm looking for a way of storing a sorted list of integers that's contiguous in memory, reasonably compact, and allows for O(log n) amortized inserts and deletes. The various self-balancing binary search trees seem to have the insertion and deletion properties I want, but are implemented with pointers all over the place, which doesn't fit my use case very well. Any ideas?
(The implementation language will almost definitely be C, if it matters. If there are existing implementations of whatever you propose, all the better, but I'm fine with writing my own.)
A binary search tree can be implemented using an array.
Accroding to the fact that "reads happen more then writes", you can try to use dynamicaly-resized array + binary search on top of it. So, you will get O(log n) time access to elements (read), but you have to pay O(n) for insert/delete (O(log n) - search the proper place in array + O(n) - shift elemets to right or left). It is kind of slow, but this is the way how to work it out. Try to think about it.
One thing you might consider is a log-structured merge tree. You can store the levels contiguously one after the other, if you keep some metadata about where each level is located. You'd want an array where you can push and pop from the end (like a C++ std::vector).

Inserting a number into a sorted array!

I would like to write a piece of code for inserting a number into a sorted array at the appropriate position (i.e. the array should still remain sorted after insertion)
My data structure doesn't allow duplicates.
I am planning to do something like this:
Find the right index where I should be putting this element using binary search
Create space for this element, by moving all the elements from that index down.
Put this element there.
Is there any other better way?
If you really have an array and not a better data structure, that's optimal. If you're flexible on the implementation, take a look at AA Trees - They're rather fast and easy to implement. Obviously, takes more space than array, and it's not worth it if the number of elements is not big enough to notice the slowness of the blit as compared to pointer magic.
Does the data have to be sorted completely all the time?
If it is not, if it is only necessary to access the smallest or highest element quickly, Binary Heap gives constant access time and logn addition and deletion time.
More over it can satisfy your condition that the memory should be consecutive, since you can implement a BinaryHeap on top of an array (I.e; array[2n+1] left child, array[2n+2] right child).
A heap based implementation of a tree would be more efficient if you are inserting a lot of elements - log n for both locating/removing and inserting operations.

Efficient data structure for fast random access, search, insertion and deletion

I'm looking for a data structure (or structures) that would allow me keep me an ordered list of integers, no duplicates, with indexes and values in the same range.
I need four main operations to be efficient, in rough order of importance:
taking the value from a given index
finding the index of a given value
inserting a value at a given index
deleting a value at a given index
Using an array I have 1 at O(1), but 2 is O(N) and insertion and deletions are expensive (O(N) as well, I believe).
A Linked List has O(1) insertion and deletion (once you have the node), but 1 and 2 are O(N) thus negating the gains.
I tried keeping two arrays a[index]=value and b[value]=index, which turn 1 and 2 into O(1) but turn 3 and 4 into even more costly operations.
Is there a data structure better suited for this?
I would use a red-black tree to map keys to values. This gives you O(log(n)) for 1, 3, 4. It also maintains the keys in sorted order.
For 2, I would use a hash table to map values to keys, which gives you O(1) performance. It also adds O(1) overhead for keeping the hash table updated when adding and deleting keys in the red-black tree.
How about using a sorted array with binary search?
Insertion and deletion is slow. but given the fact that the data are plain integers could be optimized with calls to memcpy() if you are using C or C++. If you know the maximum size of the array, you can even avoid any memory allocations during the usage of the array, as you can preallocate it to the maximum size.
The "best" approach depends on how many items you need to store and how often you will need to insert/delete compared to finding. If you rarely insert or delete a sorted array with O(1) access to the values is certainly better, but if you insert and delete things frequently a binary tree can be better than the array. For a small enough n the array most likely beats the tree in any case.
If storage size is of concern, the array is better than the trees, too. Trees also need to allocate memory for every item they store and the overhead of the memory allocation can be significant as you only store small values (integers).
You may want to profile what is faster, the copying of the integers if you insert/delete from the sorted array or the tree with it's memory (de)allocations.
I don't know what language you're using, but if it's Java you can leverage LinkedHashMap or a similar Collection. It's got all of the benefits of a List and a Map, provides constant time for most operations, and has the memory footprint of an elephant. :)
If you're not using Java, the idea of a LinkedHashMap is probably still suitable for a usable data structure for your problem.
Use a vector for the array access.
Use a map as a search index to the subscript into the vector.
given a subscript fetch the value from the vector O(1)
given a key, use the map to find the subscript of the value. O(lnN)
insert a value, push back on the vector O(1) amortized, insert the subscript into
the map O(lnN)
delete a value, delete from the map O(lnN)
Howabout a Treemap? log(n) for the operations described.
I like balanced binary trees a lot. They are sometimes slower than hash tables or other structures, but they are much more predictable; they are generally O(log n) for all operations. I would suggest using a Red-black tree or an AVL tree.
How to achieve 2 with RB-trees? We can make them count their children with every insert/delete operations. This doesn't make these operationis last significantly longer. Then getting down the tree to find the i-th element is possible in log n time. But I see no implementation of this method in java nor stl.
If you're working in .NET, then according to the MS docs http://msdn.microsoft.com/en-us/library/f7fta44c.aspx
SortedDictionary and SortedList both have O(log n) for retrieval
SortedDictionary has O(log n) for insert and delete operations, whereas SortedList has O(n).
The two differ by memory usage and speed of insertion/removal. SortedList uses less memory than SortedDictionary. If the SortedList is populated all at once from sorted data, it's faster than SortedDictionary. So it depends on the situation as to which is really the best for you.
Also, your argument for the Linked List is not really valid as it might be O(1) for the insert, but you have to traverse the list to find the insertion point, so it's really not.

Resources