Flatten a Tree into an Array - c

I am looking for the best way to place a tree into an array
The idea is to follow this principle : Array Implementation of Trees
but I'am stuck on how to know what nodes are the children and what nodes are at the same level, because I'am not using a binary tree.
I might have to store ASCII but I can't simply allow arrays of 256 pointers !
Any idea would be welcome.
The purpose of this, is to send an array (tree) to my GPU, instead of using structures.

Well, here is my idea of converting tree into an array.
Take an array of size MAX_VAL, which is the total number of nodes in the tree. The type of the array should be same as that of a node but with one extra field. Its the index value for its parent. You store each node in this way. Store the root node at first position. Say 1. Now the child nodes of this node are stored subsequently with the extra field storing 1 (since this was where root was stored).
Apply this procedure on all nodes and you are done. You can get back the tree, by a simple recursive call on each node.
Hope this helps. :) :)

Ahnentafel lists are very big if not near-perfectly balanced. My guess is your tree isn't going to be balanced, so the benefit of implicit parent/child pointers will outweigh the cost. I'm never seen a non-binary Ahnentafel list, but I assume it's possible (were you asking for the implicit equations?).
Could you keep a sorted list of child pointers for each node (ASCII character + pointer/index)? In this case it might be best, as others suggest, to construct the tree using pointers and allow the children to grow. Then pack all the nodes into a list: work out an order to place the nodes, use prefix sums for their offsets into the array, store the position indices on each node and finally copy the children lists into the array (replacing the children pointers with list indices can be done by following the pointers and querying the index from the previous step).
Traversing to a child in CUDA won't be constant time, but since the order is know you can use a binary search to speed things up.

Related

Recursive Tree Traversal, returning a variable for each leaf of the tree

I have a huffman binary tree. I need to traverse down the tree till I get to each leaf, and for each leaf, I need to "save" a member of that leaf node, and keep all those variables in an array outside of the tree.
Let's say I have this tree:
3\65
6\-1
3\70
9\-1
2\66
3\-1
1\67
16\-1
7\68
Each leaf (7/68, 1/67, 2/66, 7/70, 3/65) has a member called "encoding", which is a string.
(i.e. each node has a node->left, node->right, and node->encoding)
Let's say the encodings are as follows:
7/68 got an encoding of 0
1/67 got an encoding of 100
2/66 got an encoding of 101
3/70 got an encoding of 110
3/65 got an encoding of 111
I can traverse the tree and print off these values relatively easily, but what I need to do is save these strings in an array outside of the tree.
I can't think of how to save these outside of the tree.
"save these strings in an array outside of the tree."
Comment: Are you sure that you have to store the strings? It is much cleaner if you just store the integers and create the strings after the recursion is finished.
OK, either way (and without giving the sourcecode away) you just:
create a large enough(*) array before you start the recursion
create a pointer which will be used for writing into different parts of the array, initialize that pointer to the start of the array.
Give a pointer to that pointer into your recursion as a new/additional function-argument. Every time a leaf is reached in the recursion you
write into the pointer what you found at the leaf
increase the pointer (you can do that because you have the pointer to the pointer
As far as i remember, in huffman code implementation you don't have to use an external array. The easiest way to implement it is adding another pointer ('next') to your struct.
Each element is linked twice. Once as a member of the tree, and once as a member of a linked list.
This way there is no new structure required.

Is It possible to apply binary search to link list to find an element?

I have read a question ,is it possible to apply binary search on a link list?
Since link list doesn't allow random access, this looks practically impossible.
Any one has any way to do it?
The main issue, besides that you have no constant-time access to the linked list elements, is that you have no information about the length of the list. In this case, you simply have no way to "cut" the list in 2 halves.
If you have at least a bound on the linked list length, the problem is solvable in O(log n), with a skip list approach, indeed. Otherwise nothing would save you from reading the whole list, thus O(n).
So, assuming that the linked list is sorted, and you know its length (or at least the maximum length), yes it's possible to implement some sort of binary search on a linked list. This is not often the case, though.
With a plain linked list, you cannot do binary search directly, since random access on linked lists is O(n).
If you need fast search, tree-like data structures (R/B tree, trie, heap, etc.) offer a lot of the advantages of a linked list (relatively cheap random insertion / deletion), while being very efficient at searching.
Not with a classic linked list, for the reasons you state.
But there is a structure that does allow a form of binary search that is derived from linked lists: Skip lists.
(This is not trivial to implement.)
I have once implemented something like that for a singly-linked list containing sorted keys. I needed to find several keys in it (knowing only one of them at the beginning, the rest were dependent on it) and I wanted to avoid traversing the list again and again. And I didn't know the list length.
So, I ended up doing this... I created 256 pointers to point to the list elements and made them point to the first 256 list elements. As soon as all 256 were used and a 257th was needed, I dropped the odd-numbered pointer values (1,3,5,etc), compacted the even-numbered (0,2,4,etc) into the first 128 pointers and continued assigning the remaining half (128) of pointers to the rest, this time skipping every other list element. This process repeated until the end of the list, at which point those pointers were pointing to elements equally spaced throughout the entire list. I then could do a simple binary search using those 256 (or fewer) pointers to shorten the linear list search to 1/256th (or 1/whatever-th) of the original list length.
This is not very fancy or powerful, but sometimes can be a sufficient perf improvement with minor code changes.
You can do a binary search on a linked list. As you say, you don't have random access, but you can still find the element with a specific index, starting either from the start of the list or from some other position. So a straightforward binary search is possible, but slow compared with binary search of an array.
If you had a list where comparisons were much, much more expensive than simple list traversal, then a binary search would be cheaper than a linear search for suitably-sized lists. The linear search requires O(n) comparisons and O(n) node traversals, whereas the binary search requires O(log n) comparisons and O(n log n) node traversals. I'm not sure if that O(n log n) bound is tight, the others are.
According to me, there is no way to search the Linked list in binary search manner. In binary search, we usually find out 'mid' value of array which is impossible with lists, since lists are the structure where we have to strictly use the 'start' (Node pointing to very 1st node of list) to traverse to any of our list elements.
And in array, we can go to specific element using INDEX, here no question of Index (Due to Random Access unavailability in linked lists).
So, I think that binary search is not possible with linked list in usual practices.
for applying binary search on linked list, you can maintain a variable count which should iterate through the linked list and return the total number of nodes. Also you would need to keep a var of type int say INDEX in your node class which should increment upon creation of each new node. after which it will be easy for you to divide the linked list in 2 halves and apply binary search over it.

Sorting a linked list and returning to original unsorted order

I have an unsorted linked list. I need to sort it by a certain field then return the linked list to its previous unsorted condition. How can I do this without making a copy of the list?
When you say "return the linked list to its previous unsorted condition", do you mean the list needs to be placed into a random order or to the exact same order that you started with?
In any case, don't forget that a list can be linked into more than one list at a time. If you have two sets of "next"/"previous" pointers, then you can effectively have the same set of items sorted two different ways at the same time.
To do this you will need to either sort and then restore the list or create and sort references to the list.
To sort the list directly Merge Sort is most likely the best thing you could use for the initial sort, but returning them to their original state is tricky unless you either record your moves so you can reverse them or store their original position and resort them using that as the key.
If you would rather sort the references to the list instead you will need to allocate enough space to hold pointers to each node and sort that. If you use a flat array to store the pointers then you could use the standard C qsort to do this.
If this is an assignment and you must implement your own sort then if you don't already know the length of the list you could take advantage of having to traverse it to count its length to also choose a good initial pivot point for quicksort or if you choose not to use quicksort you can let your imagination go wild with all kinds of optimizations.
Taking your points in reverse order, to support returning to original order, you can add an extra int field to each list node. Set those values based on the original order, and when you need to return it to the original order, just sort on that field.
As far as the sorting in general goes, you probably want to use something like a merge-sort or possibly a Quick-sort.
You can make that data structure somewhat like this.
struct Elem {
Elem* _next;
Elem* _nextSorted;
...
}
Then you can use any algo for sorting the list (maybe merge sort)
If you want to keep your linked list untouched, you should add information to store the ordered list of elements.
To do so, you can either create a new linked list where each element points to one element of your original linked list. Or you can add one more field in the element of your list like sorted_next.
In any case, you should use a sequential algorithm like mergesort to sort a linked list.
Here is a C source code of mergesort for linked lists that you could reuse for your project.
I guess most of the answers have already covered the usual techniques one could use. As far as figuring out the solution to the problem goes, a trick is to look at the problem and think if the human mind can do it.
Figuring out the original random sequence from a sorted sequence is theoretically impossible unless you use some other means. This can be done by
a)modifying the linked list structure (as mentioned above, you simply add a pointer for the sorted sequence separately). This would work and maybe technically you are not creating a separate linked list, but it is as good as a new linked list - one made of pointers.
b)the other way is to log each transition of the sorting algo in a stack. This allows you to not be dependent on the sorting algorithm you use. For example when say node 1 is shifted to the 3rd position, you could have something like 1:3 pushed to the stack. The notation, of course, may vary. Once you push all the transitions, you can simply pop the stack to give take it back to the original pattern / any point in between. This is more like
If you're interested in learning more about the design for loggers, I suggest you read about the Command Pattern

Inserting a number into a sorted array!

I would like to write a piece of code for inserting a number into a sorted array at the appropriate position (i.e. the array should still remain sorted after insertion)
My data structure doesn't allow duplicates.
I am planning to do something like this:
Find the right index where I should be putting this element using binary search
Create space for this element, by moving all the elements from that index down.
Put this element there.
Is there any other better way?
If you really have an array and not a better data structure, that's optimal. If you're flexible on the implementation, take a look at AA Trees - They're rather fast and easy to implement. Obviously, takes more space than array, and it's not worth it if the number of elements is not big enough to notice the slowness of the blit as compared to pointer magic.
Does the data have to be sorted completely all the time?
If it is not, if it is only necessary to access the smallest or highest element quickly, Binary Heap gives constant access time and logn addition and deletion time.
More over it can satisfy your condition that the memory should be consecutive, since you can implement a BinaryHeap on top of an array (I.e; array[2n+1] left child, array[2n+2] right child).
A heap based implementation of a tree would be more efficient if you are inserting a lot of elements - log n for both locating/removing and inserting operations.

Heaps vs. Binary Trees - How to implement?

when implementing a heap structure, we can store the data in an array such that the children of the node at position i are at position 2i and 2i+1.
my question is, why dont we use an array to represent binary search trees and instead we deal with pointers etc.?
thanks
Personally
Because using pointers its easier to
grow the data structure size
dynamically
I find It's easier to maintain bin
tree than a heap
The algorithms to balance, remove, insert elements in the tree will alter only pointers and not move then physically as in a vector.
and so on...
If the position of all children is statically precomputed like that, then the array essentially represents a completely full, completely balanced binary tree.
Not all binary trees in "real life" are completely full and perfectly balanced. If you should happen to have a few especially long branches, you'd have to make your whole array a lot larger to accomodate all nodes at the bottom-most level.
If an array-bound binary tree is mostly empty, most of the array space is wasted.
If only some of the tree's branches are deep enough to reach to the "bottom" of the array, there's also a lot of space being wasted.
If the tree (or just one branch) needs to grow "deeper" than the size of the array will allow, this would require "growing" the array, which is usually implemented as copying to a larger array. This is a time-expensive operation.
So: Using pointers allows us to grow the structure dynamically and flexibly. Representing a tree in an array is a nice academic exercise and works well for small and simple cases but often does not fulfill the demands of "real" computing.
Mainly because the recursive tree allows for very simple code. If you flatten the tree into an array, the code becomes really complex because you have to do a lot of bookkeeping which the recursive algorithm does for you.
Also, a tree of height N can have anything between N and 2^(N+1)-1 nodes (. Only the actual nodes will need memory. If you use an array, you must always allocate space for all nodes (even the empty ones) unless you use a sparse array (which would make the code even more complex). So while it is easy to keep a sparse tree of height 100 in memory, it would be problematic to find a computer which can allocate 20282409603651670423947251286008 bytes of RAM.
To insert an element into a heap, you can place it anywhere and swap it with its parent until the heap constraint is valid again. Swap-with-parent is an operation that keeps the binary tree structure of the heap intact. This means a heap of size N will be represented as an N-cell array, and you can add a new element in logarithmic time.
A binary search tree can be represented as an array of size N using the same representation structure as a heap (children 2n and 2n+1), but inserting an element this way is a lot harder, because unlike the heap constraint, the binary search tree constraint requires rotations to be performed to retrieve a balanced tree. So, either you do manage to keep an N-node tree in an N-cell array at a cost higher than logarithmic, or you waste space by keeping the tree in a larger array (if my memory serves, a red-back tree could waste as much as 50% of your array).
So, a binary search tree in an array is only interesting if the data inside is constant. And if it is, then you don't need the heap structure (children 2n and 2n+1) : you can just sort your array and use binary search.
As far as I know, we can use Array to represent binary search trees.
But it is more flexible to use pointers.
The array based implementation is useful if you need a heap that is used as a priority queue in graph algorithms. In that case, the elements in the heap are constant, you pop the top most element and insert new elements. Removing the top element (or min-element) requires some re-balancing to become a heap again, which can be done such that the array is reasonably balanced.
A reference for this is the algorithm by Goldberg and Tarjan about efficiently computing optimal network flow in directed graphs, iirc.
Heap data structure is a complete binary tree unlike BST. Hence, using arrays is not of much use for BST.

Resources