Representation of Tree Data Structures

Representation of Tree Data Structures - c

I've studied that Data Structures can be classified into Linear (arrays, stacks, queues and linked-lists) and Non-Linear (trees and graphs) data structures.
flowchart of data structures -- image source: medium
Now, my question is that if linked-lists are "linear" data structures, then why are they used to implement trees, which are non-linear? Since trees have nodes that consist of the keys (or values) and pointers to more child nodes, aren't trees just more complex linked-lists?
If they are, then how is the statement "linked-lists are linear data structures" justified?
This is how I actually got this question:
I am currently learning data structures in C. So in order to implement the Tree data structure, I use structs, which consist of keys and pointers to the left and right children of each node.
typedef struct Node
{
int key;
struct Node *left;
struct Node *right;
} Node;
Then I wondered that I'm essentially implementing a linked-list (since linked-lists are also done using structs in C).

You are confusing different things.
A tree is logical data-structure which is a special case of a directed graph without cycles.
A tree data-structure can be implemented in various ways. An array which stores indices of children, using actual pointers to memory of children (like in your example), and other ways. your struct is a specific implementation of a tree data structure but there are other implementations.
A 'linked list' typically reffers to specific implementation where elements point to each others memory. There is a one directional linked list, where element points to next one, or bi-directional, where element points to previous and next one.
If you implement trees with pointers and each node has only one child, then this is a special case where the implementation resembles a linked list.
Note: that a linked list may have a loop, while a tree never has a loop, because then it becomes graph (by definition).
Also it is not common for a tree node to point back to its parent, only point to its children, while linked lists sometimes point to parent (previous element).
So linked list is an implementation of logical data structure which is called 'list'. This implementation uses pointers.
List can be implemented in other ways (arrays, histograms, hash tables with counters of amount of appearances of each elements, skip lists for O(log(n)) search etc).
A tree is a logical data structre which is commonly implemented using pointers, but has other implementations as well.
When tree is implemented using pointers, each 1 branch in a tree, resembles a linked list - so this is the sources of your confusion.

Related

How to serialize a Graph-like AVL tree to disk?

I know it sounds weird but this is it. I have a data structure which is basically a modified AVL tree. Each node of the the structure has a left child and a right child. These core pointers (left & right) will be used to link all data nodes together and to keep the data structure balanced (AVL rotations) to improve searching. But those are not the only pointers in the structure, there are others that can point to any random node in the tree (Which creates the graph-like analogy).
The tree is built at runtime through user interaction (CLI). The user is also responsible for creating all the different links between the nodes.
An example of such a data structure could be (Didn't start coding yet, it's only prototyping):
struct node {
struct node *left;
struct node *right
struct node *links[NUM]; // Points to any random node in the tree.
/* Probably many other fields here that could be either pointers
or other data types */
}
Now, everything is in RAM. Once the user wants to exit, all the data nodes (The whole tree) should be saved to a file in binary mode (For later reloading, so one must take this in consideration).
It's, basically, easy to save the AVL tree using one the recursive tree traversal algorithms (In this case the question is a duplicate because solutions already exist in SO). But, in my case, i have to preserve all the arbitrarily created links between the nodes.
What could be the most efficient way in time & space ?

You could dump your data structure as is (including the pointer values) and, in the binary blob of each node, also add its address. When reloading the data structure you will dynamically allocate your nodes and store their new addresses in a hash table which access keys are the old addresses. In a final pass you will parse your hash table sequentially (not using the old addresses as keys), retrieve the new address of each node, and update its pointer fields from old addresses to new addresses using again your hash table as a translation table (with the old addresses as access keys).

Choose a unique index number for each node, and use it to serialize the links.
This will likely take two traversal passes -- one to set the index number, and one to do the serialization. Add an integer field to your node to hold the index number; you shouldn't need any other memory overhead.
Alternately, if you manage your tree nodes by storing them in an array or std::vector, you will already have an index number handy, and you won't need an additional index field. Also, you can store all your links as indices instead of pointers, so you can just serialize your container as-is.

Implementing arrays using linked lists (and vice versa)

After learning about arrays and linked lists in class, I'm curious about whether arrays can be used to create linked lists and vice versa.
To create a linked list using an array, could I store the first value of the linked list at index 0 of the array, the pointer to the next node at index 1 of the array, and so on? I guess I'm confused because the "pointer to next" seems redundant, given that we know that the index storing the value of the next node will always be: index of value of current node + 2.
I don't think it's possible to create an array using a linked list, because an array involves continuous memory, but a linked list can have nodes stored in different parts of computer memory. Is there some way to get around this?
Thanks in advance.

The array based linked list is generally defined in a 2-dimentional array, something like :
Benefit: The list will only take up to a specific amount of memory that is initially defined.
Down side: The list can only contain a specific predefined amount of items.
As a single linked list the data structure has to hold a head pointer. This data structure holds a head pointer however in this specific implementation it is a int. The head of the list is the pointer that holds the index to the first node. The first node holds the index to the next node and so on. The last node in the list will hold a next value of -1. This will indicate the end of the list. The fact that indices are taken as elements are added into the structure makes a requirement for a free list head. This free list is incorporated into the same 2-dementional array. Just as the head is an int the free list pointer is an int.
Now the data structure is composted of 3 major elements. The head pointer, the free head pointer and the 2-dimentional array. The list has to be initialized correctly to allow the list to be used. The list should be initialized as below.
Reference is this link

You could store a linked list in an array, but only in the sense that you have an ordered list. As you say, you do not need pointers as you know the order (it's explicit in the array ordering). The main differences for choosing between an array or linked list are:
Arrays are "static" in that items are fixed in their elements. You can't mremove an element and have the array automatically shuffle the following elements down. Of course you can bypass "empty" elements in your iteration it this requires specific logic. With a linked list, if you remove an element, it's gone. With an array, you have to shuffle all subsequent elements down.
As such, linked lists are often used where insertion/ deletion of elements is the most common activity. Arrays are most often used with access is required (as faster as directly accessed [by index]).
Another area where you may see benefits of linked lists over arrays is in sorting (where sorting is required or frequent). The reason for this being that linked list sorts require only pointer manipulation whereas aray sorting requires swapping and shuffling. That said, many sorting algorithms create new arrays anyway (merge-sort is typical) which reduces this overhead (though requires the same memory again for the sorted array).
You can mix your metaphors somewhat if, for example, you enable your linked list to be marked "read-only". That is, you could create an array of pointers to each node in your linked list. that way, you can have indexed access to your linked-list. The array becomes outdated (in the way described above) once elements are added or removed from your linked list (hence the read-only aspect).
So, to answer your specific questions:
1) There's no value in doing this - as per the details above
2) You haven't really provided enough information to answer the question of contiguous memory allocation. It depends on a lot: OS, architecture, compiler implementation. You haven't even mentioned the programming language. In short though, choosing between a linked list and array has little to do with contiguous memory allocation and more to do with usage. For instance, the java LinkedList class and ArrayList class both represent a List implementation but are specialised based on usage patterns. It is expected that LinkedList performs better for "lists" expecting high-modification (although in tests done a few years ago this proved to be negligible - I'm not sure of the state in the latest versions of java).
Also, you wouldn't typically "create an array with a linked list" or vice versa. They're both abstract data structures used in building larger components. They represent a list of something in a wider context (e.g. a department has a list of employees). Each datatype just has usage benefits. Similarly, you might use a set, queue, stack, etc. It depends on your usage needs.
I really hope I haven't confused you further!

binary seach tree index implementation using symbol tables

I am reading about index implementation using symbol tables in book by author Robert Sedwick in Algorithms in C++.
Below is snippet from the book
We can adapt binary search trees to build indices in precisely the
same manner as we provided indirection for sorting and for heaps.
Arrange for keys to be extracted from items via the key member
function, as usual. Moreover, we can use parallel arrays for the
links, as we did for linked lists. We use three arrays, one each for
the items, left links, and right links. The links are array indices
(integers), and we replace link references such as
x = x->l
in all our code with array references such as
x = l[x].
This approach avoids the cost of dynamic memory allocation for each
node—the items occupy an array without regard to the search function,
and we preallocate two integers per item to hold the tree links,
recognizing that we will need at least this amount of space when all
the items are in the search structure. The space for the links is not
always in use, but it is there for use by the search routine without
any time overhead for allocation. Another important feature of this
approach is that it allows extra arrays (extra information associated
with each node) to be added without the tree-manipulation code being
changed at all. When the search routine returns the index for an item,
it gives a way to access immediately all the information associated
with that item, by using the index to access an appropriate array.
This way of implementing BSTs to aid in searching large arrays of
items is sometimes useful, because it avoids the extra expense of
copying items into the internal representation of the ADT, and the
overhead of allocation and construction by new. The use of arrays is
not appropriate when space is at a premium and the symbol table grows
and shrinks markedly, particularly if it is difficult to estimate the
maximum size of the symbol table in advance. If no accurate size
prediction is possible, unused links might waste space in the item
array.
My questions on above text are
What does author mean by "we can use parallel arrays for the links as we did for linked lists" ? What does this statment mean and what are parallel arrays.
What does author mean links are array indices and we replace link references such x= x->l with x=l[x]?
What does author mean by "Another important feature of this approach is that it allows extra arrays (extra information associated with each node) to be added without the tree-manipulation code being changed at all." ?

You appear to have edited the text to take out the useful references. Either that or you have an earlier version of the text.
My third edition states that the index builds are covered in section 9.6, where it covers the process, and the parallel arrays are explained in chapter 3. The parallel arrays are simply storing the payload (the keys and possibly data that are held in the tree) and left/right pointers in three or more separate arrays, using the index to tie them together (x = left[x]). In that case, you may end up with something like:
int leftptr[100];
int rightptr[100];
char *payload[100];
and so on. In that example, node # 74 would have its data stored in payload[74], and the left and right "pointers" (actually indexes) stored in left[74] and right[74] respectively.
This is in contrast to having a single array of structures with the structure holding payload and pointers together (x = x->left;):
struct sNode {
struct sNode *left, right;
char payload[];
};
So, for your specific questions:
Parallel arrays are simply separating the tree structure information from the payload information and using the index to tie together information from those arrays.
Since you're using arrays for the links (and these arrays now hold array indexes rather than pointers), you no longer use x = x->left to move to the left child. Instead you use x = left[x].
The tree manipulation is only interested in the links. By having the links separated from the payload (and other possibly useful information), the code for manipulating tree structure can be simpler.

If you haven't already, you should flip back in the book to the section on linked-lists where he says the technique was used previously (it's probably explained there).
Parallel arrays means we don't have a struct to hold the node information.
struct node {
int data;
struct node *left;
struct node *right;
};
Instead, we have arrays.
int data[SIZE];
int left[SIZE];
int right[SIZE];
These are parallel arrays because we will use the same index to access the data and links. The node is represented in our code by an index, not a pointer. So for node 4, the data is at
data[4];
The left link is at
left[4];
Adding more information at the node can be done by creating yet another array of the same size.
int extra[SIZE];
The extra data for node 4 will be at
extra[4];

Flatten a Tree into an Array

I am looking for the best way to place a tree into an array
The idea is to follow this principle : Array Implementation of Trees
but I'am stuck on how to know what nodes are the children and what nodes are at the same level, because I'am not using a binary tree.
I might have to store ASCII but I can't simply allow arrays of 256 pointers !
Any idea would be welcome.
The purpose of this, is to send an array (tree) to my GPU, instead of using structures.

Well, here is my idea of converting tree into an array.
Take an array of size MAX_VAL, which is the total number of nodes in the tree. The type of the array should be same as that of a node but with one extra field. Its the index value for its parent. You store each node in this way. Store the root node at first position. Say 1. Now the child nodes of this node are stored subsequently with the extra field storing 1 (since this was where root was stored).
Apply this procedure on all nodes and you are done. You can get back the tree, by a simple recursive call on each node.
Hope this helps. :) :)

Ahnentafel lists are very big if not near-perfectly balanced. My guess is your tree isn't going to be balanced, so the benefit of implicit parent/child pointers will outweigh the cost. I'm never seen a non-binary Ahnentafel list, but I assume it's possible (were you asking for the implicit equations?).
Could you keep a sorted list of child pointers for each node (ASCII character + pointer/index)? In this case it might be best, as others suggest, to construct the tree using pointers and allow the children to grow. Then pack all the nodes into a list: work out an order to place the nodes, use prefix sums for their offsets into the array, store the position indices on each node and finally copy the children lists into the array (replacing the children pointers with list indices can be done by following the pointers and querying the index from the previous step).
Traversing to a child in CUDA won't be constant time, but since the order is know you can use a binary search to speed things up.

Is there strong reason not to include multiple node pointers in a node to use in more than one data structure?

Take for example the assignment I'm working on. We're to use a binary search tree for one piece of a set of data and then a linked list for another piece in the set. The suggested method by the professor was:
struct treeNode
{
data * item;
treeNode *left, *right;
};
struct listNode
{
data * item;
listNode *next, *prev;
};
class collection
{
public:
........
}
Where data is a class containing the particulars of each record. Obviously as it's set up, a treeNode can't exist in the linked list.
Wouldn't it be much simpler to:
struct node
{
data * item;
node *listNext, *listPrev, *treeLeft, *treeRight;
};
then we can declare:
node * listHead;
node * treeRoot;
and include both insertion algorithms into the class.
Is there something I'm missing?

Actually, the data items are to be inserted into both lists. The (mundane) purpose of the assignment is to sort the data sets in two different elements in the set.
So with that said, wouldn't I be saving memory? Combining the 2 nodes I end up with 5 pointers, if I left them separate I'd be using 6. Also I really only have one group of data this way. if I had 250 data items to keep track of, I'd have one group of 1250 pointers instead of 2 lists of 750. Maybe I'm misunderstanding what actually gets allocated with pointer calls.

You can do that, but you are wasting memory with the extra pointers. Also, it tends to be more confusing to mix types like that. Am I correct in assuming that the data is either put into the list or put into the tree, but not inserted into both? There's really not much reason to have them both use the same structure if they are different data types anyway. If you are inserting the same data into both types, you could potentially switch from traversing the tree to traversing the list if you had any use for such an action.
Since you're inserting the data into both lists, It would save memory to use your composite node structure. I would insert into the binary tree first, then insert the allocated node into the linked list. You wouldn't really end up with a pure linked list or a binary search tree, but it would be able to be traversed like either one.

What was the answer?
If your data is less than (hmmm) megabytes, don't worry about memory consumption. 1 or 2 Gigabytes is typical in normal computers today.
How big are the items? 32 char? 64k of compressed multimedia? Something big?
How reasonable is it to organize one item using both techniques? If the data are really the same, then a 5 pointer structure is interesting- someone could find a node in one ordering and then browse related nodes in the other ordering.
Are the items unrelated, some chalk, some cheese? Are they multidimensional? personnel records? Audio file descriptions? Recipes?
In school, a good teacher is trying to give you experience with common techniques and disciplines. Just like art class, or composition. Pencil, pastels, 5 paragraph essay. So the teacher might want you to write two different classes & constructors. Use one struct for one part of the data, different one for other data. Or the same. Just because.
Outside of school, the data comes in a format and there are operations desired on it/with it. "Use cases" are stories about how data is used, what has to be kept, what algorithms are used.
The point of this might be bimodal searching, 2 pairs of orthogonal pointers. It might be Unions, where each item is asssociated with a list or a tree, but not both at the same time. It might be a flurry of lightwieght subsets, trees and lists, that are compared and contrasted...
When in doubt, "data structures + algorithms = programs". But it pays to know what point the teacher is trying to make, and whether you want to follow their lead. (Usually, in school, you do.)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight