How to get the size of a binary tree?

How to get the size of a binary tree? - c

I have a very simple binary tree structure, something like:
struct nmbintree_s {
unsigned int size;
int (*cmp)(const void *e1, const void *e2);
void (*destructor)(void *data);
nmbintree_node *root;
};
struct nmbintree_node_s {
void *data;
struct nmbintree_node_s *right;
struct nmbintree_node_s *left;
};
Sometimes i need to extract a 'tree' from another and i need to get the size to the 'extracted tree' in order to update the size of the initial 'tree' .
I was thinking on two approaches:
1) Using a recursive function, something like:
unsigned int nmbintree_size(struct nmbintree_node* node) {
if (node==NULL) {
return(0);
}
return( nmbintree_size(node->left) + nmbintree_size(node->right) + 1 );
}
2) A preorder / inorder / postorder traversal done in an iterative way (using stack / queue) + counting the nodes.
What approach do you think is more 'memory failure proof' / performant ?
Any other suggestions / tips ?
NOTE: I am probably going to use this implementation in the future for small projects of mine. So I don't want to unexpectedly fail :).

Just use a recursive function. It's simple to implement this way and there is no need to make it more compilcated.
If you were doing it "manually" you'd basically end up implementing the same thing, just that you wouldn't use the system call stack for temporary variables but your own stack. Usually this won't have any advantages outweighing the more complicated code.
If you later find out that a substantial time in your program is spend calculating the sizes of trees (which probably won't happen) you can still start to profile things and try how a manual implementation performs. But then it might also better to do algorithmic improvements like already keeping track of the changes in size during the extraction process.

If your "very simple" binary tree isn't balanced, then the recursive option is scary, because of the unconstrained recursion depth. The iterative traversals have the same time problem, but at least the stack/queue is under your control, so you needn't crash. In fact, with flags and an extra pointer in each node and exclusive access, you can iterate over your tree without any stack/queue at all.
Another option is for each node to store the size of the sub-tree below it. This means that whenever you add or remove something, you have to track all the way up to the root updating all the sizes. So again if the tree isn't balanced that's a hefty operation.
If the tree is balanced, though, then it isn't very deep. All options are failure-proof, and performance is estimated by measurement :-) But based on your tree node struct, either it's not balanced or else you're playing silly games with flags in the least significant bits of pointers...
There might not be much point being very clever with this. For many practical uses of a binary tree (in particular if it's a binary search tree), you realise sooner rather than later that you want it to be balanced. So save your energy for when you reach that point :-)

How big is this tree, and how often do you need to know its size? As sth said, the recursive function is the simplest and probably the fastest.
If the tree is like 10^3 nodes, and you change it 10^3 times per second, then you could just keep an external count, which you decrement when you remove a node, and increment when you add one. Other than that, simple is best.
Personally, I don't like any solution that requires decorating the nodes with extra information like counts and "up" pointers (although sometimes I do it). Any extra data like that makes the structure denormalized, so changing it involves extra code and extra chances for errors.

Related

Clone a Binary Tree with Random Pointers

Can anyone explain the way of cloning the binary tree with random pointers apart from left to right? every node has following structure.
struct node {
int key;
struct node *left,*right,*random;
}
This is very popular interview question and I am able to figure out the solution based on hashing(which is similar to cloning of linked lists). I tried to understand the solution given in Link (approach 2) but am not able to figure out what does it want to convey by reading code also.
I don't expect solution based on hashing as it is intuitive and pretty straight forward. Please explain solution based on modifying binary tree and cloning it.

The solution presented is based on the idea of interleaving both trees, the original one and its clone.
For every node A in the original tree, its clone cA is created and inserted as A's left child. The original left child of A is shifted one level down in the tree structure and becomes a left child of cA.
For each node B, which is a right child of its parent P (i.e., B == P->right), a pointer to its clone node cB is copied to a clone of its parent.
P P
/ \ / \
/ \ / \
A B cP B
/ \ / \ / \
/ \ / \ / \
X Z A cB Z
/ \ /
cA cZ
/
X
/
cX
Finally we can extract the cloned tree by traversing the interleaved tree and unlinking every other node on each 'left' path (starting from root->left) together with its 'rightmost' descendants path and, recursively, every other 'left' descendant of those and so on.
What's important, each cloned node is a direct left child of its original node. So in the middle part of the algorithm, after inserting the cloned nodes but before extracting them, we can traverse the whole tree walking on original nodes, and whenever we find a random pointer, say A->random == Z, we can copy the binding into clones by setting cA->random = cZ, which resolves to something like
A->left->random = A->random->left;
This allows cloning random pointers directly and does not require additional hash maps (at the cost of interleaving new nodes into the original tree and extracting them later).

The interleaving method can be simplified a little, I think.
1) For every node A in the original tree, create clone cA with the same left and right pointers as A. Then, set As left pointer to cA.
P P
/ \ /
/ \ /
A B cP
/ \ / \
/ \ / \
X Z A B
/ /
cA cB
/ \
X Z
/ /
cX cZ
2) Now given a node and it's clone (which is just node.left), the random pointer for the clone is: node.random.left (if node.random exists).
3) Finally, the binary tree can be un-interleaved.
I find this interleaving makes reasoning about the code much simpler.
Here is the code:
def clone_and_interleave(root):
if not root:
return
clone_and_interleave(root.left)
clone_and_interleave(root.right)
cloned_root = Node(root.data)
cloned_root.left, cloned_root.right = root.left, root.right
root.left = cloned_root
root.right = None # This isn't necessary, but doesn't hurt either.
def set_randoms(root):
if not root:
return
cloned_root = root.left
set_randoms(cloned_root.left)
set_randoms(cloned_root.right)
cloned_root.random = root.random.left if root.random else None
def unterleave(root):
if not root:
return (None, None)
cloned_root = root.left
cloned_root.left, root.left = unterleave(cloned_root.left)
cloned_root.right, root.right = unterleave(cloned_root.right)
return (cloned_root, root)
def cloneTree(root):
clone_and_interleave(root)
set_randoms(root)
cloned_root, root = unterleave(root)
return cloned_root

The terminology used in those interview questions is absurdly bad. It’s the case of one unwitting kuckledgragger somewhere calling that pointer the “random” pointer and everyone just nods and accept this as if it was some CS mantra from an ivory tower. Alas, it’s sheer lunacy.
Either what you have is a tree or it isn’t. A tree is an acyclic directed graph with at most a single edge directed toward any node, and adding extra pointers can’t change it - the things the pointers point to must retain this property.
But when the node has a pointer that can point to any other node, it’s not a tree. You got a proper directed graph with cycles in it, and looking at it as if it were a tree is silly at this point. It’s not a tree. It’s just a generic directed edge graph that you’re cloning. So any relevant directed graph cloning technique will work, but the insistence on using the terms “tree” and “random pointer” obscure this simple fact, and confuse the matters terribly.
This snafu indicates that whoever came up with the question was not qualified to be doing any such interviewing. This stuff is covered in any decent introductory data structure textbook so you’d think it shouldn’t present some astronomical uphill effort to just articulate what you need in a straightforward manner. Let the interviewees deal with users who can’t articulate themselves once they get that job - the data structure interview is neither the place nor time for that. It reeks of stupidity and carelessness, and leaves permanently bad aftertaste. It’s probably yet another stupid thing that ended up in some “interview question bank” because one poor soul got it asked by a careless idiot once and now everyone treats it as gospel. It’s yet again the blind leading the blind and cluelessness abounds.
Copying arbitrary graphs is a well solved problem and in all cases you need to retain the state of your traversal somehow. Whether it’s done by inserting nodes into the original graph to mark the progress - one could call it intrusive marking - or by adding data to the copy in progress and removing it when done, or by using an auxiliary structure such as a hash, or by doing repeat traversal to check it you made a copy of that node elsewhere - is of secondary importance, since the purpose Is always the same: to retain the same state information, just encoding it in various ways, trading off speed and memory use (as always).
When thinking of this problem, you need to tell yourself what sort of state you need to finish the copy, and abstract it away, and implement the copy using this abstract interface. Then you can implement it in a few ways, but at that point the copy itself doesn’t obscure things since you look at this simple abstract state-preserving interface and not at the copy process.
In real life the choice of any particular implementation highly depends on the amount and structure of data being copied, and the extent you have control over it all. If you’re the one controlling the structure of the nodes, then you’ll usually find that they have some padding that you could use to store a bit of state information. Or you’ll find that the memory block allocated for the nodes is actually larger than requested: malloc will often end up providing a block larger than asked for, and all reasonable platforms have APIs that let you retrieve the actual size of the block and thus check if there’s maybe some leftover space just begging to be used. These APIs are not always fast so be careful there of course. But you see where this is going: such optimization requires benchmarks and a clear need driven by demands of the application. Otherwise, use whatever is least likely to be buggy - ideally a C library that provides data structures that you could use right away. If you need a cyclic graph there are libraries that do just that - use them first.
But boy, do I hate that idiotic “random” name of the pointer. Who comes up with this nonsense and why do they pollute so many minds? There’s nothing random about it. And a tree that’s not a tree is not a tree. I’d fail that interviewer in a split second…

optimizing searching through memory

I have multiple instances of 4096 items. I need to search and find an item on a reocurring basis and i'd like to optimize it. Since not all 4096 items may be used, I thought, an approach to speed things up would be to use a linked list instead of an array. And whenever I have to search an item, once I found it, I'd place it on the head of the list so that next time it comes around, I'd have to do only minimal search (loop) effort. Does this sound right?
EDIT1
I don't think the binary search tree idea is really what I can use as I have ordered data, like an array i.e. every node following the previous one is larger which defeats the purpose, doesn't it?
I have attempted to solve my problem with caching and came up with something like this:
pending edit
But the output I get, suggests that it doesn't work like I'd like it to:
any suggestions on how I can improve this?

When it comes to performance there is only one important rule: measure it!
In your case you could for example have two different considerations, a theoretical runtime analysis and what is really going on one the machine. Both are heavily depending on the characteristics of your 4096 items. If your data is sorted you can have a O(log n) search, if it is unsorted it is worst case O(n) etc.
Regarding your idea of a linked list you might have more hardware cache misses because the data is not stored together anymore (spatial locality) ending up in having a slower implementation even if your theoretical consideration is right.
If you interested in general in such problems I recommend this cool talk from the GoingNative 2013
http://channel9.msdn.com/Events/GoingNative/2013/Writing-Quick-Code-in-Cpp-Quickly

Worst case, your search is still O(N) unless you sort the array or list, like Brett suggested. Therefore with a sorted list, you increase complexity of insertion (to insert ordered), but your searching will be much faster. What you are suggesting is almost like a "cache." It's hard for us to say how useful that will be without any idea of how often a found item is searched for again in the near-term. Clearly there are benefits to caching, it's why we have the whole L1, L2, L3 architecture in memory. But whether it will work out for you it's unsure.

If your data can be put in a binary search tree: http://en.wikipedia.org/wiki/Binary_search_tree
Then you can use a data structure called Splay tree: "A splay tree is a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again" http://en.wikipedia.org/wiki/Splay_tree

Respond to Edit1:
I think if your data element is not large, say, only a few bytes or even tens of bytes, 4096 of them can be fitted into memory. In this case what you need is a hash table. In C++, you use unordered_map. For example, you can define unorderedmap<int, ptr_to_your_node_type> and get the element in O(1) if your key type is int.
The fastest search could be O(1) if you can design your hash well and the worst case could be O(n). If these items are large and can not be fitted into memory, you can use the so called least recently used cachealgorithm to save memory.
An example code for LRU cache
template <typename K>
class Key_Age{
list<K> key_list;
unordered_map<K, typename list<K> :: iterator> key_pos;
public:
void access(K key){
key_list.erase(key_pos[key]);
insert_new(key);
}
void insert_new(K key){
key_list.push_back(key);
key_pos[key] = --key_list.end();
}
K pop_oldest(){
K t = key_list.front();
key_list.pop_front();
return t;
}
};
class LRU_Cache{
int capacity;
Key_Age<int> key_age;
unordered_map<int, int> lru_cache;
public:
LRU_Cache(int capacity): capacity(capacity) {
}
int get(int key) {
if (lru_cache.find(key) != lru_cache.end()) {
key_age.access(key);
return lru_cache[key];
}
return -1;
}
void set(int key, int value) {
if (lru_cache.count(key) < 1) {
if (lru_cache.size() == capacity) {
int oldest_key = key_age.pop_oldest();
lru_cache.erase(oldest_key);
}
key_age.insert_new(key);
lru_cache[key] = value;
return;
}
key_age.access(key);
lru_cache[key] = value;
}
};

Create a new copy of a data structure based on pointers

I started with a programming assignment. I had to design a DFA based on graphs. Here is the data structure I used for it:
typedef struct n{
struct n *next[255]; //pointer to the next state. Value is NULL if no transition for the input character( denoted by their ascii value)
bool start_state;
bool end_state;
}node;
Now I have a DFA graph-based structure ready with me. I need to utilize this DFA in several places; The DFA will get modified in each of these several places. But I want unmodified DFAs to be passed to these various functions. One way is to create a copy of this DFA. What is the most elegant way of doing this? So all of them are initialized either with a NULL value or some pointer to another state.
NOTE:
I want the copy to be created in the called function i.e. I pass the DFA, the called function creates its copy and does operation on it. This way, my original DFA remains undeterred.
MORE NOTES:
From each node of a DFA, I can have a directed edge connecting it with another edge, If the transition takes place when there the input alphabet is c then state->next[c] will have a pointer of the next node. It is possible that several elements of the next array are NULL. Modifying the NFA means both adding new nodes as well as altering the present nodes.

If you need a private copy on each call, and since this is a linked data structure, I see no way to avoid copying the whole graph (except perhaps to do a copy-on-write to some sub branches if the performance is that critical, but the complexity is significant and so is the chance of bugs).
Had this been c++, you could have done this in a copy constructor, but in c you just need to clone on every function. One way is to clone the entire structure (Like Mark suggested) - it's pretty complicated since you need to track cycles/ back edges in the graph (which manifest as pointers to previously visited nodes that you don't want to reallocate but reuse what you've already allocated).
Another way, if you're willing to change your data structure, is to work with arrays - keep all the nodes in a single array of type node. The array should be big enough to accommodate all nodes if you know the limit, or just reallocate it to increase upon demand, and each "pointer" is replaced by a simple index.
Building this array is different - instead of mallocing a new node, use the next available index (keep it on the side), or if you're going to add/remove nodes on the fly, you could keep a queue/stack of "free" indices (populate at the beginning with 1..N, and pop/push there whenever you need a new location or about to free an old one.
The upside is that copying would be much faster, since all the links are relative to the instance of the array, you just copy a chunk of contiguous memory (memcpy would now work fine)
Another upside is that the performance of using this data structure should be superior to the linked one, since the memory accesses are spatially close and easily prefetchable.

You'll need to write a recursive function that visits all the nodes, with a global dictionary that keeps track of the mapping from the source graph nodes to the copied graph nodes. This dictionary will be basically a table that maps old pointers to new pointers.
Here's the idea. I haven't compiled it or debugged it...
struct {
node* existing;
node* copy
} dictionary[MAX_NODES] = {0};
node* do_copy(node* existing)
{
node* copy;
int i;
for(i=0;dictionary[i].existing;i++) {
if (dictionary[i].existing == existing) return dictionary[i].copy;
}
copy = (node*)malloc(sizeof(node));
dictionary[i].existing = existing;
dictionary[i].copy = copy;
for(int j=0;j<255 && existing->next[j];j++) {
node* child = do_copy(existing->next[j]);
copy->next[j] = child;
}
copy->end_state = existing->end_state;
copy->start_start = existing->start_state;
return copy;
}

"Robot Arm moving block stacks" Programming Challenge in C

I'm trying to resolve this for fun but I'm having a little bit of trouble on the implementation, the problem goes like this:
Having n stacks of blocks containing m blocks each, design a program in c that controlls a robotic arm that moves the blocks form an inicial configuration to a final one using the minimum amount of movements posible, your arm can only move one block at a time and can only take the block at the top of the stack, your solution should use either pointers or recursive methods
In other words the blocks should go from this(suposing there are 3 stacks and 3 blocks):
| || || |
|3|| || |
|2||1|| |
to this:
| ||1|| |
| ||2|| |
| ||3|| |
using the shortest amount of movements printing each move
I was thinking that maybe I could use a tree of some sorts to solve it (n-ary tree maybe?) since that is the perfect use of pointers and recursive methods but so far it has proved unsuccesfull, I'm having lots of trouble defining the estructure that will store all the movements since I would have to check every time I want to add a new move to the tree if that move has not been done before, I want each leaf to be unique so when I find the solution it will give me the shortest path.
This is the data structure I was thinking of:
typedef struct tree(
char[MAX_BLOCK][MAX_COL] value;
struct tree *kids
struct tree *brothers;
)Tree;
(I'm really new at C so sorry beforehand if this is all wrong, I'm more used to Java)
How would you guys do it? Do you have any good ideas?

You have the basic idea - though I am not sure why you have elected to choose brothers over the parent.
You can do this problem with a simple BFS search, but it is a slightly less interesting solution, and not the one you for which seemed to have set yourself up.
I think it will help if we concisely and clearly state our approach to the problem as a formulation of either Dijkstra's, A*, or some other search algorithm.
If you are unfamiliar with Dijkstra's, it is imperative that you read up on the algorithm before attempting any further. It is one of the foundational works in shortest path exploration.
With a familiarity of Dijkstra's, A* can readily be described as
Dijsktra's minimizes distance from the start. A* adds a heuristic which minimizes the (expected) distance to the end.
With this algorithm in mind, lets state the specific inputs to an A* search algorithm.
Given a start configuration S-start, and an ending configuration S-end, can we find the shortest path from S-start to S-end given a set of rules R governed by a reward function T
Now, we can envision our data structure not as a tree, but as a graph. Nodes will be board states, and we can transition from state to state using our rules, R. We will pick which edge to follow using the reward function T, the heuristic to A*.
What is missing from your data-structure is the cost. At each node, you will want to store the current shortest path, and whether it is finalized.
Let's make a modification to your data-structure which will allow us to readily traverse a graph and store the shortest path information.
typedef struct node {
char** boardState;
struct node *children;
struct node *parent;
int distance;
char status; //pseudo boolean
} node;
You may want to stop here if you were interested in discovering the algorithm for yourself.
We now consider the rules of our system: one block at a time, from the top of a stack. Each move will constitue an edge in our graph, whose weight is governed by the shortest number of moves from S-begin plus our added heuristic.
We can then sketch a draft of the algorithm as follows:
node * curr = S-begin;
while (curr != S-end) {
curr->status == 'T'; //T for True
for(Node child : children) {
// Only do this update if it is cheaper than the
int updated = setMin(child->distance, curr->distance + 1 + heuristic(child->board));
if(updated == 1) child->parent = curr;
}
//set curr to the node with global minimum distance who has not been explored
}
You can then find the shortest path by tracing the parents backwards from S-end to S-begin.
If you are interested in these types of problems, you should consider taking a uppergraduate level AI course, where they approach these types of problems :-)

Too many frees when freeing graph

I have created a graph datastructure using linked lists. With this code:
typedef struct vertexNode *vertexPointer;
typedef struct edgeNode *edgePointer;
void freeGraph(vertexPointer); /* announce function */
struct edgeNode{
vertexPointer connectsTo;
edgePointer next;
};
struct vertexNode{
int vertex;
edgePointer next;
};
Then I create a graph in which I have 4 nodes, lets say A, B, C and D, where:
A connects to D via B and A connects to D via C. With linked lists I imagine it looks like this:
Finally, I try to free the graph with freeGraph(graph).
void freeEdge(edgePointer e){
if (e != NULL) {
freeEdge(e->next);
freeGraph(e->connectsTo);
free(e);
e = NULL;
}
}
void freeGraph(vertexPointer v){
if (v != NULL) {
freeEdge(v->next);
free(v);
v = NULL;
}
}
That's where valgrind starts complaining with "Invalid read of size 4", "Address 0x41fb0d4 is 4 bytes inside a block of size 8 free'd" and "Invalid free()". Also it says it did 8 mallocs and 9 frees.
I think the problem is that the memory for node D is already freed and then I'm trying to free it again. But I don't see any way to do this right without having to alter the datastructure.
What is the best way to prevent these errors and free the graph correctly, if possible without having to alter the datastructure? Also if there are any other problems with this code, please tell. Thanks!
greets,
semicolon

The lack of knowing all the references makes this a bit difficult. A bit of a hack, but faced with the same issue I would likely use a pointer set (a list of unique values, in this case pointers).
Walk the entire graph, pushing nodes pointers into the set only if
not already present (this the definition of 'set')
Walk the set, freeing each pointer (since they're unique, no issue of
a double-free)
Set graph to NULL.
I'm sure there is an elegant recursive solution to this, but faced with the task as stated, this seems doable and not overtly complicated.

Instead of allocating the nodes and edges on a global heap, maybe you can allocate them in a memory pool. To free the graph, free the whole pool.

I would approach this problem by designing a way to first cleanly remove each node from the graph before freeing it. To do this cleanly you will have to figure out what other nodes are referencing the node you are about to delete and remove those edges. Once the edges are removed, if you happen to come around to another node that had previously referenced the deleted node, the edge will already be gone and you won't be able to try to delete it again.
The easiest way would be to modify your data structure to hold a reference to "incoming" edges. That way you could do something like:
v->incoming[i]->next = null; // do this for each edge in incoming
freeEdge(v->next);
free(v);
v = NULL;
If you didn't want to update the data structure you are left with a hard problem of searching your graph for nodes that have edges to the node you want to delete.

It's because you've got two recursions going on here, and they're stepping on each other. freeGraph is called once to free D (say, from B) and then when the initial call to freeGraph comes back from freeEdge, you try freeing v -- which was already taken care of deeper down. That's a poor explanation without an illustration, but there ya go.
You can get rid of one recursions so they're not "crossing over", or you can check before each free to see if that node has already been taken care of by the other branch of the recursion.

Yes, the problem is that D can be reached over two paths and freed twice.
You can do it in 2 phases:
Phase 1: Insert the nodes you reached into a "set" datastructure.
Phase 2: free the nodes in the "set" datastructure.
A possible implementation of that set datastructure, which requires extending your datastructure:
Mark all nodes in the datastructure with a boolean flag, so you don't insert them twice.
Use another "next"-pointer for a second linked list of all the nodes. A simple one.
Another implementation, without extending your datastructure: SOmething like C++ std::set<>
Another problem: Are you sure that all nodes can be reached, when you start from A?
To avoid this problem, insert all nodes into the "set" datastructure at creation time (you won't need the marking, then).