Binary Tree Implemetation in C for calculating - c

I'm reading a book about Data structures and I'm getting into trouble with the implementation example of binary tree in that book. The problem is I need to calculate and implement this parse tree below:
This is the source code of the example I mentioned above:
I've known about trees but I cannot understand what the source code really means because the book I'm reading does not explain each step. I really need the deeper explanation for the source code.
EDIT : You can focus on the loop step, it is the most difficult one for me to understand

This code seems to implement the Reverse polish notation, i.e. a notation where operators follow their operands. It reads an expression recorded in the RPN and builds the corresponding binary tree. For example, for the tree above the RPN form will be:
ABC+DE**F+*
The logic is pretty straightforward and is based on a stack that contains nodes of the tree:
Every time you encounter an operand (i.e. a letter), you create a new leaf node with an operand and push it to the stack.
Every time you encounter an operator, you create a new operator node, that replaces the two top-most nodes from the stack. The replaced nodes become the new node's children.
In the end, you get the expression tree on the top of the stack.
Update: As for the specific lines you mentioned: z is a special kind of tree node, a sentinel, that is depicted as a tiny rectangle on the picture. That's a no-value node, which allows you to know when you reach the tree bottom. Another way is just to use a null pointer (the link above compares the approaches).
z->l = z;
z->r = z;
is what makes the node it's own child. A sentinel node can also represent an empty tree.
Now in the loop:
x->info = c;
x->l = z;
x->r = x;
creates a new leaf node (operand nodes don't have children). If we then find than the node is actually an operator, the children are immediately replaced with the operands from the stack.

Related

Clone a Binary Tree with Random Pointers

Can anyone explain the way of cloning the binary tree with random pointers apart from left to right? every node has following structure.
struct node {
int key;
struct node *left,*right,*random;
}
This is very popular interview question and I am able to figure out the solution based on hashing(which is similar to cloning of linked lists). I tried to understand the solution given in Link (approach 2) but am not able to figure out what does it want to convey by reading code also.
I don't expect solution based on hashing as it is intuitive and pretty straight forward. Please explain solution based on modifying binary tree and cloning it.
The solution presented is based on the idea of interleaving both trees, the original one and its clone.
For every node A in the original tree, its clone cA is created and inserted as A's left child. The original left child of A is shifted one level down in the tree structure and becomes a left child of cA.
For each node B, which is a right child of its parent P (i.e., B == P->right), a pointer to its clone node cB is copied to a clone of its parent.
P P
/ \ / \
/ \ / \
A B cP B
/ \ / \ / \
/ \ / \ / \
X Z A cB Z
/ \ /
cA cZ
/
X
/
cX
Finally we can extract the cloned tree by traversing the interleaved tree and unlinking every other node on each 'left' path (starting from root->left) together with its 'rightmost' descendants path and, recursively, every other 'left' descendant of those and so on.
What's important, each cloned node is a direct left child of its original node. So in the middle part of the algorithm, after inserting the cloned nodes but before extracting them, we can traverse the whole tree walking on original nodes, and whenever we find a random pointer, say A->random == Z, we can copy the binding into clones by setting cA->random = cZ, which resolves to something like
A->left->random = A->random->left;
This allows cloning random pointers directly and does not require additional hash maps (at the cost of interleaving new nodes into the original tree and extracting them later).
The interleaving method can be simplified a little, I think.
1) For every node A in the original tree, create clone cA with the same left and right pointers as A. Then, set As left pointer to cA.
P P
/ \ /
/ \ /
A B cP
/ \ / \
/ \ / \
X Z A B
/ /
cA cB
/ \
X Z
/ /
cX cZ
2) Now given a node and it's clone (which is just node.left), the random pointer for the clone is: node.random.left (if node.random exists).
3) Finally, the binary tree can be un-interleaved.
I find this interleaving makes reasoning about the code much simpler.
Here is the code:
def clone_and_interleave(root):
if not root:
return
clone_and_interleave(root.left)
clone_and_interleave(root.right)
cloned_root = Node(root.data)
cloned_root.left, cloned_root.right = root.left, root.right
root.left = cloned_root
root.right = None # This isn't necessary, but doesn't hurt either.
def set_randoms(root):
if not root:
return
cloned_root = root.left
set_randoms(cloned_root.left)
set_randoms(cloned_root.right)
cloned_root.random = root.random.left if root.random else None
def unterleave(root):
if not root:
return (None, None)
cloned_root = root.left
cloned_root.left, root.left = unterleave(cloned_root.left)
cloned_root.right, root.right = unterleave(cloned_root.right)
return (cloned_root, root)
def cloneTree(root):
clone_and_interleave(root)
set_randoms(root)
cloned_root, root = unterleave(root)
return cloned_root
The terminology used in those interview questions is absurdly bad. It’s the case of one unwitting kuckledgragger somewhere calling that pointer the “random” pointer and everyone just nods and accept this as if it was some CS mantra from an ivory tower. Alas, it’s sheer lunacy.
Either what you have is a tree or it isn’t. A tree is an acyclic directed graph with at most a single edge directed toward any node, and adding extra pointers can’t change it - the things the pointers point to must retain this property.
But when the node has a pointer that can point to any other node, it’s not a tree. You got a proper directed graph with cycles in it, and looking at it as if it were a tree is silly at this point. It’s not a tree. It’s just a generic directed edge graph that you’re cloning. So any relevant directed graph cloning technique will work, but the insistence on using the terms “tree” and “random pointer” obscure this simple fact, and confuse the matters terribly.
This snafu indicates that whoever came up with the question was not qualified to be doing any such interviewing. This stuff is covered in any decent introductory data structure textbook so you’d think it shouldn’t present some astronomical uphill effort to just articulate what you need in a straightforward manner. Let the interviewees deal with users who can’t articulate themselves once they get that job - the data structure interview is neither the place nor time for that. It reeks of stupidity and carelessness, and leaves permanently bad aftertaste. It’s probably yet another stupid thing that ended up in some “interview question bank” because one poor soul got it asked by a careless idiot once and now everyone treats it as gospel. It’s yet again the blind leading the blind and cluelessness abounds.
Copying arbitrary graphs is a well solved problem and in all cases you need to retain the state of your traversal somehow. Whether it’s done by inserting nodes into the original graph to mark the progress - one could call it intrusive marking - or by adding data to the copy in progress and removing it when done, or by using an auxiliary structure such as a hash, or by doing repeat traversal to check it you made a copy of that node elsewhere - is of secondary importance, since the purpose Is always the same: to retain the same state information, just encoding it in various ways, trading off speed and memory use (as always).
When thinking of this problem, you need to tell yourself what sort of state you need to finish the copy, and abstract it away, and implement the copy using this abstract interface. Then you can implement it in a few ways, but at that point the copy itself doesn’t obscure things since you look at this simple abstract state-preserving interface and not at the copy process.
In real life the choice of any particular implementation highly depends on the amount and structure of data being copied, and the extent you have control over it all. If you’re the one controlling the structure of the nodes, then you’ll usually find that they have some padding that you could use to store a bit of state information. Or you’ll find that the memory block allocated for the nodes is actually larger than requested: malloc will often end up providing a block larger than asked for, and all reasonable platforms have APIs that let you retrieve the actual size of the block and thus check if there’s maybe some leftover space just begging to be used. These APIs are not always fast so be careful there of course. But you see where this is going: such optimization requires benchmarks and a clear need driven by demands of the application. Otherwise, use whatever is least likely to be buggy - ideally a C library that provides data structures that you could use right away. If you need a cyclic graph there are libraries that do just that - use them first.
But boy, do I hate that idiotic “random” name of the pointer. Who comes up with this nonsense and why do they pollute so many minds? There’s nothing random about it. And a tree that’s not a tree is not a tree. I’d fail that interviewer in a split second…

Can somebody please explain this pseudo code to me in terms of java code?

Here is the code for a BFS, I don't understand what this means in java code. terms like .pathlen, the arrows, etc I don't understand any of it. Can anyone clarify? thanks.
Image of the code
Iterate through each vertex of graph g and mark a boolean as not visited.
Initialize a queue of vertices.
Mark and declare the starting node as visited.
Set the length of the path to the starting node as 0. This would be an integer field variable in the vertex class.
Add the first node to the queue.
While the queue is not empty, repeat the following.
Remove the head node from the queue.
For every edge that is adjacent to the vertex you removed, so you would go through the adjacency list or matrix on the graph and look for edges close to it.
If the node's boolean variable says that the edge's destination is not visited, then:
Add the edge's destination to the queue.
Mark the destination node from the edge as visited.
Add one to the length of the path on the edge's node, on top of the starting node's edge weight.
Note: If you have a weighted graph then you can do something other than +1 on step #12. But you shouldn't terminate the BFS until it runs on all nodes if you make it weighted.
Pathlen is just a member variable. Think of that like the syntax for accessing a public variable of a java class.
The arrow syntax is a syntax for assignment, the equivalent of java's =. It means "Take the thing on the right and assign the thing on the left that value."

Inserting an item to a linked list - is iteration necessary (C)?

I'm currently studying linked lists for interview prep, would appreciate if anybody could shed some light on this. The following function in C supposedly inserts a new element into the list after a certain Element elem (passed as an argument to the function):
bool insertAfter(Element * elem, int data){
Element * newElem, * curPos = head;
newElem->data = data;
while(curPos){
if(curPos == elem){
newElem->next = curPos->next;
curPos->next = newElem;
return true;
}
curPos = curPos->Next;
}
return false;
}
Although the above is specified in the textbook I am studying from, I tried coming up with a solution that does not use any iteration whatsoever:
bool insertAfter(Element * elem, int data){
Element * newElem;
newElem->next = elem->next
newElem->data = data
elem->next = newElem;
return true;
}
However, as it appears too simplistic, I sense that it may not work, but am not sure why. I need some insights on the technicalities on why this may or may not work, thanks.
Both versions suffer from the error of using newElem as though it were a valid pointer. It is not. It is not initialized to point to valid object.
You can correct that by allocating memory for an object before using:
Element * newElem = malloc(sizeof(*newElem));
The difference between the two versions is that if elem is not accessible from head for some reason or it is NULL, the first version will do nothing to the existing list. The second version does not deal with either of those scenarios. It assumes that elem is in the list and that it is not NULL.
Your solution is pretty much correct. The iteration in the example is almost completely useless.
What the example function is doing is: given an element to insert the new data after, it looks through the list starting with head to find that same element, then inserts the data - doing nothing with head or elem after finding it. Since all the loop does is "find" an element to which you already had a pointer, it essentially did nothing at all and is useless.
The only possible use of this is to constrain the insertion function to only work on this one list beginning with head, globally, throughout your program. This is such a strange design decision that one would likely assume it's a mistake unless given a reason to believe otherwise (dynamic data structures constrained to a single instance are an unusual pattern; more importantly, the whole point of linked lists is O(1) insertion, which the example function breaks by adding this useless loop). head is not needed for any other reason than to enforce this constraint, and if this is desired, it would make more sense to pass it in as a parameter as well so that the function is able to be used on more than one list per-program. (Or, not to perform the check at all: another use of linked lists is that you can pass around and insert after nodes without worrying about the head element.)
As other people have pointed out, you fail to actually allocate newElem, but so does the textbook. Overall, it's a rubbish example; not only did the author make a mistake with allocation, but they don't appear to understand the basic advantages of using linked lists. You should definitely treat this textbook with suspicion.
Your logic will work, because you just need a pointer to a node where you want to insert a node. If you already have such a pointer, no need to iterate and search for the node.
However the search (iteration) would be relevant if do not have in hand the node pointer where you want to insert the new node. Example: suppose the nodes have unique keys and you do not have the node pointer where the key exists and you want to insert after you find the node containing the specific key, then you need to find the correct node pointer and do the insertion (The function should then take in key as the argument).
However in your code (both cases), you have not allocated the memory for the new node. You need to do malloc for the new node and then go on with the insertion.
This will not work because you do not know your newElem. In linked list you have knowledge about a head and each element gives you information where to find the next one:
head -> e1 -> e2 -> ...
So you need to iterate till you will find the element you care about. But you can also iterate with the recursion.

Create a new copy of a data structure based on pointers

I started with a programming assignment. I had to design a DFA based on graphs. Here is the data structure I used for it:
typedef struct n{
struct n *next[255]; //pointer to the next state. Value is NULL if no transition for the input character( denoted by their ascii value)
bool start_state;
bool end_state;
}node;
Now I have a DFA graph-based structure ready with me. I need to utilize this DFA in several places; The DFA will get modified in each of these several places. But I want unmodified DFAs to be passed to these various functions. One way is to create a copy of this DFA. What is the most elegant way of doing this? So all of them are initialized either with a NULL value or some pointer to another state.
NOTE:
I want the copy to be created in the called function i.e. I pass the DFA, the called function creates its copy and does operation on it. This way, my original DFA remains undeterred.
MORE NOTES:
From each node of a DFA, I can have a directed edge connecting it with another edge, If the transition takes place when there the input alphabet is c then state->next[c] will have a pointer of the next node. It is possible that several elements of the next array are NULL. Modifying the NFA means both adding new nodes as well as altering the present nodes.
If you need a private copy on each call, and since this is a linked data structure, I see no way to avoid copying the whole graph (except perhaps to do a copy-on-write to some sub branches if the performance is that critical, but the complexity is significant and so is the chance of bugs).
Had this been c++, you could have done this in a copy constructor, but in c you just need to clone on every function. One way is to clone the entire structure (Like Mark suggested) - it's pretty complicated since you need to track cycles/ back edges in the graph (which manifest as pointers to previously visited nodes that you don't want to reallocate but reuse what you've already allocated).
Another way, if you're willing to change your data structure, is to work with arrays - keep all the nodes in a single array of type node. The array should be big enough to accommodate all nodes if you know the limit, or just reallocate it to increase upon demand, and each "pointer" is replaced by a simple index.
Building this array is different - instead of mallocing a new node, use the next available index (keep it on the side), or if you're going to add/remove nodes on the fly, you could keep a queue/stack of "free" indices (populate at the beginning with 1..N, and pop/push there whenever you need a new location or about to free an old one.
The upside is that copying would be much faster, since all the links are relative to the instance of the array, you just copy a chunk of contiguous memory (memcpy would now work fine)
Another upside is that the performance of using this data structure should be superior to the linked one, since the memory accesses are spatially close and easily prefetchable.
You'll need to write a recursive function that visits all the nodes, with a global dictionary that keeps track of the mapping from the source graph nodes to the copied graph nodes. This dictionary will be basically a table that maps old pointers to new pointers.
Here's the idea. I haven't compiled it or debugged it...
struct {
node* existing;
node* copy
} dictionary[MAX_NODES] = {0};
node* do_copy(node* existing)
{
node* copy;
int i;
for(i=0;dictionary[i].existing;i++) {
if (dictionary[i].existing == existing) return dictionary[i].copy;
}
copy = (node*)malloc(sizeof(node));
dictionary[i].existing = existing;
dictionary[i].copy = copy;
for(int j=0;j<255 && existing->next[j];j++) {
node* child = do_copy(existing->next[j]);
copy->next[j] = child;
}
copy->end_state = existing->end_state;
copy->start_start = existing->start_state;
return copy;
}

Losing parts in Link List in C

I'm trying to make a link list and I'm having trouble with the concept with linking the middle part, I'm just doing a little pseudo-code right now, haven't actually coded anything.
(struct pointers) *current, *ahead, *behind, *begin;
(behind)-->(current)-->(ahead) //This is what I want to do
behind->next = current;
current->next = ahead;
Is this the proper way to break and connect the list? Without losing anything..
What you have looks correct but rather incomplete. One of the unwritten rules of programming is that you cannot write a linked list implementation correctly the first time. There are four cases you need to deal with:
Insert into an empty list
Insert into a non-empty list
Removing the first element from the list
Removing any other element from the list
There are also doubly-linked lists, where each element has a pointer to both the previous element and the next element. That makes it easier to handle things like removal of a random element without traversing the list, but can be trickier to get right.

Resources