Searching a string in a binary tree in c - c

I am a student of computer science, and I had an exam last week in C.
One of the questions was to search a specific word (string) in a binary tree, and count how many times it appears.
Every node in the tree contains a letter.
For example, if the word is "mom", and the tree looks like the attached image, the function should return 2.
Pay attention that if there is a word like this — "momom" — the function will count the "mom" only one time.
I have not been able to solve this question. Can you help?
a
/ \
b m
/ / \
v o o
/ \ \
m t m

So basically, because the tree in your image does not appear to be ordered or balanced, so you would have to search every branch until either you hit a match, or you hit a leaf. Once you hit a match, you could ignore all the branches underneath because they're irrelevant. But outside of this, you don't know the depth of the tree, so you can't end searching prematurely based on depth.
So, your algorithm would be something to the effect of:
// returns the number of matches
// matchMask is a bitmap of the string sublengths that match so far...
int search(const char *substr, int substrlen, uint32_t matchMask, node_t *node) {
uint16_t newMatchMask = 0;
int bit;
ASSERT(substrlen < (sizeof(matchMask)*8));
if (node == NULL) {
// hit a leaf, stop return 0
return 0;
}
while (bit = LSB(matchMask) != -1)
{
if (node->ch == substr[bit+1])
newMatchMask |= (1 << (bit+1));
}
if (node->ch == substr[0])
newMatchMask;
if (newMatchMask & (1 << strlen)) {
// found a match, don't bother recursing
return 1;
} else {
return
search(substr, substrlen, newMatchMask, node->left) +
search(substr, substrlen, newMatchMask, node->right);
}
}
Note, that I had to do some funky bitmap stuff there to keep track of the depths matched so far, as you can match a partial substring along the way. LSB is assumed to be a least-significant-bit macro that returns -1 if no bits are set. Also, this is not tested, so there might be an off-by-one error in the bit masking, but the idea is still there.
-- EDIT --
oops, forgot to stop recursing if your node is blank... Fixing

You want to enumerate all words in the tree and check at each end of word if you have a match using strstr().
The keywords to search for would be tree walking tree depth-first.
The semantics of your tree structure are confused. To clarify your question, you should enumerate all words present in the tree by hand, then write a function that walks the tree and prints the same list, the final step is easy: instead of printing them, check if the string matches with strstr and count the matching words.

Related

Binary search tree with two conditions

I have a classic binary search tree, if cond < 0 addleft else addright. But some of my entries are "flagged" and I want them to be prioritized over "non-flagged" so that when I access the tree it first gives "flagged" entries in an ascending order and then the "non-flaged." Currently I have what I call a "tree root" where I check if the entry is "flagged" and send it to the appropriate tree. It works, but I would rather get rid of it and merge the "flagged" and "non-flagged" trees if possible. How would I go about it?
You could regard your tree as a BST with respect to a different comparison operator. Something like the following:
int compare(/* type */ a, /* type */ b) {
if (a->is_flagged && !b->is_flagged) return -1;
else if (!a->is_flagged && b->is_flagged) return 1;
else return a->value < b->value;
}
This essentially does a lexicographical ordering. It regards all flagged elements as less than every non-flagged element, and only considers the value if the flagged status is the same for both items.

C: Finding Huffman-coded path of specific leaf in tree

I'm trying to write a recursive function that locates a specific leaf within a Huffman tree, then prints its shortest path with zeros and ones (zero being a traversal to the left, and one being a traversal to the right). I understand the logic of what I need to do, but I'm not having success at actually implementing it. I believe that I have a good skeleton here, but the part I'm missing is some more complicated logic to tell when I should actually run printf and when I should not (since this currently just prints every zero and one). Also, I know that the rest of the logic outside of this is working properly because if you do a normal traversal where you do not have to plot the shortest paths, each of the elements I am searching for is found.
I've tried looking at quite a few resources online and I cannot find a solution, or at least, I cannot recognize the solution properly. I've probably rewritten this 50 or more times. Let me know what you think!
void traverse(struct tree *curr, struct tree *cmp)
{
if (curr == NULL)
{
return;
}
if (getLeft(curr) == NULL && getRight(curr) == NULL)
{
if (curr == cmp)
{
return;
}
}
if (getLeft(curr) != NULL)
{
printf("0");
traverse(getLeft(curr), cmp);
}
if (getRight(curr) != NULL)
{
printf("1");
traverse(getRight(curr), cmp);
}
}
For context: cmp is the node we want to find, getLeft() and getRight() return the left and right children of a node respectively, and curr starts as the root of the Huffman tree itself. Also, the reason this printf thing works is because I loop through all of the known leaves, print other information about the leaf, and then call this traversal method, followed by a newline.
There are several solutions.
First, you could traverse the entire tree as you are doing and build a table of codes. Then use the table, not the tree. Then you're not wasting your time searching the whole tree for every code. As you traverse the tree you build up a string of 0's and 1's, and when you get to a leaf, you save the built up string and the symbol in the leaf in the table. Then throw away the tree. This is the recommended approach.
Second, your links could be bidirectional. Since you have a pointer to the leaf, you could simply start at the leaf and work your way back to the root, constructing the string of 0's and 1's in reverse.
Third, you could persist in doing your painful tree search for every leaf by having your traverse function return true or false. It would return true if either it got to the desired leaf, or if one of the traverse calls returned true. Then depending on which traverse call returned true, you would print or save a zero or a one. This would print the path in reverse. If you save them in a string in reverse order instead of printing, then you can print the string when the first traverse call returns.
A viable solution is to give each node a parent pointer. This way, once you find the leaf, you can traverse up the tree recursively starting at that leaf, and print the appropriate bits as you return from the recursive calls.
In this function, first check if the node has a parent or not (in other words, if we're at the root or not), and if so, call the function recursively with the node's parent; if not, return.
In the case that we called the function recursively, after the recursive call, check to see if the current node is the right child of its parent. If so, print a 1; if not, print a 0.
No need to worry about reversing a string in this implementation.
Another possible solution would be to build up the string and pass it along to the recursive calls. For this solution, you'd need to know the height of the tree, or at least the number of symbols your tree can encode so that you pass in a char array of at least that size, plus one for null termination.
In pseudocode, this would look like:
func traverse (cur, cmp, str)
if cur == null, return
if cur == cmp
print str
if cur.left != null
traverse(cur.left, cmp, str + "0")
if cur.right != null
traverse(cur.right, cmp, str + "1")
This way, you're building up the string, and only print it once you find the leaf in question. Note that I moved the cur == cmp check outside of that if statement, because it should never be true for an internal node in a Huffman code tree. This method is wildly inefficient for finding the code for one character, though, since it performs a DFS on the entire tree.

Need some explanation about trees in C

Leaf *findLeaf(Leaf *R,int data)
{
if(R->data >= data )
{
if(R->left == NULL) return R;
else return findLeaf(R->left,data);
}
else
{
if(R->right == NULL) return R;
else return findLeaf(R->right,data);
}
}
void traverse(Leaf *R)
{
if(R==root){printf("ROOT is %d\n",R->data);}
if(R->left != NULL)
{
printf("Left data %d\n",R->left->data);
traverse(R->left);
}
if(R->right != NULL)
{
printf("Right data %d\n",R->right->data);
traverse(R->right);
}
}
These code snippets works fine but i wonder how they works?
I need a brief explanation about recursion.I am thankful for your helps.
A Leaf struct will look something like this:
typedef struct struct_t {
int data;
Leaf * left; //These allow structs to be chained together where each node
Leaf * right; //has two pointers to two more nodes, causing exponential
} Leaf; //growth.
The function takes a pointer to a Leaf we call R and some data to search against, it returns a pointer to a Leaf
Leaf *findLeaf(Leaf *R,int data){
This piece of code decides whether we should go left or right, the tree is known to be ordered because the insert function follows this same rule for going left and right.
if(R->data >= data ){
This is an edge case of the recursive nature of the function, if we have reached the last node in a tree, called the Leaf, return that Leaf.
An edge case of a recursive function has the task of ending the recursion and returning a result. Without this, the function would not finish.
if(R->left == NULL) return R;
This is how we walk through the tree, Here, we are traversing down the left side because the data was larger. (Larger data is always inserted on at the left to stay ordered.)
What is happening is that now we call findLeaf() with R->left, but imagine if we get to this point again in this next call.
It will become R->left->left in reference to the first call. If the data is smaller than the current node we are operating on we would go right instead.
else return findLeaf(R->left,data);
Now we are at the case where the data was smaller than the current Node, so we are going right.
} else {
This is exactly the same as with the left.
if(R->right == NULL) return R;
else return findLeaf(R->right,data);
}
}
In the end, the return of the function can be conceptualized as something like R->right->right->left->NULL.
Lets take this tree and operate on it with findLeaf();
findLeaf(Leaf * root, 4) //In this example, root is already pointing to (8)
We start at the root, at the top of the tree, which contains 8.
First we check R->data >= data where we know R->data is (8) and data is (4). Since we know data is smaller than R->data(Current node), we enter the if statement.
Here we operate on the left Leaf, checking if it is NULL. It isn't and so we skip to the else.
Now we return findLeaf(R->left, data);, but to return it, we must solve it first. This causes us to enter a second iteration where we compare (3) to (4) and try again.
Going through the entire process again, we will compare (6) to (4) and then finally find our node when we comepare (4) to (4). Now we will backtrack through the function and return a chain like this:
R(8)->(3)->(6)->(4)
Edit: Also, coincidentally, I wrote a blog post about traversing a linked list to explain the nature of a Binary Search Tree here.
Each Leaf contains three values:
data - an integer
left and right, both pointers to another leaf.
left, right or both, might be NULL, meaning there isn't another leaf in that direction.
So that's a tree. There's one Leaf at the root, and you can follow the trail of left or right pointers until you reach a NULL.
The key to recursion is that if you follow the path by one Leaf, the remaining problem is exactly the same (but "one smaller") as the problem you had when you were at the root. So you can call the same function to solve the problem. Eventually the routine will be at a Leaf with NULL as its pointer, and you've solved the problem.
It's probably easiest to understand a list before you understand a tree. So instead of a Leaf with two pointers, left and right, you have a Node with just one pointer, next. To follow the list to its end, recursively:
Node findEnd(Node node) {
if(node->next == NULL) {
return node; // Solved!!
} else {
return findEnd(node->next);
}
}
What's different about your findLeaf? Well, it uses the data parameter to decide whether to follow the left or right pointer, but otherwise it's exactly the same.
You should be able to make sense of traverse() with this knowledge. It uses the same principle of recursion to visit every Leaf in the structure.
Recursion is a function that breaks a problem down into 2 variants:
one step of solving the problem, and calling itself with the remainder of the problem
the last step of solving the problem
Recursion is simply a different way of looping through code.
Recursive algorithms generally work hand in hand with some form of data structure - in your case the tree. You need to imagine the recursion - very high level - as "reapply the same logic on a subset of the problem".
In your case the subset of the problem is either the branch of the three on the right or the branch of the three on the left.
So, let's look at the traverse algorithm:
It takes the leaf you pass to the method and - if it's the ROOT leaf states it
Then, if there is a "left" sub-leaf it displays the data attached to it and restarts the algorithm (the recursion) which means... on the left node
If the left node is the ROOT, state it (no chance after the first recursion since the ROOT is at the top)
Then , if there is a "left" sub-leaf to our left node, display it and restart the algorithm on this left, left
When reaching the bottom left, i.e. when there is no left leaf left (following? :) ) then it does the same on the first right leaf. If there is neither a left leaf nor a right leaf, which means we are at the real leaf that does not have sub-leafs, the recursive call ends, which means that the algorithm starts again from the place it was before recursing and with all the variables at the state they were in then.
After first recursion termination, you will move from the bottom left leaf up one leaf, and go down on the right leaf if there is one and start again printing and moving on the left.
All in all - the ending result is that you walk through your whole tree in a left first way.
Tell me if it's not crystal clear and try to apply the same pattern on the findLeaf recursive algorithm.
A little comment about recursion and then a little comment about searching on a tree:
let's suppose you want to calculate n!. You can do (pseudocode)
fac 0 = 1
fac (n+1) = (n+1) * fac n
So recursion is solving a problem by manipulating the result of solving the same problem with a smaller data. See http://en.wikipedia.org/wiki/Recursion.
So now, let's suppose we have a data structure tree
T = (L, e, R)
with L the left subtree, e is the root and R is the right subtree... So let's say you want to find the value v in that tree, you would do
find v LEAF = false // you cant find any value in an empty tree, base case
find v (L, e, R) =
if v == e
then something(e)
else
if v < e
find v L (here we have recursion, we say 'go and search for v in the left subtree)
else
find v R (here we have recursion, we say 'go and search for v in the right subtree)
end
end

C get mode from list of integers

I need to write a program to find the mode. Or the most occurrence of an integer or integers.
So,
1,2,3,4,1,10,4,23,12,4,1 would have mode of 1 and 4.
I'm not really sure what kind of algorithm i should use. I'm having a hard time trying to think of something that would work.
I was thinking of a frequency table of some sort maybe where i could go through array and then go through and create a linked list maybe. If the linked doesn't contain that value add it to the linked, if it does then add 1 to the value.
So if i had the same thing from above. loop through
1,2,3,4,1,10,4,23,12,4,1
Then list is empty so add node with number = 1 and value = 1.
2 doesnt exist so add node with number = 2 and value = 1 and so on.
Get to the 1 and 1 already exists so value = 2 now.
I would have to loop through the array and then loop through linked list everytime to find that value.
Once i am done then go through the linked list and create a new linked list that will hold the modes. So i set the head to the first element which is 1. Then i go through the linked list that contains the occurences and compare the values. If the occurences of the current node is > the current highest then i set the head to this node. If its = to the highest then i add the node to the mode linked list.
Once i am done i loop through the mode list and print the values.
Not sure if this would work. Does anyone see anything wrong with this? Is there an easier way to do this? I was thinking a hash table too, but not really sure how to do that in C.
Thanks.
If you can keep the entire list of integers in memory, you could sort the list first, which will make repeated values adjacent to each other. Then you can do a single pass over the sorted list to look for the mode. That way, you only need to keep track of the best candidate(s) for the mode seen up until now, along with how many times the current value has been seen so far.
The algorithm you have is fine for a homework assignment. There are all sorts of things you could do to optimise the code, such as:
use a binary tree for efficiency,
use an array of counts where the index is the number (assuming the number range is limited).
But I think you'll find they're not necessary in this case. For homework, the intent is just to show that you understand how to program, not that you know all sorts of tricks for wringing out the last ounce of performance. Your educator will be looking far more for readable, structured, code than tricky optimisations.
I'll describe below what I'd do. You're obviously free to use my advice as much or as little as you wish, depending on how much satisfaction you want to gain at doing it yourself. I'll provide pseudo-code only, which is my standard practice for homework questions.
I would start with a structure holding a number, a count and next pointer (for your linked list) and the global pointer to the first one:
typedef struct sElement {
int number;
int count;
struct sElement *next;
} tElement;
tElement first = NULL;
Then create some functions for creating and using the list:
tElement *incrementElement (int number);
tElement *getMaxCountElement (void);
tElement *getNextMatching (tElement *ptr, int count);
Those functions will, respectively:
Increment the count for an element (or create it and set count to 1).
Scan all the elements returning the maximum count.
Get the next element pointer matching the count, starting at a given point, or NULL if no more.
The pseudo-code for each:
def incrementElement (number):
# Find matching number in list or NULL.
set ptr to first
while ptr is not NULL:
if ptr->number is equal to number:
return ptr
set ptr to ptr->next
# If not found, add one at start with zero count.
if ptr is NULL:
set ptr to newly allocated element
set ptr->number to number
set ptr->count to 0
set ptr->next to first
set first to ptr
# Increment count.
set ptr->count to ptr->count + 1
def getMaxCountElement (number):
# List empty, no mode.
if first is NULL:
return NULL
# Assume first element is mode to start with.
set retptr to first
# Process all other elements.
set ptr to first->next
while ptr is not NULL:
# Save new mode if you find one.
if ptr->count is greater than retptr->count:
set retptr to ptr
set ptr to ptr->next
# Return actual mode element pointer.
return retptr
def getNextMatching (ptr, number):
# Process all elements.
while ptr is not NULL:
# If match on count, return it.
if ptr->number is equal to number:
return ptr
set ptr to ptr->next
# Went through whole list with no match, return NULL.
return NULL
Then your main program becomes:
# Process all the numbers, adding to (or incrementing in) list .
for each n in numbers to process:
incrementElement (n)
# Get the mode quantity, only look for modes if list was non-empty.
maxElem = getMaxCountElement ()
if maxElem is not NULL:
# Find the first one, whil exists, print and find the next one.
ptr = getNextMatching (first, maxElem->count)
while ptr is not NULL:
print ptr->number
ptr = getNextMatching (ptr->next, maxElem->count)
If the range of numbers is known in advance, and is a reasonable number, you can allocate a sufficiently large array for the counters and just do count[i] += 1.
If the range of numbers is not known in advance, or is too large for the naive use of an array, you could instead maintain a binary tree of values to maintain your counters. This will give you far less searching than a linked list would. Either way you'd have to traverse the array or tree and build an ordering of highest to lowest counts. Again I'd recommend a tree for that, but your list solution could work as well.
Another interesting option could be the use of a priority queue for your extraction phase. Once you have your list of counters completed, walk your tree and insert each value at a priority equal to its count. Then you just pull values from the priority queue until the count goes down.
I would go for a simple hash table based solution.
A structure for hash table containing a number and corresponding frequency. Plus a pointer to the next element for chaining in the hash bucket.
struct ItemFreq {
struct ItemFreq * next_;
int number_;
int frequency_;
};
The processing starts with
max_freq_so_far = 0;
It goes through the list of numbers. For each number, the hash table is looked up for a ItemFreq element x such that x.number_ == number.
If no such x is found, then a ItemFreq element is created as { number_ = number, frequency_ = 1} and inserted into the hash table.
If some x was found then its frequency_ is incremented.
If frequency_ > max_freq_so_far then max_freq_so_far = frequency
Once traversing through the list of numbers of complete, we traverse through the hash table and print the ItemFreq items whose frequency_ == max_freq_so_far
The complexity of the algorithm is O(N) where N is the number of items in the input list.
For a simple and elegant construction of hash table, see section 6.6 of K&R (The C Programming Language).
This response is a sample for the idea of Paul Kuliniewicz:
int CompInt(const void* ptr1, const void* ptr2) {
const int a = *(int*)ptr1;
const int b = *(int*)ptr2;
if (a < b) return -1;
if (a > b) return +1;
return 0;
}
// This function leave the modes in output and return the number
// of modes in output. The output pointer should be available to
// hold at least n integers.
int GetModes(const int* v, int n, int* output) {
// Sort the data and initialize the best result.
qsort(v, v + n, CompInt);
int outputSize = 0;
// Loop through elements while there are not exhausted.
// (look there is no ++i after each iteration).
for (int i = 0; i < n;) {
// This is the begin of the new group.
const int begin = i;
// Move the pointer until there are no more equal elements.
for (; i < n && v[i] == v[begin]; ++i);
// This is one-past the last element in the current group.
const int end = i;
// Update the best mode found until now.
if (end - begin > best) {
best = end - begin;
outputSize = 0;
}
if (end - begin == best)
output[outputSize++] = v[begin];
}
return outputSize;
}

compare nodes of a binary tree

If I have two binary trees, how would I check if the elements in all the nodes are equal.
Any ideas on how to solve this problem?
You would do a parallel tree traversal - choose your order (pre-order, post-order, in-order). If at any time the values stored in the current nodes differ, so do the two trees. If one left node is null and the other isn't, the trees are different; ditto for right nodes.
Does node order matters? I'm assuming for this answer that the two following trees :
1 1
/ \ / \
3 2 2 3
are not equal, because node position and order is taken into account for the comparison.
A few hints
Do you agree that two empty trees are equal?
Do you agree that two trees that only have a root node, with identical node values, are equal?
Can't you generalize this approach?
Being a bit more precise
Consider this generic tree:
rootnode(value=V)
/ \
/ \
-------- -------
| left | | right |
| subtree| |subtree|
-------- -------
rootnode is a single node. The two children are more generic, and represent binary trees. The children can either be empty, or a single node, or a fully-grown binary tree.
Do you agree that this representation is generic enough to represent any kind of non-empty binary tree? Are you able to decompose, say, this simple tree into my representation?
If you understand this concept, then this decomposition can help you to solve the problem. If you do understand the concept, but can't go any further with the algorithm, please comment here and I'll be a bit more specific :)
you could use something like Tree Traversal to check each value.
If the trees are binary search trees, so that a pre-order walk will produce a reliable, repeatable ordering of items, the existing answers will work. If they're arbitrary binary trees, you have a much more interesting problem, and should look into hash tables.
My solution would be to flatten the two trees into 2 arrays (using level order), and then iterate through each item and compare. You know both arrays are the same order. You can do simple pre-checks such as if the array sizes differ then the two trees aren't the same.
Level Order is fairly easy to implement, the Wikipedia article on tree traversal basically gives you everything you need, including code. If efficiency is being asked for in the question, then a non-recursive solution is best, and done using a FIFO list (a Queue in C# parlance - I'm not a C programmer).
Let the two tree pass through same tree traversal logic and match the outputs. If even a single node data does not match the trees dont match.
Or you could just create a simple tree traversal logic and compare the node values at each recursion.
You can use pointers and recursion to check if node is equal, then check subtrees. The code can be writen as following in Java language.
public boolean sameTree(Node root1, Node root2){
//base case :both are empty
if(root1==null && root2==null )
return true;
if(root1.equals(root2)) {
//subtrees
boolean left=sameTree(root1.left,root2.left);
boolean right=sameTree(root1.right,root2.right);
return (left && right);
}//end if
else{
return false;
}//end else
}//end sameTree()
Writing a C code as a tag mentions in the question.
int is_same(node* T1,node* T2)
{
if(!T1 && !T2)
return 1;
if(!T1 || !T2)
return 0;
if(T1->data == T2->data)
{
int left = is_same(T1->left,T2->left);
int right = is_same(T1->right,T2->right);
return (left && right);
}
else
return 0;
}
Takes care of structure as well as values.
One line code is enough to check if two binary tree node are equal (same value and same structure) or not.
bool isEqual(BinaryTreeNode *a, BinaryTreeNode *b)
{
return (a && b) ? (a->m_nValue==b->m_nValue && isEqual(a->m_pLeft,b->m_pLeft) && isEqual(a->m_pRight,b->m_pRight)) : (a == b);
}
If your values are numerical int, in a known range, you can use an array, (let's say max value n). Traverse through the 1st tree using whatever method you want, adding the data into a said array, in an appropriate index (using the node data as index). Then, traverse through the second tree and check for every node in it, if array[node.data] is not null. If not - trees are identical.
**assuming for each tree all nodes are unique

Resources