Troubles with Trie - c

So, I was trying to read a Trie, relatively a new data structure for me. And where ever I read, every node in the trie, would consist of an integer variable which would mark the end of an word, and would also consist of 26 pointers, each pointing to nodes in the lower level(assuming the words only contain small letter characters).
Now the problem I am facing is, where ever I see/read the implementation, they mark the node with a character. Like in this case:
http://community.topcoder.com/i/education/alg_tries.png
But the way I am understanding Trie, I believe that every edge should be marked as a character. Although, I know we don't have a data structure for the edges, just for the nodes. But wouldn't marking the edges be more correct?
Also, this is my algorithm for implementing insert. Please tell me if you find something wrong with it.
struct trie
{
int val;
trie* aplha[26];
}
trie* insert (trie *root, char *inp)
{
if (*input == '\0')
return root;
if (root == NULL)
{
root = (trie *) malloc(sizeof(trie));
int i = 0;
for (i=0;i<26;i++)
root->alpha[i] = NULL;
}
temp = *input - 'a';
root->alpha[temp] = insert (root->alpha[temp],input+1);
if (*(input+1)=='\0')
root->val = 1;
return root;
}
I am stumped as to how I could implement the delete. If you can, please help me with a delete algorithm.

Here is a small program that shows a way you can do it. There is no serious effort put into error handling though:
http://pastebin.com/84TiPrtL
I've slightly edited your trie_insert function and show a trie_delete function here. The struct Vec inside the pastebin code can be changed to a std::vector if you are using C++.
struct trie *trie_insert(struct trie *root, char *input)
{
int idx;
if (!input) {
return root;
}
if (root == NULL) {
root = (struct trie *)calloc(1, sizeof(struct trie));
}
if (*input == '\0') {
// leaves have root->val set to 1
root->val = 1;
} else {
// carry on insertion
idx = *input - 'a';
root->alpha[idx] = trie_insert(root->alpha[idx], input+1);
}
return root;
}
struct trie *trie_delete(struct trie *root, char *s)
{
int i, idx, reap = 0;
if (!root || !s) {
return root;
}
if (!*s && root->val) {
// delete this string, and mark node as deletable
root->val = 0;
reap = 1;
} else {
// more characters to insert, carry on
idx = *s - 'a';
if (root->alpha[idx]) {
root->alpha[idx] = trie_delete(root->alpha[idx], s+1);
if (!root->alpha[idx]) {
// child node deleted, set reap = 1
reap = 1;
}
}
}
// We can delete this if both:
// 1. reap is set to 1, which is only possible if either:
// a. we are now at the end of the string and root->val used
// to be 1, but is now set to 0
// b. the child node has been deleted
// 2. The string ending at the current node is not inside the trie,
// so root->val = 0
if (reap && !root->val) {
for (i = 0; i < NRALPHA; i++) {
if (root->alpha[i]) {
reap = 0;
break;
}
}
// no more children, delete this node
if (reap) {
trie_free(root);
root = NULL;
}
}
return root;
}

Related

How can I implement a check function to check the validity of the properties of a b-tree?

I have recently implemented a normal B-tree (without any variant) in C, but I would like to check if my implementation is valid i.e. if it does not violate the following properties:
Every node has at most m children.
Every non-leaf node (except root) has at least ⌈m/2⌉ child nodes.
The root has at least two children if it is not a leaf node.
A non-leaf node with k children contains k − 1 keys.
All leaves appear in the same level and carry no information.
Could help me with the implementation of this procedure giving me an example with some code in C or with some suggestions?
#include <stdio.h>
#include <stdlib.h>
#define TRUE 1
#define FALSE 0
#define EMPTY 0
#define NODE_ORDER 3 /*The degree of the tree.*/
#define NODE_POINTERS (NODE_ORDER*2)
#define NODE_KEYS NODE_POINTERS-1
typedef unsigned char bool;
typedef struct tree_node {
int key_array[NODE_KEYS];
struct tree_node *child_array[NODE_POINTERS];
unsigned int key_index;
bool leaf;
} node_t;
typedef struct {
node_t *node_pointer;
int key;
bool found;
unsigned int depth;
} result_t;
typedef struct {
node_t *root;
unsigned short order;
bool lock;
} btree_t;
static int BTreeGetLeftMax(node_t *T);
static int BTreeGetRightMin(node_t *T);
/* The AllocateNode operation allocate a b-tree node.And then set the node's
** properties to the defualt value :
** BTreeNode => K[i] = 0
** BTreeNode => child_array[i] = NULL
** BTreeNode => key_index = 0
** BTreeNode => isLeaf = 1;
*/
static node_t *create_node()
{
int i;
node_t *new_node = (node_t *)malloc(sizeof(node_t));
if(!new_node){
printf("Out of memory");
exit(0);
}
// Set Keys
for(i = 0;i < NODE_KEYS; i++){
new_node->key_array[i] = 0;
}
// Set ptr
for(i = 0;i < NODE_POINTERS; i++){
new_node->child_array[i] = NULL;
}
new_node->key_index = EMPTY;
new_node->leaf = TRUE;
return new_node;
}
/* The CreatBTree operation creates an empty b-tree by allocating a new root
** that has no keys and is a leaf node.Only the root node is permitted to
** have this properties.
*/
btree_t *create_btree()
{
btree_t *new_root = (btree_t *)malloc(sizeof(btree_t));
if(!new_root){
return NULL;
}
node_t *head = create_node();
if(!head){
return NULL;
}
new_root->order = NODE_ORDER;
new_root->root = head;
new_root->lock = FALSE;
return new_root;
}
static result_t *get_resultset()
{
result_t *ret = (result_t *)malloc(sizeof(result_t));
if(!ret){
printf("ERROR! Out of memory.");
exit(0);
}
ret->node_pointer = NULL;
ret->key = 0;
ret->found = FALSE;
ret->depth = 0;
return ret;
}
/* The BTreeSearch operation is to search X in T.Recursively traverse the tree
** from top to bottom.At each level, BTreeSearch choose the maximum key whose
** value is greater than or equal to the desired value X.If equal to the
** desired ,found.Otherwise continue to traverse.
*/
result_t *search(int key, node_t *node)
{
print_node(node);
int i = 0;
while((i < node->key_index) && (key > node->key_array[i])){
//printf("it %d is <= %d and key %d > than %d\n", i, node->key_index, key, node->key_array[i]);
i++;
}
//printf("end iterator: %d\n", i);
//printf("better: \n");
/*
int c = 0;
while((c < node->key_index) && (key > node->key_array[c])){
printf("it %d is <= %d and key %d > than %d\n", c, node->key_index, key, node->key_array[c]);
c++;
}
*/
// HACK /// may not be working
if(i == 6){
i--;
}
// Check if we found it
if((i <= node->key_index) && (key == node->key_array[i])){
result_t *result = get_resultset();
result->node_pointer = node;
result->key = i;
result->found = TRUE;
return result;
}
// Not found check leaf or child
if(node->leaf){
result_t *result = get_resultset();
result->node_pointer = node;
result->found = FALSE;
return result;
}else{
result_t *result = get_resultset();
return search(key, node->child_array[i]);
}
}
/* The split_child operation moves the median key of node child_array into
** its parent ptrParent where child_array is the ith child of ptrParent.
*/
static void split_child(node_t *parent_node, int i, node_t *child_array)
{
int j;
//Allocate a new node to store child_array's node.
node_t *new_node = create_node();
new_node->leaf = child_array->leaf;
new_node->key_index = NODE_ORDER-1;
//Move child_array's right half nodes to the new node.
for(j = 0;j < NODE_ORDER-1;j++){
new_node->key_array[j] = child_array->key_array[NODE_ORDER+j];
}
//If child_array is not leaf node,then move child_array's [child_array]s to the new
//node's [child_array]s.
if(child_array->leaf == 0){
for(j = 0;j < NODE_ORDER;j++){
new_node->child_array[j] = child_array->child_array[NODE_ORDER+j];
}
}
child_array->key_index = NODE_ORDER-1;
//Right shift ptrParent's [child_array] from index i
for(j = parent_node->key_index;j>=i;j--){
parent_node->child_array[j+1] = parent_node->child_array[j];
}
//Set ptrParent's ith child_array to the newNode.
parent_node->child_array[i] = new_node;
//Right shift ptrParent's Keys from index i-1
for(j = parent_node->key_index;j>=i;j--){
parent_node->key_array[j] = parent_node->key_array[j-1];
}
//Set ptrParent's [i-1]th Key to child_array's median [child_array]
parent_node->key_array[i-1] = child_array->key_array[NODE_ORDER-1];
//Increase ptrParent's Key number.
parent_node->key_index++;
}
/* The BTreeInsertNonFull operation insert X into a non-full node T.before
** execute this operation,guarantee T is not a full node.
*/
static void insert_nonfull(node_t *n, int key){
int i = n->key_index;
if(n->leaf){
// Shift until we fit
while(i>=1 && key<n->key_array[i-1]){
n->key_array[i] = n->key_array[i-1];
i--;
}
n->key_array[i] = key;
n->key_index++;
}else{
// Find the position i to insert.
while(i>=1 && key<n->key_array[i-1]){
i--;
}
//If T's ith child_array is full,split first.
if(n->child_array[i]->key_index == NODE_KEYS){
split_child(n, i+1, n->child_array[i]);
if(key > n->key_array[i]){
i++;
}
}
//Recursive insert.
insert_nonfull(n->child_array[i], key);
}
}
/* The BTreeInsert operation insert key into T.Before insert ,this operation
** check whether T's root node is full(root->key_index == 2*d -1) or not.If full,
** execute split_child to guarantee the parent never become full.And then
** execute BTreeInsertNonFull to insert X into a non-full node.
*/
node_t *insert(int key, btree_t *b)
{
if(!b->lock){
node_t *root = b->root;
if(root->key_index == NODE_KEYS){ //If node root is full,split it.
node_t *newNode = create_node();
b->root = newNode; //Set the new node to T's Root.
newNode->leaf = 0;
newNode->key_index = 0;
newNode->child_array[0] = root;
split_child(newNode, 1, root);//Root is 1th child of newNode.
insert_nonfull(newNode, key); //Insert X into non-full node.
}else{ //If not full,just insert X in T.
insert_nonfull(b->root, key);
}
}else{
printf("Tree is locked\n");
}
return b->root;
}
/* The merge_children operation merge the root->K[index] and its two child
** and then set chlid1 to the new root.
*/
static void merge_children(node_t *root, int index, node_t *child1, node_t *child2){
child1->key_index = NODE_KEYS;
int i;
//Move child2's key to child1's right half.
for(i=NODE_ORDER;i<NODE_KEYS;i++)
child1->key_array[i] = child2->key_array[i-NODE_ORDER];
child1->key_array[NODE_ORDER-1] = root->key_array[index]; //Shift root->K[index] down.
//If child2 is not a leaf node,must copy child2's [ptrchlid] to the new
//root(child1)'s [child_array].
if(0 == child2->leaf){
for(i=NODE_ORDER;i<NODE_POINTERS;i++)
child1->child_array[i] = child2->child_array[i-NODE_ORDER];
}
//Now update the root.
for(i=index+1;i<root->key_index;i++){
root->key_array[i-1] = root->key_array[i];
root->child_array[i] = root->child_array[i+1];
}
root->key_index--;
free(child2);
}
/* The BTreeBorrowFromLeft operation borrows a key from leftPtr.curPtr borrow
** a node from leftPtr.root->K[index] shift down to curPtr,shift leftPtr's
** right-max key up to root->K[index].
*/
static void BTreeBorrowFromLeft(node_t *root, int index, node_t *leftPtr, node_t *curPtr){
curPtr->key_index++;
int i;
for(i=curPtr->key_index-1;i>0;i--)
curPtr->key_array[i] = curPtr->key_array[i-1];
curPtr->key_array[0] = root->key_array[index];
root->key_array[index] = leftPtr->key_array[leftPtr->key_index-1];
if(0 == leftPtr->leaf)
for(i=curPtr->key_index;i>0;i--)
curPtr->child_array[i] = curPtr->child_array[i-1];
curPtr->child_array[0] = leftPtr->child_array[leftPtr->key_index];
leftPtr->key_index--;
}
/* The BTreeBorrowFromLeft operation borrows a key from rightPtr.curPtr borrow
** a node from rightPtr.root->K[index] shift down to curPtr,shift RightPtr's
** left-min key up to root->K[index].
*/
static void BTreeBorrowFromRight(node_t *root, int index, node_t *rightPtr, node_t *curPtr){
curPtr->key_index++;
curPtr->key_array[curPtr->key_index-1] = root->key_array[index];
root->key_array[index] = rightPtr->key_array[0];
int i;
for(i=0;i<rightPtr->key_index-1;i++)
rightPtr->key_array[i] = rightPtr->key_array[i+1];
if(0 == rightPtr->leaf){
curPtr->child_array[curPtr->key_index] = rightPtr->child_array[0];
for(i=0;i<rightPtr->key_index;i++)
rightPtr->child_array[i] = rightPtr->child_array[i+1];
}
rightPtr->key_index--;
}
/* The BTreeDeleteNoNone operation recursively delete X in root,handle both leaf
** and internal node:
** 1. If X in a leaf node,just delete it.
** 2. If X in a internal node P:
** a): If P's left neighbor -> prePtr has at least d keys,replace X with
** prePtr's right-max key and then recursively delete it.
** b): If P's right neighbor -> nexPtr has at least d keys,replace X with
** nexPtr's left-min key and then recursively delete it.
** c): If both of prePtr and nexPtr have d-1 keys,merge X and nexPtr into
** prePtr.Now prePtr have 2*d-1 keys,and then recursively delete X in
** prePtr.
** 3. If X not in a internal node P,X must in P->child_array[i] zone.If child_array[i]
** only has d-1 keys:
** a): If child_array[i]'s neighbor have at least d keys,borrow a key from
** child_array[i]'s neighbor.
** b): If both of child_array[i]'s left and right neighbor have d-1 keys,merge
** child_array[i] with one of its neighbor.
** finally,recursively delete X.
*/
static void BTreeDeleteNoNone(int X, node_t *root){
int i;
//Is root is a leaf node ,just delete it.
if(1 == root->leaf){
i=0;
while( (i<root->key_index) && (X>root->key_array[i])) //Find the index of X.
i++;
//If exists or not.
if(X == root->key_array[i]){
for(;i<root->key_index-1;i++)
root->key_array[i] = root->key_array[i+1];
root->key_index--;
}
else{
printf("Node not found.\n");
return ;
}
}
else{ //X is in a internal node.
i = 0;
node_t *prePtr = NULL, *nexPtr = NULL;
//Find the index;
while( (i<root->key_index) && (X>root->key_array[i]) )
i++;
if( (i<root->key_index) && (X == root->key_array[i]) ){ //Find it in this level.
prePtr = root->child_array[i];
nexPtr = root->child_array[i+1];
/*If prePtr at least have d keys,replace X by X's precursor in
*prePtr*/
if(prePtr->key_index > NODE_ORDER-1){
int aPrecursor = BTreeGetLeftMax(prePtr);
root->key_array[i] = aPrecursor;
//Recursively delete aPrecursor in prePtr.
BTreeDeleteNoNone(aPrecursor,prePtr);
}
else
if(nexPtr->key_index > NODE_ORDER-1){
/*If nexPtr at least have d keys,replace X by X's successor in
* nexPtr*/
int aSuccessor = BTreeGetRightMin(nexPtr);
root->key_array[i] = aSuccessor;
BTreeDeleteNoNone(aSuccessor,nexPtr);
}
else{
/*If both of root's two child have d-1 keys,then merge root->K[i]
* and prePtr nexPtr. Recursively delete X in the prePtr.*/
merge_children(root,i,prePtr,nexPtr);
BTreeDeleteNoNone(X,prePtr);
}
}
else{ //Not find in this level,delete it in the next level.
prePtr = root->child_array[i];
node_t *leftBro = NULL;
if(i<root->key_index)
nexPtr = root->child_array[i+1];
if(i>0)
leftBro = root->child_array[i-1];
/*root->child_array[i] need to borrow from or merge with his neighbor
* and then recursively delete. */
if(NODE_ORDER-1 == prePtr->key_index){
//If left-neighbor have at least d-1 keys,borrow.
if( (leftBro != NULL) && (leftBro->key_index > NODE_ORDER-1))
BTreeBorrowFromLeft(root,i-1,leftBro,prePtr);
else //Borrow from right-neighbor
if( (nexPtr != NULL) && (nexPtr->key_index > NODE_ORDER-1))
BTreeBorrowFromRight(root,i,nexPtr,prePtr);
//OR,merge with its neighbor.
else if(leftBro != NULL){
//Merge with left-neighbor
merge_children(root,i-1,leftBro,prePtr);
prePtr = leftBro;
}
else //Merge with right-neighbor
merge_children(root,i,prePtr,nexPtr);
}
/*Now prePtr at least have d keys,just recursively delete X in
* prePtr*/
BTreeDeleteNoNone(X,prePtr);
}
}
}
/*Get T's left-max key*/
static int BTreeGetLeftMax(node_t *T){
if(0 == T->leaf){
return BTreeGetLeftMax(T->child_array[T->key_index]);
}else{
return T->key_array[T->key_index-1];
}
}
/*Get T's right-min key*/
static int BTreeGetRightMin(node_t *T){
if(0 == T->leaf){
return BTreeGetRightMin(T->child_array[0]);
}else{
return T->key_array[0];
}
}
/* The BTreeDelete operation delete X from T up-to-down and no-backtrack.
** Before delete,check if it's necessary to merge the root and its children
** to reduce the tree's height.Execute BTreeDeleteNoNone to recursively delete
*/
node_t *delete(int key, btree_t *b)
{
if(!b->lock){
//if the root of T only have 1 key and both of T's two child have d-1
//key,then merge the children and the root. Guarantee not need to backtrack.
if(b->root->key_index == 1){
node_t *child1 = b->root->child_array[0];
node_t *child2 = b->root->child_array[1];
if((!child1) && (!child2)){
if((NODE_ORDER-1 == child1->key_index) && (NODE_ORDER-1 == child2->key_index)){
//Merge the children and set child1 to the new root.
merge_children(b->root, 0, child1, child2);
free(b->root);
BTreeDeleteNoNone(key, child1);
return child1;
}
}
}
BTreeDeleteNoNone(key, b->root);
}else{
printf("Tree is locked\n");
}
return b->root;
}
void tree_unlock(btree_t *r)
{
r->lock = FALSE;
}
bool tree_lock(btree_t *r)
{
if(r->lock){
return FALSE;
}
r->lock = TRUE;
return TRUE;
}
You have not shown any code, which makes it difficult to come up with code examples that could fit to your implementation. However, in principle you could take the following approaches:
Write unit-tests for your code. With the B-Tree that would mean to start with small trees (even with an empty tree), and use checks in your tests to verify the properties. You would then add more and more tests, specifically checking for bugs also in the "tricky" scenarios. There is a lot of general information about unit-testing available, you should be able to adapt it to your specific problem.
Add assertions to your code (read about the assert macro in C). Many of the properties you have mentioned could be checked directly within the code at appropriate places.
Certainly, there is more you could do, like, having the code reviewed by some colleague, or using some formal verification tools, but the abovementioned two approaches are good starting points.
UPDATE (after code was added):
Some more hints about how you could approach unit-testing. In principle, you should write your tests with the help of a so called test framework, which is a helper library to make writing tests easier. To explain the concept, however, I just use plain C or even pseudo-code.
Moreover, you would also put some declarations and/or definitions into a header file, like "btree.h". For the sake of example, however, I will just #include "btree.c" in the code examples below.
Create a file "btree-test.c" (the name is a proposal, you can name it as you like).
A first test would look a bit like:
#include "btree.c"
#include <assert.h>
void test_create_empty_btree() {
btree_t *actual_btree = create_btree();
// now, check that the created btree has all desired properties
// for example:
assert(actual_btree != NULL);
assert(actual_btree->order == NODE_ORDER);
assert(actual_btree->lock == FALSE);
assert(actual_btree->root->key_index == EMPTY);
assert(actual_btree->root->leaf == TRUE);
printf("PASSED: test_create_empty_btree");
}
The code above is just an example, I have not even tried compiling it. Note also that the test is not quite clean yet: there will be memory leaks, because the btree is not properly deleted at the end of the test, which would be better practice. It should, however, give you an idea how to start writing unit-tests.
A second test could then again create a btree, but in addition insert some data. In your tests you would then check that the btree has the expected form. And so on, adding more and more tests. It is good practice to have one function per test case...

Insertion into AVL tree only replaces root node

I'm currently working on an assignment where the N most frequent words in a book (.txt) must be printed. The issue that I'm currently facing is that when I add a node to one of my trees, it simply replaces the root node and thus, the tree remains as a single node.
Code snippet which adds words from the file "stopwords.txt" to a tree named stopwords:
Dict stopwords = newDict();
if (!readFile("stopwords.txt"))
{
fprintf(stderr, "Can't open stopwords\n");
exit(EXIT_FAILURE);
}
FILE *fp = fopen("stopwords.txt", "r");
while (fgets(buf, MAXLINE, fp) != NULL)
{
token = strtok(buf, "\n");
DictInsert(stopwords, buf); //the root is replaced here
}
fclose(fp);
The data structures are defined as follows:
typedef struct _DictNode *Link;
typedef struct _DictNode
{
WFreq data;
Link left;
Link right;
int height;
} DictNode;
typedef struct _DictRep *Dict;
struct _DictRep
{
Link root;
};
typedef struct _WFreq {
char *word; // word buffer (dynamically allocated)
int freq; // count of number of occurences
} WFreq;
Code to insert and rebalance tree:
// create new empty Dictionary
Dict newDict(void)
{
Dict d = malloc(sizeof(*d));
if (d == NULL)
{
fprintf(stderr, "Insufficient memory!\n");
exit(EXIT_FAILURE);
}
d->root = NULL;
return d;
}
// insert new word into Dictionary
// return pointer to the (word,freq) pair for that word
WFreq *DictInsert(Dict d, char *w)
{
d->root = doInsert(d->root, w); //the root is replaced here before doInsert runs
return DictFind(d, w);
}
static int depth(Link n)
{
if (n == NULL)
return 0;
int ldepth = depth(n->left);
int rdepth = depth(n->right);
return 1 + ((ldepth > rdepth) ? ldepth : rdepth);
}
static Link doInsert(Link n, char *w)
{
if (n == NULL)
{
return newNode(w);
}
// insert recursively
int cmp = strcmp(w, n->data.word);
if (cmp < 0)
{
n->left = doInsert(n->left, w);
}
else if (cmp > 0)
{
n->right = doInsert(n->right, w);
}
else
{ // (cmp == 0)
// if time is already in the tree,
// we can return straight away
return n;
}
// insertion done
// correct the height of the current subtree
n->height = 1 + max(height(n->left), height(n->right));
// rebalance the tree
int dL = depth(n->left);
int dR = depth(n->right);
if ((dL - dR) > 1)
{
dL = depth(n->left->left);
dR = depth(n->left->right);
if ((dL - dR) > 0)
{
n = rotateRight(n);
}
else
{
n->left = rotateLeft(n->left);
n = rotateRight(n);
}
}
else if ((dR - dL) > 1)
{
dL = depth(n->right->left);
dR = depth(n->right->right);
if ((dR - dL) > 0)
{
n = rotateLeft(n);
}
else
{
n->right = rotateRight(n->right);
n = rotateLeft(n);
}
}
return n;
}
static Link newNode(char *w)
{
Link n = malloc(sizeof(*n));
if (n == NULL)
{
fprintf(stderr, "Insufficient memory!\n");
exit(EXIT_FAILURE);
}
n->data.word = w;
n->data.freq = 1;
n->height = 1;
n->left = NULL;
n->right = NULL;
return n;
}
// Rotates the given subtree left and returns the root of the updated
// subtree.
static Link rotateLeft(Link n)
{
if (n == NULL)
return n;
if (n->right == NULL)
return n;
Link rightNode = n->right;
n->right = rightNode->left;
rightNode->left = n;
n->height = max(height(n->left), height(n->right)) + 1;
rightNode->height = max(height(rightNode->right), n->height) + 1;
return rightNode;
}
// Rotates the given subtree right and returns the root of the updated
// subtree.
static Link rotateRight(Link n)
{
if (n == NULL)
return n;
if (n->left == NULL)
return n;
Link leftNode = n->left;
n->left = leftNode->right;
leftNode->right = n;
n->height = max(height(n->left), height(n->right)) + 1;
leftNode->height = max(height(leftNode->right), n->height) + 1;
return leftNode;
}
I believe that most of the code is functional and it is simply the insertion which fails. When I attempted to debug this with gdb, I had discovered that the root node (d->root) was replaced before the recursive insert function (doInsert) was run, causing the program to always return the node n which, as a result, already exists in the tree. For example, if the text file contained the following:
a
b
c
then the program would first insert "a" as stopwords->root, then "b" would replace "a" and become the new stopwords->root, finally "c" would replace "b" as the stopwords->root, resulting in a tree with one node, "c".
There are many inconsistencies in your code.
One mistake is here:
d->root = doInsert(d->root, w);
You reassign unconditionally the root each time when you insert a new node.
You are supposed to return the new node from the function doInsert and to reassign the root only if the new node had become a new root.
But other mistake that you make is that you return from doInsert a local variable n that was not newly allocated but that was initialized to point to the previous root.
Inside doInsert you need to allocate a new node NEW and use a variable x to walk down from the root until you find a place to insert a new allocated node NEW. If x stops at root then you reinitialize the d->root = NEW.
Your function newNode just stores the passed string pointer, so what is pointed at will change when you modify the original string.
To prevent that, you should copy the input string on node insertions.
To archive that,
n->data.word = w;
should be
n->data.word = malloc(strlen(w) + 1);
if (n->data.word == NULL)
{
fprintf(stderr, "Insufficient memory!\n");
exit(EXIT_FAILURE);
}
strcpy(n->data.word, w);
Add #include <string.h> to use strlen() and strcpy() if it isn't.

Segmentation Fault in Trie implementation in C

I'm trying to implement a trie data structure to spell-check a given text file. Currently, it seems to work for a couple words in the file, then it reaches a seg fault. I tried debugging to find the culprit, but all I found was that the value of "letter" is retaining seemingly random negative values (it should be between 1 and 27, inclusive). Normally the seg fault issue appears almost instantly after i start the program, so I'm not sure why the issue is popping up in the middle of the program.
/**
* Implements a dictionary's functionality.
*/
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include "dictionary.h"
//create global root node
Trienode *root;
//create word counter for size() function
unsigned int wordcount = 0;
//creates an empty node
Trienode * newnode()
{
Trienode *nnode = NULL;
nnode = (Trienode *)malloc(sizeof(Trienode));
//initialize new node with null pointers and values
nnode -> parent = NULL;
for(int i = 0; i < 27; i++)
{
nnode -> children[i] = NULL;
}
return nnode;
}
void cleartrie(Trienode *head)
{
//if child node exists, free it, else continue with next iteration in for loop
if(head)
{
for(int i = 0; i < 27; i++)
{
cleartrie(head -> children[i]);
}
free(head);
head = NULL;
}
}
/**
* Returns true if word is in dictionary else false.
*/
bool check(const char *word)
{
int i = 0;
int letter;
Trienode *head = root;
while(word[i] != '\0')
{
if(isalpha(word[i]))
{
letter = word[i] - 'a';
}
else //it must be an apostrophe
{
letter = word[i] - 13;
}
if(!(head -> children[letter]))
{
return false;
}
else //a pointer must exist
{
head = head -> children[letter];
}
i++;
}
return true;
}
/**
* Loads dictionary into memory. Returns true if successful else false.
*/
bool load(const char *dictionary)
{
//open file
FILE *infile = fopen(dictionary, "r");
Trienode *parnode; //parent node
root = newnode();
Trienode *curnode = root; //current node
int letter = 0;
//while not end of file, read words
while(fgetc(infile) != EOF)
{
//while not end of word, read letters
for(;;)
{
int c;
//read current letter in file
c = fgetc(infile);
//convert input char to corresponding array location (a - z = 0-25, apostrophe = 26)
if(isalpha(c))
{
letter = c - 'a';
}
else if (c == '\'')
{
letter = c - 13;
}
//if end of string, exit loop
else if (c == '\0')
{
//end of word, so endofstring = true
wordcount++;
break;
}
//move to next letter if not either apostrophe or alphabetical
else
{
break;
}
//if pointer to letter of word doesn't exist, create new node
if(curnode -> children[letter] == NULL)
{
curnode -> children[letter] = newnode();
}
//child node is the new current node
parnode = curnode;
curnode = curnode -> children[letter];
curnode -> parent = parnode;
}
//return to root node
curnode = root;
}
fclose(infile);
return true;
}
/**
* Returns number of words in dictionary if loaded else 0 if not yet loaded.
*/
unsigned int size(void)
{
return wordcount;
}
/**
* Unloads dictionary from memory. Returns true if successful else false.
*/
bool unload(void)
{
cleartrie(root);
if (root == NULL)
{
return true;
}
return false;
}
Sorry about the wall of text, but most of it is just there for context (I hope). The seg fault error is occurring on the if(!(head -> children[letter])) line of the check helper function.
Thanks in advance!
I suspect that your test file may contain some uppercase letters. If this is the case, then subtracting 'a' in an attempt to remap your letters will result in a negative number, since 'A' < 'a'. Have a look at the ASCII Table. Converting the letters to lowercase first should solve your problem.

How to free memory occupied by a Tree, C?

I'm currently dealing with a generic Tree with this structure:
typedef struct NODE {
//node's keys
unsigned short *transboard;
int depth;
unsigned int i;
unsigned int j;
int player;
int value;
struct NODE *leftchild; //points to the first child from the left
struct NODE *rightbrothers; //linked list of brothers from the current node
}NODE;
static NODE *GameTree = NULL;
While the function that allocates the different nodes is (don't bother too much at the keys' values, basically allocates the children-nodes. If there aren't any the new child goes to leftchild, otherwise it goes at the end of the list "node->leftchild->rightbrothers"):
static int AllocateChildren(NODE **T, int depth, unsigned int i, unsigned int j, int player, unsigned short *transboard) {
NODE *tmp = NULL;
if ((*T)->leftchild == NULL) {
if( (tmp = (NODE*)malloc(sizeof(NODE)) )== NULL) return 0;
else {
tmp->i = i;
tmp->j = j;
tmp->depth = depth;
(player == MAX ) ? (tmp->value = 2 ): (tmp->value = -2);
tmp->player = player;
tmp->transboard = transboard;
tmp->leftchild = NULL;
tmp->rightbrothers = NULL;
(*T)->leftchild = tmp;
}
}
else {
NODE *scorri = (*T)->leftchild;
while (scorri->rightbrothers != NULL)
scorri = scorri->rightbrothers;
if( ( tmp = (NODE*)malloc(sizeof(NODE)) )== NULL) return 0;
else {
tmp->i = i;
tmp->j = j;
tmp->depth = depth;
(player == MAX) ? (tmp->value = 2) : (tmp->value = -2);
tmp->player = player;
tmp->transboard = transboard;
tmp->leftchild = NULL;
tmp->rightbrothers = NULL;
}
scorri->rightbrothers = tmp;
}
return 1;
}
I need to come up with a function, possibly recursive, that deallocates the whole tree, so far I've come up with this:
void DeleteTree(NODE **T) {
if((*T) != NULL) {
NODE *tmp;
for(tmp = (*T)->children; tmp->brother != NULL; tmp = tmp->brother) {
DeleteTree(&tmp);
}
free(*T);
}
}
But it doesn't seem working, it doesn't even deallocate a single node of memory.
Any ideas of where I am being wrong or how can it be implemented?
P.s. I've gotten the idea of the recursive function from this pseudocode from my teacher. However I'm not sure I've translated it correctly in C with my kind of Tree.
Pseudocode:
1: function DeleteTree(T)
2: if T != NULL then
3: for c ∈ Children(T) do
4: DeleteTree(c)
5: end for
6: Delete(T)
7: end if
8: end function
One thing I like doing if I'm allocating lots of tree nodes, that are going to go away at the same time, is to allocate them in 'batches'. I malloc then as an array of nodes and dole them out from a special nodealloc function after saving a pointer to the array (in a function like below). To drop the tree I just make sure I'm not keeping any references and then call the free routine (also like below).
This can also reduce the amount of RAM you allocate if you're lucky (or very smart) with your initial malloc or can trust realloc not to move the block when you shrink it.
struct freecell { struct freecell * next; void * memp; } * saved_pointers = 0;
static void
save_ptr_for_free(void * memp)
{
struct freecell * n = malloc(sizeof*n);
if (!n) {perror("malloc"); return; }
n->next = saved_pointers;
n->memp = memp;
saved_pointers = n;
}
static void
free_saved_memory(void)
{
while(saved_pointers) {
struct freecell * n = saved_pointers;
saved_pointers = saved_pointers->next;
free(n->memp);
free(n);
}
}
I've just realized my BIG mistake in the code and I'll just answer myself since no one had found the answer.
The error lies in this piece of code:
for(tmp = (*T)->children; tmp->brother != NULL; tmp = tmp->brother) {
DeleteTree(&tmp);
}
First of all Ami Tavory was right about the for condition, i need to continue as long as tmp != NULL
Basically it won't just work because after the DeleteTree(&tmp), I can no longer access the memory in tmp because it's obviously deleted, so after the first cycle of for ends I can't do tmp = tmp->rightbrother to move on the next node to delete because tmp->rightbrother no longer exists as I just deleted it.
In order to fix it I just needed to save the tmp->brother somewhere else:
void DeleteTree(NODE **T) {
if((*T) != NULL) {
NODE *tmp, *deletenode, *nextbrother;
for(tmp = (*T)->children; tmp != NULL; tmp = nextbrother) {
nextbrother = tmp->rightbrother;
DeleteTree(&tmp);
}
canc = (*T);
free(*T);
(*T) = NULL;
}
}
Just for the sake of completeness I want to add my version of DeleteTree
void DeleteTree(NODE *T) {
if(T != NULL) {
DeleteTree(T->rightbrothers);
DeleteTree(T->leftchild);
free(T);
}
}
I think it is much less obscure and much easier to read. Basically it solves the issue in DeleteTree but through eliminating the loop.
Since we free the nodes recursively we might as well do the whole process recursively.

C - Segfault when accessing struct member in a HashTable (insert function)

I am new to C and am having issues implementing an insert function for my HashTable.
Here are my structs:
typedef struct HashTableNode {
char *url; // url previously seen
struct HashTableNode *next; // pointer to next node
} HashTableNode;
typedef struct HashTable {
HashTableNode *table[MAX_HASH_SLOT]; // actual hashtable
} HashTable;
Here is how I init the table:
HashTable *initTable(){
HashTable* d = (HashTable*)malloc(sizeof(HashTable));
int i;
for (i = 0; i < MAX_HASH_SLOT; i++) {
d->table[i] = NULL;
}
return d;
}
Here is my insert function:
int HashTableInsert(HashTable *table, char *url){
long int hashindex = JenkinsHash(url, MAX_HASH_SLOT);
int uniqueBool = 2; // 0 for true, 1 for false, 2 for init
HashTableNode* theNode = (HashTableNode*)malloc(sizeof(HashTableNode));
theNode->url = url;
if (table->table[hashindex] != NULL) { // if we have a collision
HashTableNode* currentNode = (HashTableNode*)malloc(sizeof(HashTableNode));
currentNode = table->table[hashindex]->next; // the next node in the list
if (currentNode == NULL) { // only one node currently in list
if (strcmp(table->table[hashindex]->url, theNode->url) != 0) { // unique node
table->table[hashindex]->next = theNode;
return 0;
}
else{
printf("Repeated Node\n");
return 1;
}
}
else { // multiple nodes in this slot
printf("There was more than one element in this slot to start with. \n");
while (currentNode != NULL)
{
// SEGFAULT when accessing currentNode->url HERE
if (strcmp(currentNode->url, table->table[hashindex]->url) == 0 ){ // same URL
uniqueBool = 1;
}
else{
uniqueBool = 0;
}
currentNode = currentNode->next;
}
}
if (uniqueBool == 0) {
printf("Unique URL\n");
theNode->next = table->table[hashindex]->next; // splice current node in
table->table[hashindex]->next = theNode; // needs to be a node for each slot
return 0;
}
}
else{
printf("simple placement into an empty slot\n");
table->table[hashindex] = theNode;
}
return 0;
}
I get SegFault every time I try to access currentNode->url (the next node in the linked list of a given slot), which SHOULD have a string in it if the node itself is not NULL.
I know this code is a little dicey, so thank you in advance to anyone up for the challenge.
Chip
UPDATE:
this is the function that calls all ht functions. Through my testing on regular strings in main() of hash table.c, I have concluded that the segfault is due to something here:
void crawlPage(WebPage * page){
char * new_url = NULL;
int pos= 0;
pos = GetNextURL(page->html, pos, URL_PREFIX, &new_url);
while (pos != -1){
if (HashTableLookup(URLsVisited, new_url) == 1){ // url not in table
printf("url is not in table......\n");
hti(URLsVisited, new_url);
WebPage * newPage = (WebPage*) calloc(1, sizeof(WebPage));
newPage->url = new_url;
printf("Adding to LIST...\n");
add(&URLList, newPage); // added & to it.. no seg fault
}
else{
printf("skipping url cuz it is already in table\n");
}
new_url = NULL;
pos = GetNextURL(page->html, pos, URL_PREFIX, &new_url);
}
printf("freeing\n");
free(new_url); // cleanup
free(page); // free current page
}
Your hash table insertion logic violates some rather fundamental rules.
Allocating a new node before determining you actually need one.
Blatant memory leak in your currentNode allocation
Suspicious ownership semantics of the url pointer.
Beyond that, this algorithm is being made way too complicated for what it really should be.
Compute the hash index via hash-value modulo the table size.
Start at the table slot of the hash index, walking node pointers until one of two things happens:
You discover the node is already present
You reach the end of the collision chain.
Only in #2 above do you actually allocate a collision node and chain it to your existing collision list. Most of this is trivial when employing a pointer-to-pointer approach, which I demonstrate below:
int HashTableInsert(HashTable *table, const char *url)
{
// find collision list starting point
long int hashindex = JenkinsHash(url, MAX_HASH_SLOT);
HashTableNode **pp = table->table+hashindex;
// walk the collision list looking for a match
while (*pp && strcmp(url, (*pp)->url))
pp = &(*pp)->next;
if (!*pp)
{
// no matching node found. insert a new one.
HashTableNode *pNew = malloc(sizeof *pNew);
pNew->url = strdup(url);
pNew->next = NULL;
*pp = pNew;
}
else
{ // url already in the table
printf("url \"%s\" already present\n", url);
return 1;
}
return 0;
}
That really is all there is to it.
The url ownership issue I mentioned earlier is addressed above via string duplication using strdup(). Although not a standard library function, it is POSIX compliant and every non-neanderthal half-baked implementation I've seen in the last two decades provides it. If yours doesn't (a) I'd like to know what you're using, and (b) its trivial to implement with strlen and malloc. Regardless, when the nodes are being released during value-removal or table wiping, be sure and free a node's url before free-ing the node itself.
Best of luck.

Resources