C Having Trouble Resizing a Hash Table

C Having Trouble Resizing a Hash Table - c

I'll post snippets of the code here which (I think) are relevant to the problem, but I can pastebin if necessary. Probably posting more than enough code already :P
My program includes a hash table which needs to double when a certain hash bucket reaches 20 entries. Although I believe the logic to be good, and it compiles like a charm, it throws up a Segmentation Fault. The code runs like a charm when not resizing, but resizing messes things up.
Thanks for any help :)
Error
Program received signal SIGSEGV, Segmentation fault.
0x0000000000401012 in ml_add (ml=0x7fffffffe528, me=0x75a5a0) at mlist.c:74
74 while((cursorNode->next) != NULL){
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.5.x86_64
(gdb) backtrace
#0 0x0000000000401012 in ml_add (ml=0x7fffffffe528, me=0x75a5a0) at mlist.c:74
#1 0x0000000000401554 in main (argc=1, argv=0x7fffffffe638) at finddupl.c:39
Structure of Hash Table
typedef struct bN { //linked list node containing data and next
MEntry *nestedEntry;
struct bN *next;
} bucketNode;
typedef struct bL { // bucket as linked list
struct bN *first;
int bucketSize;
} bucket;
struct mlist {
struct bL *currentTable; //bucket array
};
Add Function
int ml_add(MList **ml, MEntry *me){
MList *tempList;
tempList = *ml;
bucketNode *tempNode = (bucketNode *)malloc(sizeof(bucketNode));
tempNode->nestedEntry = me;
tempNode->next = NULL;
unsigned long currentHash = me_hash(me, tableSize);
if((tempList->currentTable[currentHash].bucketSize) == 0) {
tempList->currentTable[currentHash].first = tempNode;
tempList->currentTable[currentHash].bucketSize = (tempList->currentTable[currentHash].bucketSize) + 1;
}
else if((tempList->currentTable[currentHash].bucketSize) == 20){
printf("About to resize");
printf("About to resize");
tempList = ml_resize(&tempList, (tableSize * 2));
tableSize = tableSize * 2;
ml_add(&tempList,me);
}
else{
bucketNode *cursorNode;
cursorNode = tempList->currentTable[currentHash].first;
while((cursorNode->next) != NULL){
cursorNode = cursorNode->next;
}
cursorNode->next = tempNode;
tempList->currentTable[currentHash].bucketSize = (tempList->currentTable[currentHash].bucketSize) + 1;
return 1;
}
return 1;
}
Resize Function
MList *ml_resize(MList **ml, int newSize){
MList *oldList;
oldList = *ml;
MList *newList;
if ((newList = (MList *)malloc(sizeof(MList))) != NULL){
newList->currentTable = (bucket *)malloc(newSize * sizeof(bucket));
int i;
for(i = 0; i < newSize; i++){
newList->currentTable[i].first = NULL;
newList->currentTable[i].bucketSize = 0;
}
}
int j;
for(j = 0; j < tableSize; j++){
bucketNode *cursorNode = oldList->currentTable[j].first;
bucketNode *nextNode;
while(cursorNode != NULL){
nextNode = cursorNode->next;
ml_transfer(&newList, cursorNode, newSize);
cursorNode = nextNode;
}
}
free(oldList);
return newList;
}
Transfer to new list function
void ml_transfer(MList **ml, bucketNode *insertNode, int newSize){
MList *newList;
newList = *ml;
bucketNode *tempNode = insertNode;
tempNode->next = NULL;
unsigned long currentHash = me_hash((tempNode->nestedEntry), newSize);
if((newList->currentTable[currentHash].bucketSize) == 0) {
newList->currentTable[currentHash].first = tempNode;
newList->currentTable[currentHash].bucketSize = (newList->currentTable[currentHash].bucketSize) + 1;
}
else{
bucketNode *cursorNode;
cursorNode = newList->currentTable[currentHash].first;
while((cursorNode->next) != NULL){
cursorNode = cursorNode->next;
}
cursorNode->next = tempNode;
newList->currentTable[currentHash].bucketSize = (newList->currentTable[currentHash].bucketSize) + 1;
}
}

The problem most probably lies on the fact that the ml_add() function is failing to update the MList** ml parameter node whenever the hashtable is resized.
When the hashtable is resized, the old hashtable is destroyed (inside, ml_resize()), but the pointer to the resized, new hashtable is just updated in the tempList variable, that is just a local copy of *ml. You should also update *ml in order to modify the variable that is keeeping reference of the hashTable outside of the function, otherwise, it is left pointing to the deleted, invalid Hashtable. Try the following modification:
...
else if((tempList->currentTable[currentHash].bucketSize) == 20){
printf("About to resize");
printf("About to resize");
tempList = ml_resize(&tempList, (tableSize * 2));
tableSize = tableSize * 2;
ml_add(&tempList,me);
*ml = tempList; // this is necesary to fix the pointer outside the
// function, that still points to the hashtable
// memory freed by the resize function
}
...
Also please note the comments I made about two memory leaks existing in your code, and I would also take into account what #hexist pointed out that it is not necessary to insert at the end of the liked list at the head, simplifying the code and making it faster.

Related

How to call free() properly?

Kindly read the whole post since it includes small details which are highly important.
As known by C we should take care of incidents where malloc fails, for that case I created a function called destroyList() whose job is to take a pointer to Node and destroy it one by one.
But my function isn't being called correctly...
I tried to call it with ptr, merged_out and *merged_out (The last one was a suggestion from a member of the community) but nothing seems to work.
Why is that? the function sometimes receives NULL, Empty Lists or some random values.
Can someone please help me fix this issue and let me understand of what is going on?
typedef struct node_t {
int x;
struct node_t *next;
} *Node;
void destroyList(Node ptr) {
while (ptr) {
Node toDelete = ptr;
ptr = ptr->next;
free(toDelete);
}
}
Main Function:
ErrorCode mergeSortedLists(Node list1, Node list2, Node *merged_out) {
if (!list1 || !list2) {
return EMPTY_LIST;
}
if (!isListSorted(list1) || !isListSorted(list2)) {
return UNSORTED_LIST;
}
if (!merged_out) {
return NULL_ARGUMENT;
}
Node ptr = NULL;
int total_len = getListLength(list1) + getListLength(list2);
for (int i = 0; i < total_len; i++) {
int min = getMin(&list1, &list2);
ptr = malloc(sizeof(*ptr));
*merged_out = ptr;
if (!ptr) {
destroyList(*merged_out);
*merged_out = NULL;
return MEMORY_ERROR;
}
ptr->x = min;
ptr->next = NULL;
merged_out = &ptr->next;
}
return SUCCESS;
}
This is how the function should be called:
Node merged_actual = NULL;
ErrorCode merge_status = mergeSortedLists(list1, list2, &merged_actual);
Note: getMin() gets the minimum value and advances the pointer of the list which has that min value to the next node.

Start after those if checks.
Node ptr=NULL,last;
/* find out current tail of the list */
if (*merged_out!=NULL){
last=*merged_out;
while (last->next!=NULL){
last=last->next;
}
}
int total_len = getListLength(list1) + getListLength(list2);
for (int i = 0; i < total_len; i++)
{
int min = getMin(&list1, &list2);
ptr = malloc(sizeof(*ptr));
if (!ptr)
{
destroyList(*merged_out);
*merged_out=NULL;
return MEMORY_ERROR;
}
ptr->x = min;
ptr->next = NULL;
/* link ptr onto the list */
if (*merged_out==NULL){
/* if the list is empty, make ptr the head of the list */
*merged_out=ptr;
last=*merged_out;
}
else{
last->next = ptr;
last = ptr;
}
}
Please try not to copy and paste this block of code. It may or may not be correct, but try to understand what it did: iterate each time the function is called, in an effort to put last to point at the last element of the list. Therefore merged_out can always point to the head.

#user12986714 I lost my old account, and was told to not to care about the initial value of *merged_out could you update the solution (delete the first while loop and no need for 2 pointers)

C memory leak when inserting into a doubly linked list

Hi I'm new to C and pointers and are having issues trying to implement the below doubly linked list structure. Memory leaks happened in listInsertEnd I believe? I am very confused as to why one work (at least no mem leak in output) and the other one doesn't. I have pasted only parts of the program, any help or explanation is much appreciated.
#include <stdio.h>
#include <stdlib.h>
typedef struct node *Node;
struct node {
int value;
Node next;
Node prev;
};
typedef struct list *List;
struct list {
Node first;
Node last;
int count;
};
Node newNode(int value) {
Node n = malloc(sizeof(*n));
if (n == NULL) fprintf(stderr, "couldn't create new node\n");
n->value = value;
n->next = NULL;
n->prev = NULL;
return n;
}
void listInsertEnd(List newList, int value) {
Node n = newNode(value);
if (newList== NULL) { //no item in list
//why is this giving me memory leaks
newList->first = newList->last = n;
//whereas this doesn't?
newList->first = newList->last = newNode(value);
} else { //add to end
n->prev = newList->last;
newList->last->next = n;
newList->last = n;
}
nList->count++;
}

First of all, talking about memory leaks: there is no direct memory leak in your code. If the leak happens somewhere, it's outside of these functions. It's most probably because you create one or more nodes and then forget to free() them, but this has nothing to do with the two functions you show.
I see that you are using typedef to declare simple pointer types, take a look at this question and answer to understand why that's bad practice and should be avoided: Is it a good idea to typedef pointers?. Also, this particular piece of Linux kernel documentation which explains the issue in more detail.
Secondly, the real problem in the code you show is that you are using pointers after you tested that they are invalid (NULL).
Here:
Node newNode(int value) {
Node n = malloc(sizeof(*n));
if (n == NULL) fprintf(stderr, "couldn't create new node\n");
n->value = value;
// ^^^^^^^^ BAD!
And also here:
if (newList== NULL) {
newList->first = newList->last = n;
// ^^^^^^^^^^^^^^ BAD!
If something is NULL, you cannot dereference it. Change your functions to safely abort after they detect an invalid pointer.
This can be done in multiple ways. Here's an example of correct code:
Node newNode(int value) {
Node n = malloc(sizeof(*n));
if (n == NULL) {
fprintf(stderr, "couldn't create new node\n");
return NULL;
}
n->value = value;
n->next = NULL;
n->prev = NULL;
return n;
}
void listInsertEnd(List newList, int value) {
Node n;
if (newList == NULL) {
return;
// You probably want to return some error value here.
// In that case change the function signature accordingly.
}
n = newNode(value);
if (newList->count == 0) {
newList->first = newList->last = n;
} else { //add to end
n->prev = newList->last;
newList->last->next = n;
newList->last = n;
}
newList->count++;
}
NOTE: the check newList->count == 0 assumes that you correctly increment/decrement the count when adding/removing elements.

This typedef declaration
typedef struct node *Node;
is confusing and presents a bad style. Consider for example this statement
Node n = malloc(sizeof(*n));
somebody can think that here is a typo and should be written
Node *n = malloc(sizeof(*n));
The function
void listInsertEnd(List newList, int value) {
Node n = newNode(value);
if (newList== NULL) { //no item in list
//why is this giving me memory leaks
newList->first = newList->last = n;
//whereas this doesn't?
newList->first = newList->last = newNode(value);
} else { //add to end
n->prev = newList->last;
newList->last->next = n;
newList->last = n;
}
nList->count++;
}
has undefined behavior. If newList is equal to NULL then you are trying to use memory pointed to by a null pointer.
if (newList== NULL) { //no item in list
//why is this giving me memory leaks
newList->first = newList->last = n;
//whereas this doesn't?
newList->first = newList->last = newNode(value);
And initially data members newList->first and newList->last can be equal to NULL. That also can be reason of undefined behavior because the function does not take this into account.
Before changing the function listInsertEnd you should define the function newNode the following way
Node newNode(int value)
{
Node n = malloc(sizeof(*n));
if ( n != NULL )
{
n->value = value;
n->next = NULL;
n->prev = NULL;
}
return n;
}
The function shall not issue any message. It is the caller of the function that decides whether to issue a message if it is required.
In this case the function listInsertEnd can be written the following way
int listInsertEnd(List newList, int value)
{
Node n = newNode(value);
int success = n != NULL;
if ( success )
{
n->prev = newList->last;
if ( newList->first == NULL )
{
newList->first = newList->last = n;
}
else
{
newList->last = newList->last->next = n;
}
++newList->count;
}
return success;
}
Within the main you should create the list the following way
int main( void )
{
struct list list1 = { .first = NULL, .last = NULL, .count = 0 };
// or
// struct list list1 = { NULL, NULL, 0 };
and call the function like
listInsertEnd) &list1, some_integer_value );

Hashtable Add - C

Getting some segfault on the following algorithm to add an element to the correct bucket in a hashtable.
My structures are basic:
struct kv {
char* key;
unsigned val;
struct kv* next;
};
struct hashtable {
struct kv** table;
unsigned size;
};
And my buggy function:
struct kv* ht_find_or_put(char* word, unsigned value,
struct hashtablet* hashtable,
unsigned (*hash)(char*))
{
unsigned index = hash(word) % hashtable->size;
struct kv* ke = malloc(sizeof (struct kv));
for (ke = hashtable->table[index]; ke != NULL; ke = ke->next)
{
if (strcmp(ke->key, word) == 0)
return ke;
}
if (ke == NULL)
{
ke->key = word;
ke->val = value;
ke->next = hashtable->table[index];
hashtable->table[index] = ke;
}
return ke;
}
I know I haven't added yet all the tests (if malloc failed and such) just trying to debug this particular problem...
I'm allocating my table as such:
struct hashtable* hashtable_malloc(unsigned size)
{
struct hashtable *new_ht = malloc(sizeof(struct hashtable));
new_ht->size = size;
new_ht->table = malloc(sizeof(struct kv) * size);
for(unsigned i = 0; i < size; i++)
new_ht->table[i] = NULL;
return new_ht;
}
Any sort of help will greatly be appreciated. I'm only starting to learn.

The first issue is a memory leak, e.g. - you allocate memory using malloc, but than loses the reference to it, as you override the pointer:
// allocate memory
struct kv* ke = malloc(sizeof (struct kv));
// lose the reference
// VVVVVVVVVVV
for (ke = hashtable->table[index]; ke != NULL; ke = ke->next)
The second issue, which probably causes the segfault, is that you try to de-reference a null pointer:
if (ke == NULL)
{
// ke is NULL, you can't de-reference it
ke->key = word;
ke->val = value;
ke->next = hashtable->table[index];
hashtable->table[index] = ke;
}
The solution will be, IMHO, to allocate and put the new element, only upon failure to find it:
struct kv* ht_find_or_put(char* word, unsigned value, struct hashtablet* hashtable, unsigned (*hash)(char*))
{
unsigned index = hash(word) % hashtable->size;
struct kv* ke;
// first we try to find the node
for (ke = hashtable->table[index]; ke != NULL; ke = ke->next)
{
if (strcmp(ke->key, word) == 0)
return ke;
}
// didn't find it - lets create and put a new one.
if (ke == NULL)
{
ke = malloc(sizeof (struct kv));
// later add a check if the allocation succeded...
ke->key = word;
ke->val = value;
ke->next = hashtable->table[index];
hashtable->table[index] = ke;
}
return ke;
}
Since I didn't want to introduce entirely new code, that would just confuse you, I made the minimal changes to the original code.

Hashmap with Linked List to find word count

I have been working on this little project for quite some time and I can't figure out why I'm not getting the results that are expected. I am a beginner to C programming so my understanding with pointers and memory allocation/deallocation is novice. Anyways, I have constructed this segment of code by originally building a hash function, then adding a count to it. However, when I test it, sometimes the count works, sometimes it doesn't. I'm not sure whether it's the fault of the hash function, or the fault of the way I set up my count. The text file is read one line at a time and is a string consisting of a hexadecimal.
struct node {
char *data;
struct node *next;
int count; /* Implement count here for word frequencies */
};
#define H_SIZE 1024
struct node *hashtable[H_SIZE]; /* Declaration of hash table */
void h_lookup(void)
{
int i = 0;
struct node *tmp;
for(i = 0; i < H_SIZE; i++) {
for(tmp = hashtable[i]; tmp != NULL; tmp = tmp->next) {
if(tmp->data != 0) {
printf("Index: %d\nData: %s\nCount: %d\n\n", i,
tmp->data, tmp->count);
}
}
}
}
/* self explanatory */
void h_add(char *data)
{
unsigned int i = h_assign(data);
struct node *tmp;
char *strdup(const char *s);
/* Checks to see if data exists, consider inserting COUNT here */
for(tmp = hashtable[i]; tmp != NULL; tmp = tmp->next) {
if(tmp->data != 0) { /* root node */
int count = tmp->count;
if(!strcmp(data, tmp->data))
count= count+1;
tmp->count = count;
return;
}
}
for(tmp = hashtable[i]; tmp->next != NULL; tmp = tmp->next);
if(tmp->next == NULL) {
tmp->next = h_alloc();
tmp = tmp->next;
tmp->data = strdup(data);
tmp->next = NULL;
tmp->count = 1;
} else
exit(EXIT_FAILURE);
}
/* Hash function, takes value (string) and converts into an index into the array of linked lists) */
unsigned int h_assign(char *string)
{
unsigned int num = 0;
while(*string++ != '\0')
num += *string;
return num % H_SIZE;
}
/* h_initialize(void) initializes the array of linked lists. Allocates one node for each list by calling h_alloc which creates a new node and sets node.next to null */
void h_initialize(void)
{ int i;
for(i = 0; i <H_SIZE; i++) {
hashtable[i] = h_alloc();
}
}
/* h_alloc(void) is a method which creates a new node and sets it's pointer to null */
struct node *h_alloc(void)
{
struct node *tmp = calloc(1, sizeof(struct node));
if (tmp != NULL){
tmp->next = NULL;
return tmp;
}
else{
exit(EXIT_FAILURE);
}
}
/* Clean up hashtable and free up memory */
void h_free(void)
{
struct node *tmp;
struct node *fwd;
int x;
for(x = 0; x < H_SIZE; x++) {
tmp = hashtable[x];
while(tmp != NULL) {
fwd = tmp->next;
free(tmp->data);
free(tmp);
tmp = fwd;
}
}
}

I assume that the count is not being incremented when it does not work. It is possible that strdup is not able to allocate memory for the new string and is returning NULL. You should check the return value to and exit gracefully if it fails.

Linked list implementations difference

For the following linked list declaration,
#include <stdlib.h>
#include <stdio.h>
typedef struct list
{
int val;
struct list *next;
} list;
void destroy (list *l)
{
if (l)
{
destroy (l->next);
free (l);
}
}
why does the following main work
int main()
{
list *test;
list *ptr1, *ptr2;
int i;
test = malloc (sizeof (list));
test->val = 0;
ptr2 = test;
for (i = 1; i <= 10; i++)
{
ptr1 = (list *) malloc (sizeof (list));
ptr1->val = i;
ptr2->next = ptr1;
ptr2 = ptr1;
}
ptr1 = test;
while (ptr1)
{
printf ("%d\n", ptr1->val);
ptr1 = ptr1->next ;
}
destroy (test);
return 0;
}
while this one doesn't even create a list (it only makes one node)?
int main()
{
list *test;
list *ptr;
int i;
test = malloc (sizeof (list));
test->val = 0;
ptr = test->next;
for (i = 1; i <= 10; i++)
{
ptr = (list *) malloc (sizeof (list));
ptr->val = i;
ptr = ptr->next;
}
ptr = test;
while (ptr)
{
printf ("%d\n", ptr->val);
ptr = ptr->next ;
}
destroy (test);
return 0;
}
Don't they use the same logic?

The code
ptr = test->next;
for (i = 1; i <= 10; i++)
{
ptr = (list *) malloc (sizeof (list));
ptr->val = i;
ptr = ptr->next;
}
starts by taking a copy of test->next but never assigns anything to test->next itself. A list starting from test therefore only has a single item. Worse, that item has an uninitialised next pointer so code that tries to iterate over the list will almost certainly crash.
As hinted at in the other answers, this pattern is repeated for each newly allocated node.
In answer to your comment, the best way to make the second function work is to make it more like the first (working) version. I've renamed the variables in it to try to make it clearer
list *head;
list *next, *curr;
int i;
head = malloc (sizeof(*head));
head->val = 0;
curr= head;
for (i = 1; i <= 10; i++)
{
next = malloc (sizeof(*next));
next->val = i;
curr->next = next;
curr= next;
}
curr= head;

It looks like in the first example, which works, ptr2 is holding the previously created node in the list, so that this can be rewritten
last_created_node = test;
for (i = 1; i <= 10; i++)
{
// create new node
new_node = (list *) malloc (sizeof (list));
new_node ->val = i;
// chain newly created node onto list so far
// make last created node point to new node
last_created_node->next = new_node ;
// last created node is now new node
last_created_node = new_node ;
}
// terminate the list
last_created_node->next = 0;
There is no equivalent of linking a new node onto the chain in the second code sample you give. Also there are problems with unitialised memory as others have commented. Would be good to add the termination condition as shown in the last line of my sample above.

In the second main during
ptr = test->next;
you are trying to acces test->next withouth allocating memory for it.You can try changing your code as following to get second main working
test = malloc (sizeof (list));
test->val = 0;
test->next = (list *) malloc (sizeof (list));
ptr = test->next;
for (i = 1; i <= 10; i++)
{
ptr->val = i;
ptr->next = (list *) malloc (sizeof (list));
ptr = ptr->next;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C Having Trouble Resizing a Hash Table - c

Related

How to call free() properly?

C memory leak when inserting into a doubly linked list

Hashtable Add - C

Hashmap with Linked List to find word count

Linked list implementations difference

Categories

Resources