Hashtable insertion/search in C - c

Hello i have a problem with my hash table its implemented like this:
#define HT_SIZE 10
typedef struct _list_t_ {
char key[20];
char string[20];
char prevValue[20];
struct _list_t_ *next;
} list_t;
typedef struct _hash_table_t_ {
int size; /* the size of the table */
list_t ***table; /* first */
sem_t lock;
} hash_table_t;
I have a Linked list with 3 pointers because i want a hash table with several partitions (shards), here is my initialization of my Hash table:
hash_table_t *create_hash_table(int NUM_SERVER_THREADS, int num_shards){
hash_table_t *new_table;
int j,i;
if (HT_SIZE<1) return NULL; /* invalid size for table */
/* Attempt to allocate memory for the hashtable structure */
new_table = (hash_table_t*)malloc(sizeof(hash_table_t)*HT_SIZE);
/* Attempt to allocate memory for the table itself */
new_table->table = (list_t ***)calloc(1,sizeof(list_t **));
/* Initialize the elements of the table */
for(j=0; j<num_shards; j++){
new_table->table[j] = (list_t **)calloc(1,sizeof(list_t *));
for(i=0; i<HT_SIZE; i++){
new_table->table[j][i] = (list_t *)calloc(1,sizeof(list_t ));
}
}
/* Set the table's size */
new_table->size = HT_SIZE;
sem_init(&new_table->lock, 0, 1);
return new_table;
}
Here is my search function to search in the hash table
list_t *lookup_string(hash_table_t *hashtable, char *key, int shardId){
list_t *list ;
int hashval = hash(key);
/* Go to the correct list based on the hash value and see if key is
* in the list. If it is, return return a pointer to the list element.
* If it isn't, the item isn't in the table, so return NULL.
*/
sem_wait(&hashtable->lock);
for(list = hashtable->table[shardId][hashval]; list != NULL; list =list->next) {
if (strcmp(key, list->key) == 0){
sem_post(&hashtable->lock);
return list;
}
}
sem_post(&hashtable->lock);
return NULL;
}
And my insert function:
char *add_string(hash_table_t *hashtable, char *str,char *key, int shardId){
list_t *new_list;
list_t *current_list;
unsigned int hashval = hash(key);
/*printf("|%d|%d|%s|\n",hashval,shardId,key);*/
/* Lock for concurrency */
sem_wait(&hashtable->lock);
/* Attempt to allocate memory for list */
new_list = (list_t*)malloc(sizeof(list_t));
/* Does item already exist? */
sem_post(&hashtable->lock);
current_list = lookup_string(hashtable, key,shardId);
sem_wait(&hashtable->lock);
/* item already exists, don't insert it again. */
if (current_list != NULL){
strcpy(new_list->prevValue,current_list->string);
strcpy(new_list->string,str);
strcpy(new_list->key,key);
new_list->next = hashtable->table[shardId][hashval];
hashtable->table[shardId][hashval] = new_list;
sem_post(&hashtable->lock);
return new_list->prevValue;
}
/* Insert into list */
strcpy(new_list->string,str);
strcpy(new_list->key,key);
new_list->next = hashtable->table[shardId][hashval];
hashtable->table[shardId][hashval] = new_list;
/* Unlock */
sem_post(&hashtable->lock);
return new_list->prevValue;
}
My main class runs some of tests by executing the insertion / reading / delete from the elements of the hash table the problem is when i have more than 4 partitions/shards the tests stop at the first reading element saying it returned the wrong value NULL on the search function, when its less than 4 it runs perfectly well and passes all the tests.
You can see my main.c in here if you want to give a look:
http://hostcode.sourceforge.net/view/1105
My complete Hash table code:
http://hostcode.sourceforge.net/view/1103
And other functions where hash table code is executed:
.c file http://hostcode.sourceforge.net/view/1104
.h file http://hostcode.sourceforge.net/view/1106
Thank for you time, i appreciate any help you can give to me this is a college important project that I'm trying to solve and I'm stuck here for 2 days.

Hi already solved this problem i was doing a bad allocation in my initialization:
new_table->table = (list_t ***)calloc(1,sizeof(list_t **));
it should be like this:
new_table->table = (list_t ***)calloc(num_shards,sizeof(list_t **));

Related

How to deal with old references to a resized hash table?

I'm currently working on a hash table implementation in C. I'm trying to implement dynamic resizing, but came across a problem.
If resizing a hash table means creating a new one with double (or half) the size, rehashing, and deleting the old one, how can I deal with old references the user may have made to the old table? Example code (I've omitted error checking just for this example):
int main(int argc, char *argv[])
{
ht = ht_create(5) /* make hashtable with size 5 */
ht_insert("john", "employee"); /* key-val pair "john -> employee" */
ht_insert("alice", "employee");
char *position = ht_get(ht, "alice"); /* get alice's position from hashtable ht */
ht_insert("bob", "boss"); /* this insert exceeds the load factor, resizes the hash table */
printf("%s", position); /* returns NULL because the previous hashtable that was resized was freed */
return 0;
}
In this case position pointed to alice's value which was found in the hashtable. When it was resized, we freed the hash table and lost it. How can I fix this problem, so the user won't have to worry that a previously defined pointer was freed?
EDIT: my current hash table implementation
hash.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "hash.h"
#define LOADFACTOR 0.75
typedef struct tableentry /* hashtab entry */
{
struct tableentry *next;
char *key;
void *val;
} tableentry_t;
typedef struct hashtable
{
datatype_t type;
size_t size;
size_t load; /* number of keys filled */
struct tableentry **tab;
} hashtable_t;
/* creates hashtable */
/* NOTE: dynamically allocated, remember to ht_free() */
hashtable_t *ht_create(size_t size, datatype_t type)
{
hashtable_t *ht = NULL;
if ((ht = malloc(sizeof(hashtable_t))) == NULL)
return NULL;
/* allocate ht's table */
if ((ht->tab = malloc(sizeof(tableentry_t) * size)) == NULL)
return NULL;
/* null-initialize table */
size_t i;
for (i = 0; i < size; i++)
ht->tab[i] = NULL;
ht->size = size;
ht->type = type;
return ht;
}
/* creates hash for a hashtab */
static unsigned hash(char *s)
{
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval;
}
static int *intdup(int *i)
{
int *new;
if ((new = malloc(sizeof(int))) == NULL)
return NULL;
*new = *i;
return new;
}
static void free_te(tableentry_t *te)
{
free(te->key);
free(te->val);
free(te);
}
/* loops through linked list freeing */
static void free_te_list(tableentry_t *te)
{
tableentry_t *next;
while (te != NULL)
{
next = te->next;
free_te(te);
te = next;
}
}
/* creates a key-val pair */
static tableentry_t *alloc_te(char *k, void *v, datatype_t type)
{
tableentry_t *te = NULL;
int status = 0;
/* alloc struct */
if ((te = calloc(1, sizeof(*te))) == NULL)
status = -1;
/* alloc key */
if ((te->key = strdup(k)) == NULL)
status = -1;
/* alloc value */
int *d;
char *s;
switch (type)
{
case STRING:
s = (char *) v;
if ((te->val = strdup(s)) == NULL)
status = -1;
break;
case INTEGER:
d = (int *) v;
if ((te->val = intdup(d)) == NULL)
status = -1;
break;
default:
status = -1;
}
if (status < 0)
{
free_te_list(te);
return NULL;
}
te->next = NULL;
return te;
}
static tableentry_t *lookup(hashtable_t *ht, char *k)
{
tableentry_t *te;
/* step through linked list */
for (te = ht->tab[hash(k) % ht->size]; te != NULL; te = te->next)
if (strcmp(te->key, k) == 0)
return te; /* found */
return NULL; /* not found */
}
/* inserts the key-val pair */
hashtable_t *ht_insert(hashtable_t *ht, char *k, void *v)
{
tableentry_t *te;
/* unique entry */
if ((te = lookup(ht, k)) == NULL)
{
te = alloc_te(k, v, ht->type);
unsigned hashval = hash(k) % ht->size;
/* insert at beginning of linked list */
te->next = ht->tab[hashval];
ht->tab[hashval] = te;
ht->load++;
}
/* replace val of previous entry */
else
{
free(te->val);
switch (ht->type)
{
case STRING:
if ((te->val = strdup(v)) == NULL)
return NULL;
break;
case INTEGER:
if ((te->val = intdup(v)) == NULL)
return NULL;
break;
default:
return NULL;
}
}
return ht;
}
static void delete_te(hashtable_t *ht, char *k)
{
tableentry_t *te, *prev;
unsigned hashval = hash(k) % ht->size;
te = ht->tab[hashval];
/* point head to next element if deleting head */
if (strcmp(te->key, k) == 0)
{
ht->tab[hashval] = te->next;
free_te(te);
ht->load--;
return;
}
/* otherwise look through, keeping track of prev to reassign its ->next */
for (; te != NULL; te = te->next)
{
if (strcmp(te->key, k) == 0)
{
prev->next = te->next;
free_te(te);
ht->load--;
return;
}
prev = te;
}
}
hashtable_t *ht_delete(hashtable_t *ht, char *k)
{
size_t i;
if (lookup(ht, k) == NULL)
return NULL;
else
delete_te(ht, k);
}
/* retrieve value from key */
void *ht_get(hashtable_t *ht, char *k)
{
tableentry_t *te;
if ((te = lookup(ht, k)) == NULL)
return NULL;
return te->val;
}
/* frees hashtable created from ht_create() */
void ht_free(hashtable_t *ht)
{
size_t i;
if (ht)
{
for (i = 0; i < ht->size; i++)
if (ht->tab[i] != NULL)
free_te_list(ht->tab[i]);
free(ht);
}
}
/* resizes hashtable, returns new hashtable and frees old */
static hashtable_t *resize(hashtable_t *oht, size_t size)
{
hashtable_t *nht; /* new hashtable */
nht = ht_create(size, oht->type);
/* rehash */
size_t i;
tableentry_t *te;
/* loop through hashtable */
for (i = 0; i < oht->size; i++)
/* loop through linked list */
for (te = oht->tab[i]; te != NULL; te = te->next)
/* insert & rehash old vals into new ht */
if (ht_insert(nht, te->key, te->val) == NULL)
return NULL;
ht_free(oht);
return nht;
}
hash.h
/* a hash-table implementation in c */
/*
hashing algorithm: hashval = *s + 31 * hashval
resolves collisions using linked lists
*/
#ifndef HASH
#define HASH
typedef struct hashtable hashtable_t;
typedef enum datatype {STRING, INTEGER} datatype_t;
/* inserts the key-val pair */
hashtable_t *ht_insert(hashtable_t *ht, char *k, void *v);
/* creates hashtable */
/* NOTE: dynamically allocated, remember to ht_free() */
hashtable_t *ht_create(size_t size, datatype_t type);
/* frees hashtable created from ht_create() */
void ht_free(hashtable_t *ht);
/* retrive value from key */
void *ht_get(hashtable_t *ht, char *k);
hashtable_t *ht_delete(hashtable_t *ht, char *k);
#endif
Do not use the hash table as the container for the data; only use it to refer to the data, and you won't have that problem.
For example, let's say you have key-value pairs, using a structure with the actual data in the C99 flexible array member:
struct pair {
struct pair *next; /* For hash chaining */
size_t hash; /* For the raw key hash */
/* Payload: */
size_t offset; /* value starts at (data + offset) */
char data[]; /* key starts at (data) */
};
static inline const char *pair_key(struct pair *ref)
{
return (const char *)(ref->data);
}
static inline const char *pair_value(struct pair *ref)
{
return (const char *)(ref->data + ref->offset);
}
Your hash table can then be simply
struct pair_hash_table {
size_t size;
struct pair **entry;
};
If you have struct pair_hash_table *ht, and struct pair *foo with foo->hash containing the hash of the key, then foo should be in the singly-linked list hanging off ht->entry[foo->hash % ht->size];.
Let's say you wish to resize the hash table ht. You choose a new size, and allocate enough memory for that many struct pair *. Then, you go through each singly-linked list in each old hash entry, detaching them from the old list, and prepending them to the lists in correct hash table entries in the new hash table. Then you just free the old hash table entry array, replacing it with the new one:
int resize_pair_hash_table(struct pair_hash_table *ht, const size_t new_size)
{
struct pair **entry, *curr, *next;
size_t i, k;
if (!ht || new_size < 1)
return -1; /* Invalid parameters */
entry = malloc(new_size * sizeof entry[0]);
if (!entry)
return -1; /* Out of memory */
/* Initialize new entry array to empty. */
for (i = 0; i < new_size; i++)
entry[i] = NULL;
for (i = 0; i < ht->size; i++) {
/* Detach the singly-linked list. */
next = ht->entry[i];
ht->entry[i] = NULL;
while (next) {
/* Detach the next element, as 'curr' */
curr = next;
next = next->next;
/* k is the index to this hash in the new array */
k = curr->hash % new_size;
/* Prepend to the list in the new array */
curr->next = entry[k];
entry[k] = curr;
}
}
/* Old array is no longer needed, */
free(ht->entry);
/* so replace it with the new one. */
ht->entry = entry;
ht->size = size;
return 0; /* Success */
}
Note that the hash field in struct pair is not modified, nor recalculated.
Having the raw hash (as opposed to modulo table-size), means you can speed up the key search even when different keys use the same slot:
struct pair *find_key(struct pair_hash_table *ht,
const char *key, const size_t key_hash)
{
struct pair *curr = ht->entry[key_hash % ht->size];
while (curr)
if (curr->hash == key_hash && !strcmp(key, pair_key(next)))
return curr;
else
curr = curr->next;
return NULL; /* Not found. */
}
In C, the logical and operator, &&, is short-circuiting. If the left side is not true, the right side is not evaluated at all, because the entire expression can never be true in that case.
Above, this means that the raw hash value of the key is compared, and only when they do match, the actual strings are compared. If your hash algorithm is even halfway good, this means that if the key already exists, typically only one string comparison is done; and if the key does not exist in the table, typically no string comparisons are done.
You can deal with them the same way the standard library (C++) deals with this exact problem:
Some operations on containers (e.g. insertion, erasing, resizing) invalidate iterators.
For instance std::unordered_map which is basically a hash table implemented with buckets has these rules:
insertion
unordered_[multi]{set,map}: all iterators invalidated when rehashing
occurs, but references unaffected [23.2.5/8]. Rehashing does not occur
if the insertion does not cause the container's size to exceed z * B
where z is the maximum load factor and B the current number of
buckets. [23.2.5/14]
erasure
unordered_[multi]{set,map}: only iterators and references to the
erased elements are invalidated [23.2.5/13]
Iterator invalidation rules
The C++ concept of iterators is a generalization of pointers. So this concept can be applied to C.
Your only other alternative is that instead of holding the objects directly into the container you add another level of indirection and hold some sort of proxy. And so the elements always stay at the same position in memory. It's the proxies that move around on resizing/inserting etc. But you need to analize this scenario: are the added double indirection (which will surely affect performance in a negative way) and increase implementation complexity worth it? Is is that important to have persistent pointers?

need help rehashing a hashtable in c

I want to rehash a hashtable by allocating space for a new table, traverse the old table, and for each element, compute a
new hash value and then link it into the new table. I have linked lists as entries into the hashtable thus the second for loop whilst traversing the old hashtable. I also want to free the old table, but first get the elements into the new table correctly.
I need help, where am I going wrong in traversing the old table? Also can I just point the original ht to the newht at the end? I need to free the old table(prevtable) afterwards also, which I will figure out later.
typedef struct hashtable {
htentry_ptr *table; /*<< a pointer to the underlying table */
unsigned int size; /*<< the current size of the underlying table */
unsigned int num_entries; /*<< the current number of entries */
float max_loadfactor; /*<< the maximum load factor before the
* underlying table is resized */
unsigned short idx; /*<< the index into the delta array */
unsigned int (*hash)(void *, unsigned int); /*<< a pointer to the hash function */
int (*cmp)(void *, void *); /*<< a pointer to the comparison
* function */
} hashtable_t;
The rehash function looks like this
static void rehash(hashtab_ptr ht)
{
hashtab_ptr prevtable;
/* store reference to the old table */
prevtable->table = ht->table;
htentry_ptr p;
unsigned int i;
unsigned int newidx;
printf("\nrehashing\n");
ht->size = getsize(prevtable);
printf("\nnew table size %d\n", ht->size);
ht->table = calloc(ht->size , sizeof(htentry_t));
for (i = 0; i < prevtable->size; i++) {
for (p = prevtable->table[i]; p; p = p->next_ptr) {
newidx = ht->hash(p->key, ht->size);
if(ht->table[newidx]){
htentry_ptr next;
htentry_ptr prev = NULL;
next = ht->table[newidx];
printf("\ncollision adding to linked list\n");
while (next) {
prev = next;
next = next->next_ptr;
}
prev->next_ptr = p;
p->next_ptr = NULL;
} else {
ht->table[newidx] = p;
ht->table[newidx]->next_ptr = NULL;
ht->num_entries++;
}
}
}
}
inserting into the hashtable. When the table gets too dense the rehash function is called at the end of the insert.
int ht_insert(hashtab_ptr ht, void *key, void *value)
{
/* key is the id of the variable like num1 and value is number
index = value
*/
unsigned int N = ht->size;
unsigned int ne;
float current_loadfactor;
int k;
htentry_ptr p;
p = calloc(1,sizeof(htentry_t));
p->key = key;
p->value = value;
k = ht->hash(key, ht->size);
if(ht->table[k]){
htentry_ptr next;
htentry_ptr prev = NULL;
/* theres already something in the index*/
next = ht->table[k];
printf("\ncollision adding to linked list");
while (next) {
prev = next;
next = next->next_ptr;
}
ht->num_entries++;
prev->next_ptr = p;
p->next_ptr = NULL;
} else {
ht->table[k] = p;
ht->table[k]->next_ptr = NULL;
ht->num_entries++;
}
ne = ht->num_entries;
current_loadfactor = ne / N;
if (current_loadfactor > ht->max_loadfactor) {
rehash(ht);
}
Also can I just point the original ht to the newht at the end?
No.
The pointer ht is a copy on the local function stack. Changing the value with ht = newht; just changes the copy.
The easiest solution would be to let your rehash() function return the pointer to the new hashtable.
static hashtab_ptr rehash(hashtab_ptr ht)
{
[...]
return newht;
}
Then you can call it like:
current_ht = rehash(current_ht);
The second solution would be to change the prototype to pass a double pointer:
static void rehash(hashtab_ptr *ht)
{
[...]
*ht = newht;
}
This means that you need to change the use of ht everywhere in your rehash() function to reflect that it's a double pointer now.
The third solution would be to not create a new hashtable_t, but just create a new htentry_ptr *table area and update the values in ht; This would be my favorite solution in a code review.
I need help, where am I going wrong in traversing the old table?
while (next)
{
prev = next;
next = next->next_ptr;
newht->num_entries++;
}
The newht->num_entries++; is at the wrong place. When you look for the end of the linked list, the elements that are already there don't increase the size of your hashtable. You can move the expression newht->num_entries++; out of both if/else - your table increases by one no matter if there is a collision or not.
Second, at the end of the linked list loop it will look like this:
prev = [last_element of linked list];
next = null;
prev->next_ptr = old_element;
But.. where does old_element->next_ptr point to? There is no guarantee that it is null.
So you need to add p->next_ptr = NULL; so that an element that wasn't formerly at the end of the collision and is now at the end of the collision properly ends the linked list.
The problem is you can't just do p->next_ptr = NULL; because then your loop thinks it's at the end. Your concept is screwed when a linked list element in the middle of the linked list gets reassigned to a new index in the new hashtable. The element can't have the correct value for the old and the new table in next_ptr at the same time.
So, there are two solutions:
a) Go backwards through your collision list, but as this is a single linked list as it seems, this is a very painful process of putting elements on a stack.
b) Rehash the table by creating new elements instead of trying to reuse the old elements.
EDIT:
Okay, with the insert function, the rehash function can look like this (quick & dirty):
static hashtab_ptr rehash(hashtab_ptr ht)
{
hashtab_ptr prevtable = ht;
hashtab_ptr newht;
htentry_ptr p;
unsigned int i;
unsigned int newidx;
printf("\nrehashing");
newht->idx = prevtable->idx + 1;
newht->size = getsize(prevtable);
newht->num_entries = 0;
newht->hash = prevtable->hash;
newht->cmp = prevtable->cmp;
newht->max_loadfactor = prevtable->max_loadfactor;
newht->table = calloc(newht->size , sizeof(htentry_t));
for (i = 0; i < ht->size; i++) {
for (p = ht->table[i]; p; p = p->next_ptr) {
ht_insert(newht, p->key, p->value);
}
return newht;
}
Then you should have a function to free a hashtable, so you end up using it:
if (current_loadfactor > ht->max_loadfactor) {
hashtab_ptr tempht = ht;
ht = rehash(ht);
ht_delete(tempht);
}
This is intended to show that:
you only need to reallocate the table[] member, not the envelope
pointers to pointers can simplify things
when moving an element from the old table to the new one, you should take care not to damage its next pointer
[Note: I removed the typedefines, because I hate them ...]
#include <stdio.h>
#include <stdlib.h>
struct hashentry {
struct hashentry *next;
char *key;
void *payload;
};
struct hashtable {
struct hashentry **table; /*<< a pointer to array of pointers */
unsigned int size; /*<< current size */
unsigned int num_entries; /*<< current number of entries */
float max_loadfactor;
/* unsigned short idx; the index into the delta array(Quoi?) */
unsigned int (*hash)(void *, unsigned int); /*<< a pointer to the hash function */
int (*cmp)(void *, void *); /*<< a pointer to the comparison function */
};
static void rehash(struct hashtable *ht);
// The rehash function could look like this
static void rehash(struct hashtable *ht)
{
struct hashentry **newtab;
struct hashentry **pp, **qq, *this;
unsigned int newsize, oldidx, newidx;
newsize = ht->size * 2; /* or something like (max_loadfactor*num_entries), rounded up */
fprintf(stderr, "new table size %u\n", newsize);
newtab = malloc(newsize * sizeof *newtab );
for (newidx=0; newidx < newsize; newidx++) {
newtab[newidx] = NULL;
}
for (oldidx = 0; oldidx < ht->size; oldidx++) {
for (pp = &ht->table[oldidx]; *pp; ) {
this = *pp;
*pp = this->next; /* this is important ! */
this->next = NULL; /* ... because ... */
newidx = ht->hash(this->key, newsize);
for(qq = &newtab[newidx]; *qq; qq = &(*qq)->next) {
/* You could count the number of "collisions" here */
}
*qq = this;
}
}
free(ht->table);
ht->table = newtab;
ht->size = newsize;
/* The rest of the fields does not need to change */
}
I think is might be the solution, but im not 100% sure.
static void rehash(hashtab_ptr ht)
{
unsigned int old_size, new_size;
unsigned int newindex;
unsigned int i;
htentry_ptr q, p;
htentry_ptr *new_table;
old_size = ht->size;
/*gets new size in prime table */
new_size = getsize(ht);
new_table = malloc(sizeof(htentry_t) * new_size);
/* nullify the new table */
for (i = 0; i < new_size; i++) {
new_table[i] = NULL;
}
printf("\n*****rehashing******\n");
ht->size = new_size;
printf("%s %d\n", "new size:", new_size);
for (i = 0; i < old_size; i++) {
p = ht->table[i];
while (p) {
q = p->next_ptr;
newindex = ht->hash(p->key, new_size);
/*
temp = malloc(sizeof(htentry_t));
temp->key = p->key;
temp->value = p->value;
temp->next_ptr = new_table[ht->hash(temp->key, next_size)];
new_table[ht->hash(temp->key, next_size)] = temp;
*/
if (new_table[newindex]) {
p->next_ptr = new_table[newindex];
new_table[newindex] = p;
} else {
new_table[newindex] = p;
new_table[newindex]->next_ptr = NULL;
}
p = q;
}
}
free(ht->table);
ht->table = new_table;
}

Doubling an array that points to a linked list in C

I am implementing a hashset in C, where my array points to a linked list
this is the linked list:
typedef struct hashnode hashnode;
struct hashnode {
char *word;
// will hold our word as a string
hashnode *link;
//will be used only if chaining
};
and this is the Hashset:
struct hashset {
size_t size;
//size of entire array
size_t load;
//number of words total
hashnode **chains;
//linked list (if words have same index);
};
Now I am having a problem with my double array code
I believe there is a dangling pointer somewhere
here is the code:
void dbl_array(hashset *this) {
size_t newlen = this->size +1;
newlen *= 2;
//double siz
hashnode **new_array = malloc(newlen * sizeof(hashnode*));
//new array
int array_end = (int)this->size;//load;
//end of old array
for(int i = 0; i < array_end; i++) {
//loop through old
int index = i;
if(this->chains[index] == NULL) {
continue;
}
else {
hashnode *nod;
int i=0;
for(nod = this->chains[index]; nod != NULL; nod = nod->link) {
if(nod == NULL)
return;
size_t tmp = strhash(nod->word) % newlen;
//compute hash
hashnode *newnod;
newnod = malloc(sizeof(hashnode*));
newnod->word = strdup(nod->word);
newnod->link = NULL;
if(new_array[tmp] == NULL) {
//if new array does not already have a word at index
new_array[tmp] = newnod;
}
else {
//if word is here then link to old one
newnod->link = new_array[tmp];
new_array[tmp] = newnod;
}
printf("newarray has: %s # {%d} \n", new_array[tmp]->word, tmp);
//testing insertion
i++;
}
free(nod);
}
}
this->chains = new_array;
this->size = newlen;
free(new_array);
printf("new size %d\n", this->size);
}
So after running GDB, I am finding that there is something wrong when I add the new node
There is no reason at all to allocate new collision nodes for a hash table expansion. The algorithm for expanding your hash table is relatively straight forward:
compute new table size
allocate new table
enumerate all chains in old table
for each chain, enumerate all nodes
for each node, compute new hash based on new table size
move node to appropriate slot in new table
When the above is done, so are you. Just wire up the new table to the hashset and make sure to update the size member to the new size. The old table is discarded.
The following code assumes you have properly managed your hash table prior to doubling. With that:
All unused table slots are properly NULL
All collision lists are properly NULL-terminated.
If you can't guarantee both of those conditions, doubling the size of your hash table is the least of your worries.
void hashset_expand(hashset* hs)
{
size_t new_size = 2 * (1 + hs->size), i, idx;
hash node *next, *nod, **tbl = calloc(new_size, sizeof(*tbl));
// walk old table, and each chain within it.
for (i=0; i<hs->size; ++i)
{
next = hs->chains[i];
while (next)
{
nod = next;
next = next->link; // must be done **before** relink
idx = strhash(nod->word) % new_size;
nod->link = tbl[idx];
tbl[idx] = nod;
}
}
// finish up, deleting the old bed.
free(hs->chains);
hs->chains = tbl;
hs->size = new_size;
}
That is all there is to it. Don't make it more complicated than that.

how can i create a dynamic array of a hash table in c

i have the following bucket entry structure and hash table set up
typedef struct Hash_Entry
{
struct Hash_Entry *next;
void *key_Data;
unsigned key_hash;
char key[5];
} Hash_Entry;
typedef struct Hash_Table
{
struct Hash_Entry **bucketPtr; /* Buckets in the table */
int size; /* Actual size of array. */
int numEntries; /* Number of entries in the table. */
int mask; /* Used to select bits for hashing. */
} Hash_Table;
I want to create an array(or a dynamic array) of this Hash_Table so that when I feel the table is full I can create another table instead of re sizing it
Something like:
void hash_table_init(Hash_Table *table, size_t entries)
{
size_t i;
table->size = 0;
table->numEntries = entries;
table->bucketPtr = malloc(table->numEntries * sizeof *table->bucketPtr);
for(i = 0; i < table->numEntries; i++)
table->bucketPtr[i] = NULL;
table->mask = 0; /* Not sure how to initialize this. */
}
I don't quite see the point of leaving the initial buckets as pointers, I'd probably just do
typedef struct {
...
Hash_Entry *buckets;
...
} Hash_Table;
Assuming that most buckets will actually be used, so why not have them. :)
you can create an array using malloc from stdlib
Hash_Table* array = (Hash_Table*)malloc(sizeof(Hash_Table) * 100);
and when the array is full you can do a realloc.
you can have a look at:
Create dynamic sized array of user defined structure

C programming question on the implementation of a hash table

I have a C programming question on the implementation of a hash table. I have implemented the hash table for storing some strings.
I am having a problem while dealing with hash collisons. I am following a chaining linked-list approach to overcome the problem but, somehow, my code is behaving differently. I am not able to debug it. Can somebody help?
This is what I am facing:
Say first time, I insert a string called gaur. My hash map calculates the index as 0 and inserts the string successfully. However, when another string whose hash also, when calculated, turns out to be 0, my previous value gets overrwritten i.e. gaur will be replaced by new string.
This is my code:
struct list
{
char *string;
struct list *next;
};
struct hash_table
{
int size; /* the size of the table */
struct list **table; /* the table elements */
};
struct hash_table *create_hash_table(int size)
{
struct hash_table *new_table;
int i;
if (size<1) return NULL; /* invalid size for table */
/* Attempt to allocate memory for the table structure */
if ((new_table = malloc(sizeof(struct hash_table))) == NULL) {
return NULL;
}
/* Attempt to allocate memory for the table itself */
if ((new_table->table = malloc(sizeof(struct list *) * size)) == NULL) {
return NULL;
}
/* Initialize the elements of the table */
for(i=0; i<size; i++)
new_table->table[i] = '\0';
/* Set the table's size */
new_table->size = size;
return new_table;
}
unsigned int hash(struct hash_table *hashtable, char *str)
{
unsigned int hashval = 0;
int i = 0;
for(; *str != '\0'; str++)
{
hashval += str[i];
i++;
}
return (hashval % hashtable->size);
}
struct list *lookup_string(struct hash_table *hashtable, char *str)
{
printf("\n enters in lookup_string \n");
struct list * new_list;
unsigned int hashval = hash(hashtable, str);
/* Go to the correct list based on the hash value and see if str is
* in the list. If it is, return return a pointer to the list element.
* If it isn't, the item isn't in the table, so return NULL.
*/
for(new_list = hashtable->table[hashval]; new_list != NULL;new_list = new_list->next)
{
if (strcmp(str, new_list->string) == 0)
return new_list;
}
printf("\n returns NULL in lookup_string \n");
return NULL;
}
int add_string(struct hash_table *hashtable, char *str)
{
printf("\n enters in add_string \n");
struct list *new_list;
struct list *current_list;
unsigned int hashval = hash(hashtable, str);
printf("\n hashval = %d", hashval);
/* Attempt to allocate memory for list */
if ((new_list = malloc(sizeof(struct list))) == NULL)
{
printf("\n enters here \n");
return 1;
}
/* Does item already exist? */
current_list = lookup_string(hashtable, str);
if (current_list == NULL)
{
printf("\n DEBUG Purpose \n");
printf("\n NULL \n");
}
/* item already exists, don't insert it again. */
if (current_list != NULL)
{
printf("\n Item already present...\n");
return 2;
}
/* Insert into list */
printf("\n Inserting...\n");
new_list->string = strdup(str);
new_list->next = NULL;
//new_list->next = hashtable->table[hashval];
if(hashtable->table[hashval] == NULL)
{
hashtable->table[hashval] = new_list;
}
else
{
struct list * temp_list = hashtable->table[hashval];
while(temp_list->next!=NULL)
temp_list = temp_list->next;
temp_list->next = new_list;
hashtable->table[hashval] = new_list;
}
return 0;
}
I haven't checked to confirm, but this line looks wrong:
hashtable->table[hashval] = new_list;
This is right at the end of the last case of add_string. You have:
correctly created the new struct list to hold the value being added
correctly found the head of the linked list for that hashvalue, and worked your way to the end of it
correctly put the new struct list at the end of the linked list
BUT then, with the line I quote above, you are telling the hash table to put the new struct list at the head of the linked list for this hashvalue! Thus throwing away the whole linked list that was there before.
I think you should omit the line I quote above, and see how you get on. The preceding lines are correctly appending it to the end of the existing list.
The statement hashtable->table[hashval] = new_list; is the culprit. You insrted the new_list ( I think better name would have been new_node) at end of the linked list. But then you overwrite this linked list with new_list which is just a single node. Just remove this statement.
As others have already pointed out, you are walking to the end of the list with temp_list, appending new_list to it, then throwing away the existing list.
Since the same value NULL is used to indicate an empty bucket and the end of the list, it's quite a bit easier to put the new item at the head of the list.
You also should do any test which would result in the new item not being added before creating it, otherwise you will leak the memory.
I would also have an internal lookup function that takes the hash value, otherwise you have to calculate it twice
int add_string(struct hash_table *hashtable, char *str)
{
unsigned int hashval = hash(hashtable, str);
/* item already exists, don't insert it again. */
if (lookup_hashed_string(hashtable, hashval, str))
return 2;
/* Attempt to allocate memory for list */
struct list *new_list = malloc(sizeof(struct list));
if (new_list == NULL)
return 1;
/* Insert into list */
new_list->string = strdup(str);
new_list->next = hashtable->table[hashval];
hashtable->table[hashval] = new_list;
return 0;
}
The hash function must be a function which take your data in entry and return delimited id (eg: integer between 0 and HASH_MAX)
Then you must stock your element in a list in the Hash(data) index of a hash_table array. if a data have the same hash, it will be stock in the same list as the previous data.
struct your_type_list {
yourtype data;
yourtype *next_data;
};
struct your_type_list hash_table[HASH_MAX];

Resources