need help rehashing a hashtable in c - c

I want to rehash a hashtable by allocating space for a new table, traverse the old table, and for each element, compute a
new hash value and then link it into the new table. I have linked lists as entries into the hashtable thus the second for loop whilst traversing the old hashtable. I also want to free the old table, but first get the elements into the new table correctly.
I need help, where am I going wrong in traversing the old table? Also can I just point the original ht to the newht at the end? I need to free the old table(prevtable) afterwards also, which I will figure out later.
typedef struct hashtable {
htentry_ptr *table; /*<< a pointer to the underlying table */
unsigned int size; /*<< the current size of the underlying table */
unsigned int num_entries; /*<< the current number of entries */
float max_loadfactor; /*<< the maximum load factor before the
* underlying table is resized */
unsigned short idx; /*<< the index into the delta array */
unsigned int (*hash)(void *, unsigned int); /*<< a pointer to the hash function */
int (*cmp)(void *, void *); /*<< a pointer to the comparison
* function */
} hashtable_t;
The rehash function looks like this
static void rehash(hashtab_ptr ht)
{
hashtab_ptr prevtable;
/* store reference to the old table */
prevtable->table = ht->table;
htentry_ptr p;
unsigned int i;
unsigned int newidx;
printf("\nrehashing\n");
ht->size = getsize(prevtable);
printf("\nnew table size %d\n", ht->size);
ht->table = calloc(ht->size , sizeof(htentry_t));
for (i = 0; i < prevtable->size; i++) {
for (p = prevtable->table[i]; p; p = p->next_ptr) {
newidx = ht->hash(p->key, ht->size);
if(ht->table[newidx]){
htentry_ptr next;
htentry_ptr prev = NULL;
next = ht->table[newidx];
printf("\ncollision adding to linked list\n");
while (next) {
prev = next;
next = next->next_ptr;
}
prev->next_ptr = p;
p->next_ptr = NULL;
} else {
ht->table[newidx] = p;
ht->table[newidx]->next_ptr = NULL;
ht->num_entries++;
}
}
}
}
inserting into the hashtable. When the table gets too dense the rehash function is called at the end of the insert.
int ht_insert(hashtab_ptr ht, void *key, void *value)
{
/* key is the id of the variable like num1 and value is number
index = value
*/
unsigned int N = ht->size;
unsigned int ne;
float current_loadfactor;
int k;
htentry_ptr p;
p = calloc(1,sizeof(htentry_t));
p->key = key;
p->value = value;
k = ht->hash(key, ht->size);
if(ht->table[k]){
htentry_ptr next;
htentry_ptr prev = NULL;
/* theres already something in the index*/
next = ht->table[k];
printf("\ncollision adding to linked list");
while (next) {
prev = next;
next = next->next_ptr;
}
ht->num_entries++;
prev->next_ptr = p;
p->next_ptr = NULL;
} else {
ht->table[k] = p;
ht->table[k]->next_ptr = NULL;
ht->num_entries++;
}
ne = ht->num_entries;
current_loadfactor = ne / N;
if (current_loadfactor > ht->max_loadfactor) {
rehash(ht);
}

Also can I just point the original ht to the newht at the end?
No.
The pointer ht is a copy on the local function stack. Changing the value with ht = newht; just changes the copy.
The easiest solution would be to let your rehash() function return the pointer to the new hashtable.
static hashtab_ptr rehash(hashtab_ptr ht)
{
[...]
return newht;
}
Then you can call it like:
current_ht = rehash(current_ht);
The second solution would be to change the prototype to pass a double pointer:
static void rehash(hashtab_ptr *ht)
{
[...]
*ht = newht;
}
This means that you need to change the use of ht everywhere in your rehash() function to reflect that it's a double pointer now.
The third solution would be to not create a new hashtable_t, but just create a new htentry_ptr *table area and update the values in ht; This would be my favorite solution in a code review.
I need help, where am I going wrong in traversing the old table?
while (next)
{
prev = next;
next = next->next_ptr;
newht->num_entries++;
}
The newht->num_entries++; is at the wrong place. When you look for the end of the linked list, the elements that are already there don't increase the size of your hashtable. You can move the expression newht->num_entries++; out of both if/else - your table increases by one no matter if there is a collision or not.
Second, at the end of the linked list loop it will look like this:
prev = [last_element of linked list];
next = null;
prev->next_ptr = old_element;
But.. where does old_element->next_ptr point to? There is no guarantee that it is null.
So you need to add p->next_ptr = NULL; so that an element that wasn't formerly at the end of the collision and is now at the end of the collision properly ends the linked list.
The problem is you can't just do p->next_ptr = NULL; because then your loop thinks it's at the end. Your concept is screwed when a linked list element in the middle of the linked list gets reassigned to a new index in the new hashtable. The element can't have the correct value for the old and the new table in next_ptr at the same time.
So, there are two solutions:
a) Go backwards through your collision list, but as this is a single linked list as it seems, this is a very painful process of putting elements on a stack.
b) Rehash the table by creating new elements instead of trying to reuse the old elements.
EDIT:
Okay, with the insert function, the rehash function can look like this (quick & dirty):
static hashtab_ptr rehash(hashtab_ptr ht)
{
hashtab_ptr prevtable = ht;
hashtab_ptr newht;
htentry_ptr p;
unsigned int i;
unsigned int newidx;
printf("\nrehashing");
newht->idx = prevtable->idx + 1;
newht->size = getsize(prevtable);
newht->num_entries = 0;
newht->hash = prevtable->hash;
newht->cmp = prevtable->cmp;
newht->max_loadfactor = prevtable->max_loadfactor;
newht->table = calloc(newht->size , sizeof(htentry_t));
for (i = 0; i < ht->size; i++) {
for (p = ht->table[i]; p; p = p->next_ptr) {
ht_insert(newht, p->key, p->value);
}
return newht;
}
Then you should have a function to free a hashtable, so you end up using it:
if (current_loadfactor > ht->max_loadfactor) {
hashtab_ptr tempht = ht;
ht = rehash(ht);
ht_delete(tempht);
}

This is intended to show that:
you only need to reallocate the table[] member, not the envelope
pointers to pointers can simplify things
when moving an element from the old table to the new one, you should take care not to damage its next pointer
[Note: I removed the typedefines, because I hate them ...]
#include <stdio.h>
#include <stdlib.h>
struct hashentry {
struct hashentry *next;
char *key;
void *payload;
};
struct hashtable {
struct hashentry **table; /*<< a pointer to array of pointers */
unsigned int size; /*<< current size */
unsigned int num_entries; /*<< current number of entries */
float max_loadfactor;
/* unsigned short idx; the index into the delta array(Quoi?) */
unsigned int (*hash)(void *, unsigned int); /*<< a pointer to the hash function */
int (*cmp)(void *, void *); /*<< a pointer to the comparison function */
};
static void rehash(struct hashtable *ht);
// The rehash function could look like this
static void rehash(struct hashtable *ht)
{
struct hashentry **newtab;
struct hashentry **pp, **qq, *this;
unsigned int newsize, oldidx, newidx;
newsize = ht->size * 2; /* or something like (max_loadfactor*num_entries), rounded up */
fprintf(stderr, "new table size %u\n", newsize);
newtab = malloc(newsize * sizeof *newtab );
for (newidx=0; newidx < newsize; newidx++) {
newtab[newidx] = NULL;
}
for (oldidx = 0; oldidx < ht->size; oldidx++) {
for (pp = &ht->table[oldidx]; *pp; ) {
this = *pp;
*pp = this->next; /* this is important ! */
this->next = NULL; /* ... because ... */
newidx = ht->hash(this->key, newsize);
for(qq = &newtab[newidx]; *qq; qq = &(*qq)->next) {
/* You could count the number of "collisions" here */
}
*qq = this;
}
}
free(ht->table);
ht->table = newtab;
ht->size = newsize;
/* The rest of the fields does not need to change */
}

I think is might be the solution, but im not 100% sure.
static void rehash(hashtab_ptr ht)
{
unsigned int old_size, new_size;
unsigned int newindex;
unsigned int i;
htentry_ptr q, p;
htentry_ptr *new_table;
old_size = ht->size;
/*gets new size in prime table */
new_size = getsize(ht);
new_table = malloc(sizeof(htentry_t) * new_size);
/* nullify the new table */
for (i = 0; i < new_size; i++) {
new_table[i] = NULL;
}
printf("\n*****rehashing******\n");
ht->size = new_size;
printf("%s %d\n", "new size:", new_size);
for (i = 0; i < old_size; i++) {
p = ht->table[i];
while (p) {
q = p->next_ptr;
newindex = ht->hash(p->key, new_size);
/*
temp = malloc(sizeof(htentry_t));
temp->key = p->key;
temp->value = p->value;
temp->next_ptr = new_table[ht->hash(temp->key, next_size)];
new_table[ht->hash(temp->key, next_size)] = temp;
*/
if (new_table[newindex]) {
p->next_ptr = new_table[newindex];
new_table[newindex] = p;
} else {
new_table[newindex] = p;
new_table[newindex]->next_ptr = NULL;
}
p = q;
}
}
free(ht->table);
ht->table = new_table;
}

Related

Problem with implementing a function to reverse a linked list in C

So I wanted to write a function to reverse a linked list using an array of pointers but I'm getting warnings: assignment from incompatible pointer type [-Wincompatible-pointer-types]. I wanted to store the pointers to nodes of the list in an array of pointers int **s = (int **)calloc(10, sizeof(int)); and thought that s[*top] = *l will assign the pointer to which **l is pointing to *topth element of array *s[]. So am I wrong thinking that elements of array *s[] are pointers? If someone could explain it to me I'd be very glad. Here's the whole code (except the part where I create the list which is fine):
typedef struct list {
int v;
struct list *next;
} list;
void reverseListS(list **l, int **s, int *top) {
while ((*l)->next != NULL) {
s[*top] = *l;
*top++;
*l = (*l)->next;
}
list *temp = *l;
while (!(*top == 0)) {
temp->next = s[*top];
*top--;
temp = temp->next;
}
temp->next = NULL;
}
int main() {
int **s = (int **)calloc(10, sizeof(int));
int *top = 0;
reverseListS(&l, s, top);
}
Many issues. Just in main: Should be sizeof(int *) (or sizeof *s). Although, I think you want s to be an array of ints, so it should be an int *. And top does not point anywhere - why is it even a pointer?. l is not initialized.
In reverseListS at s[*top] = *l; you are trying to assign a struct list * to an int *.
I have re-written your code to work. I'm not saying this is the best way to reverse a list, but it makes the fewest modifications to your code - as I understand it.
typedef struct list {
int v;
struct list *next;
} list;
void reverseListS(list **l)
{
// Count number of items
// *this step could be skipped by dynamically resizing the array with realloc
int count = 0;
list *temp = *l;
while (temp) {
count += 1;
temp = temp->next;
}
// Allocate memory - an array of list *
list **s = malloc(count * (sizeof *s));
if (!s) return;
// Copy list item addresses to array
temp = *l;
int index = 0;
while (temp) {
s[index++] = temp;
temp = temp->next;
}
// Rebuild the list in reverse order
// *if you already have an "append_to_list" function, that should be used here
temp = NULL;
for (int i = index - 1; i >= 0; i--) {
if (!temp) {
// This is the new first item in list.
// Make the original list point to it
*l = temp = s[i];
}
else {
// Append to end of new list
temp->next = s[i];
temp = s[i];
}
s[i]->next = NULL;
}
free(s);
}
int main() {
list *l;
// TODO: Fill the list with values.
reverseListS(&l);
}

How to add to a linked list within a separate chaining hash table in C

Here are the relevant structs for my question.
//A SymEntry is the building block for linked lists of (name, attribute) pairs
typedef struct SymEntry {
char * name;
void * attribute;
struct SymEntry * next;
} SymEntry;
/*
Each symbol table is represented by a SymTab
size is the current number of lists in the separate chaining hash table
contents is an array of lists (i.e. points to the zeroth element in the array)
if current is not NULL it points to the current (name, attribute) pair in the symbol table
*/
typedef struct {
int size;
SymEntry ** contents;
SymEntry *current;
} SymTab;
I have a project to create a Symbol Table in c. We are to implement a separate chaining hash table to accomplish this. I believe that I created the initial, empty hash table correctly. Below is my implementation of that.
SymTab * createSymTab(int size) {
int i;
SymTab *symbolTable = malloc(sizeof(SymTab));
symbolTable->contents = (SymEntry**)malloc(size * sizeof(SymEntry));
symbolTable->current = (SymEntry*)malloc(sizeof(SymEntry));
symbolTable->size = size;
for (i=0; i<size; i++) {
SymEntry *newEntry = malloc(sizeof(SymEntry));
newEntry -> name = NULL;
newEntry -> attribute = NULL;
newEntry -> next = NULL;
symbolTable->contents[i] = newEntry;
}
symbolTable->current = NULL;
return symbolTable;
}
I seem to also have it working to where it can add the first node (SymEntry) in the linked list. Below is my code to add an entry, along with my hash method.
int enterName(SymTab * table, char *name) {
if (findName(table, name) == 0) {
int size = table->size;
int hashNum = hash(name, &size);
SymEntry *head = table->contents[hashNum];
printf("Hash Number is %d\n", hashNum);
if (head->name == NULL) {
printf("Head is null\n");
head->name = name;
head->attribute = NULL;
}
else {
printf("Head is not null\n");
SymEntry *newNode = malloc(sizeof(SymEntry));
newNode->name = name;
newNode->attribute = NULL;
newNode->next = head;
head = newNode;
}
return 0;
}
return 1;
}
int hash(char *key, int * size) {
int hash = 0;
int i = 0;
int sizeOfNum = *size;
printf("Key Value: %s Size of Number: %d\n", key, sizeOfNum);
while (key && key[i]) {
hash = (hash + key[i] % sizeOfNum);
i++;
}
return hash % sizeOfNum;
}
Lastly, the below code is what I am using to test things out. If my understanding of everything is correct, the name that should be printing is Jess, my second entry, but instead I am only seeing Wes. Both of these names hash out to the same number, which in this case would be 5. What exactly am I doing wrong when I go to add a node (SymEntry) to the list? My output recognizes that the head is not empty when I go to add Jess, so I know the first entry works.
int main(void) {
SymTab * symbolTable = createSymTab(6);
enterName(symbolTable, "wes");
enterName(symbolTable, "jess");
SymEntry * example = symbolTable->contents[5];
printf("%s\n", example->name);
return 0;
}

How to deal with old references to a resized hash table?

I'm currently working on a hash table implementation in C. I'm trying to implement dynamic resizing, but came across a problem.
If resizing a hash table means creating a new one with double (or half) the size, rehashing, and deleting the old one, how can I deal with old references the user may have made to the old table? Example code (I've omitted error checking just for this example):
int main(int argc, char *argv[])
{
ht = ht_create(5) /* make hashtable with size 5 */
ht_insert("john", "employee"); /* key-val pair "john -> employee" */
ht_insert("alice", "employee");
char *position = ht_get(ht, "alice"); /* get alice's position from hashtable ht */
ht_insert("bob", "boss"); /* this insert exceeds the load factor, resizes the hash table */
printf("%s", position); /* returns NULL because the previous hashtable that was resized was freed */
return 0;
}
In this case position pointed to alice's value which was found in the hashtable. When it was resized, we freed the hash table and lost it. How can I fix this problem, so the user won't have to worry that a previously defined pointer was freed?
EDIT: my current hash table implementation
hash.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "hash.h"
#define LOADFACTOR 0.75
typedef struct tableentry /* hashtab entry */
{
struct tableentry *next;
char *key;
void *val;
} tableentry_t;
typedef struct hashtable
{
datatype_t type;
size_t size;
size_t load; /* number of keys filled */
struct tableentry **tab;
} hashtable_t;
/* creates hashtable */
/* NOTE: dynamically allocated, remember to ht_free() */
hashtable_t *ht_create(size_t size, datatype_t type)
{
hashtable_t *ht = NULL;
if ((ht = malloc(sizeof(hashtable_t))) == NULL)
return NULL;
/* allocate ht's table */
if ((ht->tab = malloc(sizeof(tableentry_t) * size)) == NULL)
return NULL;
/* null-initialize table */
size_t i;
for (i = 0; i < size; i++)
ht->tab[i] = NULL;
ht->size = size;
ht->type = type;
return ht;
}
/* creates hash for a hashtab */
static unsigned hash(char *s)
{
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval;
}
static int *intdup(int *i)
{
int *new;
if ((new = malloc(sizeof(int))) == NULL)
return NULL;
*new = *i;
return new;
}
static void free_te(tableentry_t *te)
{
free(te->key);
free(te->val);
free(te);
}
/* loops through linked list freeing */
static void free_te_list(tableentry_t *te)
{
tableentry_t *next;
while (te != NULL)
{
next = te->next;
free_te(te);
te = next;
}
}
/* creates a key-val pair */
static tableentry_t *alloc_te(char *k, void *v, datatype_t type)
{
tableentry_t *te = NULL;
int status = 0;
/* alloc struct */
if ((te = calloc(1, sizeof(*te))) == NULL)
status = -1;
/* alloc key */
if ((te->key = strdup(k)) == NULL)
status = -1;
/* alloc value */
int *d;
char *s;
switch (type)
{
case STRING:
s = (char *) v;
if ((te->val = strdup(s)) == NULL)
status = -1;
break;
case INTEGER:
d = (int *) v;
if ((te->val = intdup(d)) == NULL)
status = -1;
break;
default:
status = -1;
}
if (status < 0)
{
free_te_list(te);
return NULL;
}
te->next = NULL;
return te;
}
static tableentry_t *lookup(hashtable_t *ht, char *k)
{
tableentry_t *te;
/* step through linked list */
for (te = ht->tab[hash(k) % ht->size]; te != NULL; te = te->next)
if (strcmp(te->key, k) == 0)
return te; /* found */
return NULL; /* not found */
}
/* inserts the key-val pair */
hashtable_t *ht_insert(hashtable_t *ht, char *k, void *v)
{
tableentry_t *te;
/* unique entry */
if ((te = lookup(ht, k)) == NULL)
{
te = alloc_te(k, v, ht->type);
unsigned hashval = hash(k) % ht->size;
/* insert at beginning of linked list */
te->next = ht->tab[hashval];
ht->tab[hashval] = te;
ht->load++;
}
/* replace val of previous entry */
else
{
free(te->val);
switch (ht->type)
{
case STRING:
if ((te->val = strdup(v)) == NULL)
return NULL;
break;
case INTEGER:
if ((te->val = intdup(v)) == NULL)
return NULL;
break;
default:
return NULL;
}
}
return ht;
}
static void delete_te(hashtable_t *ht, char *k)
{
tableentry_t *te, *prev;
unsigned hashval = hash(k) % ht->size;
te = ht->tab[hashval];
/* point head to next element if deleting head */
if (strcmp(te->key, k) == 0)
{
ht->tab[hashval] = te->next;
free_te(te);
ht->load--;
return;
}
/* otherwise look through, keeping track of prev to reassign its ->next */
for (; te != NULL; te = te->next)
{
if (strcmp(te->key, k) == 0)
{
prev->next = te->next;
free_te(te);
ht->load--;
return;
}
prev = te;
}
}
hashtable_t *ht_delete(hashtable_t *ht, char *k)
{
size_t i;
if (lookup(ht, k) == NULL)
return NULL;
else
delete_te(ht, k);
}
/* retrieve value from key */
void *ht_get(hashtable_t *ht, char *k)
{
tableentry_t *te;
if ((te = lookup(ht, k)) == NULL)
return NULL;
return te->val;
}
/* frees hashtable created from ht_create() */
void ht_free(hashtable_t *ht)
{
size_t i;
if (ht)
{
for (i = 0; i < ht->size; i++)
if (ht->tab[i] != NULL)
free_te_list(ht->tab[i]);
free(ht);
}
}
/* resizes hashtable, returns new hashtable and frees old */
static hashtable_t *resize(hashtable_t *oht, size_t size)
{
hashtable_t *nht; /* new hashtable */
nht = ht_create(size, oht->type);
/* rehash */
size_t i;
tableentry_t *te;
/* loop through hashtable */
for (i = 0; i < oht->size; i++)
/* loop through linked list */
for (te = oht->tab[i]; te != NULL; te = te->next)
/* insert & rehash old vals into new ht */
if (ht_insert(nht, te->key, te->val) == NULL)
return NULL;
ht_free(oht);
return nht;
}
hash.h
/* a hash-table implementation in c */
/*
hashing algorithm: hashval = *s + 31 * hashval
resolves collisions using linked lists
*/
#ifndef HASH
#define HASH
typedef struct hashtable hashtable_t;
typedef enum datatype {STRING, INTEGER} datatype_t;
/* inserts the key-val pair */
hashtable_t *ht_insert(hashtable_t *ht, char *k, void *v);
/* creates hashtable */
/* NOTE: dynamically allocated, remember to ht_free() */
hashtable_t *ht_create(size_t size, datatype_t type);
/* frees hashtable created from ht_create() */
void ht_free(hashtable_t *ht);
/* retrive value from key */
void *ht_get(hashtable_t *ht, char *k);
hashtable_t *ht_delete(hashtable_t *ht, char *k);
#endif
Do not use the hash table as the container for the data; only use it to refer to the data, and you won't have that problem.
For example, let's say you have key-value pairs, using a structure with the actual data in the C99 flexible array member:
struct pair {
struct pair *next; /* For hash chaining */
size_t hash; /* For the raw key hash */
/* Payload: */
size_t offset; /* value starts at (data + offset) */
char data[]; /* key starts at (data) */
};
static inline const char *pair_key(struct pair *ref)
{
return (const char *)(ref->data);
}
static inline const char *pair_value(struct pair *ref)
{
return (const char *)(ref->data + ref->offset);
}
Your hash table can then be simply
struct pair_hash_table {
size_t size;
struct pair **entry;
};
If you have struct pair_hash_table *ht, and struct pair *foo with foo->hash containing the hash of the key, then foo should be in the singly-linked list hanging off ht->entry[foo->hash % ht->size];.
Let's say you wish to resize the hash table ht. You choose a new size, and allocate enough memory for that many struct pair *. Then, you go through each singly-linked list in each old hash entry, detaching them from the old list, and prepending them to the lists in correct hash table entries in the new hash table. Then you just free the old hash table entry array, replacing it with the new one:
int resize_pair_hash_table(struct pair_hash_table *ht, const size_t new_size)
{
struct pair **entry, *curr, *next;
size_t i, k;
if (!ht || new_size < 1)
return -1; /* Invalid parameters */
entry = malloc(new_size * sizeof entry[0]);
if (!entry)
return -1; /* Out of memory */
/* Initialize new entry array to empty. */
for (i = 0; i < new_size; i++)
entry[i] = NULL;
for (i = 0; i < ht->size; i++) {
/* Detach the singly-linked list. */
next = ht->entry[i];
ht->entry[i] = NULL;
while (next) {
/* Detach the next element, as 'curr' */
curr = next;
next = next->next;
/* k is the index to this hash in the new array */
k = curr->hash % new_size;
/* Prepend to the list in the new array */
curr->next = entry[k];
entry[k] = curr;
}
}
/* Old array is no longer needed, */
free(ht->entry);
/* so replace it with the new one. */
ht->entry = entry;
ht->size = size;
return 0; /* Success */
}
Note that the hash field in struct pair is not modified, nor recalculated.
Having the raw hash (as opposed to modulo table-size), means you can speed up the key search even when different keys use the same slot:
struct pair *find_key(struct pair_hash_table *ht,
const char *key, const size_t key_hash)
{
struct pair *curr = ht->entry[key_hash % ht->size];
while (curr)
if (curr->hash == key_hash && !strcmp(key, pair_key(next)))
return curr;
else
curr = curr->next;
return NULL; /* Not found. */
}
In C, the logical and operator, &&, is short-circuiting. If the left side is not true, the right side is not evaluated at all, because the entire expression can never be true in that case.
Above, this means that the raw hash value of the key is compared, and only when they do match, the actual strings are compared. If your hash algorithm is even halfway good, this means that if the key already exists, typically only one string comparison is done; and if the key does not exist in the table, typically no string comparisons are done.
You can deal with them the same way the standard library (C++) deals with this exact problem:
Some operations on containers (e.g. insertion, erasing, resizing) invalidate iterators.
For instance std::unordered_map which is basically a hash table implemented with buckets has these rules:
insertion
unordered_[multi]{set,map}: all iterators invalidated when rehashing
occurs, but references unaffected [23.2.5/8]. Rehashing does not occur
if the insertion does not cause the container's size to exceed z * B
where z is the maximum load factor and B the current number of
buckets. [23.2.5/14]
erasure
unordered_[multi]{set,map}: only iterators and references to the
erased elements are invalidated [23.2.5/13]
Iterator invalidation rules
The C++ concept of iterators is a generalization of pointers. So this concept can be applied to C.
Your only other alternative is that instead of holding the objects directly into the container you add another level of indirection and hold some sort of proxy. And so the elements always stay at the same position in memory. It's the proxies that move around on resizing/inserting etc. But you need to analize this scenario: are the added double indirection (which will surely affect performance in a negative way) and increase implementation complexity worth it? Is is that important to have persistent pointers?

Assigning value to an index in a struct pointer array inside of a struct pointer

I currently having a problem assigning a NODE type object to a NODE* array that is inside of an INV_PAGE_TABLE structure.
The structures look as follows:
typedef struct node {
int pid;
int p;
int offset;
unsigned TAG;
} NODE;
typedef struct invTablePage {
NODE *pageTable;
int frameSize;
int currentSize;
int totalSize;
int oldest;
int maxIndex;
} INV_PAGE_TABLE;
The invTablePage is allocated as follows:
void initInverted(INV_PAGE_TABLE *invTable, int memSize, int frameSize) {
//Malloc inverted page table
invTable = malloc(sizeof(struct invTablePage));
//Save frameSize
invTable->frameSize = frameSize;
//Save totalSize
invTable->totalSize = memSize / frameSize - 1;
//Save currentSize
invTable->currentSize = 0;
//Set oldest
invTable->oldest = 0;
//Malloc array inside of page table
invTable->pageTable = malloc(sizeof(NODE) * invTable->totalSize);
}
And finally the method which invokes a Segmentation Fault
void addToPageTable(struct invTablePage *invTable, NODE *node) {
NODE tempNode;
//If pageTable is not full
int currentSize = invTable->currentSize;
if (invTable->currentSize != invTable->totalSize) {
//Add Entry at index of currentSize
/*FOLLOWING LINE CRASHES PROGRAM*/
invTable->pageTable[currentSize] = node;
//Update currentSize
invTable->currentSize++;
//If pageTable is full
} else {
//Set temp to oldest
tempNode = invTable->pageTable[invTable->oldest];
//Set oldest to node
invTable->pageTable[invTable->oldest] = *node;
}
}
Notice that, in an array [10] for example, index runs from 0 to 9
So your total size should be invTable->totalSize = memSize/frameSize;
And currentSize shouldn't exceed invTable->totalSize - 1.
But I think you have to allocate memSize/frameSize and not memSize/frameSize - 1...
I'm not sure my answer is right, try some printf...

Doubling an array that points to a linked list in C

I am implementing a hashset in C, where my array points to a linked list
this is the linked list:
typedef struct hashnode hashnode;
struct hashnode {
char *word;
// will hold our word as a string
hashnode *link;
//will be used only if chaining
};
and this is the Hashset:
struct hashset {
size_t size;
//size of entire array
size_t load;
//number of words total
hashnode **chains;
//linked list (if words have same index);
};
Now I am having a problem with my double array code
I believe there is a dangling pointer somewhere
here is the code:
void dbl_array(hashset *this) {
size_t newlen = this->size +1;
newlen *= 2;
//double siz
hashnode **new_array = malloc(newlen * sizeof(hashnode*));
//new array
int array_end = (int)this->size;//load;
//end of old array
for(int i = 0; i < array_end; i++) {
//loop through old
int index = i;
if(this->chains[index] == NULL) {
continue;
}
else {
hashnode *nod;
int i=0;
for(nod = this->chains[index]; nod != NULL; nod = nod->link) {
if(nod == NULL)
return;
size_t tmp = strhash(nod->word) % newlen;
//compute hash
hashnode *newnod;
newnod = malloc(sizeof(hashnode*));
newnod->word = strdup(nod->word);
newnod->link = NULL;
if(new_array[tmp] == NULL) {
//if new array does not already have a word at index
new_array[tmp] = newnod;
}
else {
//if word is here then link to old one
newnod->link = new_array[tmp];
new_array[tmp] = newnod;
}
printf("newarray has: %s # {%d} \n", new_array[tmp]->word, tmp);
//testing insertion
i++;
}
free(nod);
}
}
this->chains = new_array;
this->size = newlen;
free(new_array);
printf("new size %d\n", this->size);
}
So after running GDB, I am finding that there is something wrong when I add the new node
There is no reason at all to allocate new collision nodes for a hash table expansion. The algorithm for expanding your hash table is relatively straight forward:
compute new table size
allocate new table
enumerate all chains in old table
for each chain, enumerate all nodes
for each node, compute new hash based on new table size
move node to appropriate slot in new table
When the above is done, so are you. Just wire up the new table to the hashset and make sure to update the size member to the new size. The old table is discarded.
The following code assumes you have properly managed your hash table prior to doubling. With that:
All unused table slots are properly NULL
All collision lists are properly NULL-terminated.
If you can't guarantee both of those conditions, doubling the size of your hash table is the least of your worries.
void hashset_expand(hashset* hs)
{
size_t new_size = 2 * (1 + hs->size), i, idx;
hash node *next, *nod, **tbl = calloc(new_size, sizeof(*tbl));
// walk old table, and each chain within it.
for (i=0; i<hs->size; ++i)
{
next = hs->chains[i];
while (next)
{
nod = next;
next = next->link; // must be done **before** relink
idx = strhash(nod->word) % new_size;
nod->link = tbl[idx];
tbl[idx] = nod;
}
}
// finish up, deleting the old bed.
free(hs->chains);
hs->chains = tbl;
hs->size = new_size;
}
That is all there is to it. Don't make it more complicated than that.

Resources