I don't understand why my program seg faults at this line: if ((**table->table).link == NULL){ I seem to have malloc-ed memory for it, and I tried looking at it with gdb. *table->table was accessible and not NULL, but **table->table was not accessible.
Definition of hash_t:
struct table_s {
struct node_s **table;
size_t bins;
size_t size;
};
typedef struct table_s *hash_t;
void set(hash_t table, char *key, int value){
unsigned int hashnum = hash(key)%table->bins;
printf("%d \n", hashnum);
unsigned int i;
for (i = 0; i<hashnum; i++){
(table->table)++;
}
if (*(table->table) == NULL){
struct node_s n = {key, value, NULL};
struct node_s *np = &n;
*(table->table) = malloc(sizeof(struct node_s));
*(table->table) = np;
}else{
while ( *(table->table) != NULL){
if ((**table->table).link == NULL){
struct node_s n = {key, value, NULL};
struct node_s *np = &n;
(**table->table).link = malloc(sizeof(struct node_s));
(**table->table).link = np;
break;
}else if (strcmp((**table->table).key, key) == 0){
break;
}
*table->table = (**(table->table)).link;
}
if (table->size/table->bins > 1){
rehash(table);
}
}
}
I'm calling set from here:
for (int i = 0; i < trials; i++) {
int sample = rand() % max_num;
sprintf(key, "%d", sample);
set(table, key, sample);
}
Your hashtable works like this: You have bins bins and each bin is a linked list of key / value pairs. All items in a bin share the same hash code modulo the number of bins.
You have probably created the table of bins when you created or initialised the hash table, something like this:
table->table = malloc(table->bins * sizeof(*table->table);
for (size_t i = 0; i < table->bins; i++) table->table[i] = NULL;
Now why does the member table have two stars?
The "inner" star means that the table stores pointers to nodes, not the nodes themselves.
The "outer" start is a handle to allocated memory. If your hash table were of a fixed size, for example always with 256 bins, you could define it as:
struct node_s *table[256];
If you passed this array around, it would become (or "decay into") a pointer to its first element, a struct node_s **, just as the array you got from malloc.
You access the contents of the l´bins via the linked lists and the head of linked list i is table->table[i].
You code has other problems:
What did you want to achieve with (table->table)++? This will make the handle to the allocated memory point not to the first element but tho the next one. After doing that hashnum times, *table->table will now be at the right node, but you will have lost the original handle, which you must retain, because you must pass it to free later when you clean up your hash table. Don't lose the handle to allocated memory! Use another local pointer instead.
You create a local node n and then make a link in your linked list with a pointer to that node. But the node n will be gone after you leave the function and the link will be "stale": It will point to invalid memory. You must also create memory for the node with malloc.
A simple implementation of your has table might be:
void set(hash_t table, char *key, int value)
{
unsigned int hashnum = hash(key) % table->bins;
// create (uninitialised) new node
struct node_s *nnew = malloc(sizeof(*nnew));
// initialise new node, point it to old head
nnew->key = strdup(key);
nnew->value = value;
nnew->link = table->table[hashnum];
// make the new node the new head
table->table[hashnum] = nnew;
}
This makes the new node the head of the linked list. This is not ideal, because if you overwrite items, the new ones will be found (which is good), but the old ones will still be in the table (which isn't good). But that, as they say, is left as an exercise to the reader.
(The strdup function isn't standard, but widely available. It also creates new memory, which you must free later, but it ensures, that the string "lives" (is still valid) after you have ceated the hash table.)
Please not how few stars there are in the code. If there is one star too few, it is in hash_t, where you have typecasted away the pointer nature.
Related
I have a block of pointers to some structs which I want to handle (i.e. free) separately. As an example below there is an integer double-pointer which should keep other pointers to integer. I then would like to free the second of those integer pointers (in my program based on some filterings and calculations). If I do so however, I should keep track of int-pointers already set free so that when I iterate over the pointers in the double-pointer I do not take the risk of working with them further. Is there a better approach for solving this problem (in ANSI-C) without using other libs (e.g. glib or alike)?
Here is a small simulation of the problem:
#include <stdio.h>
#include <stdlib.h>
int main() {
int **ipp=NULL;
for (int i = 0; i < 3; i++) {
int *ip = malloc(sizeof (int));
printf("%p -> ip %d\n", ip, i);
*ip = i * 10;
if ((ipp = realloc(ipp, sizeof (int *) * (i + 1)))) {
ipp[i] = ip;
}
}
printf("%p -> ipp\n", ipp);
for (int i = 0; i < 3; i++) {
printf("%d. %p %p %d\n", i, ipp + i, *(ipp+i), **(ipp + i));
}
// free the middle integer pointer
free(*(ipp+1));
printf("====\n");
for (int i = 0; i < 3; i++) {
printf("%d. %p %p %d\n", i, ipp + i, *(ipp+i), **(ipp + i));
}
return 0;
}
which prints something like
0x555bcc07f2a0 -> ip 0
0x555bcc07f6f0 -> ip 1
0x555bcc07f710 -> ip 2
0x555bcc07f6d0 -> ipp
0. 0x555bcc07f6d0 0x555bcc07f2a0 0
1. 0x555bcc07f6d8 0x555bcc07f6f0 10
2. 0x555bcc07f6e0 0x555bcc07f710 20
====
0. 0x555bcc07f6d0 0x555bcc07f2a0 0
1. 0x555bcc07f6d8 0x555bcc07f6f0 0
2. 0x555bcc07f6e0 0x555bcc07f710 20
Here I have freed the middle int-pointer. In my actual program I create a new block for an integer double-pointer, iterate over the current one, create new integer pointers and copy the old values into it, realloc the double-pointer block and append the new pointer to it, and at the end free the old block and all it's containing pointers. This is a bit ugly, and resource-consuming if there is a huge amount of data, since I have to iterate over and create and copy all the data twice. Any help is appreciated.
Re:
"This is a bit ugly, and resource-consuming if there is a huge amount of data, since I have to iterate over and create and copy all the data
twice. Any help is appreciated."
First observation: It is not necessary to use realloc() when allocating new memory on a pointer that has already been freed. realloc() is useful when needing to preserve the contents in a particular area of memory, while expanding its size. If that is not a need (which is not in this case) malloc() or calloc() are sufficient. #Marco's suggestion is correct.
Second observation: the following code snippet:
if ((ipp = realloc(ipp, sizeof (int *) * (i + 1)))) {
ipp[i] = ip;
}
is a potential memory leak. If the call to realloc()_ fails, the pointer ipp will be set to null, making the memory location that was previously allocated becomes orphaned, with no way to free it.
Third observation: Your approach is described as needing:
Array of struct
dynamic memory allocation of a 2D array
need to delete elements of 2D array, and ensure they are not referenced once deleted
need to repurpose deleted elements of 2D array
Your initial reaction in comments to considering using an alternative approach notwithstanding, Linked lists are a perfect fit to address the needs stated in your post.
The fundamental element of a Linked List uses a struct
Nodes (elements) of a List are dynamically allocated when created.
Nodes of a List are not accessible to be used once deleted. (No need to track)
Once the need exists, a new node is easily created.
Example struct follows. I like to use a data struct to contain the payload, then use an additional struct as the conveyance, to carry the data when building a Linked List:
typedef struct {//to simulate your struct
int dNum;
char unique_name[30];
double fNum;
} data_s;
typedef struct Node {//conveyance of payload, forward and backward searchable
data_s data;
struct Node *next; // Pointer to next node in DLL
struct Node *prev; // Pointer to previous node in DLL
} list_t;
Creating a list is done by creating a series of nodes as needed during run-time. Typically as records of a database, or lines of a file are read, and the elements of the table record (of element of the line in a file) are read into and instance of the data part of the list_s struct. A function is typically defined to do this, for example
void insert_node(list_s **head, data_s *new)
{
list_s *temp = malloc(sizeof(*temp));
//insert lines to populate
temp.data.dNum = new.dNum;
strcpy(temp.data.unique_name, new.unique_name);
temp.fNum = new.fNum
//arrange list to accomdate new node in new list
temp->next = temp->prev = NULL;
if (!(*head))
(*head) = temp;
else//...or existing list
{
temp->next = *head;
(*head)->prev = temp;
(*head) = temp;
}
}
Deleting a node can be done in multiple ways. It the following example method a unique value of a node member is used, in this case unique_name
void delete_node_by_name(list_s** head_ref, const char *name)
{
BOOL not_found = TRUE;
// if list is empty
if ((*head_ref) == NULL)
return;
list_s *current = *head_ref;
list_s *next = NULL;
// traverse the list up to the end
while (current != NULL && not_found)
{
// if 'name' in node...
if (strcmp(current->data.unique_name, name) == 0)
{
//set loop to exit
not_found = FALSE;
//save current's next node in the pointer 'next' /
next = current->next;
// delete the node pointed to by 'current'
delete_node(head_ref, current);
// reset the pointers
current = next;
}
// increment to next node
else
{
current = current->next;
}
}
}
Where delete_node() is defined as:
void delete_node(list_t **head_ref, list_t *del)
{
// base case
if (*head_ref == NULL || del == NULL)
return;
// If node to be deleted is head node
if (*head_ref == del)
*head_ref = del->next;
// Change next only if node to be deleted is NOT the last node
if (del->next != NULL)
del->next->prev = del->prev;
// Change prev only if node to be deleted is NOT the first node
if (del->prev != NULL)
del->prev->next = del->next;
// Finally, free the memory occupied by del
free(del);
}
This link is an introduction to Linked Lists, and has additional links to other related topic to expand the types of lists that are available.
You could use standard function memmove and then call realloc. For example
Let's assume that currently there are n pointers. Then you can write
free( *(ipp + i ) );
memmove( ipp + i, ipp + i + 1, ( n - i - 1 ) * sizeof( *pp ) );
*( ipp + n - 1 ) = NULL; // if the call of realloc will not be successfull
// then the pointer will be equal to NULL
int **tmp = realloc( ipp, ( n - 1 ) * sizeof( *tmp ) );
if ( tmp != NULL )
{
ipp = tmp;
--n;
}
else
{
// some other actions
}
For part of my C data structures assignment, I am tasked with taking an array of pointers to nodes of 2 doubly linked lists (one representing the main service queue, and the other representing a "bucket" of buzzers ready to be reused or used for the first time in the queue), doubling the size, while keeping the original contents in tact. The idea is that each node has an ID associated which corresponds to the number index of the pointer array map. So for example, the pointer in index 3 will always point to the node whose ID is 3. The boolean inQ is for something unrelated to this issue.
I've written most of the code, but it seems to be functioning incorrectly (it changes all the original pointers to the last node in the list before the array resizing) So, since the starting size of the array is 10 elements, when I print out the contents after the function, it displays 9 9 9 9 9 9 9 9 9 9.
Here are the structs im using:
typedef struct node {
int id;
int inQ;
struct node *next;
struct node *prev;
}NODE;
typedef struct list
{
NODE *front;
NODE *back;
int size;
} LIST;
//referred to as SQ in the separate header file
struct service_queue
{
LIST *queue;
LIST *bucket;
NODE **arr;
int arrSize;
int maxID;
};
Here is the function in question:
SQ sq_double_array(SQ *q)
{
NODE **arr2 = malloc(q->arrSize * 2 * sizeof(NODE*));
int i;
//fill the first half of the new array with the node pointers of the first array
for (i = 0; i < q->arrSize; i++)
{
arr2[i] = malloc(sizeof(NODE));
if (i > 0)
{
arr2[i - 1]->next = arr2[i];
arr2[i]->prev = arr2[i - 1];
}
arr2[i]->id = q->arr[i]->id;
arr2[i]->inQ = q->arr[i]->inQ;
arr2[i]->next = q->arr[i]->next;
arr2[i]->prev = q->arr[i]->prev;
}
//fill the second half with node pointers to the new nodes and place them into the bucket
for (i = q->arrSize; i < q->arrSize * 2; i++)
{
//Point the array elements equal to empty nodes, corresponding to the inidicies
arr2[i] = malloc(sizeof(NODE));
arr2[i]->id = i;
arr2[i]->inQ = 0;
//If the bucket is empty (first pass)
if (q->bucket->front == NULL)
{
q->bucket->front = arr2[i];
arr2[i]->prev = NULL;
arr2[i]->next = NULL;
q->bucket->back = arr2[i];
}
//If the bucket has at least 1 buzzer in it
else
{
q->bucket->back = malloc(sizeof(NODE));
q->bucket->back->next = arr2[i];
q->bucket->back = arr2[i];
q->bucket->back->next = NULL;
}
}
q->arrSize *= 2;
q->arr = arr2;
return *q;
}
Keep in mind this must only be done in c, which is why im not using 'new'
You could use the realloc function:
void *realloc(void *ptr, size_t size);
Quoted from the man pages:
The realloc() function changes the size of the memory block pointed to
by ptr to size bytes. The
contents will be unchanged in the range from the start of the region up to the minimum of the old
and new sizes. If the new size is larger than the old size, the added memory will not be initial‐
ized. If ptr is NULL, then the call is equivalent to malloc(size), for all values of size; if
size is equal to zero, and ptr is not NULL, then the call is equivalent to free(ptr). Unless ptr
is NULL, it must have been returned by an earlier call to malloc(), calloc() or realloc(). If the
area pointed to was moved, a free(ptr) is done.
struct hashLink
{
KeyType key; /*the key is what you use to look up a hashLink*/
ValueType value; /*the value stored with the hashLink, an int in our case*/
struct hashLink *next; /*notice how these are like linked list nodes*/
};
struct hashMap
{
hashLink ** table; /*array of pointers to hashLinks*/
int tableSize; /*number of buckets in the table*/
int count; /*number of hashLinks in the table*/
};
I'm trying to iterate through a hashMap with hashLinks. Is this the correct approach? The hashLinks are in an array and may have more hashLinks linked to them in a linked list. I just do not understand how to work pointers to pointers. tableSize is the amount of elements in the array. At each array position there may be more hashLinks linked to the first there.
for(int i = 0; i < ht->tableSize; i ++)
{
hashLink *current;
if (ht->table[i] != 0)
{
current = ht->table[i];
while(current->next !=0)
{
hashLink *next;
next = current->next;
free(current->key);
free(current);
current = next;
}
free(current->key);
free(current);
}
else
{
continue;
}
counter++;
}
}
Yes, this does work, but you end up with a hashtable that contains dangling pointers. Also, as Joachim noted, it works as long as you assume that the values contained in the structs are sane, i.e., tableSize contains the number of entries in table and the hashLinks have been correctly allocated.
Your iteration through the links is fine and correclty frees all the hashLinks in the table. However, consider the state of ht after the iteration. You do not change the values of ht->table[i] at all, so after you leave the loop, the pointers will still be stored in the table. If you want to reuse the table, you should set the pointers to 0 when you do not need them anymore, i.e., add ht->table[i] = 0 somewhere after current = ht->table[i];.
If this method is part of the "destructor" of the table (i.e., some method like hashmap_delete(...)), then you can simply free the hashmap after you finished your iteration, i.e., add free(ht); after the for-loop.
Simplified:
for(int i=0; i < ht->tableSize; i++)
{
hashLink *current;
while (ht->table[i] != NULL) {
current = ht->table[i];
ht->table[i] = current->next;
free(current->key);
free(current);
}
}
It can be further simplified to only one loop, but that is left as an exercise to the reader ...
Note: as a side effect, this will set all the pointers in ht->table[] to NULL; which is good, since after freeing the linked lists they have become stale anyway.
I am trying to insert an integer into a hash table. To do this, I'm creating an array of node*'s and I'm trying to make assignments like listarray[i]->data=5 possible. However, I'm still very confused with pointers and I'm crashing at the line with the comment '//crashes here' and I don't understand why. Was my initialization in main() invalid?
#include <stdio.h>
#include <stdlib.h>
typedef struct node
{
int data;
struct node * next;
} node;
//------------------------------------------------------------------------------
void insert (node **listarray, int size)
{
node *temp;
int value = 11; //just some random value for now, eventually will be scanned in
int index = value % size; // 11 modulo 8 yields 3
printf ("index is %d\n", index); //prints 3 fine
if (listarray[index] == NULL)
{
printf("listarray[%d] is NULL",index); //prints because of loop in main
listarray[index]->data = value; //crashes here
printf("listarray[%d] is now %d",index,listarray[index]->data); //never prints
listarray[index]->next = NULL;
}
else
{
temp->next = listarray[index];
listarray[index] = temp;
listarray[index]->data = value;
}
}//end insert()
//------------------------------------------------------------------------------
int main()
{
int size = 8,i; //set default to 8
node * head=NULL; //head of the list
node **listarray = malloc (sizeof (node*) * size); //declare an array of Node *
//do i need double pointers here?
for (i = 0; i < size; i++) //malloc each array position
{
listarray[i] = malloc (sizeof (node) * size);
listarray[i] = NULL; //satisfies the first condition in insert();
}
insert(*&listarray,size);
}
output:
index is 3
listarray[3] is NULL
(crash)
desired output:
index is 3
listarray[3] is NULL
listarray[3] is now 11
There are various issues here:
If you have a hash table of a certain size, then the hash code must map to a value between 0 and size - 1. Your default size is 8, but your hash code is x % 13, which means that your index might be out of bounds.
Your insert function should also pass the item to insert (unless that's the parameter called size, in which case it is severely misnamed).
if (listarray[index] == NULL) {
listarray[index]->data = value; //crashes here
listarray[index]->next = NULL;
}
It's no wonder that it crashes: When the node is NULL, you cannot dereference it with either * or ->. You should allocate new memory here.
And you shouldn't allocate memory here:
for (i = 0; i < size; i++) //malloc each array position
{
listarray[i] = malloc (sizeof (node) * size);
listarray[i] = NULL; //satisfies the first condition in insert();
}
Allocating memory and then resetting it to NULL is nonsense. NULL is a special value that means that no memory is at the pointed-to location. Just set all nodes to NULL, which means that the hash table starts out without any nodes. Allocate when you need a node at a certain position.
In the else clause, you write:
else
{
temp->next = listarray[index];
listarray[index] = temp;
listarray[index]->data = value;
}
but temp hasn't been allocated, but you dereference it. That's just as bad as dereferencing ´NULL`.
Your hash table also needs a means to handle collisions. It looks as if at every index in the hash table, there is a linked list. That's a good way to deal with it, but you haven't implemented it properly.
You seem to have problems to understand pointers. Perhaps you should start with a simpler data structure like a linked list, just to practice? When you have gotten a firm grasp of that, you can use what you've learned to implement your hash table.
I wrote a hashtable and it basically consists of these two structures:
typedef struct dictEntry {
void *key;
void *value;
struct dictEntry *next;
} dictEntry;
typedef struct dict {
dictEntry **table;
unsigned long size;
unsigned long items;
} dict;
dict.table is a multidimensional array, which contains all the stored key/value pair, which again are a linked list.
If half of the hashtable is full, I expand it by doubling the size and rehashing it:
dict *_dictRehash(dict *d) {
int i;
dict *_d;
dictEntry *dit;
_d = dictCreate(d->size * 2);
for (i = 0; i < d->size; i++) {
for (dit = d->table[i]; dit != NULL; dit = dit->next) {
_dictAddRaw(_d, dit);
}
}
/* FIXME memory leak because the old dict can never be freed */
free(d); // seg fault
return _d;
}
The function above uses the pointers from the old hash table and stores it in the newly created one. When freeing the old dict d a Segmentation Fault occurs.
How am I able to free the old hashtable struct without having to allocate the memory for the key/value pairs again?
Edit, for completness:
dict *dictCreate(unsigned long size) {
dict *d;
d = malloc(sizeof(dict));
d->size = size;
d->items = 0;
d->table = calloc(size, sizeof(dictEntry*));
return d;
}
void dictAdd(dict *d, void *key, void *value) {
dictEntry *entry;
entry = malloc(sizeof *entry);
entry->key = key;
entry->value = value;
entry->next = '\0';
if ((((float)d->items) / d->size) > 0.5) d = _dictRehash(d);
_dictAddRaw(d, entry);
}
void _dictAddRaw(dict *d, dictEntry *entry) {
int index = (hash(entry->key) & (d->size - 1));
if (d->table[index]) {
dictEntry *next, *prev;
for (next = d->table[index]; next != NULL; next = next->next) {
prev = next;
}
prev->next = entry;
} else {
d->table[index] = entry;
}
d->items++;
}
best way to debug this is to run your code against valgrind .
But to you give some perspective :
when you free(d) you are expecting more of a destructor call on your struct dict which would internally free the memory allocated to the pointer to pointer to dictEntry
why do you have to delete the entire has table to expand it ? you have a next pointer anyways why not just append new hash entries to it ?
Solution is not to free the d rather just expand the d by allocating more struct dictEntry and assigning them to appropriate next.
When contracting the d you will have to iterate over next to reach the end and then start freeing the memory for struct dictEntrys inside of your d.
To clarify Graham's point, you need to pay attention to how memory is being accessed in this library. The user has one pointer to their dictionary. When you rehash, you free the memory referenced by that pointer. Although you allocated a new dictionary for them, the new pointer is never returned to them, so they don't know not to use the old one. When they try to access their dictionary again, it's pointing to freed memory.
One possibility is not to throw away the old dictionary entirely, but only the dictEntry table you allocated within the dictionary. That way your users will never have to update their pointer, but you can rescale the table to accomodate more efficient access. Try something like this:
void _dictRehash(dict *d) {
printf("rehashing!\n");
int i;
dictEntry *dit;
int old_size = d->size;
dictEntry** old_table = d->table;
int size = old_size * 2;
d->table = calloc(size, sizeof(dictEntry*));
d->size = size;
d->items = 0;
for (i = 0; i < old_size; i++) {
for (dit = old_table[i]; dit != NULL; dit = dit->next) {
_dictAddRaw(d, dit);
}
}
free(old_table);
return;
}
As a side note, I'm not sure what your hash function does, but it seems to me that the line
int index = (hash(entry->key) & (d->size - 1));
is a little unorthodox. You get a hash value and do a bitwise and with the size of the table, which I guess works in the sense that it will be guaranteed to be within (I think?) [0, max_size), I think you might mean % for modulus.
You are freeing a pointer which is passed in to your function. This is only safe if you know that whoever's calling your function isn't still trying to use the old value of d. Check all the code which calls _dictRehash() and make sure nothing's hanging on to the old pointer.
What does dictCreate actually do?
I think you're getting confused between the (fixed size) dict object, and the (presumably variable sized) array of pointers to dictEntries in dict.table.
Maybe you could just realloc() the memory pointed to by dict.table, rather than creating a new 'dict' object and freeing the old one (which incidentally, isn't freeing the table of dictentries anyway!)