Suggestion for an appropriate hash function - c

I have two files (.txt) with phones numbers (one number per line). The files are pretty huge (282 MB), and I am writing a program to check these two files (raw data and DO NO CALL List), and filter out those numbers that does not exist in the DO NO CALL List. Something similar to grep -f raw.txt donotcall.txt -v > filtered.txt
I have implemented a very simple form of hashtable (separate addressing, using linked lists). My code currently, reads the phone numbers from the DoNotCall.txt and stores it in the hashtable. This is the function that I use to generate the hash. THE TABLE SIZE IS 100
int hashgen(char s[])
{
int hash;
hash = (s[0] + s[1] + s[2] + s[3]) * 100 / 13;
return hash;
}
The hashtable: the way I did.
#define TABLESIZE 100
struct node {
char str[30];
struct node *next;
}
struct node *hashtble[TABLESIZE];
struct node *hashtable_alloc(void) //allocates space for a node in the memory
{
struct node *tmp = calloc(1, sizeof(struct node));
strcpy(tmp->str, "~"); //just a string to mark the head of the linked list
tmp->next = NULL;
return tmp;
}
void hashinit(void)
{
struct node *t = NULL;
int i=0;
for(i=0; i<TABLE_SIZE; i++)
ht[i] = hashtable_alloc();
}
void hashtable_add(char s[])
{
struct node *t = NULL;
int arrnum = hashgen(s);
t = calloc(1, sizeof(struct node));
strcpy(t->str, s);
t->next = ht[arrnum];
ht[arrnum] = t;
}
Undoubtedly, I am a naive programmer dealing with hashtables. Please suggest me a better hash function. Though, I have read articles on hashtables, it would be great if anyone can tell me about a better approach, something better than hashtables, or do the hashtable method in a better way. Thanks in advance

Related

C: From char array to linked list

I'm still learning how to program in C and I've stumbled across a problem.
Using a char array, I need to create a linked list, but I don't know how to do it. I've searched online, but it seems very confusing. The char array is something like this char arr[3][2]={"1A","2B","3C"};
Have a look at this code below. It uses a Node struct and you can see how we iterate through the list, creating nodes, allocating memory, and adding them to the linked list. It is based of this GeeksForGeeks article, with a few modifications. I reccommend you compare the two to help understand what is going on.
#include <stdio.h>
#include <stdlib.h>
struct Node {
char value[2];
struct Node * next;
};
int main() {
char arr[3][2] = {"1A","2B","3C"};
struct Node * linked_list = NULL;
// Iterate over array
// We calculate the size of the array by using sizeof the whole array and dividing it by the sizeof the first element of the array
for (int i = 0; i < sizeof(arr) / sizeof(arr[0]); i++) {
// We create a new node
struct Node * new_node = (struct Node *)malloc(sizeof(struct Node));
// Assign the value, you can't assign arrays so we do each char individually or use strcpy
new_node->value[0] = arr[i][0];
new_node->value[1] = arr[i][1];
// Set next node to NULL
new_node->next = NULL;
if (linked_list == NULL) {
// If the linked_list is empty, this is the first node, add it to the front
linked_list = new_node;
continue;
}
// Find the last node (where next is NULL) and set the next value to the newly created node
struct Node * last = linked_list;
while (last->next != NULL) {
last = last->next;
}
last->next = new_node;
}
// Iterate through our linked list printing each value
struct Node * pointer = linked_list;
while (pointer != NULL) {
printf("%s\n", pointer->value);
pointer = pointer->next;
}
return 0;
}
There are a few things the above code is missing, like checking if each malloc is successful, and freeing the allocated memory afterwards. This is only meant to give you something to build off of!

Data Loss when trying to copy char* in C

I have been working on a project in C and I am having trouble when trying to copy char* using strcpy/memcpy/strncpy, none of these seem to work. The problem that is arising is that the words that are around 8 or more characters long are not being copied completely.
typedef struct wordFrequency {
char * word;
int frequency;
struct wordFrequency *left, *right;
} *node;
node setnode(char * word) {
node newNode = (node)malloc(sizeof(node));
newNode->word = (char*)malloc(sizeof(word));
strcpy(newNode->word, word); //This is where I'm having trouble
newNode->frequency = 1;
newNode->right = NULL;
return newNode;
}
The code above is what I believe is the main cause for error, but I don't know where to fix it. I have tried messing with the sizes, but that didn't work.
If possible can someone explain to me a way to copy all characters or if I did not allocate enough space?
This program is an mcve that shows how to properly allocate and initialize each node in your linked list:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRAY_SIZE(array) \
(sizeof(array) / sizeof(array[0]))
typedef struct wordFrequency {
char *word;
int frequency;
struct wordFrequency *left, *right;
} node;
node *setnode(char *word) {
node *newNode = malloc(sizeof(node));
newNode->word = malloc(strlen(word) + 1);
strcpy(newNode->word, word);
newNode->frequency = 1;
newNode->right = NULL;
return newNode;
}
int main() {
char *wordList[] = {"one", "two", "three"};
node nodeHead;
node *nodePrev = &nodeHead;
node *nodeNext;
for (int index = 0; index < ARRAY_SIZE(wordList); index++) {
nodeNext = setnode(wordList[index]);
nodePrev->right = nodeNext;
nodeNext->left = nodePrev;
nodePrev = nodeNext;
}
for (node *nodePtr = nodeHead.right; nodePtr != NULL; nodePtr = nodePtr->right) {
printf("word = %s, frequency = %d\n", nodePtr->word, nodePtr->frequency);
}
return 0;
}
Output
word = one, frequency = 1
word = two, frequency = 1
word = three, frequency = 1
Note
This program has no error checking and does not free the allocated memory. This code should not be used in a production environment.
Replies to Questions in Comments
I replaced *node with node in the typedef because that allows me to declare instances of node. The other syntax only allows pointers to node.
I use an instance of node instead of node * for nodeHead because any attempt to change its address will be an error.
I use nodePrev to traverse the list and also to provide a target for left in the returned nodes. I initialize nodePrev to &nodeHead because it is the start of the list. I set nodePrev to nodeNext because that's how I chose to traverse the list during initialization. I could have used
nodePrev = nodePrev->right;
and achieved the same effect.
I only implemented list handling so that I could create a self-contained example that would run without changes. You can safely ignore it.
If you want to see good linked list code, I recommend the linux kernel implementation.

Pointer seg faulting although I malloc-ed right

I don't understand why my program seg faults at this line: if ((**table->table).link == NULL){ I seem to have malloc-ed memory for it, and I tried looking at it with gdb. *table->table was accessible and not NULL, but **table->table was not accessible.
Definition of hash_t:
struct table_s {
struct node_s **table;
size_t bins;
size_t size;
};
typedef struct table_s *hash_t;
void set(hash_t table, char *key, int value){
unsigned int hashnum = hash(key)%table->bins;
printf("%d \n", hashnum);
unsigned int i;
for (i = 0; i<hashnum; i++){
(table->table)++;
}
if (*(table->table) == NULL){
struct node_s n = {key, value, NULL};
struct node_s *np = &n;
*(table->table) = malloc(sizeof(struct node_s));
*(table->table) = np;
}else{
while ( *(table->table) != NULL){
if ((**table->table).link == NULL){
struct node_s n = {key, value, NULL};
struct node_s *np = &n;
(**table->table).link = malloc(sizeof(struct node_s));
(**table->table).link = np;
break;
}else if (strcmp((**table->table).key, key) == 0){
break;
}
*table->table = (**(table->table)).link;
}
if (table->size/table->bins > 1){
rehash(table);
}
}
}
I'm calling set from here:
for (int i = 0; i < trials; i++) {
int sample = rand() % max_num;
sprintf(key, "%d", sample);
set(table, key, sample);
}
Your hashtable works like this: You have bins bins and each bin is a linked list of key / value pairs. All items in a bin share the same hash code modulo the number of bins.
You have probably created the table of bins when you created or initialised the hash table, something like this:
table->table = malloc(table->bins * sizeof(*table->table);
for (size_t i = 0; i < table->bins; i++) table->table[i] = NULL;
Now why does the member table have two stars?
The "inner" star means that the table stores pointers to nodes, not the nodes themselves.
The "outer" start is a handle to allocated memory. If your hash table were of a fixed size, for example always with 256 bins, you could define it as:
struct node_s *table[256];
If you passed this array around, it would become (or "decay into") a pointer to its first element, a struct node_s **, just as the array you got from malloc.
You access the contents of the l´bins via the linked lists and the head of linked list i is table->table[i].
You code has other problems:
What did you want to achieve with (table->table)++? This will make the handle to the allocated memory point not to the first element but tho the next one. After doing that hashnum times, *table->table will now be at the right node, but you will have lost the original handle, which you must retain, because you must pass it to free later when you clean up your hash table. Don't lose the handle to allocated memory! Use another local pointer instead.
You create a local node n and then make a link in your linked list with a pointer to that node. But the node n will be gone after you leave the function and the link will be "stale": It will point to invalid memory. You must also create memory for the node with malloc.
A simple implementation of your has table might be:
void set(hash_t table, char *key, int value)
{
unsigned int hashnum = hash(key) % table->bins;
// create (uninitialised) new node
struct node_s *nnew = malloc(sizeof(*nnew));
// initialise new node, point it to old head
nnew->key = strdup(key);
nnew->value = value;
nnew->link = table->table[hashnum];
// make the new node the new head
table->table[hashnum] = nnew;
}
This makes the new node the head of the linked list. This is not ideal, because if you overwrite items, the new ones will be found (which is good), but the old ones will still be in the table (which isn't good). But that, as they say, is left as an exercise to the reader.
(The strdup function isn't standard, but widely available. It also creates new memory, which you must free later, but it ensures, that the string "lives" (is still valid) after you have ceated the hash table.)
Please not how few stars there are in the code. If there is one star too few, it is in hash_t, where you have typecasted away the pointer nature.

Storing elements of a string on a Linked List [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
I'm having a small issue here with my linked list.
I built a linked list with strings and it worked perfectly.
Now since i'm using strtok() to separate the string I need help on storing the struct separately but keeping them connected.
Hope i explained it well
for now here's what i've got:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct dict_word *word;
typedef struct node *Node;
typedef struct double_linked_list *DLL;
struct dict_word
{
char words[100];
int year[10];
char eng_synonyms[100];
char heb_synonyms[100];
};
struct node
{
word data;
Node *next;
Node *previous;
};
struct double_linked_list
{
Node *head;
Node *last;
};
char *split(char words[100])
{
int i;
char *word=strtok(words, "_#_");
char *year=strtok(NULL, "_#_");; // assigning NULL for previousely where it left off
char *definition=strtok(NULL,"_#_");
char *synonyms=strtok(NULL,"_#_");
i=atoi(year);
printf("%s\n", word);
printf("%i\n",i);
printf("%s\n", definition);
printf("%s\n", synonyms);
return 0;
}
and this is my function to insert node by having only one string:
void insert_beginning(char words[99])
{
struct node *var, *temp;
var=(struct node *)malloc(sizeof(struct node)); //explination about the (node *)
strncpy(var->data, words,99);
if (head==NULL)
{
head=var;
head->previous=NULL;
head->next=NULL;
last=head;
}
else
{
temp=var;
temp->previous=NULL;
temp->next=head;
head->previous=temp;
head=temp;
}
}
I am a bit surprised to see plain C code used to handle such abstract data in 2014.
Nevertheless, I think you should separate the actual book data from the list.
strtok will modify your initial string (inserting '\0' at the end of each token). If you want to access the various bits strtok has split the string into, you must memorize all the pointers to the tokens (word, definition, etc).
So you should create a structure to hold all this together :
typedef struct {
const char * words;
int year;
const char * definition;
const char * synonyms;
} dict_word;
Now to create a new record, you will have to make a copy of the various tokens, just like you did previously in your linked list insertion.
But this time the copy will occur sooner, using the strdup function.
dict_word * create_record (char * raw) // raw record string
{
// allocate a new object
dict_word record = (dict_word *) malloc (sizeof (record));
assert (record != NULL);
/*
* sanity checks left out for concision,
* but you should make sure your input is properly formatted
*/
// populate the fields
record->word = strdup (strtok(raw , "_#_"));
record->year = atoi (strtok(NULL, "_#_"));
record->definition = strdup (strtok(NULL, "_#_"));
record->synomyms = strdup (strtok(NULL, "_#_"));
// done
return record;
}
You will need a cleanup function to free all the memory allocated during record creation:
void delete_record (record * r)
{
// first free all strings
free (r->word);
free (r->definition);
free (r->synomyms);
// then free the object
free (r);
}
Now for the list.
Instead of mixing up the code that handles the list with the one that cares about books, you can define the list as a more independent object:
typedef struct sNode {
struct sNode * next;
struct sNode * prev;
void * data; // this will point to the linked objects
} listNode;
typedef struct
{
listNode *head;
listNode *tail; // either first/last or head/tail, but keep it consistent :)
} List;
First you will need to initialize the list:
void List_init (List * l)
{
l->head = l->tail = NULL;
}
Then you will want to add elements to it
void List_put (List * list, void * data)
{
// allocate a node
listNode * node = (listNode *) malloc (sizeof (node));
assert (node != NULL);
// store data reference
node->data = data;
// insert the node at the end of list
node->prev = list->tail;
node->next = NULL;
list->tail = node;
if (list->head == NULL) list->head = node;
}
Finally, to use all this:
// create the list
List book_list;
List_init (&book_list);
/* ... */
// create the records
char * raw_record;
while ((raw_record = read_from_database ()) != DONE_READING)
{
List_put (book_list, create_record (raw_record));
}
/* ... */
// browse the records
listNode * node;
for (node = book_list->head; node != NULL; node = node->next)
{
dict_word * record = (dict_node *) node->data;
// do whatever you want with your record
}
All this being said and done, C is inadequate at best to handle this kind of high-level data.
You could write a very much more compact, reliable and efficient equivalent in a variety of more modern languages, starting with C++.
Now if you're just a student asked by an old geezer of a professor to do some dusty C homework and hoping to get it done for you by an old geezer of a StackOverflow contributor, well... it's your lucky day.

How do I sort a linked list of structures by one of the fields?

Wow now i know I dont. Lol.
I've got my structure like this:
struct Medico{
int Id_Doctor;
int Estado;
char Nombre[60]; ////focus on this part of the structure, this is name.
char Clave_Acceso[20];
char Especialidad[40];
struct Medico *next;
};
And I want to organize the structure depending on the name(alphabetical order..) any ideas on how to tackle this problem?
for example
Albert Haynesworth
Bob Marley
Carl Johnson
Thank you very much in advanced. :)(C, Unix)
Implementing a mergesort over a linked list in C is quite easy:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
struct node {
struct node *next;
char *data;
};
struct node *
divlist (struct node *n) {
int i = 0;
if (n) {
struct node *tail, *n2 = n;
while (1) {
n2 = n2->next;
if (!n2) break;
if (i++ & 1) n = n->next;
}
tail = n->next;
n->next = NULL;
return tail;
}
return NULL;
}
struct node *
mergelists(struct node *a, struct node *b) {
struct node *n;
struct node **last = &n;
if (!a) return b;
if (!b) return a;
while (1) {
if (strcmp(a->data, b->data) > 1) {
*last = b;
last = &b->next;
b = b->next;
if (!b) {
*last = a;
break;
}
}
else {
*last = a;
last = &a->next;
a = a->next;
if (!a) {
*last = b;
break;
}
}
}
return n;
}
struct node *
sortlist (struct node *n) {
struct node *tail = divlist(n);
if (!tail) return n;
return mergelists(sortlist(n), sortlist(tail));
}
int main(int argc, char *argv[]) {
int i;
struct node *n1, *n = NULL;
for (i = argc; --i >= 1;) {
n1 = (struct node *)malloc(sizeof(*n1));
n1->data = argv[i];
n1->next = n;
n = n1;
}
n1 = n = sortlist(n);
while (n1) {
printf("%s\n", n1->data);
n1 = n1->next;
}
return 0;
}
Note that you will have to modify this code to use your data structure and the right comparison!
C can't sort for you, nor maintain a sorted data structure. As others have suggested, you need to sort it yourself. I would do this when you create a new Medico, since inserting into a linked list is easy, and you can just find where it belongs as you iterate.
If Medico's order needs to be different, than you will need to sort the list whenever you display it. You'll probably want to iterate to pull out every name, then sort the resultant array using any of a number of techniques (depending on the size).
Assuming the list order is otherwise of no concern, keep it in order.
Sounds like you want to look at implementations of either quicksort or mergesort. I believe that the c std lib qsort implementation takes an array and not a linked list, so you may need to implement your own (although I'm pretty sure that you could find a readily available implementation on the interwebz if you did a quick search)
If you want to sort an array of structures, you can use the qsort function, see man qsort. It takes a base address of the array, number of elements, element size and comparing function:
int compare(const void *a, const void *b) {
Medico *medA = (Medico*) a;
Medico *medB = (Medico*) b;
return /* compare medA and medB */;
}
Medico *medicos = /* initialize */;
qsort(medicos, numberOfMedicos, sizeof(Medico), compare);
D’oh, just now I noticed the next-record pointer that probably makes this answer useless. (I’ve changed the question title to make the linked list apparent.) To make at least something from this answer, you can always copy the list into an array:
Medico *medicos = calloc(sizeof(Medico), numberOfMedicos);
Medico *current = /* first record in your linked list */;
int i = 0;
assert(current);
do {
medicos[i++] = *current;
current = current->next;
} while (current);
// Here you can sort the array.
free(medicos);
Of course, it depends on the number of records and other variables.
(My C is a bit rusty, feel free to fix.)

Resources