I have read this article: Quick Way to Implement Dictionary in C and implemented it. Need to say that I change the install and lookup to use hashtab as a argument instead of a global variable, It worked like charm:
struct nlist *install(struct nlist **hashtab,char *name, symbol *dfn);
struct nlist *lookup(struct nlist **hashtab, char *s);
The problem is, I need to copy the hash into a new one. I first try:
struct nlist **copy_context(struct nlist **hashtab){
nlist **copy = (nlist**)malloc(sizeof(nlist*)*HASHSIZE);
struct nlist *np;
int i = 0;
for (np = hashtab[i]; i<HASHSIZE ;i++ ){
debug_log("i vale %i",i);
if( np != NULL ){
install(copy,np->name,np->dfn);
}
}
return copy;
}
but is not clear to me where the values are stored neither where to start the copy. I was thinking something like:
struct nlist **copy_context(struct nlist **hashtab){
nlist **copy = (nlist**)malloc(sizeof(nlist*)*HASHSIZE);
i = 0;//this is not correct
or (np = hashtab[i]; np != NULL; np = np->next){
install(copy,np->name,np->dfn);
}
return copy;
}
but is notingh is working.
as "Some Programmer dude" said I need to do two loops, one on the table itself and one for the list in that table index.
struct nlist **copy_context(struct nlist **hashtab){
nlist **copy = (nlist**)malloc(sizeof(nlist*)*HASHSIZE);
struct nlist *np;
for (int i = 0; i<HASHSIZE; i++ ){
np = hashtab[i];
while( np )
{
install(copy,np->name,np->dfn);
np = np->next;
}
}
return copy;
}
Related
Trying to understand the code for an implementation of hash search as discussed in K&R C programming book (page 143-145).
Consider a #define statement
#define STATE 1
The aim is that we should store the name and its replacement text in a table. We create an array hashtab having pointers to linked lists. Each pointer refers to a linked list which has a name and its replacement text in each of its nodes (also with a link node, of course). If there are no names with a certain hash value, the array element at that index is NULL. For a linked list pointed to by a pointer from hashtab array, all nodes have names with common hash value.
Following are the function and struct definitions.
Here is the struct nlist. It is used to create a node which would record the name and replacement text. The node will be added in front of the already existing linked list for that hash value.
struct nlist { /* table entry: */
struct nlist *next; /* next entry in chain */
char *name; /* defined name */
char *defn; /* replacement text */
};
Here is the code for the lookup() function. It searches for the string in the table and returns a pointer to the place where it was found, or NULL if the string is not present.
/* lookup: look for s in hashtab */
struct nlist *lookup(char *s)
{
struct nlist *np;
for (np = hashtab[hash(s)]; np != NULL; np = np->next)
if (strcmp(s, np->name) == 0)
return np; /* found */
return NULL; /* not found */
}
The strdup() function which makes a duplicate of a string src. Error handling, for example when malloc() returns NULL, is left to the caller.
char *strdup(const char *src) {
char *p = (char *) malloc(strlen (src) + 1); // Space for length plus '\0'
if (p != NULL) strcpy(p, src);
return p;
}
My questions mainly concern the following install() function which records a name and replacement text in the front of the already existing linked list for that hash value.
/* install: put (name, defn) in hashtab */
struct nlist *install(char *name, char *defn)
{
struct nlist *np;
unsigned hashval;
if ((np = lookup(name)) == NULL) { /* not found */
np = (struct nlist *) malloc(sizeof(*np));
if (np == NULL || (np->name = strdup(name)) == NULL)
return NULL;
hashval = hash(name);
np->next = hashtab[hashval];
hashtab[hashval] = np;
} else /* already there */
free((void *) np->defn); /*free previous defn */
if ((np->defn = strdup(defn)) == NULL)
return NULL;
return np;
}
I have not written the hash() function here.
What is the significance of its return value to the main()?
One more related question:
The last if condition may assign NULL to np->defn (and also returns it). How will this assignment of NULL be useful to the user of install() since the first item in the list of names (having such hash values) would contain NULL in the defn? Are we allowing a name to have NULL as its definition and ignoring the reason why NULL was assigned to its defn field?
As I am searching dictionary example in C, I have come accross the example in here stackoverflow which references K&R The C Programming Language book. In that book, there is a table lookup topic in section 6.6. The section exemplifies table lookup as a hash table.
The hashtable is formed by 101 sized nlist(the struct in the below code snippet) self-referential nodes in the example.
My question is here why they have used self-referential struct for a lookup table? Look-up tables work as key-value pair so we dont have to hold next node.
struct nlist {
/* table entry: */
struct nlist *next; /* next entry in chain */
char *name; /* defined name */
char *defn; /* replacement text */
};
The second part of my question related with the example is for the loop statement in the lookup(char *s) function. The loop works only for one time and also np = np->next expression may irrelevant, i think or there could be anything that i missed!
struct nlist *lookup(char *s)
{
struct nlist *np;
for (np = hashtab[hash(s)]; np != NULL; np = np->next)
if (strcmp(s, np->name) == 0)
return np; /* found */
return NULL; /* not found */
}
The last part of my question is about the assignment np->next = hashtab[hashval]; (the function below) in the function *install(char *name, char *defn), actually it assignes its current node to itself as a next node.
struct nlist *install(char *name, char *defn)
{
struct nlist *np;
unsigned hashval;
if ((np = lookup(name)) == NULL) { /* not found */
np = (struct nlist *) malloc(sizeof(*np));
if (np == NULL || (np->name = strdup(name)) == NULL)
return NULL;
hashval = hash(name);
np->next = hashtab[hashval];
hashtab[hashval] = np;
} else /* already there */
free((void *) np->defn); /*free previous defn */
if ((np->defn = strdup(defn)) == NULL)
return NULL;
return np;
}
Thanks in advance.
The table is not indexed with keys directly, but with hashes of keys. Different keys may have the same hash. This is called hash collisions.
This implementation stores all values that correspond to keys with the same hash as a linked list, thus a self referential structure. The table stores linked lists of key-value pairs. All keys in the same list have the same hash. (This is not the only method of handling collisions).
In case of a collision, the loop does work more than once. It you don't see this loop executing more than once, keep adding entries until you hit a collision.
No, it does not assign a node to itself. It inserts a newly allocated node at the head of the linked list. The former head of the list becomes the second node in the list.
I'm trying to write a table-lookup package that would work for a symbol table management.
The names and replacement texts are held in a struct nlist type, and I have an array of pointers to the name and replacement text.
#define HASHSIZE 101
struct nlist { // table entry
struct nlist *next; // next entry in chain
char *name; // defined name
char *defn; // replacement text
};
static struct nlist *hashtab[HASHSIZE] = { NULL }; // pointer table
The lookup routine searches for s in the table.
// hash: form hash value for string s
unsigned hash(char *s)
{
unsigned hashval;
for (hashval = 0; *s != '\0'; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;
}
struct nlist *lookup(char *s)
{
struct nlist *np;
for (np = hashtab[hash(s)]; np != NULL; np = np->next)
if (strcmp(s, np->name) == 0)
return np; // found
return NULL; // not found
}
The install routine uses lookup to determine whether the name being installed is already present; if so, the new definition will supersede the old. Otherwise, a new entry is created.
// install: put (name, defn) in hashtab
struct nlist *install(char *name, char *defn)
{
struct nlist *np;
unsigned hashval;
if ((np = lookup(name)) == NULL) { // not found
np = (struct nlist *) malloc(sizeof(*np));
if (np == NULL || (np->name = strdup(name)) == NULL)
return NULL;
hashval = hash(name);
np->next = hashtab[hashval];
hashtab[hashval] = np;
} else // already there
free(np->defn); // free previous defn
if ((np->defn = strdup(defn)) == NULL)
return NULL;
return np;
}
In my main routine, "Access violation reading location" exception is thrown on the call of printf on pointer p. What is the cause of this error and how can I fix it?
int main()
{
struct nlist *install(char *name, char *defn);
struct nlist *p;
void undef(char *name);
p = install("DEFAULT", "ON");
printf("%s\n",p->name);
return 0;
}
I'm not certain of the correctness of the assignment of np within the install routine i.e. taking the size of *np instead of just struct nlist *.
Additionally, I have read advice that the additional cast isn't necessary.
struct nlist *np;
...
np = (struct nlist *) malloc(sizeof(*np));
PICNIC. I didn't include <string.h>
here is the install function from the hash tables example from K&R's book:
struct nlist *install(char *name, char *defn)
{
struct nlist *np;
unsigned hashval;
if ((np = lookup(name)) == NULL) { /* not found */
np = (struct nlist *) malloc(sizeof(*np));
if (np == NULL || (np->name = strdup(name)) == NULL)
return NULL;
hashval = hash(name);
np->next = hashtab[hashval];
hashtab[hashval] = np;
} else /* already there */
free((void *) np->defn); /*free previous defn */
if ((np->defn = strdup(defn)) == NULL)
return NULL;
return np;
}
I don't understand the line np->next = hashtab[hasvall]
I thought the reason to have the variable np->next is for putting in the table two string with the same hash value, but the outcome from this is having only one name for every hash value.
Furthermore I cannot seem to understand the function lookup, and the "AFTERTHOUGHT part in the for(because I think there is only one vaule to every struct in the talbe:
/* lookup: look for s in hashtab */
struct nlist *lookup(char *s)
{
struct nlist *np;
for (np = hashtab[hash(s)]; np != NULL; np = np->next)
if (strcmp(s, np->name) == 0)
return np; /* found */
return NULL; /* not found */
}
What am I missing?
You can have only one key (name) for every value, but two or more keys can have the same hash. np->next = hashtab[hashval] adds the new hashval to the linked list. Lookup then iterates through the list until the key (name) is matched.
np->next = hashtab[hashval];
hashtab[hashval] = np;
These two lines do not replace the old entry, they add to it.
hashtab[hashval]-> existing_node becomes
hashtab[hashval]-> np -(next)-> existing_node
As #Bo Persson mentions in the comments, this is called "chaining".
Given this structure, the lookup function correctly checks the names of each node in the chain.
Here's what I'm trying to do:
myValueType function1(myParam){}
myValueType function2(myParam){}
myArray[CONSTANT_STATE1] = &function1;
myArray[CONSTANT_STATE2] = &function2;
myValue = (*myArray[CONSTANT_STATE1])(myParam);
When I compile, it throws an error that I've redeclared function1.
What's the best way to do this?
As per this SO answer from user Vijay Mathew:
Section 6.6 of The C Programming Language presents a simple dictionary (hashtable) data structure. I don't think a useful dictionary implementation could get any simpler than this. For your convenience, I reproduce the code here.
struct nlist { /* table entry: */
struct nlist *next; /* next entry in chain */
char *name; /* defined name */
char *defn; /* replacement text */
};
#define HASHSIZE 101
static struct nlist *hashtab[HASHSIZE]; /* pointer table */
/* hash: form hash value for string s */
unsigned hash(char *s)
{
unsigned hashval;
for (hashval = 0; *s != ’\0’; s++)
hashval = *s + 31 * hashval;
return hashval % HASHSIZE;
}
/* lookup: look for s in hashtab */
struct nlist *lookup(char *s)
{
struct nlist *np;
for (np = hashtab[hash(s)]; np != NULL; np = np->next)
if (strcmp(s, np->name) == 0)
return np; /* found */
return NULL; /* not found */
}
char *strdup(char *);
/* install: put (name, defn) in hashtab */
struct nlist *install(char *name, char *defn)
{
struct nlist *np;
unsigned hashval;
if ((np = lookup(name)) == NULL) { /* not found */
np = (struct nlist *) malloc(sizeof(*np));
if (np == NULL || (np->name = strdup(name)) == NULL)
return NULL;
hashval = hash(name);
np->next = hashtab[hashval];
hashtab[hashval] = np;
} else /* already there */
free((void *) np->defn); /*free previous defn */
if ((np->defn = strdup(defn)) == NULL)
return NULL;
return np;
}
char *strdup(char *s) /* make a duplicate of s */
{
char *p;
p = (char *) malloc(strlen(s)+1); /* +1 for ’\0’ */
if (p != NULL)
strcpy(p, s);
return p;
}
Note that if the hashes of two strings collide, it may lead to an O(n) lookup time. You can reduce the likely hood of collisions by increasing the value of HASHSIZE. For a complete discussion of the data structure, please consult the book.
The code you've shown is almost right. The problem is in your function declarations:
myValueType function1(myParam){}
myValueType function2(myParam){}
These are old-style K&R non-prototyped declarations - the name of the parameter is myParam, and the type has not been specified. Perhaps you meant this?
myValueType function1(myParamType myParam){}
myValueType function2(myParamType myParam){}
Expanding your code out to a minimal compilable example:
typedef int myValueType, myParamType;
enum { CONSTANT_STATE1, CONSTANT_STATE2 };
myValueType function1(myParamType myParam){}
myValueType function2(myParamType myParam){}
void f(myParamType myParam)
{
myValueType myValue;
myValueType (*myArray[2])(myParamType);
myArray[CONSTANT_STATE1] = &function1;
myArray[CONSTANT_STATE2] = &function2;
myValue = (*myArray[CONSTANT_STATE1])(myParam);
}