I'm working on CS50's Week 5 assignment, Speller. I'm building my functions one at a time, and I'm running into problems with my unload function (Line 151). Right now, I'm just testing the iteration in a way that prints results before I use that iteration to free each of the nodes. I'm doing this by changing each node's word to "FREE" in the order these nodes are to be freed.
The function call (Line 60) returns true, and the printf command prints successfully. However, everything in the unload function itself is being ignored. None of the printf lines that I added to see its progress (DEBUG DEBUG DEBUG) are printing. The print() function call on line 63 should be printing the table with all of the words set to "FREE", and all dictionary word locations showing "NOT FOUND". Instead, it's printing the list and locations completely unaltered, and with none of the DEBUG print commands within the for loop (Line 155) triggering.
I don't understand why this is happening. The unload() function call alone, whether or not it returns true, should still at least trigger the first printf command in the for loop (Line 157). But even that is skipped.
Can someone please help me understand why the function is returning true, yet making none of the changes it's supposed to? Thanks in advance.
EDIT: Okay, I was told that I wasn't calling the unload function correctly on line 60. I've since corrected that. Now it will print out "LOCATION 00:", but it ends as soon as it hits that first while loop on line 158. I was having this problem before, and I'm not sure why it's doing this. strcmp() should see that the head node's word does not match "FREE" until it makes the change from the end of the list to the beginning. Why is the while loop not even triggering?
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
unsigned int HASH_MAX = 50; // Max elements in hash table
unsigned int LENGTH = 20; // Max length of word to be stored
unsigned int hash(const char *word); // assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
bool load(FILE *dictionary); // load dictionary into memory
bool check(char *word); // check if word exists in dictionary
bool unload(void); // unload dictionary from memory, free memory (CURRENTLY DEBUGGING, CHECKING ITERATION)
void print(void); // print table contents and node locations
typedef struct _node // node structure: stored word, pointer to next node
{
char *word[20];
struct _node *next;
} node;
node *HASH_TABLE[50];
int main(int argc, char *argv[])
{
FILE *dictionary = fopen("C:/Users/aaron/Desktop/Dictionary.txt", "r"); // open dictionary file, read
if (!dictionary) // if dictionary is NULL, return error message, end program
{
printf("FILE NOT FOUND\n");
return 1;
}
if (load(dictionary)) // if dictionary loaded successfully (function call), close dictionary and print table contents
{
fclose(dictionary);
print(); // print "LIST (number): {(name, address), ...}\n
}
char *checkword = "Albatross"; // test check function for word that does not exist in the library
char *checkword2 = "Riku"; // test check function for word that does exist in the library
if (check(checkword)) // return check results for checkword, found or not found
{
printf("\n%s found\n", checkword);
}
else
{
printf("\n%s not found\n", checkword);
}
if (check(checkword2)) // return check results for checkword2, found or not found
{
printf("\n%s found\n", checkword2);
}
else
{
printf("\n%s not found\n", checkword2);
}
if (unload()) // if unloaded successfully (function call), print contents
{
printf("\nUNLOADED...\n\n"); // DEBUG DEBUG DEBUG (confirm unload function returned true)
print();
}
}
unsigned int hash(const char *word) // assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
{
char word_conv[LENGTH + 1]; // store converted word for uniform key
unsigned int code = 0; // hash code
strcpy(word_conv, word);
for (int i = 0; i < strlen(word); i++) // set all letters in the word to lower case
{
word_conv[i] = tolower(word_conv[i]);
}
for (int j = 0; j < strlen(word_conv); j++) // for all letters in converted word, add ascii value to code and multiply by 3
{
code += word_conv[j];
code = code * 3;
}
code = code % HASH_MAX; // set code to remainder of current code divided by maximum hash table size
return code;
}
bool load(FILE *dictionary) // load dictionary into memory
{
char word[LENGTH+1]; // store next word in the dictionary
while (!feof(dictionary)) // until end of dictionary file
{
fscanf(dictionary, "%s", word); // scan for next word
node *new_n = malloc(sizeof(node)); // new node
strcpy(new_n->word, word); // store scanned word in new node
new_n->next = NULL; // new node's next pointer set to NULL
unsigned int code = hash(word); // retrieve and store hash code
if (HASH_TABLE[code] == NULL) // if hash location has no head
{
HASH_TABLE[code] = new_n; // set new node to location head
}
else if (HASH_TABLE[code] != NULL) // if head already exists at hash location
{
node *trav = HASH_TABLE[code]; // set traversal node
while (trav->next != NULL) // while traversal node's next pointer is not NULL
{
trav = trav->next; // move to next node
}
if (trav->next == NULL) // if traversal node's next pointer is null
{
trav->next = new_n; // set new node to traversal node's next pointer
}
}
}
return true; // confirm successful load
}
bool check(char *word) // check if word exists in dictionary
{
unsigned int code = hash(word); // retrieve and store hash code
node *check = HASH_TABLE[code]; // set traversal node to hash location head
while (check != NULL) // while traversal node is not NULL
{
int check_true = strcasecmp(check->word, word); // compare traversal node's word to provided word argument
if (check_true == 0) // if a match is found, return true
{
return true;
}
else if (check_true != 0) // if no match, move to next node
{
check = check->next;
}
}
if (check == NULL) // if end of list is reached without a match, return false
return false;
}
bool unload(void) // unload dictionary from memory, free memory (CURRENTLY DEBUGGING, CHECKING ITERATION)
{
char *word = "FREE"; // DEBUG DEBUG DEBUG (changin all nodes' words to "FREE" to test iteration)
for (int i = 0; i < HASH_MAX; i++) // for every element in the hash table, HASH_MAX (50)
{
printf("LOCATION %02d:\n", i); // DEBUG DEBUG DEBUG (print current hash table location)
while (strcmp(HASH_TABLE[i]->word, word) != 0) // while the head node's word is not "FREE"
{
node *trav = HASH_TABLE[i]; // set traversal node to head
printf("HEAD WORD: %s\n", HASH_TABLE[i]->word); // DEBUG DEBUG DEBUG (print head word to confirm while condition)
while (strcmp(trav->next->word, word) != 0) // while the traversal node's word is not "FREE"
{
trav = trav->next; // move to next node
printf("."); // DEBUG DEBUG DEBUG (print a dot for every location skipped)
}
printf("\n"); // DEBUG DEBUG DEBUG
strcpy(trav->word, word); // set traversal node's word to "FREE"
printf("{"); // DEBUG DEBUG DEBUG
while (trav != NULL) // DEBUG DEBUG DEBUG (print hash location's current list of words)
{
printf("%s, ", trav->word); // DEBUG DEBUG DEBUG
}
printf("}\n\n"); // DEBUG DEBUG DEBUG
}
}
return true; // freed successfully
}
void print(void) // print hash table contents and node locations
{
for (int i = 0; i < HASH_MAX; i++) // for every element in the hash table
{
node *check = HASH_TABLE[i]; // set traversal node to current hash table element head
printf("LIST %02d: {", i); // print hash table element location
while (check != NULL) // for all nodes in the current linked list
{
printf("%s, ", check->word); // print traversal node's word
check = check->next; // move to next node
}
printf("}\n");
}
printf("\n");
FILE *dictionary = fopen("C:/Users/aaron/Desktop/Dictionary.txt", "r"); // open dictionary file
while (!feof(dictionary)) // for all words in the dictionary
{
char word[LENGTH + 1]; // store next word
fscanf(dictionary, "%s", word); // scan for next word
unsigned int code = hash(word); // retrieve and store word's hash code
node *search = HASH_TABLE[code]; // set traversal node to hash location head
while (search != NULL) // for all nodes at that location, or until word is found
{
if (strcasecmp(search->word, word) == 0) // compare traversal node's word to scanned word (case insensitive)
{
printf("%s: %p\n", search->word, search); // print traversal node's word and location
break; // break while loop
}
else
{
search = search->next; // if traversal node's word does not match scanned word, move to next node
}
}
if (search == NULL) // if the scanned word matches none of the words in the hash location's linked list
printf("\"%s\" NOT FOUND\n", word); // word not found
}
fclose(dictionary); // close dictionary file
}
Caveat: chqrlie has pointed out many of the basic issues, but here's some refactored code.
Your main issue was that unload didn't actually remove the nodes.
One of things to note is that it's easier/faster/better to use tolower once per string.
If the lowercased version is what we store in the node, and we lowercase the search word in check, we can use strcmp instead of strcasecmp [which has to redo the lowercasing for both arguments on each loop iteration].
So, I've changed the hash function to lowercase its argument "in-place".
As I mentioned in my above comment, print was extraneously rereading the dictionary file. So, I've removed that code. If it were necessary to do this, it should go into [yet] another function, or load and/or check should be reused.
(i.e.) print should do one thing well [a programming maxim].
Personally, I dislike "sidebar" comments:
if (unload()) // if unloaded successfully (function call), print contents
I prefer the comment to go above the line:
// if unloaded successfully (function call), print contents
if (unload())
To me, this is much clearer and it helps prevent the line from going beyond 80 characters in width.
Certain fixed constants (e.g. HASH_MAX and LENGTH) are global variables. This prevents them from being used to define arrays
(e.g.) you couldn't say:
node *HASH_TABLE[HASH_MAX];
and had to "hardwire" it as:
node *HASH_TABLE[50];
If we define these with either #define or as an enum, then we can use the preferred definitions.
Doing something like:
for (int i = 0; i < strlen(word); i++)
increases the loop time from O(length) to O(length^2) because strlen is called "length" times inside the loop and it rescans the string each time.
Much better to do:
int len = strlen(word);
for (int i = 0; i < len; i++)
But even this has an extra scan of the buffer. It can be better is to do something like:
for (int chr = *word++; chr != 0; chr = *word++)
I've refactored the code with annotations for the bugs. Original code is bracketed inside a #if 0 block:
#if 0
// old/original code
#else
// new/refactored code
#endif
Anyway, here's the code:
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#if 1
#include <ctype.h>
#endif
// Max elements in hash table
#if 0
unsigned int HASH_MAX = 50;
#else
enum {
HASH_MAX = 50
};
#endif
// Max length of word to be stored
#if 0
unsigned int LENGTH = 20;
#else
enum {
LENGTH = 20
};
#endif
// assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
#if 0
unsigned int hash(const char *word);
#else
unsigned int hash(char *word);
#endif
// load dictionary into memory
bool load(FILE *dictionary);
// check if word exists in dictionary
#if 0
bool check(char *word);
#else
bool check(const char *word);
#endif
// unload dictionary from memory, free memory (CURRENTLY DEBUGGING,
// CHECKING ITERATION)
bool unload(void);
// print table contents and node locations
void print(void);
// node structure: stored word, pointer to next node
typedef struct _node {
#if 0
char *word[20];
#else
char word[LENGTH + 1];
#endif
struct _node *next;
} node;
#if 0
node *HASH_TABLE[50];
#else
node *HASH_TABLE[HASH_MAX];
#endif
int
main(int argc, char *argv[])
{
// open dictionary file, read
#if 0
FILE *dictionary = fopen("C:/Users/aaron/Desktop/Dictionary.txt", "r");
#else
FILE *dictionary = fopen("Dictionary.txt", "r");
#endif
// if dictionary is NULL, return error message, end program
if (!dictionary) {
printf("FILE NOT FOUND\n");
return 1;
}
// if dictionary loaded successfully (function call), close dictionary and
// print table contents
if (load(dictionary)) {
fclose(dictionary);
// print "LIST (number): {(name, address), ...}\n
print();
}
// test check function for word that does not exist in the library
char *checkword = "Albatross";
// test check function for word that does exist in the library
char *checkword2 = "Riku";
// return check results for checkword, found or not found
if (check(checkword)) {
printf("\n%s found\n", checkword);
}
else {
printf("\n%s not found\n", checkword);
}
// return check results for checkword2, found or not found
if (check(checkword2)) {
printf("\n%s found\n", checkword2);
}
else {
printf("\n%s not found\n", checkword2);
}
// if unloaded successfully (function call), print contents
if (unload()) {
// DEBUG DEBUG DEBUG (confirm unload function returned true)
printf("\nUNLOADED...\n\n");
print();
}
}
// assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
unsigned int
hash(char *word)
{
// store converted word for uniform key
#if 0
char word_conv[LENGTH + 1];
#endif
// hash code
unsigned int code = 0;
#if 0
strcpy(word_conv, word);
// set all letters in the word to lower case
for (int i = 0; i < strlen(word); i++) {
word_conv[i] = tolower(word_conv[i]);
}
// for all letters in converted word, add ascii value to code and multiply by 3
for (int j = 0; j < strlen(word_conv); j++) {
code += word_conv[j];
code = code * 3;
}
#else
int chr;
while (1) {
chr = *word;
if (chr == 0)
break;
chr = tolower(chr);
*word++ = chr;
code += chr;
code *= 3;
}
#endif
// set code to remainder of current code divided by maximum hash table size
code = code % HASH_MAX;
return code;
}
// load dictionary into memory
bool
load(FILE * dictionary)
{
// store next word in the dictionary
char word[LENGTH + 1];
// until end of dictionary file
// NOTE/BUG: don't use feof
#if 0
while (!feof(dictionary)) {
// scan for next word
fscanf(dictionary, "%s", word);
#else
// scan for next word
while (fscanf(dictionary, "%s", word) == 1) {
#endif
// new node
node *new_n = malloc(sizeof(node));
// store scanned word in new node
strcpy(new_n->word, word);
// new node's next pointer set to NULL
new_n->next = NULL;
// retrieve and store hash code
unsigned int code = hash(new_n->word);
// NOTE/BUG: there's no need to append to the end of the list -- pushing
// on the front is adequate and is faster
#if 0
// if hash location has no head
if (HASH_TABLE[code] == NULL) {
// set new node to location head
HASH_TABLE[code] = new_n;
}
// if head already exists at hash location
else if (HASH_TABLE[code] != NULL) {
// set traversal node
node *trav = HASH_TABLE[code];
// while traversal node's next pointer is not NULL
while (trav->next != NULL) {
// move to next node
trav = trav->next;
}
// if traversal node's next pointer is null
if (trav->next == NULL) {
// set new node to traversal node's next pointer
trav->next = new_n;
}
}
#else
new_n->next = HASH_TABLE[code];
HASH_TABLE[code] = new_n;
#endif
}
// confirm successful load
return true;
}
// check if word exists in dictionary
#if 0
bool
check(char *word)
#else
bool
check(const char *arg)
#endif
{
char word[LENGTH + 1];
// retrieve and store hash code
#if 1
strcpy(word,arg);
#endif
unsigned int code = hash(word);
// set traversal node to hash location head
node *check = HASH_TABLE[code];
// while traversal node is not NULL
while (check != NULL) {
// compare traversal node's word to provided word argument
// NOTE/BUG: strcmp is faster than strcasecmp if we convert to lowercase _once_
#if 0
int check_true = strcasecmp(check->word, word);
#else
int check_true = strcmp(check->word, word);
#endif
#if 0
// if a match is found, return true
if (check_true == 0) {
return true;
}
// if no match, move to next node
else if (check_true != 0) {
check = check->next;
}
#else
if (check_true == 0)
return true;
check = check->next;
#endif
}
// if end of list is reached without a match, return false
#if 0
if (check == NULL)
return false;
#else
return false;
#endif
}
// unload dictionary from memory, free memory
// (CURRENTLY DEBUGGING, CHECKING ITERATION)
bool
unload(void)
{
// DEBUG DEBUG DEBUG (changin all nodes' words to "FREE" to test iteration)
#if 0
char *word = "FREE";
#endif
// for every element in the hash table, HASH_MAX (50)
for (int i = 0; i < HASH_MAX; i++) {
#if 0
// DEBUG DEBUG DEBUG (print current hash table location)
printf("LOCATION %02d:\n", i);
// while the head node's word is not "FREE"
while (strcmp(HASH_TABLE[i]->word, word) != 0) {
// set traversal node to head
node *trav = HASH_TABLE[i];
// DEBUG DEBUG DEBUG (print head word to confirm while condition)
printf("HEAD WORD: %s\n", HASH_TABLE[i]->word);
// while the traversal node's word is not "FREE"
while (strcmp(trav->next->word, word) != 0) {
// move to next node
trav = trav->next;
// DEBUG DEBUG DEBUG (print a dot for every location skipped)
printf(".");
}
// DEBUG DEBUG DEBUG
printf("\n");
// set traversal node's word to "FREE"
strcpy(trav->word, word);
// DEBUG DEBUG DEBUG
printf("{");
// DEBUG DEBUG DEBUG (print hash location's current list of words)
while (trav != NULL) {
// DEBUG DEBUG DEBUG
printf("%s, ", trav->word);
}
// DEBUG DEBUG DEBUG
printf("}\n\n");
}
#else
node *nxt;
for (node *cur = HASH_TABLE[i]; cur != NULL; cur = nxt) {
nxt = cur->next;
free(cur);
}
HASH_TABLE[i] = NULL;
#endif
}
// freed successfully
return true;
}
// print hash table contents and node locations
void
print(void)
{
// for every element in the hash table
for (int i = 0; i < HASH_MAX; i++) {
// set traversal node to current hash table element head
node *check = HASH_TABLE[i];
// print hash table element location
printf("LIST %02d: {", i);
// for all nodes in the current linked list
while (check != NULL) {
// print traversal node's word
printf("%s, ", check->word);
// move to next node
check = check->next;
}
printf("}\n");
}
printf("\n");
// NOTE/BUG: why reread dictionary after printing it?
#if 0
// open dictionary file
FILE *dictionary = fopen("C:/Users/aaron/Desktop/Dictionary.txt", "r");
// for all words in the dictionary
while (!feof(dictionary)) {
// store next word
char word[LENGTH + 1];
// scan for next word
fscanf(dictionary, "%s", word);
// retrieve and store word's hash code
unsigned int code = hash(word);
// set traversal node to hash location head
node *search = HASH_TABLE[code];
// for all nodes at that location, or until word is found
while (search != NULL) {
// compare traversal node's word to scanned word (case insensitive)
if (strcasecmp(search->word, word) == 0) {
// print traversal node's word and location
printf("%s: %p\n", search->word, search);
// break while loop
break;
}
else {
// if traversal node's word does not match scanned word,
// move to next node
search = search->next;
}
}
// if the scanned word matches none of the words in the hash location's
// linked list
if (search == NULL)
// word not found
printf("\"%s\" NOT FOUND\n", word);
}
// close dictionary file
fclose(dictionary);
#endif
}
Here's a version that has the #if 0 blocks removed.
Also, I've added a slight reordering in the load function, so that it inputs the data directly into the final place inside the node element (i.e. eliminates the intermediate buffer and a strcpy)
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <ctype.h>
// Max elements in hash table
enum {
HASH_MAX = 50
};
// Max length of word to be stored
enum {
LENGTH = 20
};
// assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
unsigned int hash(char *word);
// load dictionary into memory
bool load(FILE *dictionary);
// check if word exists in dictionary
bool check(const char *word);
// unload dictionary from memory, free memory (CURRENTLY DEBUGGING,
// CHECKING ITERATION)
bool unload(void);
// print table contents and node locations
void print(void);
// node structure: stored word, pointer to next node
typedef struct _node {
char word[LENGTH + 1];
struct _node *next;
} node;
node *HASH_TABLE[HASH_MAX];
int
main(int argc, char *argv[])
{
// open dictionary file, read
FILE *dictionary = fopen("Dictionary.txt", "r");
// if dictionary is NULL, return error message, end program
if (!dictionary) {
printf("FILE NOT FOUND\n");
return 1;
}
// if dictionary loaded successfully (function call), close dictionary and
// print table contents
if (load(dictionary)) {
fclose(dictionary);
// print "LIST (number): {(name, address), ...}\n
print();
}
// test check function for word that does not exist in the library
char *checkword = "Albatross";
// test check function for word that does exist in the library
char *checkword2 = "Riku";
// return check results for checkword, found or not found
if (check(checkword)) {
printf("\n%s found\n", checkword);
}
else {
printf("\n%s not found\n", checkword);
}
// return check results for checkword2, found or not found
if (check(checkword2)) {
printf("\n%s found\n", checkword2);
}
else {
printf("\n%s not found\n", checkword2);
}
// if unloaded successfully (function call), print contents
if (unload()) {
// DEBUG DEBUG DEBUG (confirm unload function returned true)
printf("\nUNLOADED...\n\n");
print();
}
}
// assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
unsigned int
hash(char *word)
{
// store converted word for uniform key
// hash code
unsigned int code = 0;
unsigned char chr;
while (1) {
chr = *word;
if (chr == 0)
break;
chr = tolower(chr);
*word++ = chr;
code += chr;
code *= 3;
}
// set code to remainder of current code divided by maximum hash table size
code = code % HASH_MAX;
return code;
}
// load dictionary into memory
bool
load(FILE *dictionary)
{
// scan for next word
while (1) {
// new node
node *new_n = malloc(sizeof(node));
if (fscanf(dictionary, "%s", new_n->word) != 1) {
free(new_n);
break;
}
// store scanned word in new node
new_n->next = NULL;
// retrieve and store hash code
unsigned int code = hash(new_n->word);
// pushing on the front of the list is adequate and is faster
new_n->next = HASH_TABLE[code];
HASH_TABLE[code] = new_n;
}
// confirm successful load
return true;
}
// check if word exists in dictionary
bool
check(const char *arg)
{
char word[LENGTH + 1];
// retrieve and store hash code
strcpy(word,arg);
unsigned int code = hash(word);
// set traversal node to hash location head
node *check = HASH_TABLE[code];
// while traversal node is not NULL
while (check != NULL) {
// compare traversal node's word to provided word argument
int check_true = strcmp(check->word, word);
if (check_true == 0)
return true;
check = check->next;
}
// if end of list is reached without a match, return false
return false;
}
// unload dictionary from memory, free memory
// (CURRENTLY DEBUGGING, CHECKING ITERATION)
bool
unload(void)
{
// for every element in the hash table, HASH_MAX (50)
for (int i = 0; i < HASH_MAX; i++) {
node *nxt;
for (node *cur = HASH_TABLE[i]; cur != NULL; cur = nxt) {
nxt = cur->next;
free(cur);
}
HASH_TABLE[i] = NULL;
}
// freed successfully
return true;
}
// print hash table contents and node locations
void
print(void)
{
// for every element in the hash table
for (int i = 0; i < HASH_MAX; i++) {
// set traversal node to current hash table element head
node *check = HASH_TABLE[i];
// print hash table element location
printf("LIST %02d: {", i);
// for all nodes in the current linked list
while (check != NULL) {
// print traversal node's word
printf("%s, ", check->word);
// move to next node
check = check->next;
}
printf("}\n");
}
printf("\n");
}
UPDATE:
Could you please explain for (int chr = *word++; chr != 0; chr = *word++)? I don't know what *word++ means in this context.
Sure. With chr = *word++; it means dereference word [a char pointer]. This fetches the char value pointed to by word (i.e. fetch the value from memory). Then, set this value into chr. Then, increment word [so it points to the next character in the array.
The statement is composed of three operators: = is the assignment operator. * is a dereference operator and ++ is a post-decrement operator.
Based on the precedence [and/or binding] of the operators, * has higher precedence [tighter binding], so it is performed first. The value is placed in chr. Then, ++ is performed on the value in word. It is as the following is performed as a single statement:
chr = *word;
word += 1;
chr = tolower(chr); should be chr = tolower((unsigned char)chr); for reasons explained in my answer. Alternatively, you could define chr as unsigned char chr;
I was under the impression that tolower et. al. were "self protective" of this (e.g. they did the unsigned char cast). But, the [linux] manpage says its UB if the value is out of range. I've edited the second example to use unsigned char chr;.
Strangely, for glibc's tolower, it has a range check built it that works on the int value and returns the original value (i.e. does not index into the translation table) if the value is out of range. This appears to be part of some BSD compatibility [the BSD manpage states it does a range check, but the feature is deprecated]. I'm guessing the glibc range check as added after the manpage was written.
To me, the macro should just do the cast itself [and the global function as well]. But, I think this might break the BSD compatibility.
But, now we're all hamstrung to the old way [or add a wrapper macro] because of backward compatibility.
it is confusing for hash to have a side effect on its argument and further confusing that this side effect be necessary for the strcmp in check to work.
The side effect is [probably] no more [or, perhaps, even less] egregious than what strtok does. That is, it's not modifying a hidden/unrelated global, etc.
IMO, it wouldn't be confusing if the effect were commented [I documented it in the answer text]. Perhaps renaming hash to something a bit more descriptive would help. We could do: take_hash_of_argument_that_we_modify_to_lowercase_first.
That would make the function name "self documenting" as some (e.g. "Uncle" Bob Martin(?)) might suggest member functions should be.
But, maybe hash_and_lowercase might be better. This might be a sufficient clue to the reader that they need to consult the API documentation for the function rather than assuming they know all about it from just the name.
The linked list traversal is much faster with strcmp, so, at a minimum [architecturally] we want to store lower case strings in the nodes. We don't want to repeat the lowercasing for each node on each scan. And, we don't want strcasecmp to repeat the lowercasing on word [and the string in the node] for each loop iteration.
As you say, we could have two functions. And we could still achieve this refactoring: a string based version of tolower that lowercases its argument and leave the hash as it was done originally.
Originally, I considered this approach. I soon realized that everywhere you did a hash, you wanted it to be on the lowercased string. We could achieve this with (e.g.):
strlower(word);
value = hash(word);
But, there wasn't a use case here for doing one of these calls separately--only in pairs.
So, given that, why scan the argument string twice and slow down the operation by 2x?
From JFK [after the failed Bay of Pigs invasion]: Mistakes are not errors if we admit them.
So, I'd paraphrase that as: Side effects are not errors if we document them.
There are multiple problems in your code:
the word member of the _node structure has the wrong type: it should just be an array of 20 characters, not an array of 20 char pointers. And dont use _node, identifiers starting with _ are reserved. Change the definition to:
typedef struct node { // node structure: stored word, pointer to next node
char word[LENGTH+1];
struct node *next;
} node;
your reading loops are incorrect: while (!feof(dictionary)) is not the proper test to detect the end of file, you should instead test if fscanf() successfully reads the next word:
while (fscanf(dictionary, "%s", word) == 1) // until end of dictionary file
Furthermore you should specify a maximum length for fscanf() to avoid undefined behavior on long words:
while (fscanf(dictionary, "%19s", word) == 1) // read at most 19 characters
You do not check for allocation failure.
There are many redundant tests such as else if (HASH_TABLE[code] != NULL) and if (trav->next == NULL) in load(), else if (check_true != 0) and if (check == NULL) in check().
You do not modify trav in the loop while (trav != NULL) in the DEBUG code, causing an infinite loop.
It is not difficult to free the dictionary in unload(), your iteration checking code is way too complicated, you already have correct iteration code for print(). Here is a simple example:
bool unload(void) { // unload dictionary from memory, free memory
for (int i = 0; i < HASH_MAX; i++) {
while (HASH_TABLE[i]) {
node *n = HASH_TABLE[i];
HASH_TABLE[i] = n->next;
free(n);
}
}
return true;
}
Note also that there is no need to store the converted word to compute the hash value, and char values must be cast as (unsigned char) to pass to tolower() because this function is only defined for the values of unsigned char and the special negative value EOF. char may be a signed type, so tolower(word[i]) has undefined behavior for extended characters.
unsigned int hash(const char *word) // assign hash code -- [(code + current letter) * 3] * string length, % HASH_MAX
{
unsigned int code = 0; // hash code
for (int i = 0; word[i] != '\0'; i++) {
// compute hashcode from lowercase letters
code = (code + tolower((unsigned char)word[i])) * 3;
}
code = code % HASH_MAX; // set code to remainder of current code divided by maximum hash table size
return code;
}
I have created a linked list which enters a word character by character from a string. Each character represents a node inside of the linked list. For example, Ian is nice would appears as:
I->A->N-> ->i->s-> ->n->i->c->e->terminates with new line.
I am attempting to find "IAN" when I search and locate how many times Ian would appear in the nodes of the linked list.
When I used my current code, I will search for the desired word. It will enter and begin search the linked list and it won't find any occurrences.
I've attempted to search the linked list as strings, but that makes little since in this context the link list consists of characters.
I have also tried searching for words character by character, but they return nothing.
I am now attempting to search for the word inside of the function, and it now returns an infinite loop of search. Even though I've attempted to end it at new line, and it's not counting the number of occurrences.
node* Find(char findCharacter){
node *nodePtr = headNode;
int occurrences = 0;
int index;
const int arraySize = 51;
char findWord[arraySize];
printf("Enter the word that you would like to find: ");
gets_s(findWord);
printf("\n");
while((nodePtr != NULL) && (nodePtr->character != findCharacter)){
for(index = 0; nodePtr->character != '\n'; index++){
if(findWord == &findWord[index]){
nodePtr = nodePtr->nextNode;
}
else{
}
}
occurrences = occurrences + 1;
}
printf("number of occurrences: %d\n", occurrences);
return nodePtr;
}
int main() {
const int arraySize = 201;
char entryString[arraySize];
int index;
/*
* Let the user enter a string to start the program
* */
printf("Enter a user string: ");
gets_s(entryString);
printf("\n");
int length = strlen(entryString);
for(index = 0; index < length; index++){
Find(entryString[index]);
}
return 0;
}
If the user enters: "Ian Ian is a great student."
and the user attempts to find: Ian
The program is supposed to return:
The user has entered: " Ian Ian is a great student."
"The number of occurrences for Ian is 2"
You probably have working code to set up your linked list. A simple function to count how often a single letter occurs would look like this:
int count_chars(node *nodePtr, char findme)
{
int occurrences = 0;
while (nodePtr) {
if (nodePtr->character == findme) {
occurrences++;
}
}
return occurrences;
}
(Note: I've chosen to pass the head node to the function instead of using a global head pointer. That will allow you to have more than one linked list in your program. That means that you could even implement your search word "Ian" as linked list.)
Instead of finding a string anywhere in the list, let's first write some code that tests whether a linked list begins with a certain word. Walk the list and the string simultaneously and check for mismatches. Becsuae we require all of the string, but not all of the list to match, we control the loop with the string:
int startswith(node *nodePtr, const char *findme)
{
while (*findme) {
if (nodePtr == NULL || *findme != nodePtr->character) {
return 0; // mismatch!
}
nodePtr = nodePtr->nextNode; // next node in list
findme++; // next char in string
}
return 1; // all chars match
}
But you want to find the string anywhere, not just at the beginning. Each node in the list can be considered the head node of a sublist that starts at that node. So if your list holds "cat", the nodes in the list hold the sublists "cat", "at", and "t" respectively. With the function above, you can check for your word at each node. The code looks very similar to the first code snippet above where we have just counted chars.
int count_strings(node *nodePtr, const char *findme)
{
int occurrences = 0;
while (nodePtr) {
if (startswith(nodePtr, findme)) {
occurrences++;
}
}
return occurrences;
}
One remark about code design: Don't try to do everything into your function. If the string you search for comes from user input, don't make that part of the function. Instead, do that separately: read the input, then test that input with count_strings. That way, your function can be used in other contexts than just your specialised case. (The same holds more or less for passing in the head node as parameter.)
First post, extremely limited in coding knowledge and new to C. Be gentle! I am at the point where "trying" different things is just confusing me more and more. I need someone's correct guidance!
This particular problem is from an online edX course I am attempting which ultimately when implemented correctly, checks a given word read in from a text file (the 'check' function) and compares it to each word read into (from the 'load' function) a linked list of structs.
I believe I have the load function implemented correctly as when I use gdb, as I am seeing what I anticipate as I step through it, but my question and my problem relates specifically to the check function. I still have a lot to implement to finish my code but while testing with gdb, I am not seeing values of the char* member of the struct correspond with what I anticipate I should see.
When using gdb and stepping through the 'check' function and trying to access the dword member of the struct nodes in the linked list I created in the load function, I anticipate I should see a string for the char* member. For instance, I anticipate the word "cat" assigned to current->dword , but am instead seeing in gdb when I test:
~(gdb) print current->dword
$13 = 0xbfffede2 "\004\b\214\365\372D\300\355\377\277"
My thoughts are that I'm still only accessing an address somehow and not the actual value, but I'm oblivious as to why this is. When the node is created in the load function, a value is assigned to the dword member correctly (at least as far as I can tell while stepping through the code in gdb) but doesn't seem to be accessed correctly in the check function. Any help for a newbie would be appreciated!
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "dictionary.h"
typedef struct node
{
char* dword;
struct node* next;
}
node;
// keep track of #of words in dictionary loaded
int wordCounter = 0;
// create root for hash table
node* root[26];
// create cursor to keep place in creating, pointing, and traversing through nodes
node* current = NULL;
/**
* Returns true if word is in dictionary else false.
*/
bool check(const char* word)
{
// size of word read into buffer
int wordSize = sizeof(word);
// prepare to make a new lowercase only word for comparison to lowercase only dictionary
char bufWord[wordSize];
// make it
for(int i = 0; i < wordSize; i++)
{
if (i == wordSize - 1)
{
bufWord[i] = '\0';
}
else
{
bufWord[i] = tolower(word[i]);
}
}
// hash word to achieve proper root node location
int hash = bufWord[0] - 97;
// point to the correct root node to begin traversing
current = root[hash];
// make sure there is even a word in hash table location
if(root[hash] == NULL)
{
return false;
}
else if(root[hash] != NULL)
{
// progress through the nodes until the last node's next pointer member is NULL
while(current != NULL)
{
// compare 1st letter only of current->dword[i] to bufWord[i] to save time
// if they don't match, return false
// if they do match then continue
\
char dictWord[wordSize];
// hold copy of struct member value to compare to dictWord
char* wordTemp = current->dword;
//
for(int i = 0; i < wordSize; i++)
{
dictWord[i] = wordTemp[i];
}
// do a spell check
if(strcmp(bufWord, dictWord) == 0)
{
return true;
}
else
{
// set current to the next node if any or NULL if it's already the last node in the list
current = current->next;
}
}
}
return false;
}
/**
* Loads dictionary into memory. Returns true if successful else false.
*/
bool load(const char* dictionary)
{
// buffer for reading in dictionary words
char wordIn[LENGTH + 1];
// open the dictionary file
FILE* newDict = fopen(dictionary, "r");
for (int i = 0; i < 27; i++)
{
root[i] = NULL;
}
// while there are words to read
while(fscanf(newDict, "%s ", wordIn) > 0)
{
// keep track of #of words for constant time read in size function
wordCounter++;
// hash the first letter for the location in root
int hash = wordIn[0] - 97;
// malloc space for a new node
node* newNode = malloc(sizeof(node));
// error check
if (newNode == NULL)
{
return false;
}
// set value member of node to current word
newNode->dword = wordIn;
// first insertion into linked list if that root node has not been used yet
if(root[hash] == NULL)
{
// sets to NULL
newNode->next = root[hash];
// link it
root[hash] = newNode;
}
else if(root[hash] != NULL)
{
// starts at the root
node* current = root[hash];
// insert into new beginning of list
newNode->next = current;
root[hash] = newNode;
}
}
fclose(newDict);
return true;
}
/**
* Returns number of words in dictionary if loaded else 0 if not yet loaded.
*/
unsigned int size(void)
{
return wordCounter;
}
/**
* Unloads dictionary from memory. Returns true if successful else false.
*/
bool unload(void)
{
// TODO
return false;
}
The source of your problem is the line:
newNode->dword = wordIn;
wordIn is a local array in load. You are storing the address of wordIn in the dword of your nodes. When you return from load, those addresses are no valid any longer.
What you need to do is allocate memory for the string in wordIn, assign the allocated memory to newNode->dword and copy the contents of wordIn to newNode->dword.
If your platform provides the non-standard function strdup, you can change the above line to:
newNode->dword = strdup(wordIn);
If not, it is easily implemented:
char* strdup(char const* in)
{
char* r = malloc(strlen(in)+1);
strcpy(r, in);
return r;
}
I'm trying to search a link list in c, I can get it to match my search string with the first node but not the next any ideas why. Here's my code:
void fnSearchList(struct listnode *ptrH, char strName[50])
{
struct listnode *ptrTemp = ptrH;
int nCount = 0;
// nRet = strcmp(strName, ptrH->arcFirstName);
// printf("%i", nRet);
if(!ptrH)
{
/* Empty List */
printf("\n\nEmpty List \n\n");
}
else
{
while(ptrTemp->ptrNext)
{
nRet = strcmp(strName, ptrTemp->arcFirstName);
if(nRet == 0)
{
printf("The value %s has been located\n", ptrTemp->arcFirstName);
nCount++;
}
ptrTemp = ptrTemp->ptrNext;
}
if(!nCount)
printf("\t\tValue not found within the list\n");
else
printf("\t\tA total of %d were found\n", nCount);
}
printf("The list totals %d\n", fnTotalList(ptrH));
}
I have marked a few things out as I was testing to see if the strcmp was working which it is.
I think your while loop should be:
while (ptrTemp)
Otherwise it will not look a the last element in the list
Change:
while(ptrTemp->ptrNext)
To:
while(ptrTemp)
The condition to check for the while loop should be while(ptrTemp) and not while(ptrTemp->ptrNext). That is because you already change ptrTemp to point to the next node in the list by doing
ptrTemp = ptrTemp->ptrNext;
So your code skips the last node in the list because lastNode->ptrNext == NULL is true.
Also, note that strName parameter of your function fnSearchList is a pointer to char type and not an array of 50 char. You can write it as:
void fnSearchList(struct listnode *ptrH, char *strName) {
// stuff
}
They are exactly the same.
For this assignment I had to create my own string class. I initially wrote the compareto method to compare two string but return whichever is overall larger. What I want to do is compare and return which one is alphabetically larger i.e. comparing two strings, for example: smith and htims. With the way I designed the compareto method is that the result will be that they are equal. What I want to do is tell me which one comes first alphabetically, so for my example htims would come first. I understand how to do this in Java or even in C with using the <string.h> library, I am just confused as to how to do this myself.
EDIT: I just wanted to note that I am not looking for code answer, rather a nudge in the how I should write the code.
int compareto(void * S1, void * S2){
String s1 = (String S1);
String s2 = (String S2);
int i, cs1 = 0, cs2 = 0; //cs1 is count of s1, cs2 is count of s2
while(s1->c[i] != '\0'){ //basically, while there is a word
if(s1->c[i] < s2->c[i]) // if string 1 char is less than string 2 char
cs2++; //add to string 2 count
else (s1->c[i] > s2->c[i]) //vice versa
cs1++;
i++;
}
//for my return I basically have
if(cs1>cs2){
return 1;
}
else if(cs2 > cs1){
return 2;
}
return 0;
here is mystring.h
typedef struct mystring {
char * c;
int length;
int (*sLength)(void * s);
char (*charAt)(void * s, int i);
int (*compareTo)(void * s1, void * s2);
struct mystring * (*concat)(void * s1, void * s2);
struct mystring * (*subString)(void * s, int begin, int end);
void (*printS)(void * s);
} string_t;
typedef string_t * String;
Any suggestions, all of my google searches involve using the <string.h> library, so I've had no luck.
Im using this to traverse through a linked list and remove the person whose last name matches the person the user is trying to delete.
Here is my test code to help clarify my problem (Note that compareto is in the remove function):
int main() {
Node startnode, currentnode, newnode;
int ans, success;
String who;
who = newString2();
startnode = (Node) malloc(sizeof(pq_t));
startnode->next = NULL;
currentnode = startnode;
ans = menu();
while (ans != 0) {
switch (ans) {
case add:
newnode = getStudent();
startnode = insert(newnode, startnode);
break;
case remove:
printf("Enter the last name of the person you want to delete : \n");
scanf("%s", &who->c);
startnode = removeStudent(startnode, who, &success);
if (success == 0)
printf("UNFOUND\n");
else
printf("permanently DELETED\n");
break;
case view:
printf("Now displaying the list : \n");
displaylist(startnode);
break;
}
ans = menu();
}
}
Node removeStudent(Node head, String who, int * success) {
Node p, l; //p = pointer node, l = previous node
Student current; //Im using generics, so I have to case the current node->obj as a student.
String ln, cln; //the last name of the person I want to remove, and the last name of the current node
p = head;
l = p;
//there can be three cases, p->first node, p->last node, p->some node in between
if (head->obj == NULL) {
printf("The list is empty\n"); //when the list is empty
*success = 0;
return NULL;
}
while (p != NULL) {
current = (Student) p->obj;
cln = current->ln;
if (ln->compareTo(who, cln) == 0) {
if (head == p) { //when there is only one node
head = head->next;
free(p);
*success = 1;
return head;
} else if (p->next == NULL) { //last node
l->next = NULL;
free(p);
*success = 1;
return head;
} else {
l->next = p->next; //middle
free(p);
*success = 1;
return head;
}
}
l = p;
p = p->next;
}
*success = 0;
return head;//couldnt find the node
}
Try comparing the following pairs of strings:
"ABC" vs "DEF"
"ADF" vs "BBB"
"ABC" vs "CBA"
What results do you get? More importantly, why? How do these results compare to what you want to get?
(You should first work it out in your head. Work out the values of c1 and c2 for each step of the comparison loop.)
First, ln isn't properly initialized in the sample removeStudent(), so calling ln->compareTo will probably cause a segfault. Hopefully, ln is properly initialized in your actual code.
To define an ordering on the strings, you can first define what's known in database circles as a "collation": an ordering on characters. You can implement the collation as a function (called, say, chrcmp), or inline within your string comparison function. The important thing is to define it.
Generally speaking, an ordering on a type induces a lexicographic order on sequences of that type: to compare two sequences, find the first place they differ; the lesser sequence is the one with the lesser element at that position.
More formally, suppose sequences are indexed starting at 0. let a and b be sequences of the base type of lengths m and n, respectively. The lexicographic order a ≤ b is:
a < b where ai R bi and aj=mj for all 0 ≤ j < i
a < b if a is a prefix of b
a=b if m=n and ai=bi for all 0 ≤ i < m
Where "a is a prefix of b" means m < n and ai = bi for all 0 ≤ i < m.
The advantage of this approach is you can write a comparison function that will work with any homogeneous sequence type: strings, lists of strings, arrays of integers, what-have-you. If you specialize the comparison function for null-terminated strings, you don't need to worry about the cases for prefixes; simply have '\0' be the least character in the collation.
From a general comparison function (called, say, lexiCompare), you can define
lexicCompareString (a, b):
return lexicCompare(a, b, chrcmp)
and set a String's compareTo member to lexicCompareString.