I am currently working on pset5 from cs50.
My entire program compiles successfully but stops in the middle of the function called load when program is executed.
Below is my load function, and you can see the comment where it gave me a segmentation fault error.
If you can help me with figuring out how I should approach my error, please do let me know.
I understand that segmentation fault is caused when the program attempts to access a memory that does not belong to it.
However, I have allocated memory and checked whether there was enough memory to continue on the program.
I will provide comments to highlight what my code does.
// In another header file, I have defined 'LENGTH'
// Maximum length for a word
// (e.g., pneumonoultramicroscopicsilicovolcanoconiosis)
#define LENGTH 45
// Represents a node in a hash table
typedef struct node
{
char word[LENGTH + 1];
struct node *next;
}
node;
// Hash table
// I have initialized the array of `node` pointer to point `NULL`
node *table[N] = {NULL};
unsigned int word_counter = 0;
bool load(const char *dictionary)
{
// Open file, and if cannot open, return false
FILE *file = fopen(dictionary, "r");
if (file == NULL)
{
return false;
}
// read string in the file into array of character, `word` until reaching end of the file
char word[LENGTH + 1];
while (fscanf(file, "%s", word) != EOF)
{
// keep track of how many word exists in the file, for later use (not in this function)
word_counter += 1;
// allocated memory for struct type `node`, if not enough memory found, return false
node *n = (node*)malloc(sizeof(node));
if (n == NULL)
{
return false;
}
// assign index by hashing (hash function will not be posted in this question though.)
unsigned int index = hash(&word[0]);
// copy the word from file, into word field of struct type `node`
strncpy(n->word, word, sizeof(word));
// Access the node pointer in this index from array(table), and check is its `next` field points to NULL or not.
// If it is pointing to NULL, that means there is no word stored in this index of the bucket
if (table[index]->next == NULL) // THIS IS WHERE PROGRAM GIVES 'segmentation fault' !!!! :(
{
table[index]->next = n;
}
else
{
n->next = table[index];
table[index]->next = n;
}
}
return true;
}
You define ant initialize the hash table as:
node *table[N] = {NULL};
That means you have an array of null-pointers.
When you insert the first value in the table, then table[index] (for any valid index) will be a null pointer. That means table[index]->next attempt to dereference this null pointer and you will have undefined behavior.
You need to check for a null pointers first:
if (table[index] == NULL)
{
n->next = NULL;
}
else
{
n->next = table[index];
}
table[index] = n;
Related
I just finished pset5 of cs50, and one of functions is meant to load content of a dictionary into a hash table. Inside the loop in said function i have to malloc memory for a node that i will later assign to node in the hash table.
When i tried freeing node n after each loop iteration my function wouldn't work.
When i don't free it it does work and more confusingly it also passes valgrind check and cs50's check50 for memory leaks.
My questions are :
how would i free 'node n' to allow my function to still work?
Why doesn't valgrind detect any memory leaks when i don't free 'n' ? Is it example of undefined behavior ?
How does malloc in a loop work, does it allocate new chunk of memory each time or does it overwrite previous chunk of memory ?
Any answers would be greatly appreciated.
Here is the code :
bool load(const char *dictionary)
{
//Setting counter to determine wheather node comes second in linked list or not.
int counter = 0;
//declaring string array to store words from dictionary
char word1[LENGTH +1];
FILE *dic = fopen(dictionary, "r");
if(dic == NULL)
{
return false;
}
//Loop loading words from dictionary to hash table
while(fscanf(dic, "%s", word1) != EOF )
{
node *n = malloc(sizeof(node));
if (n == NULL)
{
return false;
free(n);
}
int i = hash(word1);
//Storing word in temporary node
strcpy(n->word, word1);
n->next = NULL;
//Three different conditions(first node of[i], second node of[i], and after second node of[i])
if(table[i] == NULL)
{
table[i] = n;
counter++;
counter2++;
}
else if (counter == 1)
{
table[i]->next = n;
counter = 0;
counter2++;
}
else
{
n->next = table[i];
table[i] = n;
counter2++;
}
}
fclose(dic);
return true;
You don't free memory in load. You free it in unload. That was the whole point.
If valgrind doesn't detect memory leaks, then presumably you have a working unload function. Why would it be undefined behaviour?
It will allocate new memory every time. This wouldn't work if it didn't.
I'm trying to create a program that reads a dictionary and then stores the words into the hash table, then read another file checks every word of that file if it is in the hash table if it is not then it will be outputted as a misspelled word. I'm first trying to check if I can load the dictionary file into my hash table and then output the words in the hash table yet my code seems to crash whenever I try to run it. The hash function I use was taken from the Internet. I'm also still very new with data structures, and having a hard time understanding.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// file to read
#define dictionary "dictionary.txt"
// No. of buckets
const unsigned int N = 10;
typedef struct node
{
char* word;
struct node *next;
}
node;
node *table[10];
// hash function
unsigned int hash(char *word)
{
// TODO
unsigned int hash = 5381;
int c = 0;
while (c == *word++)
hash = ((hash << 5) + hash) + c;
return hash % 10;
}
int main(void)
{
// initialize array heads to NULL
for (int i = 0; i < N; i++)
{
table[i] = NULL;
}
// Open file to read
FILE *indata = fopen(dictionary, "r");
if (indata == NULL)
{
printf("cant open\n");
return 1;
}
// variable to store words read from the file
char *words = malloc(sizeof(char) * 20);
if (words == NULL)
{
printf("no memory\n");
return 1;
}
// While loop to read through the file
while (fgets(words, 20, indata))
{
// get the index of the word using hash function
int index = hash(words);
// create new node
node *newNode = malloc(sizeof(node));
if (newNode == NULL)
{
printf("here\n");
return 1;
}
// make the new node the new head of the list
strcpy(newNode->word, words);
newNode->next = table[index];
table[index] = newNode;
// free memory
free(newNode);
}
// free memory
free(words);
// loop to print out the values of the hash table
for (int i = 0; i < N; i++)
{
node *tmp = table[i];
while (tmp->next != NULL)
{
printf("%s\n", tmp->word);
tmp = tmp->next;
}
}
// loop to free all memory of the hash table
for (int i = 0; i < N; i++)
{
if (table[i] != NULL)
{
node *tmp = table[i]->next;
free(table[i]);
table[i] = tmp;
}
}
// close the file
fclose(indata);
}
At least three bugs that independently caused a segfault:
First, newNode->word is used unitialized, so it points to random memory, so the strcpy would segfault. Better to use strdup
Also, after you put newNode in the table, you do free(newNode) making what it points to invalid. This causes the second loop to segfault
Third, in the second loop, if table[i] is null, the while (tmp->next != NULL) will segfault
I've annotated and corrected your code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// file to read
#define dictionary "dictionary.txt"
// No. of buckets
const unsigned int N = 10;
typedef struct node {
char *word;
struct node *next;
} node;
node *table[10];
// hash function
unsigned int
hash(char *word)
{
// TODO
unsigned int hash = 5381;
int c = 0;
while (c == *word++)
hash = ((hash << 5) + hash) + c;
// NOTE: not a bug but probably better
#if 0
return hash % 10;
#else
return hash % N;
#endif
}
int
main(void)
{
// initialize array heads to NULL
for (int i = 0; i < N; i++) {
table[i] = NULL;
}
// Open file to read
FILE *indata = fopen(dictionary, "r");
if (indata == NULL) {
printf("cant open\n");
return 1;
}
// variable to store words read from the file
char *words = malloc(sizeof(char) * 20);
if (words == NULL) {
printf("no memory\n");
return 1;
}
// While loop to read through the file
while (fgets(words, 20, indata)) {
// get the index of the word using hash function
int index = hash(words);
// create new node
node *newNode = malloc(sizeof(node));
if (newNode == NULL) {
printf("here\n");
return 1;
}
// make the new node the new head of the list
// NOTE/BUG: word is never set to anything valid -- possible segfault here
#if 0
strcpy(newNode->word, words);
#else
newNode->word = strdup(words);
#endif
newNode->next = table[index];
table[index] = newNode;
// free memory
// NOTE/BUG: this will cause the _next_ loop to segfault -- don't deallocate
// the node you just added to the table
#if 0
free(newNode);
#endif
}
// free memory
free(words);
// loop to print out the values of the hash table
for (int i = 0; i < N; i++) {
node *tmp = table[i];
// NOTE/BUG: this test fails if the tmp is originally NULL (i.e. no entries
// in the given hash index)
#if 0
while (tmp->next != NULL) {
#else
while (tmp != NULL) {
#endif
printf("%s\n", tmp->word);
tmp = tmp->next;
}
}
// loop to free all memory of the hash table
for (int i = 0; i < N; i++) {
if (table[i] != NULL) {
node *tmp = table[i]->next;
free(table[i]);
table[i] = tmp;
}
}
// close the file
fclose(indata);
}
UPDATE:
I made a linked list program before that stores an integer in the list, int number; struct node *next; and I used newNode->number = 5; and it worked, why is it in this case it doesn't?? Is it because I am working with strings here??
The difference is that word is a pointer. It must be assigned a value before it can be used. strcpy does not assign a value to word. It tries to use the contents of word as the destination address of the copy.
But, the other two bugs happen regardless of word being a char * vs number being int.
If you had defined word not as a pointer, but as a fixed array [not as good in this usage], the strcpy would have worked. That is, instead of char *word;, if you had done (e.g.) char word[5];
But, what you did is better [with the strdup change] unless you can guarantee that the length of word can hold the input. strdup will guarantee that.
But, notice that I [deliberately] made word have only five chars to illustrate the problem. It means that the word to be added can only be 4 characters in length [we need one extra byte for the nul terminator character]. You'd need to use strncpy instead of strcpy but strncpy has issues [it does not guarantee to add the nul char at the end if the source length is too large].
Conincidentally, there is another question today that has an answer that may help shed some more light on the differences of your word struct member: Difference between memory allocations of struct member (pointer vs. array) in C
From a cursory glance I can see two problems:
You don't allocate space for your word in the node; you simply strcopy the word into an undefined pointer. You might want to use strdup instead.
You free the memory of the node after you added it to the list. The table is an array of pointers, so you store the point in the table and then throw away the memory that it points to.
Oh, three: and in the final loop you free the unallocated memory again...
code from cs50 harvard course dealing with linked list:
---The problem I do not understand is that when node *ptr points to numbers, which is a null pointer, how can the for loop: (node *ptr = numbers; ptr != NULL) run at all since *numbers = NULL?---
full version of the codes can be found at: https://cdn.cs50.net/2017/fall/lectures/5/src5/list2.c
#include <cs50.h>
#include <stdio.h>
typedef struct node
{
int number;
struct node *next;
}
node;
int main(void)
{
// Memory for numbers
node *numbers = NULL;
// Prompt for numbers (until EOF)
while (true)
{
// Prompt for number
int number = get_int("number: ");
// Check for EOF
if (number == INT_MAX)
{
break;
}
// Check whether number is already in list
bool found = false;
for (node *ptr = numbers; ptr != NULL; ptr = ptr->next)
{
if (ptr->number == number)
{
found = true;
break;
}
}
The loop is to check for prior existence in the list actively being built. If not there (found was never set true), the remaining inconveniently omitted code adds it to the list.
On initial run, the numbers linked list head pointer is null, signifying an empty list. That doesn't change the algorithm of search + if-not-found-insert whatsoever. It just means the loop is never entered because the bail-case is immediately true. in other words, with numbers being NULL
for (node *ptr = numbers; ptr != NULL; ptr = ptr->next)
the condition to continue, ptr != NULL is already false, so the body of the for-loop is simply skipped. That leads to the remainder of the code you didn't post, which does the actual insertion. After that insertion, the list now has something, and the next iteration of the outer-while loop will eventually scan the list again after the next prospect value is read. This continues until the outer-while condition is no longer satisfied.
A Different Approach
I have never been fond of the cs50 development strategy, and Harvard's technique for teaching C to entry-level CS students. The cs50 header and lib has caused more transitional confusion to real-world software engineering than one can fathom. Below is an alternative for reading a linked list of values, keeping only unique entries. It may look like a lot, but half of this is inline comments describing what is going on. Some of it will seem trivial, but the search-and-insert methodology is what you should be focusing on. It uses a strategy of pointer-to-pointer that you're likely not familiar with, and this is a good exposure.
Enjoy.
#include <stdio.h>
#include <stdlib.h>
struct node
{
int value;
struct node *next;
};
int main()
{
struct node *numbers = NULL;
int value = 0;
// retrieve list input. stop when we hit
// - anything that doesn't parse as an integer
// - a value less than zero
// - EOF
while (scanf("%d", &value) == 1 && value >= 0)
{
// finds the address-of (not the address-in) the first
// pointer whose node has a value matching ours, or the
// last pointer in the list (which points to NULL).
//
// note the "last" pointer will be the head pointer if
// the list is empty.
struct node **pp = &numbers;
while (*pp && (*pp)->value != value)
pp = &(*pp)->next;
// if we didn't find our value, `pp` holds the address of
// the last pointer in the list. Again, not a pointer to the
// last "node" in the list; rather the last actual "pointer"
// in the list. Think of it as the "next" member of last node,
// and in the case of an empty list, it will be the address of
// the head pointer. *That* is where we will be hanging our
// new node, and since we already know where it goes, there is
// no need to rescan the list again.
if (!*pp)
{
*pp = malloc(sizeof **pp);
if (!*pp)
{
perror("Failed to allocate new node");
exit(EXIT_FAILURE);
}
(*pp)->value = value;
(*pp)->next = NULL;
}
}
// display entire list, single line
for (struct node const *p = numbers; p; p = p->next)
printf("%d ", p->value);
fputc('\n', stdout);
// free the list
while (numbers)
{
struct node *tmp = numbers;
numbers = numbers->next;
free(tmp);
}
return EXIT_SUCCESS;
}
This approach is especially handy when building sorted lists, as it can be altered with just a few changes to do so.
If you examine rest of the code which is also within the while loop, you can see alteration of numbers on the shared link.
if (!found)
{
// Allocate space for number
node *n = malloc(sizeof(node));
if (!n)
{
return 1;
}
// Add number to list
n->number = number;
n->next = NULL;
if (numbers)
{
for (node *ptr = numbers; ptr != NULL; ptr = ptr->next)
{
if (!ptr->next)
{
ptr->next = n;
break;
}
}
}
else
{
numbers = n;
}
}
Besides, it doesn't hit body of the for loop at first, so your thinking is correct.
I keep getting segfault for my load function.
bool load(const char *dictionary)
{
//create a trie data type
typedef struct node
{
bool is_word;
struct node *children[27]; //this is a pointer too!
}node;
//create a pointer to the root of the trie and never move this (use traversal *)
node *root = malloc(sizeof(node));
for(int i=0; i<27; i++)
{
//NULL point all indexes of root -> children
root -> children[i] = NULL;
}
FILE *dptr = fopen(dictionary, "r");
if(dptr == NULL)
{
printf("Could not open dictionary\n");
return false;
}
char *c = NULL;
//scan the file char by char until end and store it in c
while(fscanf(dptr,"%s",c) != EOF)
{
//in the beginning of every word, make a traversal pointer copy of root so we can always refer back to root
node *trav = root;
//repeat for every word
while ((*c) != '\0')
{
//convert char into array index
int alpha = (tolower(*c) - 97);
//if array element is pointing to NULL, i.e. it hasn't been open yet,
if(trav -> children[alpha] == NULL)
{
//then create a new node and point it with the previous pointer.
node *next_node = malloc(sizeof(node));
trav -> children[alpha] = next_node;
//quit if malloc returns null
if(next_node == NULL)
{
printf("Could not open dictionary");
return false;
}
}
else if (trav -> children[alpha] != NULL)
{
//if an already existing path, just go to it
trav = trav -> children[alpha];
}
}
//a word is loaded.
trav -> is_word = true;
}
//success
free(root);
return true;
}
I checked whether I properly pointed new pointers to NULL during initialization. I have three types of nodes: root, traversal (for moving), and next_node. (i.) Am I allowed to null point the nodes before mallocing them? (ii.) Also, how do I free 'next_node' if that node is initialized and malloced inside an if statement? node *next_node = malloc(sizeof(node)); (iii.) If I want to set the nodes as global variables, which ones should be global? (iv.) Lastly, where do I set global variables: inside the main of speller.c, outside its main, or somewhere else? That's alot of questions, so you don't have to answer all of them, but it would be nice if you could answer the answered ones! Please point out any other peculiarities in my code. There should be plenty. I will accept most answers.
The cause of segmentation fault is the pointer "c" which you have not allocated memory.
Also, in your program -
//scan the file char by char until end and store it in c
while(fscanf(dptr,"%s",c) != EOF)
Once you allocate memory to pointer c, c will hold the word read from file dictionary.
Below in your code, you are checking for '\0' character-
while ((*c) != '\0')
{
But you are not moving the c pointer to point to next character in the string read because of which this code will end up executing infinite while loop.
May you can try something like this-
char *tmp;
tmp = c;
while ((*tmp) != '\0')
{
......
......
//Below in the loop at appropriate place
tmp++;
}
I am working on a problem in the K&R book (#6.3) where the user inputs a sequence of words, and you have to create a list of these words along with the lines that each one appears on. It's supposed to involve structures so these are the ones I have right now:
struct entry {
int line;
int count;
struct entry *next;
};
struct word {
char *str;
struct entry *lines;
struct word *next;
};
static struct word *wordlist = NULL; // GLOBAL WORDLIST
However when I input something and the program tries to add a new entry to the structure (which is somewhat like a linked list), there is a problem and the program terminates with no error message. Code for that:
void add_entry(char *word, int line)
{
if (word == NULL || line <= 0 || is_blocked_word(word))
return;
struct word *w;
for (w = wordlist; w != NULL && w->next != NULL && !strcmp(w->str, word); w = w->next);
// If word is found in the wordlist, then update the entry
if (w != NULL) {
struct entry *v;
for (v = w->lines; v != NULL && v->next != NULL && v->line != line; v = v->next);
if (v == NULL) {
struct entry *new = (struct entry*) malloc(sizeof(struct entry));
new->line = line;
new->count = 1;
new->next = NULL;
if (w->lines == NULL)
w->lines = new;
else
v->next = new;
}
else v->count++;
}
// If word is not found in the word list, then create a new entry for it
else {
struct word *new = (struct word*) malloc(sizeof(struct word));
new->lines = (struct entry*) malloc(sizeof(struct entry));
new->next = NULL;
new->str = (char*) malloc(sizeof(char) * strlen(word));
new->lines->line = line;
new->lines->count = 1;
new->lines->next = NULL;
strcpy(new->str, word);
// If the word list is empty, then populate head first before populating the "next" entry
if (wordlist == NULL)
wordlist = new;
else
w->next = new;
}
}
The program terminates even after adding just the first word to wordlist. This is on the line that says if (wordlist == NULL) wordlist = new; where new contains the pointer to a valid structure that I malloc'ed. How can this be possible?
As far as I know it's a problem with my pointer usage but I'm not sure where exactly it lies. Can someone help?
Some fairly evident, and some not-so-evident things.
The for-loop limit for w stops one short
for (w = wordlist; w != NULL && w->next != NULL && !strcmp(w->str, word); w = w->next);
This will start with the first and continue until
We have run out of nodes
We have almost (one short) run out of nodes.
The word in the current node does NOT match
Almost the same problem, different for-loop
for (v = w->lines; v != NULL && v->next != NULL && v->line != line; v = v->next);
As above, this has similar attributes (but not the third option, as this correctly continues so long as the line numbers do not match. The prior loop broke as soon as any word did not match.
And that is in the first ten lines of this function.
String allocation size fails to account for the nulchar terminator
This falls short by one char of the allocation size needed for a zero-terminated string:
malloc(sizeof(char) * strlen(word))
You always need space for the terminator. The easiest way to remember that is to consider how many chars are needed for a zero-length C string? Answer: one, because the terminator needs to go somewhere. After that is simply length+1
One possible way to do this is via a pointer-to-pointer approach, shown below:
void add_entry(const char *word, int line)
{
if (word == NULL || line <= 0 || is_blocked_word(word))
return;
struct word **pp = &wordlist;
for (; *pp && strcmp((*pp)->str, word); pp = &(*pp)->next);
if (*pp)
{
// search for matching line number
struct entry **vv = &(*pp)->lines;
for (; *vv && (*vv)->line != line; vv = &(*vv)->next);
if (!*vv)
{
*vv = malloc(sizeof(**vv));
if (!*vv)
{
perror("Failed to allocate line entry.");
exit(EXIT_FAILURE);
}
(*vv)->count = 1;
(*vv)->line = line;
(*vv)->next = NULL;
}
else
{ // found an entry. increment count.
(*vv)->count++;
}
}
else
{ // no matching word. create a new word with a new line entry
size_t len = strlen(word);
*pp = malloc(sizeof(**pp));
if (!*pp)
{
perror("Failed to allocate word entry.");
exit(EXIT_FAILURE);
}
(*pp)->lines = malloc(sizeof(*(*pp)->lines));
if (!(*pp)->lines)
{
perror("Failed to allocate line count entry.");
exit(EXIT_FAILURE);
}
(*pp)->str = malloc(len + 1);
if (!(*pp)->str)
{
perror("Failed to allocate word string entry.");
exit(EXIT_FAILURE);
}
(*pp)->lines->count = 1;
(*pp)->lines->line = line;
(*pp)->lines->next = NULL;
(*pp)->next = NULL;
memcpy((*pp)->str, word, len+1);
}
}
How It Works
In both cases, we use a pointer-to-pointer. They are a most-hand construct when the desire is to perform tail-end insertion on a linked list without having to keep a "one-back" or "previous" pointer. Just like any pointer, they hold an address. Unlike a regular pointer-to-something, a pointer-to-pointer-to-something holds the address of another pointer. With it we can "loop" by initially setting it to the address of the head pointer, the entering the search.
struct word **pp = &wordlist;
for (; *pp && strcmp((*pp)->str, word); pp = &(*pp)->next);
Here we start with the address of our head pointer. the loop will terminate if the pointer at the address held in pp is NULL, or if the word actually matches. Otherwise it sets the address of (not the address in) the next pointer of the current node.If we run out of words and never find a match the loop will break, but with a most-handy consequence: pp contains the address of the pointer that we need to set to the new allocation. If the list were initially empty, it contains the address of the head pointer.
With that, we can then do this:
if (*pp)
{
// search for matching line number
struct entry **vv = &(*pp)->lines;
for (; *vv && (*vv)->line != line; vv = &(*vv)->next);
Notice we use the same idea on the line-entry list. Either we're going to find an entry, or the loop will exit with *vv being NULL, and vv contains the address of the next pointer we want to set to our new allocation.
I strongly urge you to step through this code in a debugger line-by-line, and understand how it works. utilizing this technique has many redeeming qualities, among them the incredibly brief method of populating a forward-linked list in O(n) complexity without having to check for a head pointer or walking the list for each insertion and retaining the original order (as opposed to reversing the order as a stack-like solution would result):
struct node *head = NULL;
struct node **pp = &head;
while (get-data-for-our-list)
{
*pp = malloc(sizeof(**pp));
// TODO: populate (*pp)->members here
pp = &(*pp)->next;
}
*pp = NULL;