EDIT: So, it turns out that 'index' was not being returned to 0. Well then. That fixed one segfault. But still getting a different segfault. Working on it.
node* new_node(void){
node* ptr = malloc(sizeof(node));
for (int i = 0; i<27; i++) {
ptr->next[i] = NULL;
}
return ptr;
}
bool load(const char* dictionary)
{
FILE* dict = fopen(dictionary, "r");
node* ptr = new_node;
char word[LENGTH+1];
int index = 0;
for (int c = fgetc(dict); c!=EOF; c = fgetc(dict)){
if(c!='\n'){
word[index]=c;
index++;
}
else {
for(int x=0; x<=index; x++){
int ch = (word[x] == '\'') ? 26 : tolower(word[x])-'a';
if (ptr->next[ch] == NULL){
ptr->next[ch] = new_node;
}
ptr = ptr->next[ch];
}
ptr->end=true;
}
}
return true;
}
I'm trying to implement a trie data structure for a dictionary but my program seems to segfault somewhere in this function. I can't seem to pin it down even with the help of GDB, so can someone give me a hand?
Node is defined as such:
typedef struct node{
bool end;
struct node* next[27];
} node;
Dictionary file:
a
aaa
aaas
aachen
aalborg
aalesund
aardvark
aardvark's
aardvarks
aardwolf
(...)
You have many issues in your code:
When you allocate memory with malloc, it is uninitialised. initialise it directly after allocating it, so that NULL pointers really are null. (calloc, a cousin of ´malloc´, initialises all memory to zero.)
When you loop over the word, you should nor include index:
for (int x = 0; x < index; x++) ...
When you have found the end of a word, you must reset the index to 0. Otherwise, you will append to the old word and overflow the buffer. (You should probably also enforce the upper bound of ´index´.)
Likewise, when you insert a word into the trie, you must reset your pointer for trie traversal to the trie's root. You need two pointers here: A root node pointer and an auxiliary pointer for traversing the trie.
As is, your trie is local to your function. Return the root node, so that other functions can use the trie, or NULL on failure.
Fix these, and you will have a non-crashing function. (It still leaks memory and may not construct the trie properly.)
node *load(const char *dictionary)
{
FILE *dict = fopen(dictionary, "r");
node *head = calloc(1, sizeof(node));
char word[LENGTH + 1];
int index = 0;
for (int c = fgetc(dict); c != EOF; c = fgetc(dict)) {
if (c != '\n') {
word[index] = c;
index++;
} else {
node *ptr = head;
for (int x = 0; x < index; x++) {
int ch = (word[x] == '\'') ? 26 : tolower(word[x]) - 'a';
if (ptr->next[ch] == NULL) {
ptr->next[ch] = calloc(1, sizeof(node));
}
ptr = ptr->next[ch];
}
ptr->end = true;
index = 0;
}
}
return head;
}
The line:
node* ptr = new_node;
and
ptr->next[ch] = new_node;
are not calling the function, but assigning the address of the function to ptr. Call the function instead.
This problem could have been prevented if compiler warnings: -Wall and -Wextra were enabled.
There is no bounds checking done on the array word. Use the value LENGTH to check if the index is in bounds before using it.
It isn't clear what the if statement inside the for loop is doing. It appears that every time a newline is found the whole array word is added to the tree, but the index isn't reset so the same array is added multiple times. At some point index will point out of bounds causing undefined behavior. You should reset index after you use the array word.
You forgot to reset index to 0 at the beginning of the loop.
You should also use calloc(1, sizeof(node)) instead of malloc(sizeof(node)) to avoid leaving memory uninitialized. I suggest you use valgrind to help you track problems of this kind in your code.
You should filter punctuation\unsupported characters a bit more. Any character outside of [a-z|A-Z|\n|\\] will crash your program because of
int ch = (word[x] == '\'') ? 26 : tolower(word[x])-'a';
if (ptr->next[ch] == NULL){
Given that you open a file, there might be a space somewhere or some unexpected character. You need something like
if(c!='\n'){
int num = (c == '\'') ? 26 : tolower(c)-'a');
if(num >=0 && num < 27)
{
word[index]=c;
index++;
}
}
Related
I'm taking CS50 and currently on problem set 'speller.c'. Sorry if this is a simple fix, but i'm just barely understanding C.
I've written all of the functions, but when I try to run the program I get a segmentation fault. After using debug50, it tells me that the segmentation fault has something to do with the hash table, or the array of a custom data type called node. Debug50 tells me that it happens when i try to run an if statement, which is "if (table[hashnum]->next == NULL)", which is in the 'load' function.
I've looked online as to why the seg fault could be happening, but from what I understand, it happens when I access freed pointers, when i dont have enough space, when i try to access memory i'm not allowed to access, or try to write in a 'read only' part of memory. Also, from what I understand about initializing global arrays, the one made in the beginning of the program should allow me to read and write in it, so I'm not sure what i'm doing wrong.
Any help and explanations are appreciated, thanks :)
Below is my code.
Also, the guidelines and goal of the problem set are found at https://cs50.harvard.edu/x/2023/psets/5/speller/.
// Implements a dictionary's functionality
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include "dictionary.h"
// Represents a node in a hash table
typedef struct node
{
char word[LENGTH + 1];
struct node *next;
}
node;
// TODO: Choose number of buckets in hash table
const unsigned int N = 26;
// Hash table
node *table[N];
// Global Variables
char tmp[LENGTH + 1];
int wordcount = 0;
bool hashnull = false;
// New function prototype
void freehash(node* node);
// Returns true if word is in dictionary, else false
bool check(const char *word)
{
// Runs word through hash function
int hashnum = hash(word);
node * tmpnode = table[hashnum]->next;
// Compares all words in linked list given by hashnum to 'word' to see if any match/if 'word' is spelled correctly
while (strcasecmp(word, tmpnode->word) != 0)
{
if (tmpnode->next == NULL)
{
return false;
}
tmpnode = tmpnode->next;
}
return true;
}
// Hashes word to a number
unsigned int hash(const char *word)
{
// TODO: Improve this hash function
return toupper(word[0]) - 'A';
}
// Loads dictionary into memory, returning true if successful, else false
bool load(const char *dictionary)
{
// Opens dictionary
FILE *dict = fopen(dictionary, "r");
// Checks if dictionary opened successfully
if (dict == NULL)
{
fclose(dict);
printf("\n\n Could not load dictionary.\n");
return false;
}
// Load dictionary into hash table
node *tmpnode;
int hashnum;
char check;
while(fread(&check, sizeof(char), 1, dict))
{
fseek(dict, -1, SEEK_CUR);
//// Take a single word from dictionary and put into dictword (new node)
// Make new node/element to add to hash table
node *dictword = malloc(sizeof(node));
if (dictword == NULL)
{
fclose(dict);
free(dictword);
return false;
}
tmpnode = dictword;
// Iterates through all of the letters in a word from dictionary and ends when next line (/n) is read
int tmpctr = 0;
while(fread(&dictword->word[tmpctr], sizeof(char), 1, dict))
{
if(dictword->word[tmpctr] == '\n')
{
break;
}
else
{
tmpctr++;
}
}
// Run hash function to see where in hash table new word will go
hashnum = hash(dictword->word);
//// Put new node for current word into hash table
if (table[hashnum]->next == NULL)
{
table[hashnum]->next = tmpnode;
dictword->next = NULL;
}
else
{
tmpnode = table[hashnum]->next;
dictword->next = tmpnode;
table[hashnum]->next = dictword;
}
// Increase wordcount
wordcount++;
}
// Close dictionary stream (after success)
fclose(dict);
// ***** NO NEED TO MALLOC HERE; THATS WHAT THE UNLOAD FUNCTION IS FOR
return true;
}
// Returns number of words in dictionary if loaded, else 0 if not yet loaded
unsigned int size(void)
{
if (wordcount > 0)
{
return wordcount;
}
else
{
return 0;
}
}
// Unloads dictionary from memory, returning true if successful, else false
bool unload(void)
{
node * tmpnode;
for(int i = 0; i < N; i++)
{
tmpnode = table[i]->next;
freehash(tmpnode);
}
return true;
}
void freehash(node* node)
{
// If node currently being 'freed' isn't the last node, call freehash() with next node in line
if (node->next != NULL)
{
freehash(node->next);
}
free(node);
return;
}
load(): If fopen() fails you fclose(NULL) which segfaults.
load(): As node *table[N] is a global variable it is zero initialized. In load() you do table[hashnum]->next which segfaults as table[hashnum] is NULL. Maybe you want:
if (!table[hashnum]) {
table[hashnum] = tmpnode;
} else if (table[hashnum]->next) {
As aside, minimize the scope of tmpnode to just where it's needed (which is the case where ->next is set). This makes your code easier to read.
load(): You currently include the '\n' of the file but you probably shouldn't and instead want to NUL terminate your string instead:
size_t i = 0;
for(; i < LENGTH && fread(&dictword->word[i], 1, 1, dict) && dictword->word[i] != '\n'; i++);
dictword->word[i] = '\0';
load(): The file reading logic is kinda odd, read one byte, then you back up the file pointer then read letter by letter without checking if word is too long. As #Barmar suggest, just get a line with fgets(), then use strcspn() to replace the \n with a \0.
Code also has subtle bugs beyond CS50 expectations.
OP hashed with return toupper(word[0]) - 'A';
This should be done as return toupper(((unsigned char *)word)[0]) - 'A'; as char may be signed with word[0] < 0 and toupper(int) is defined for unsigned char values and EOF. Characters should be examined as if there are unsigned char even when char is signed.
strcasecmp(), although not a standard function, more often converts to lower (than topper()) and then compares. When the case mapping is not 1-to-1, e.g. toupper() maps ÿ and y to Y, but tolower() maps Y to y. The hash with toupper will not return 0 with strcasecmp("ÿ","y"). Best to use the same case to-ness.
I am trying to add new node to my linked list, but it's gives memory error
my struct and global vars:
typedef struct word word;
struct word
{
char str[256];
word *next;
};
word *head = NULL;
word *cur = NULL;
the function :
int addWord(char * str)
{
word *w = calloc(1, sizeof(w));
if(w == NULL)
{
return 0;
}
strcpy(w->str, str);
if(cur == NULL)
{
cur = w;
head = w;
}
else
{
puts("4");
cur->next = w;
puts("5");
cur = w;
puts("6");
}
return 1;
}
and the result is :
...
4
5
6
4
==73913== Invalid write of size 8
==73913== at 0x109425: addWord (in /home/mz37/programming/godaphy/bin/godaphy.out)
==73913== by 0x109696: parseLine (in /home/mz37/programming/godaphy/bin/godaphy.out)
==73913== by 0x109351: main (in /home/mz37/programming/godaphy/bin/godaphy.out)
==73913== Address 0x4a6a880 is 96 bytes inside an unallocated block of size 4,188,096 in arena "client"
==73913==
5
6
i am still searching for the error and i haven't found it yet
word *w = calloc(1, sizeof(w));
The w variable is of type word pointer hence is likely to be four or eight bytes at most. It may be larger if we end up with 128-bit machines at some point, but it'll be quite some time before it gets to 2000+ bits :-)
You probably wanted to do:
word *w = calloc(1, sizeof(*w));
// note this ___^
The type of *w is the actual type word, and that will be the correct size for what you're trying to do.
And, as an aside, you may want to think about the wisdom of blindly copying whatever string you're given, into a block of memory that can only hold 256 characters. A safer alternative would be:
strncpy(w->str, str, sizeof(w->str) - 1);
// Would normally also do something like:
// w->str[sizeof(w->str) - 1] = '\0';
// but calloc() makes that superfluous.
The resultant function (including compactifying) would be along the following lines:
int addWord(char *str) {
word *w;
// Option to fail if string too big, rather than truncate.
//if (strlen(str) >= sizeof(w->str)
// return 0;
// Allocate and check.
if ((w = calloc(1, sizeof(*w))) == NULL)
return 0;
// Copy in string, truncate if too big.
strncpy(w->str, str, sizeof(w->str) - 1);
// Make new list if currently empty, otherwise add it, then flag success.
if(cur == NULL) {
cur = head = w;
} else {
cur->next = w;
cur = w;
}
return 1;
}
I am having trouble implementing my load and unload functions in pset5 of the cs50 class at Harvard. When I run it, I get a segmentation fault and when I run valgrind, it tells me that none of the nodes that I malloc'd at load were freed.
I've been trying to fix this for days, I've tried several different implementations for my unload function, but nothing's worked. I think the mistake might be in my load function. Would someone please please please help me with this one?
/****************************************************************************
* dictionary.c
*
* Computer Science 50
* Problem Set 5
*
* Implements a dictionary's functionality.
***************************************************************************/
#include <stdbool.h>
#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include "dictionary.h"
#define HASHTABLE_SIZE 5000
// create word counter for size
int wordCount = 0;
// linked link struct
typedef struct node
{
// word's length + NULL character
char word[LENGTH + 1];
struct node* next;
}
node;
// Hashtable array
node* hashtable[HASHTABLE_SIZE];
// hash function from study.cs50.net
int hash_function(char* key)
{
// initialize index to 0
int index = 0;
// sum ascii values
for (int i = 0; key[i] != 0; i++)
{
index += toupper(key[i]) - 'A';
}
return index % HASHTABLE_SIZE;
}
/**
* Returns true if word is in dictionary else false.
*/
bool check(const char* word)
{
// create variable to hold word
char temp[LENGTH + 1];
// convert every character in word to lowercase
for (int i = 0, n = strlen(word); i < n; i++)
{
if (isalpha(word[i]))
{
temp[i] = tolower(word[i]);
}
}
// get hashed word's index
int hash_index = hash_function(temp);
// find head of that index
node* head = hashtable[hash_index];
// traverse through linked list
for (node* cur = head; cur != NULL; cur = cur->next)
{
// find if linnked list contains word
if (strcmp(cur->word, word) == 0)
{
return true;
}
}
return false;
}
/**
* Loads dictionary into memory. Returns true if successful else false.
*/
bool load(const char* dictionary)
{
// // open file
FILE* file = fopen(dictionary, "r");
// check if file exists
if (file == NULL)
{
return false;
}
// word length plus NULL character
char word[LENGTH + 1];
// iterate through every word of the dictionary
while (fscanf(file, "%s\n", word) != EOF) // Source: http://stackoverflow.com/questions/6275558/question-about-whileeof
{
node* new_node = malloc(sizeof(node));
if (new_node == NULL)
{
return false;
}
wordCount++;
strcpy(new_node->word, word); // Source: cs50 reddit
int hash_index = hash_function(new_node->word);
// check whether node should be head
if (hashtable[hash_index] == NULL)
{
hashtable[hash_index] = new_node;
new_node->next = NULL;
}
else
{
new_node->next = hashtable[hash_index];
hashtable[hash_index] = new_node;
}
}
// close file
fclose(file);
return false;
}
/**
* Returns number of words in dictionary if loaded else 0 if not yet loaded.
*/
unsigned int size(void)
{
return wordCount;
}
/**
* Unloads dictionary from memory. Returns true if successful else false.
*/
bool unload(void)
{
// go through all of the indexes in the hashtable
for (int i = 0; i < HASHTABLE_SIZE; i++)
{
node* head = hashtable[i];
while (head != NULL)
{
node* ptr = head->next;
free(head);
head = ptr;
}
}
return true;
}
Your unload function is good. The problem with your code is the check function, notably the part where you try to convert the input to lower case:
char temp[LENGTH + 1];
for (int i = 0, n = strlen(word); i < n; i++)
{
if (isalpha(word[i]))
{
temp[i] = tolower(word[i]);
}
}
There are two issues here. First, temp is not null-terminated. Second, the check for isalpha means you could leave characters uninitialised: If your input is, say, "I'm", temp will hold 'I', garbage, 'm', garbage when it should hold 'I', ' \'', 'm', '\0', garbage.
Alternatively, you can filter out unwanted characters. In that case, you need two indices: one for the source word, another for the filtered word.
But you don't even need this additional step, because you hash function converts the input to toupper again.
Speaking of your hash function: You might want to pick a better one. The current one doesn't distribute the values well over the 5000 slots. (How are you even going to reach 5000 when you add, what?, up to 20 numbers between 0 and 25?)
The hash also has another problem: If you input a number, the contributing "letters" are negative, because in ASCII, numbers have values from 48 to 57 and you subtract the value of 'A', 65, from them. In general, your hash function should return an unsigned value.
First post, extremely limited in coding knowledge and new to C. Be gentle! I am at the point where "trying" different things is just confusing me more and more. I need someone's correct guidance!
This particular problem is from an online edX course I am attempting which ultimately when implemented correctly, checks a given word read in from a text file (the 'check' function) and compares it to each word read into (from the 'load' function) a linked list of structs.
I believe I have the load function implemented correctly as when I use gdb, as I am seeing what I anticipate as I step through it, but my question and my problem relates specifically to the check function. I still have a lot to implement to finish my code but while testing with gdb, I am not seeing values of the char* member of the struct correspond with what I anticipate I should see.
When using gdb and stepping through the 'check' function and trying to access the dword member of the struct nodes in the linked list I created in the load function, I anticipate I should see a string for the char* member. For instance, I anticipate the word "cat" assigned to current->dword , but am instead seeing in gdb when I test:
~(gdb) print current->dword
$13 = 0xbfffede2 "\004\b\214\365\372D\300\355\377\277"
My thoughts are that I'm still only accessing an address somehow and not the actual value, but I'm oblivious as to why this is. When the node is created in the load function, a value is assigned to the dword member correctly (at least as far as I can tell while stepping through the code in gdb) but doesn't seem to be accessed correctly in the check function. Any help for a newbie would be appreciated!
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "dictionary.h"
typedef struct node
{
char* dword;
struct node* next;
}
node;
// keep track of #of words in dictionary loaded
int wordCounter = 0;
// create root for hash table
node* root[26];
// create cursor to keep place in creating, pointing, and traversing through nodes
node* current = NULL;
/**
* Returns true if word is in dictionary else false.
*/
bool check(const char* word)
{
// size of word read into buffer
int wordSize = sizeof(word);
// prepare to make a new lowercase only word for comparison to lowercase only dictionary
char bufWord[wordSize];
// make it
for(int i = 0; i < wordSize; i++)
{
if (i == wordSize - 1)
{
bufWord[i] = '\0';
}
else
{
bufWord[i] = tolower(word[i]);
}
}
// hash word to achieve proper root node location
int hash = bufWord[0] - 97;
// point to the correct root node to begin traversing
current = root[hash];
// make sure there is even a word in hash table location
if(root[hash] == NULL)
{
return false;
}
else if(root[hash] != NULL)
{
// progress through the nodes until the last node's next pointer member is NULL
while(current != NULL)
{
// compare 1st letter only of current->dword[i] to bufWord[i] to save time
// if they don't match, return false
// if they do match then continue
\
char dictWord[wordSize];
// hold copy of struct member value to compare to dictWord
char* wordTemp = current->dword;
//
for(int i = 0; i < wordSize; i++)
{
dictWord[i] = wordTemp[i];
}
// do a spell check
if(strcmp(bufWord, dictWord) == 0)
{
return true;
}
else
{
// set current to the next node if any or NULL if it's already the last node in the list
current = current->next;
}
}
}
return false;
}
/**
* Loads dictionary into memory. Returns true if successful else false.
*/
bool load(const char* dictionary)
{
// buffer for reading in dictionary words
char wordIn[LENGTH + 1];
// open the dictionary file
FILE* newDict = fopen(dictionary, "r");
for (int i = 0; i < 27; i++)
{
root[i] = NULL;
}
// while there are words to read
while(fscanf(newDict, "%s ", wordIn) > 0)
{
// keep track of #of words for constant time read in size function
wordCounter++;
// hash the first letter for the location in root
int hash = wordIn[0] - 97;
// malloc space for a new node
node* newNode = malloc(sizeof(node));
// error check
if (newNode == NULL)
{
return false;
}
// set value member of node to current word
newNode->dword = wordIn;
// first insertion into linked list if that root node has not been used yet
if(root[hash] == NULL)
{
// sets to NULL
newNode->next = root[hash];
// link it
root[hash] = newNode;
}
else if(root[hash] != NULL)
{
// starts at the root
node* current = root[hash];
// insert into new beginning of list
newNode->next = current;
root[hash] = newNode;
}
}
fclose(newDict);
return true;
}
/**
* Returns number of words in dictionary if loaded else 0 if not yet loaded.
*/
unsigned int size(void)
{
return wordCounter;
}
/**
* Unloads dictionary from memory. Returns true if successful else false.
*/
bool unload(void)
{
// TODO
return false;
}
The source of your problem is the line:
newNode->dword = wordIn;
wordIn is a local array in load. You are storing the address of wordIn in the dword of your nodes. When you return from load, those addresses are no valid any longer.
What you need to do is allocate memory for the string in wordIn, assign the allocated memory to newNode->dword and copy the contents of wordIn to newNode->dword.
If your platform provides the non-standard function strdup, you can change the above line to:
newNode->dword = strdup(wordIn);
If not, it is easily implemented:
char* strdup(char const* in)
{
char* r = malloc(strlen(in)+1);
strcpy(r, in);
return r;
}
I'm passing a file pointer to a function(A) which then opens the file, reads a line in a while loop (for each line in the file) and calls another function(B) using theses values. The issue is after running through function B once, the file pointer becomes NULL and I'm not sure why.
void readMatrixData(matrix *matrix, FILE *fileInput)
{
char buffer[30];
while(fgets(buffer, 30, fileInput) != NULL) {
char *splitString = strtok(buffer, ",");
int row = atoi(splitString);
splitString = strtok(NULL, ",");
int column = atoi(splitString);
splitString = strtok(NULL, ",");
int value = atoi(splitString);
insertNewNode(&matrix->rowArray[row], &matrix->columnArray[column], value, row, column);
}
}
I check if fopen returns NULL before calling function A, and it's not. I've also set a breakpoint on the while loop and the first time it hits, fileInput has some memory allocated. However, on the second loop fileInput becomes NULL and I'm not sure why.
EDIT:
Here's the insertNewNode function:
void insertNewNode(node **rowHead, node **columnHead, int value, int row, int column) {
//Get to the correct position in the column linked list
if (*columnHead == NULL) {
*columnHead = malloc(sizeof(node));
} else {
while((*columnHead)->nextColumn != NULL && (*columnHead)->nextColumn->row < row)
*columnHead = (*columnHead)->nextColumn;
}
//Get to the correct position in the row linked list.
if (*rowHead == NULL) {
*rowHead = malloc(sizeof(node));
} else {
while((*rowHead)->nextRow != NULL && ((*rowHead)->nextRow->column < column))
*rowHead = (*rowHead)->nextRow;
}
node *newNode = malloc(sizeof(node));
newNode->column = column;
newNode->row = row;
newNode->value = value;
(*columnHead)->nextColumn = newNode;
(*rowHead)->nextRow = newNode;
}
The structs involved are:
typedef struct matrix {
node **rowArray;
node **columnArray;
Size matrixDimensions;
} matrix;
typedef struct node {
int value;
int row;
int column;
struct node *nextColumn;
struct node *nextRow;
} node;
and I initialise the matrix arrays with:
node *columns[m->matrixDimensions.columns];
node *rows[m->matrixDimensions.rows];
for (int i=0; i< m->matrixDimensions.columns; i++)
{
columns[i] = NULL;
}
for (int i=0; i < m->matrixDimensions.rows; i++)
{
rows[i] = NULL;
}
m->columnArray = columns;
m->rowArray = rows;
Probably the function insertNewNode overwrites memory
Prefer strtol over atoi.
As #DavideBerra suggested, comment out the call to insertNewNode and step through the code to confirm you can make multiple iterations of your while loop.
I don't understand how you are initialising your matrix arrays using m->matrixDimensions.columns and m->matrixDimensions.rows. Are you using C99 VLAs?
Crank up the warning levels of your compiler and ensure zero-warning compilation.
You do not initialize the nextRow and nextColumn fields of your newly allocated node. Doing so should prevent you from at least some trouble. It is strange that you do not get a Segfault.
You are also mixing array and linked list, what could happen if you get "overflowing" values from your file ? I feel like the segfault is not far away from here. Be very careful, your code shows weird concept mixing !
As others had suggested you, comment your insertNewNode call and see if your loop is well performed. If it is, run your program step by step using a debugger. Hope this helps, good luck !
check the value of row and column before accessing matrix->rowArray and matrix->columnArray whether these values are less than the array size.
My guess is that the values row,column may be outside your matrix and thus overwriting memory. Add a check of the values you receive and make sure your matrix is large enough. Remember arrays are zero indexed in C.