SEGMENTATION FAULT in strncpy - load from dictionary - c

I have this function "load" where I read words from a dictionary and put them in an hashtable of linked lists. When I try to read a line and save it in my new_node->text the compiler returns SEGMENTATION FAULT and I don't know why. The error apperars when I use strncpy.
#define HASHTABLE_SIZE 76801
typedef struct node
{
char text[LENGTH+1];
//char* text;
//link to the next word
struct node* next_word;
}
node;
node* hashtable[HASHTABLE_SIZE];
bool load(const char* dictionary)
{
FILE* file = fopen(dictionary,"r");
unsigned long index = 0;
char str[LENGTH+1];
if(file == NULL)
{
printf("Error opening file!");
return false;
}
while(! feof(file))
{
node * new_node = malloc(sizeof(node)+1000);
while( fscanf(file,"%s",str) > 0)
{
printf("The word is %s",str);
strncpy(new_node->text,str,LENGTH+1);
//strcpy(new_node->text,str);
new_node->next_word = NULL;
index = hash( (unsigned char*)new_node->text);
if(hashtable[index] == NULL)
{
hashtable[index] = new_node;
}
else
{
new_node->next_word = hashtable[index];
hashtable[index] = new_node;
}
n_words++;
}
//free(new_node);
}
fclose(file);
loaded = true;
return true;
}

Let's look at your code line by line, shall we?
while(! feof(file))
{
This is not the right way to use feof - check out the post Why is “while ( !feof (file) )” always wrong? right here on StackOverflow.
node * new_node = malloc(sizeof(node)+1000);
Hmm, ok. We allocate space for one node and 1000 bytes. That's a bit weird, but hey... RAM is cheap.
while( fscanf(file,"%s",str) > 0)
{
Uhm... another loop? OK...
printf("The word is %s",str);
strncpy(new_node->text,str,LENGTH+1);
//strcpy(new_node->text,str);
new_node->next_word = NULL;
index = hash( (unsigned char*)new_node->text);
Hey! Wait a second... in this second loop we keep overwriting new_node repeatedly...
if(hashtable[index] == NULL)
{
hashtable[index] = new_node;
}
else
{
new_node->next_word = hashtable[index];
hashtable[index] = new_node;
}
Assume for a second that both words hash to the same bucket:
OK, so the first time through the loop, hashtable[index] will point to NULL and be set to point to new_node.
The second time through the loop, hashtable[index] isn't NULL so new_node will be made to point to whatever hashtable[index] points to (hint: new_node) and hashtable[index] will be made to point to new_node).
Do you know what an ouroboros is?
Now assume they don't hash to the same bucket:
One of the buckets now contains the wrong information. If you add "hello" in bucket 1 first and "goodbye" in bucket 2 first, when you try to traverse bucket 1 you may (only because the linking code is broken) find "goodbye" which doesn't belong in bucket 1 at all.
You should allocate a new node for every word you are adding. Don't reuse the same node.

Related

How do I reset the pointer to the head node when adding to nodes?

I need to start with the head node every cycle to add the new node in the right place. I think my current code makes the pointer for head and sptr equal so when I move one, the other one moves too. How do I move the pointer sptr to the beginning?
In debugger head->letter[1] turns true when I save an "a" as a word as it should, but later turns back to false as soon as sptr = head; runs. I think it has to do with the pointers.
typedef struct node
{
bool exist;
struct node* letter[28];
} trie;
trie *head = NULL;
int words = 0;
// Loads dictionary into memory, returning true if successful else false
bool load(const char *dictionary)
{
int i = 0;
FILE *infile = fopen(dictionary, "r");
if (infile == NULL)
{
printf("Could not open %s.\n", dictionary);
return 1;
}
// allocate memory
head = calloc(sizeof(trie), 1);
head->exist = false;
trie *sptr = head;
int cr;
// loop through file one character at a time
while ((cr = fgetc(infile)) != EOF)
{
// build a trie
// check if it's end of line
if (cr != 10)
{
i = tolower(cr) - 96;
// check for apostrophy
if (i < 0)
{
i = 0;
}
// check if the position exists
if (sptr->letter[i] == NULL)
{
sptr->letter[i] = malloc(sizeof(trie));
sptr->exist = false; // not the end of the word
}
sptr = sptr->letter[i];
}
else // indicate the end of a word that exists
{
sptr->exist = true;
sptr = head;// I think the problem might be here, I'm trying to move the pointer to the beginning.
words++;
}
}
return true;
}
Found the problem. It was in line sptr->exist = false, it should've read sptr->letter[i]->exist = false. The pointer was moving fine but I was changing the value of where the current pointer was, not the newly created node.

Load function trie segmentation fault

I keep getting segfault for my load function.
bool load(const char *dictionary)
{
//create a trie data type
typedef struct node
{
bool is_word;
struct node *children[27]; //this is a pointer too!
}node;
//create a pointer to the root of the trie and never move this (use traversal *)
node *root = malloc(sizeof(node));
for(int i=0; i<27; i++)
{
//NULL point all indexes of root -> children
root -> children[i] = NULL;
}
FILE *dptr = fopen(dictionary, "r");
if(dptr == NULL)
{
printf("Could not open dictionary\n");
return false;
}
char *c = NULL;
//scan the file char by char until end and store it in c
while(fscanf(dptr,"%s",c) != EOF)
{
//in the beginning of every word, make a traversal pointer copy of root so we can always refer back to root
node *trav = root;
//repeat for every word
while ((*c) != '\0')
{
//convert char into array index
int alpha = (tolower(*c) - 97);
//if array element is pointing to NULL, i.e. it hasn't been open yet,
if(trav -> children[alpha] == NULL)
{
//then create a new node and point it with the previous pointer.
node *next_node = malloc(sizeof(node));
trav -> children[alpha] = next_node;
//quit if malloc returns null
if(next_node == NULL)
{
printf("Could not open dictionary");
return false;
}
}
else if (trav -> children[alpha] != NULL)
{
//if an already existing path, just go to it
trav = trav -> children[alpha];
}
}
//a word is loaded.
trav -> is_word = true;
}
//success
free(root);
return true;
}
I checked whether I properly pointed new pointers to NULL during initialization. I have three types of nodes: root, traversal (for moving), and next_node. (i.) Am I allowed to null point the nodes before mallocing them? (ii.) Also, how do I free 'next_node' if that node is initialized and malloced inside an if statement? node *next_node = malloc(sizeof(node)); (iii.) If I want to set the nodes as global variables, which ones should be global? (iv.) Lastly, where do I set global variables: inside the main of speller.c, outside its main, or somewhere else? That's alot of questions, so you don't have to answer all of them, but it would be nice if you could answer the answered ones! Please point out any other peculiarities in my code. There should be plenty. I will accept most answers.
The cause of segmentation fault is the pointer "c" which you have not allocated memory.
Also, in your program -
//scan the file char by char until end and store it in c
while(fscanf(dptr,"%s",c) != EOF)
Once you allocate memory to pointer c, c will hold the word read from file dictionary.
Below in your code, you are checking for '\0' character-
while ((*c) != '\0')
{
But you are not moving the c pointer to point to next character in the string read because of which this code will end up executing infinite while loop.
May you can try something like this-
char *tmp;
tmp = c;
while ((*tmp) != '\0')
{
......
......
//Below in the loop at appropriate place
tmp++;
}

CS50 pset5 Load Function

I'm having some trouble with the load section of pset5 on CS50, it would be great if someone could help. I'm trying to load a trie that reads from a dictionary (file fp below) and then iterates through the letters to create the trie.
I understand the concept of building a trie but I think I'm missing something with how the struct pointers are set up (hopefully I'm not way off the track with the code below). I've tried to set up 'trap' to navigate through each stage of the try.
I'm currently getting a segmentation fault so not entirely sure where to go next. Any help would be massively appreciated.
/**
* Loads dictionary into memory. Returns true if successful else false.
*/
bool load(const char* dictionary)
{
//create word node and set root
typedef struct node {
bool is_word;
struct node* children[27];
} node;
node* root = calloc(1, sizeof(root));
root -> is_word = false;
node* trav = root;
//open small dictionary
FILE* fp = fopen(dictionary, "r");
if (fp == NULL)
{
printf("Could not open %s.\n", dictionary);
return false;
}
//read characters one by one and write them to the trie
for (int c = fgetc(fp); c != EOF; c = fgetc(fp))
{
//set index using to lower. Use a-1 to set ' to 0 and other letters 1-27
int index = tolower(c)-('a'-1);
//if new line (so end of word) set is_word to true and return trav to root)
if (index == '\n')
{
trav->is_word = true;
trav = root;
}
//if trav-> children is NULL then create a new node assign to next
//and move trav to that position
if (trav->children[index] == NULL)
{
node* next = calloc(1, sizeof(node));
trav->children[index] = next;
trav = next;
}
//else pointer must exist so move trav straight on
else {
trav = trav->children[index];
}
}
fclose(fp);
return false;
}
I'm assuming you set the size of array children[] to store 26 letters of the alphabet plus apostrophes. If so, when fgetc(fp) returns an apostrophe with an acsii code of 39 (I think), index will be set to -57, which is definitely not part of trav->children. That's probably where you're getting the segfault (or at least one of the places)
.
Hope this helps.

In what cases would fscanf overflow memory?

I have a linked list I implemented that works fine for various file inputs (it reads them line by line and inserts them.)
void insert_words(FILE* file, node_type* list)
{
char buffer[12];
int length = 0;
while (!feof(file)){
fscanf(file, "%s", buffer);//buffer contains the string/name
length = strlen(buffer);
if (length != 0) {
insert_sorted(list, buffer);
}
} //while
}
Given this code, I'm seeing issues which when fscanf is executed with the given sample input after reading in 'fff', things seem to go sour.
one
two
three ee
fff
ee
As ee is being parsed:
list = 0x009dfac0 {name=0x009dfac0 "one" next=0x00c98c08 {name=0x00c98c08 "three" next=0x00c98ba0 {name=0x00c98ba0 "two" ...} } }
After the next token:
list = 0x009dfac0 {name=0x009dfac0 "ee" next=0x009df068 {name=0x009df068 "È”ü\xf\x1" next=0x0ff7caa0 {msvcr110d.dll!_except_handler4(_EXCEPTION_RECORD *, _EXCEPTION_REGISTRATION_RECORD *, _CONTEXT *, void *)} {...} }
When examining my list, the 'next' pointer is corrupted immediately after fscanf is fired off. What are the potential causes?
Inserted sort upon request:
void insert_sorted(node_type* list, char* value)
{
// Copy our pointer so we can move around
node_type *n = (node_type*) malloc(sizeof *n);
node_type *loc = NULL;
node_type *tmp;
node_type dat;
node_type* prev = list;
node_type* head = list;
n->next = NULL;
strcpy(n->name, value);
// First element, assign immediately
if( strcmp(list->name, "") == 0 )
{
*list = *n;
return;
}
while(head != NULL)
{
// We should do a comparison to see if one is greater than another
int cmp_result = strcmp(value, head->name);
// If the value is bigger, this means the value needs to be inserted after
if(cmp_result > 0)
{
loc = head;
}
else if (cmp_result < 0) // this needs to be ahead
{
if(prev == head)
{
dat = *head;
*head = *n;
head->next = &dat;
return;
}
prev->next = n;
n->next = head;
return;
}
else if(cmp_result == 0)
{
free(n);
return; // duplicate, die
}
// Advance to the next pointer
prev = head;
head = head->next;
}
// You've reached the end, that must mean you've succesfully reached the point to insert
tmp = loc->next; // get the value we're going to end up detaching
n->next = tmp; // link the two together
loc->next = n;
}
Modify the loop into
while (fscanf(file, "%s", buffer) == 1) {
length = strlen(buffer);
// ...
}
Because feof(file) still returns 0 after the last successful fscanf(). It returns non-0 after the first failed fscanf().
Regarding insert_sorted(), look at the following lines:
head->next = &dat;
return;
Since dat is a local object, saving its address leads to invalid address once the function returns.
You're not testing for end of file correctly. In general, it's not correct to use feof, but instead to test the return value of the function with which you read from the file.
fscanf returns the number of entities that it was able to read. So in your case you would test that it returned 1, which would indicate a successful read. And to avoid buffer overflow, you can put a limit on the number of characters to read with a number between the % and the s.
And there's no reason to be so stingy with your buffer size.
So:
void insert_words(FILE* file, node_type* list)
{
char buffer[128];
while (fscanf(file, "%127s", buffer) == 1) {
insert_sorted(list, buffer);
}
}
BTW, you're not reading "line by line" but "space-delimited string by space-delimited string". To read the file line-by-line you could use fgets.
And before you say "that's not the problem", try it first. These are the only problems that could come from this function.

How to read file content into a struct line by line in C

Here, I am trying to read the contents of a file line by line and create a struct for each line. The problem is when I print the list of words, every single one of them is the last word of the file (which is } in this sample). I believe since line changes frequently and I pass a pointer to a char, value of every struct changes as well. I've been trying to fix this problem for nearly a day without any luck. What's a good way to read every word into a struct and link each struct to the linked list?
Note that there are some helper methods used below. I've tested them multiple times and they are working.
Token struct
typedef struct token
{
char* value;
struct token* next;
}TOKEN;
File content
target1:
dependency1
{
command1,
command2
}
Main
TOKEN *head = NULL;
// represents each formatted line from the script file
char* line = malloc(161*sizeof(char));
FILE* fileRead = openFile("RawRules.txt", "r");
while((line = readLine(line, fileRead)) != NULL)
{
head = add(head, line);
}
displaylist(head);
freeNodes(head);
fclose(fileRead);
Add function Modified from http://cprogramminglanguage.net/singly-linked-list-c-source-code.aspx
TOKEN* add(TOKEN *head, char* value){
TOKEN *tmp;
if(head == NULL){
head=(TOKEN *)malloc(sizeof(TOKEN));
if(head == NULL){
printf("Error! memory is not available\n");
exit(0);
}
head-> value = value;
head-> next = head;
}else{
tmp = head;
while (tmp-> next != head)
tmp = tmp-> next;
tmp-> next = (TOKEN *)malloc(sizeof(TOKEN));
if(tmp -> next == NULL)
{
printf("Error! memory is not available\n");
exit(0);
}
tmp = tmp-> next;
tmp-> value = value;
tmp-> next = head;
}
return head;
}
readline function
// reads a line of a file into buffer
char* readLine(char* buffer, FILE* file) {
buffer = fgets(buffer, 161, file);
return buffer;
}
This did not fix the problem either
while(true)
{
char* ll = malloc(161*sizeof(char));
ll = readLine(ll, fileRead);
f(ll != NULL)
head = add(head, ll);
else
break;
}
Sorry, I progammed in C like billion years ago, so call me a noob!
In the add() function, you're simply assigning a char *, rather than allocating any new memory (and then copying) for each string. So each TOKEN ends up pointing at the original buffer. As you're using a single buffer at the top-level, you're overwriting it over and over again.
In short: You need a separate buffer for each line. One way (not necessarily the best way) is to do the following inside add():
int len = strlen(value);
...
tmp->value = malloc(len+1); /* +1 for null terminator */
strncpy(tmp->value, value, len+1);
Remember that at some point, you'll need to free() all of these extra buffers.

Resources