C pointers, inserting elements to HEAD of linked list - c

I am working on a problem in the K&R book (#6.3) where the user inputs a sequence of words, and you have to create a list of these words along with the lines that each one appears on. It's supposed to involve structures so these are the ones I have right now:
struct entry {
int line;
int count;
struct entry *next;
};
struct word {
char *str;
struct entry *lines;
struct word *next;
};
static struct word *wordlist = NULL; // GLOBAL WORDLIST
However when I input something and the program tries to add a new entry to the structure (which is somewhat like a linked list), there is a problem and the program terminates with no error message. Code for that:
void add_entry(char *word, int line)
{
if (word == NULL || line <= 0 || is_blocked_word(word))
return;
struct word *w;
for (w = wordlist; w != NULL && w->next != NULL && !strcmp(w->str, word); w = w->next);
// If word is found in the wordlist, then update the entry
if (w != NULL) {
struct entry *v;
for (v = w->lines; v != NULL && v->next != NULL && v->line != line; v = v->next);
if (v == NULL) {
struct entry *new = (struct entry*) malloc(sizeof(struct entry));
new->line = line;
new->count = 1;
new->next = NULL;
if (w->lines == NULL)
w->lines = new;
else
v->next = new;
}
else v->count++;
}
// If word is not found in the word list, then create a new entry for it
else {
struct word *new = (struct word*) malloc(sizeof(struct word));
new->lines = (struct entry*) malloc(sizeof(struct entry));
new->next = NULL;
new->str = (char*) malloc(sizeof(char) * strlen(word));
new->lines->line = line;
new->lines->count = 1;
new->lines->next = NULL;
strcpy(new->str, word);
// If the word list is empty, then populate head first before populating the "next" entry
if (wordlist == NULL)
wordlist = new;
else
w->next = new;
}
}
The program terminates even after adding just the first word to wordlist. This is on the line that says if (wordlist == NULL) wordlist = new; where new contains the pointer to a valid structure that I malloc'ed. How can this be possible?
As far as I know it's a problem with my pointer usage but I'm not sure where exactly it lies. Can someone help?

Some fairly evident, and some not-so-evident things.
The for-loop limit for w stops one short
for (w = wordlist; w != NULL && w->next != NULL && !strcmp(w->str, word); w = w->next);
This will start with the first and continue until
We have run out of nodes
We have almost (one short) run out of nodes.
The word in the current node does NOT match
Almost the same problem, different for-loop
for (v = w->lines; v != NULL && v->next != NULL && v->line != line; v = v->next);
As above, this has similar attributes (but not the third option, as this correctly continues so long as the line numbers do not match. The prior loop broke as soon as any word did not match.
And that is in the first ten lines of this function.
String allocation size fails to account for the nulchar terminator
This falls short by one char of the allocation size needed for a zero-terminated string:
malloc(sizeof(char) * strlen(word))
You always need space for the terminator. The easiest way to remember that is to consider how many chars are needed for a zero-length C string? Answer: one, because the terminator needs to go somewhere. After that is simply length+1
One possible way to do this is via a pointer-to-pointer approach, shown below:
void add_entry(const char *word, int line)
{
if (word == NULL || line <= 0 || is_blocked_word(word))
return;
struct word **pp = &wordlist;
for (; *pp && strcmp((*pp)->str, word); pp = &(*pp)->next);
if (*pp)
{
// search for matching line number
struct entry **vv = &(*pp)->lines;
for (; *vv && (*vv)->line != line; vv = &(*vv)->next);
if (!*vv)
{
*vv = malloc(sizeof(**vv));
if (!*vv)
{
perror("Failed to allocate line entry.");
exit(EXIT_FAILURE);
}
(*vv)->count = 1;
(*vv)->line = line;
(*vv)->next = NULL;
}
else
{ // found an entry. increment count.
(*vv)->count++;
}
}
else
{ // no matching word. create a new word with a new line entry
size_t len = strlen(word);
*pp = malloc(sizeof(**pp));
if (!*pp)
{
perror("Failed to allocate word entry.");
exit(EXIT_FAILURE);
}
(*pp)->lines = malloc(sizeof(*(*pp)->lines));
if (!(*pp)->lines)
{
perror("Failed to allocate line count entry.");
exit(EXIT_FAILURE);
}
(*pp)->str = malloc(len + 1);
if (!(*pp)->str)
{
perror("Failed to allocate word string entry.");
exit(EXIT_FAILURE);
}
(*pp)->lines->count = 1;
(*pp)->lines->line = line;
(*pp)->lines->next = NULL;
(*pp)->next = NULL;
memcpy((*pp)->str, word, len+1);
}
}
How It Works
In both cases, we use a pointer-to-pointer. They are a most-hand construct when the desire is to perform tail-end insertion on a linked list without having to keep a "one-back" or "previous" pointer. Just like any pointer, they hold an address. Unlike a regular pointer-to-something, a pointer-to-pointer-to-something holds the address of another pointer. With it we can "loop" by initially setting it to the address of the head pointer, the entering the search.
struct word **pp = &wordlist;
for (; *pp && strcmp((*pp)->str, word); pp = &(*pp)->next);
Here we start with the address of our head pointer. the loop will terminate if the pointer at the address held in pp is NULL, or if the word actually matches. Otherwise it sets the address of (not the address in) the next pointer of the current node.If we run out of words and never find a match the loop will break, but with a most-handy consequence: pp contains the address of the pointer that we need to set to the new allocation. If the list were initially empty, it contains the address of the head pointer.
With that, we can then do this:
if (*pp)
{
// search for matching line number
struct entry **vv = &(*pp)->lines;
for (; *vv && (*vv)->line != line; vv = &(*vv)->next);
Notice we use the same idea on the line-entry list. Either we're going to find an entry, or the loop will exit with *vv being NULL, and vv contains the address of the next pointer we want to set to our new allocation.
I strongly urge you to step through this code in a debugger line-by-line, and understand how it works. utilizing this technique has many redeeming qualities, among them the incredibly brief method of populating a forward-linked list in O(n) complexity without having to check for a head pointer or walking the list for each insertion and retaining the original order (as opposed to reversing the order as a stack-like solution would result):
struct node *head = NULL;
struct node **pp = &head;
while (get-data-for-our-list)
{
*pp = malloc(sizeof(**pp));
// TODO: populate (*pp)->members here
pp = &(*pp)->next;
}
*pp = NULL;

Related

CS50 - LOAD - Get random character from no where when trying to execute load

I am new to C programming. I am trying to do the pset5 in CS50 while trying to understand the concepts of memory, linked list and hashtable. I wrote the code and it compiled but there seems to be something wrong because every time I tried to execute the code it returns some garbage value. Could anyone please help me with that? Many thanks.
#include<stdio.h>
#include<stdlib.h>
#include<ctype.h>
#include<string.h>
#include "dictionary.h"
#define DICTIONARY "dictionaries/small"
typedef struct node
{
char WORD[LENGTH + 1];
struct node *next;
}
node;
int hash(char *word);
int main(void)
{
node **HASHTABLE = malloc(sizeof(node) * 26);
//open the dictionary
FILE *dic = fopen(DICTIONARY, "r");
if (dic == NULL)
{
fprintf(stderr, "Could not open the library\n");
return 1;
}
int index = 0;
char word[LENGTH + 1];
for (int c = fgetc(dic); c != EOF; c = fgetc(dic))
{
word[index] = c;
index++;
if (c == '\n')
{
int table = hash(word);
printf("%d\n", table);
//create a newnode
node *newnode = malloc(sizeof(node));
strcpy(newnode->WORD, word);
newnode->next = NULL;
printf("Node: %s\n", newnode->WORD);
index = 0;
//add new node to hash table
if (HASHTABLE[table] == NULL)
{
HASHTABLE[table] = newnode;
}
else
{
HASHTABLE[table]->next = newnode;
}
}
}
for(int i = 0; i < 26; i++)
{
node *p = HASHTABLE[i];
while (p != NULL)
{
printf("%s", p->WORD);
p = p->next;
}
}
//free memory
for(int i = 0; i < 26; i++)
{
node *p = HASHTABLE[i];
while (p != NULL)
{
node *temp = p->next;
free(p);
p = temp;
}
}
free(HASHTABLE);
}
int hash(char *word)
{
int i = 0;
if (islower(word[0]))
return i = word[0] - 'a';
if (isupper(word[0]))
return i = word[0] - 'A';
return 0;
}
Your code has serious problems that result in undefined behavior.
Two of them are the result of this line:
node **HASHTABLE = malloc(sizeof(node) * 26);
That allocates 26 node structures, but the HASHTABLE variable expects the address of a pointer to an array of node * pointers (that's the ** in the node **HASHTABLE declaration).
So, you should replace it with something like:
node **HASHTABLE = malloc( 26 * sizeof( *HASHTABLE ) );
Note that I used the dereferenced value of the variable being assigned to - HASHTABLE. This means in this case a node (one less * than in the declaration). So if the type of HASHTABLE changes, you don't need to make any other changes to the malloc() statement.
That problem, while technically undefined behavior, likely wouldn't cause any problems.
However, there's still a problem with
node **HASHTABLE = malloc( 26 * sizeof( *HASHTABLE ) );
that will cause problems - and serious ones.
That array of 26 pointers isn't initialized - you don't know what's in them. They can point anywhere. So this won't work well, if at all:
if (HASHTABLE[table] == NULL)
Meaning this points off to somewhere unknown:
HASHTABLE[table]->next = newnode;
And that will cause all kinds of problems.
The simplest fix? Initialize the values all to zero by using calloc() instead of malloc():
node **HASHTABLE = calloc( 26, sizeof( *HASHTABLE ) );
Until that's fixed, any results from your entire program are questionable, at best.
The reason for the garbage is that you didn't null-terminate the string:
strcpy(newnode->WORD, word);
strcpy expects the src to point to a null-terminated string. Simply adding 0 at the end. Simply terminate it with
word[index] = 0;
before the strcpy.
Other than that, the ones in Andrew Henle's answer should be addressed too, but I am not going to repeat them here.
BTW, next you will notice that
HASHTABLE[table]->next = newnode;
wouldn't work properly - that code always inserts the node as the 2nd one. But you want to always insert the new node unconditionally as the head, with
newnode->next = HASHTABLE[table];
HASHTABLE[table] = newnode;
There need not be any special condition for inserting the first node to a bucket.

A pointer points to a NULL pointer

code from cs50 harvard course dealing with linked list:
---The problem I do not understand is that when node *ptr points to numbers, which is a null pointer, how can the for loop: (node *ptr = numbers; ptr != NULL) run at all since *numbers = NULL?---
full version of the codes can be found at: https://cdn.cs50.net/2017/fall/lectures/5/src5/list2.c
#include <cs50.h>
#include <stdio.h>
typedef struct node
{
int number;
struct node *next;
}
node;
int main(void)
{
// Memory for numbers
node *numbers = NULL;
// Prompt for numbers (until EOF)
while (true)
{
// Prompt for number
int number = get_int("number: ");
// Check for EOF
if (number == INT_MAX)
{
break;
}
// Check whether number is already in list
bool found = false;
for (node *ptr = numbers; ptr != NULL; ptr = ptr->next)
{
if (ptr->number == number)
{
found = true;
break;
}
}
The loop is to check for prior existence in the list actively being built. If not there (found was never set true), the remaining inconveniently omitted code adds it to the list.
On initial run, the numbers linked list head pointer is null, signifying an empty list. That doesn't change the algorithm of search + if-not-found-insert whatsoever. It just means the loop is never entered because the bail-case is immediately true. in other words, with numbers being NULL
for (node *ptr = numbers; ptr != NULL; ptr = ptr->next)
the condition to continue, ptr != NULL is already false, so the body of the for-loop is simply skipped. That leads to the remainder of the code you didn't post, which does the actual insertion. After that insertion, the list now has something, and the next iteration of the outer-while loop will eventually scan the list again after the next prospect value is read. This continues until the outer-while condition is no longer satisfied.
A Different Approach
I have never been fond of the cs50 development strategy, and Harvard's technique for teaching C to entry-level CS students. The cs50 header and lib has caused more transitional confusion to real-world software engineering than one can fathom. Below is an alternative for reading a linked list of values, keeping only unique entries. It may look like a lot, but half of this is inline comments describing what is going on. Some of it will seem trivial, but the search-and-insert methodology is what you should be focusing on. It uses a strategy of pointer-to-pointer that you're likely not familiar with, and this is a good exposure.
Enjoy.
#include <stdio.h>
#include <stdlib.h>
struct node
{
int value;
struct node *next;
};
int main()
{
struct node *numbers = NULL;
int value = 0;
// retrieve list input. stop when we hit
// - anything that doesn't parse as an integer
// - a value less than zero
// - EOF
while (scanf("%d", &value) == 1 && value >= 0)
{
// finds the address-of (not the address-in) the first
// pointer whose node has a value matching ours, or the
// last pointer in the list (which points to NULL).
//
// note the "last" pointer will be the head pointer if
// the list is empty.
struct node **pp = &numbers;
while (*pp && (*pp)->value != value)
pp = &(*pp)->next;
// if we didn't find our value, `pp` holds the address of
// the last pointer in the list. Again, not a pointer to the
// last "node" in the list; rather the last actual "pointer"
// in the list. Think of it as the "next" member of last node,
// and in the case of an empty list, it will be the address of
// the head pointer. *That* is where we will be hanging our
// new node, and since we already know where it goes, there is
// no need to rescan the list again.
if (!*pp)
{
*pp = malloc(sizeof **pp);
if (!*pp)
{
perror("Failed to allocate new node");
exit(EXIT_FAILURE);
}
(*pp)->value = value;
(*pp)->next = NULL;
}
}
// display entire list, single line
for (struct node const *p = numbers; p; p = p->next)
printf("%d ", p->value);
fputc('\n', stdout);
// free the list
while (numbers)
{
struct node *tmp = numbers;
numbers = numbers->next;
free(tmp);
}
return EXIT_SUCCESS;
}
This approach is especially handy when building sorted lists, as it can be altered with just a few changes to do so.
If you examine rest of the code which is also within the while loop, you can see alteration of numbers on the shared link.
if (!found)
{
// Allocate space for number
node *n = malloc(sizeof(node));
if (!n)
{
return 1;
}
// Add number to list
n->number = number;
n->next = NULL;
if (numbers)
{
for (node *ptr = numbers; ptr != NULL; ptr = ptr->next)
{
if (!ptr->next)
{
ptr->next = n;
break;
}
}
}
else
{
numbers = n;
}
}
Besides, it doesn't hit body of the for loop at first, so your thinking is correct.

Load function trie segmentation fault

I keep getting segfault for my load function.
bool load(const char *dictionary)
{
//create a trie data type
typedef struct node
{
bool is_word;
struct node *children[27]; //this is a pointer too!
}node;
//create a pointer to the root of the trie and never move this (use traversal *)
node *root = malloc(sizeof(node));
for(int i=0; i<27; i++)
{
//NULL point all indexes of root -> children
root -> children[i] = NULL;
}
FILE *dptr = fopen(dictionary, "r");
if(dptr == NULL)
{
printf("Could not open dictionary\n");
return false;
}
char *c = NULL;
//scan the file char by char until end and store it in c
while(fscanf(dptr,"%s",c) != EOF)
{
//in the beginning of every word, make a traversal pointer copy of root so we can always refer back to root
node *trav = root;
//repeat for every word
while ((*c) != '\0')
{
//convert char into array index
int alpha = (tolower(*c) - 97);
//if array element is pointing to NULL, i.e. it hasn't been open yet,
if(trav -> children[alpha] == NULL)
{
//then create a new node and point it with the previous pointer.
node *next_node = malloc(sizeof(node));
trav -> children[alpha] = next_node;
//quit if malloc returns null
if(next_node == NULL)
{
printf("Could not open dictionary");
return false;
}
}
else if (trav -> children[alpha] != NULL)
{
//if an already existing path, just go to it
trav = trav -> children[alpha];
}
}
//a word is loaded.
trav -> is_word = true;
}
//success
free(root);
return true;
}
I checked whether I properly pointed new pointers to NULL during initialization. I have three types of nodes: root, traversal (for moving), and next_node. (i.) Am I allowed to null point the nodes before mallocing them? (ii.) Also, how do I free 'next_node' if that node is initialized and malloced inside an if statement? node *next_node = malloc(sizeof(node)); (iii.) If I want to set the nodes as global variables, which ones should be global? (iv.) Lastly, where do I set global variables: inside the main of speller.c, outside its main, or somewhere else? That's alot of questions, so you don't have to answer all of them, but it would be nice if you could answer the answered ones! Please point out any other peculiarities in my code. There should be plenty. I will accept most answers.
The cause of segmentation fault is the pointer "c" which you have not allocated memory.
Also, in your program -
//scan the file char by char until end and store it in c
while(fscanf(dptr,"%s",c) != EOF)
Once you allocate memory to pointer c, c will hold the word read from file dictionary.
Below in your code, you are checking for '\0' character-
while ((*c) != '\0')
{
But you are not moving the c pointer to point to next character in the string read because of which this code will end up executing infinite while loop.
May you can try something like this-
char *tmp;
tmp = c;
while ((*tmp) != '\0')
{
......
......
//Below in the loop at appropriate place
tmp++;
}

Array of strings linked list - Segmentation fault

I have a function that takes an array of strings. It separates all those strings by the presence of a particular character, in this case '|'. See my previous question for a better idea Split an array of strings based on character
So, I have an array of strings that looks like this:
char ** args = {"ls", "-l", "|", "cd", "."}
My parseCmnds function is supposed to go through each string in the array and create a new array of strings with all the strings before the '|' character. Then it creates a linked list where each node points to each of the array of strings I created, essentially separating the original array of strings into separate arrays of strings linked to each other.
So, my parse loop should create something like this for example:
On the first iteration:
char ** command = {"ls", "-l", NULL}
On the second iteration
char ** command = {"cd", ".", NULL}
After each iteration my function creates a new linked list node and populates it. I built code based on some of the answers I got on my previous question (thanks a million). But for some reason I'm getting a segmentation fault that I can't figure out. Can someone check out my code and let me know what I'm doing wrong?
typedef struct node {
char ** cmnd;
struct node * next;
} node_cmnds;
node_cmnds * parseCmnds(char **args) {
int i;
int j=0;
int numArgs = 0;
node_cmnds * head = NULL; //head of the linked list
head = malloc(sizeof(node_cmnds));
if (head == NULL) { //allocation failed
return NULL;
}
else {
head->next = NULL;
}
node_cmnds * currNode = head; //point current node to head
for(i = 0; args[i] != NULL; i++) { //loop that traverses through arguments
char ** command = (char**)malloc(maxArgs * sizeof(char*)); //allocate an array of strings for the command
if(command == NULL) { //allocation failed
return NULL;
}
while(strcmp(args[i],"|") != 0) { //loop through arguments until a | is found
command[i] = (char*)malloc(sizeof(args[i])); //allocate a string to copy argument
if(command[i] == NULL) { //allocation failed
return NULL;
}
else {
strcpy(command[i],args[i]); //add argument to our array of strings
i++;
numArgs++;
}
}
command[i] = NULL; //once we find | we set the array element to NULL to specify the end
while(command[j] != NULL) {
strcpy(currNode->cmnd[j], command[j]);
j++;
}
currNode->next = malloc(sizeof(node_cmnds));
if(currNode->next == NULL) {
return NULL;
}
currNode = currNode->next; //
numArgs = 0;
}
return head;
}
You're never allocating any memory for the cmnd member of node_cmds. So the line strcpy(currNode->cmnd[j], command[j]); is writing to...somewhere. Likely to memory you don't own. And when you do add those mallocs, your indexing (using j) is going to be very incorrect on the second pass through the outside for loop.
Also, you're leaking memory like a sieve. Try throwing some frees in there.
while(command[j] != NULL) {
strcpy(currNode->cmnd[j], command[j]);
j++;
}
At this statement you haven't allocated memory for the cmnd pointer(string). I believe this may be causing part of your problem. You have allocated memory for the struct, but you need to allocate memory for each pointer in the struct as well.

In what cases would fscanf overflow memory?

I have a linked list I implemented that works fine for various file inputs (it reads them line by line and inserts them.)
void insert_words(FILE* file, node_type* list)
{
char buffer[12];
int length = 0;
while (!feof(file)){
fscanf(file, "%s", buffer);//buffer contains the string/name
length = strlen(buffer);
if (length != 0) {
insert_sorted(list, buffer);
}
} //while
}
Given this code, I'm seeing issues which when fscanf is executed with the given sample input after reading in 'fff', things seem to go sour.
one
two
three ee
fff
ee
As ee is being parsed:
list = 0x009dfac0 {name=0x009dfac0 "one" next=0x00c98c08 {name=0x00c98c08 "three" next=0x00c98ba0 {name=0x00c98ba0 "two" ...} } }
After the next token:
list = 0x009dfac0 {name=0x009dfac0 "ee" next=0x009df068 {name=0x009df068 "È”ü\xf\x1" next=0x0ff7caa0 {msvcr110d.dll!_except_handler4(_EXCEPTION_RECORD *, _EXCEPTION_REGISTRATION_RECORD *, _CONTEXT *, void *)} {...} }
When examining my list, the 'next' pointer is corrupted immediately after fscanf is fired off. What are the potential causes?
Inserted sort upon request:
void insert_sorted(node_type* list, char* value)
{
// Copy our pointer so we can move around
node_type *n = (node_type*) malloc(sizeof *n);
node_type *loc = NULL;
node_type *tmp;
node_type dat;
node_type* prev = list;
node_type* head = list;
n->next = NULL;
strcpy(n->name, value);
// First element, assign immediately
if( strcmp(list->name, "") == 0 )
{
*list = *n;
return;
}
while(head != NULL)
{
// We should do a comparison to see if one is greater than another
int cmp_result = strcmp(value, head->name);
// If the value is bigger, this means the value needs to be inserted after
if(cmp_result > 0)
{
loc = head;
}
else if (cmp_result < 0) // this needs to be ahead
{
if(prev == head)
{
dat = *head;
*head = *n;
head->next = &dat;
return;
}
prev->next = n;
n->next = head;
return;
}
else if(cmp_result == 0)
{
free(n);
return; // duplicate, die
}
// Advance to the next pointer
prev = head;
head = head->next;
}
// You've reached the end, that must mean you've succesfully reached the point to insert
tmp = loc->next; // get the value we're going to end up detaching
n->next = tmp; // link the two together
loc->next = n;
}
Modify the loop into
while (fscanf(file, "%s", buffer) == 1) {
length = strlen(buffer);
// ...
}
Because feof(file) still returns 0 after the last successful fscanf(). It returns non-0 after the first failed fscanf().
Regarding insert_sorted(), look at the following lines:
head->next = &dat;
return;
Since dat is a local object, saving its address leads to invalid address once the function returns.
You're not testing for end of file correctly. In general, it's not correct to use feof, but instead to test the return value of the function with which you read from the file.
fscanf returns the number of entities that it was able to read. So in your case you would test that it returned 1, which would indicate a successful read. And to avoid buffer overflow, you can put a limit on the number of characters to read with a number between the % and the s.
And there's no reason to be so stingy with your buffer size.
So:
void insert_words(FILE* file, node_type* list)
{
char buffer[128];
while (fscanf(file, "%127s", buffer) == 1) {
insert_sorted(list, buffer);
}
}
BTW, you're not reading "line by line" but "space-delimited string by space-delimited string". To read the file line-by-line you could use fgets.
And before you say "that's not the problem", try it first. These are the only problems that could come from this function.

Resources