My question is connected with task from CS50, pset5. For ones who don't know any about that, I'll try to explain. Nothing very special. I just need to make function which will intake dictionary file (it was written before, all of the words in that file are uppercase), which contains more over 20K words, and sort them somehow. I've made simple and naive algorithm, building hash-table, which sort words, depending on the theirs first letters. And I've passed all checks by the CS50, so my program is working well. But comparing to the course's one - it is too slow. Time of executing for personnel's is 0.1s, but for mine - 5.0s - 7.0s. What can I improve in this code to make it faster? Or should I totally change everything? I have no experience in optimization, `cause just started learning. It would be great to study from any of you =) Thanks in advance!
// Some constant values, which are declared before the function
#define LENGTH 46
#define ALPHALENGTH 26
/* Definition of node struct. Nothing special, in fact =) */
typedef struct node {
char word[LENGTH +1];
struct node *next;
} node;
node *hashTable[ALPHALENGTH];
bool load(const char *dictionary) {
FILE *f = fopen(dictionary, "r");
if (f == NULL) {
return false;
}
char word[LENGTH + 1];
int hash = 0;
for (int i = 0; i < ALPHALENGTH; i++) {
hashTable[i] = NULL;
}
// 46 - is LENGTH, but for some reason it is impossible
// to put variable`s name between quotation marks
while (fscanf(f, "%46s", word) == 1) {
// make every letter lowercase to get result from 0 - 25
hash = tolower(word[0]) - 'a';
node *new_node = malloc(sizeof(node));
strcpy(new_node->word, word);
// check if element is first in the list
if (hashTable[hash] == NULL) {
new_node->next = NULL;
hashTable[hash] = new_node;
} else {
node *ptr = hashTable[hash];
do {
if (ptr->next == NULL) {
break;
}
ptr = ptr->next;
} while (true);
ptr->next = new_node;
new_node->next = NULL;
}
}
fclose(f);
return true;
}
Your problem isn't your hash function; it's that your hash table is way too small.
From the sound of things, you have about 26 hash buckets for over 20,000 words. This places between 750 and 1000 words in each bucket. (Probably much more in some, as the hash function you're using is not uniform. There are very few words that start with x or q, for instance.)
Try expanding the hash table to 1000 entries (for instance), so that there are around 20 entries in each bucket. You will need a new hash function to do this; anything will work, but to work well it will need to generate values up to the size of the table. (Adding together the values of all the letters won't work, for instance, as it'll almost never reach 1000.)
The problem is not in your hash function, nor in the size of your hash table, it is in your list management: your method for appending words to the corresponding lists has a complexity of O(N2).
By the way, your hash function is not used for hashing, but for dispatching. You are sorting the table only on the first letter of each word, keeping the words with the same initial in the same order. If you meant to sort the dictionary completely, you would still need to sort each list.
You can drastically improve the performance while keeping the same semantics by prepending to the lists and reversing the lists at the end of the parsing phase.
For a dictionary with 20 thousand words, the code below runs 50 times faster (as expected by the CS50 site):
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LENGTH 46
#define ALPHALENGTH 26
typedef struct node {
struct node *next;
char word[LENGTH +1];
} node;
node *hashTable[ALPHALENGTH];
bool load(const char *dictionary) {
FILE *f = fopen(dictionary, "r");
if (f == NULL) {
return false;
}
char word[LENGTH + 1];
int hash = 0;
for (int i = 0; i < ALPHALENGTH; i++) {
hashTable[i] = NULL;
}
while (fscanf(f, "%46s", word) == 1) {
node *new_node = malloc(sizeof(node));
if (new_node == NULL)
return false;
// make every letter lowercase to get result from 0 - 25
hash = tolower(word[0]) - 'a';
strcpy(new_node->word, word);
/* prepending to the list */
new_node->next = hashTable[hash];
hashTable[hash] = new_node;
}
for (int i = 0; i < ALPHALENGTH; i++) {
node *n, *prev, *next;
/* reverse list */
for (prev = NULL, n = hashTable[i]; n != NULL; ) {
next = n->next;
n->next = prev;
prev = n;
n = next;
}
hashTable[i] = prev;
}
fclose(f);
return true;
}
void save(void) {
for (int i = 0; i < ALPHALENGTH; i++) {
for (node *n = hashTable[i]; n != NULL; n = n->next) {
puts(n->word);
}
}
}
int main(int argc, char *argv[]) {
if (argc > 1) {
if (load(argv[1]))
save();
}
}
Changing the fscanf() to a simpler fgets() might provide a marginal performance improvement, at the cost of more restrictive semantics for the dictionary format.
Related
the task is to create a fully functioning spell check, I think I have more than one error in my code which is causing lots of difficulty when trying to wrap my head round what exactly is causing all the frowns on check50. check50 only says my code compiles, everything else is a frown, I think my programme could be exiting during the check function as size and unload have no runtime. I also get exit code 1 once run, however I may be wayyy off as this is my first programming course. Any tips, help or pointers in the right direction would be massivly appriciated!
#include <ctype.h>
#include <stdbool.h>
#include <string.h>
#include "dictionary.h"
#include <stdlib.h>
#include <stdio.h>
#include <strings.h>
// Represents a node in a hash table
typedef struct node
{
char word[LENGTH + 1];
struct node *next;
} node;
// TODO: Choose number of buckets in hash table
const unsigned int N = 1000;
// Hash table
node *table[N];
bool check(const char *word)
{
// hash word
int x = hash(word);
// create cursor, set to first item in linked list
node *cursor = table[x];
// loop over hash tables
while (cursor != NULL)
{
if (strcasecmp(word, cursor->word) == 0)
{
return true;
}
cursor = cursor->next;
}
return false;
}
int dictionary_size = 0; // global variable for size of dictionary, used in multiple functions
unsigned int hash(const char *word)
{
int value = 0;
// hash function using math of all leters
// loop over every word
for (int i = 0; i < strlen(word); i++)
{
// convert to lower case for ascii values, removes case sensitive problem
value += tolower(word[i]); // sum of ascii values of word
}
return value % N; // return index for word
}
// Loads dictionary into memory, returning true if successful, else false
bool load(const char *dictionary)
{
// TODO
// array to store words from dictionary
char word[LENGTH + 1];
// open dictionary file
FILE *d = fopen(dictionary, "r");
if (d == NULL)
{
return false;
}
// read strings from file repeat for each word in dictionary , similar loop to recover.c
while (fscanf(d, "%s", word) != EOF)
{
// keep track of dictionary size
dictionary_size++;
// create new node
node *n = malloc(sizeof(node));
if (n == NULL)
{
return false;
}
// store word in array
strcpy(n->word, word);
n->next = NULL;
// hash word
int x = hash(word);
// set pointers to correct order
n->next = table[x];
table[x] = n;
}
fclose(d);
return true;
}
// Returns number of words in dictionary if loaded, else 0 if not yet loaded
unsigned int size(void)
{
return dictionary_size;
}
// Unloads dictionary from memory, returning true if successful, else false
bool unload(void)
{
// loop over hash tables
for (int j = 0; j < N; j++)
{
// initialise cursor for local scope
node *cursor = table[j];
// traverse linked list
while (cursor != NULL)
{
node *tmp = cursor;
cursor = cursor->next;
free(tmp);
return true;
}
}
return false;
}
From comments:
in regards to the frowns the only one i have now is: :( programme is free of memory errors. check50 was saying everything was wrong, could've been an error with it
The unload function frees one node and then returns (to the call from speller). The function should not return until it has freed all the nodes (ie finished the while loop).
I'm trying to create a program that reads a dictionary and then stores the words into the hash table, then read another file checks every word of that file if it is in the hash table if it is not then it will be outputted as a misspelled word. I'm first trying to check if I can load the dictionary file into my hash table and then output the words in the hash table yet my code seems to crash whenever I try to run it. The hash function I use was taken from the Internet. I'm also still very new with data structures, and having a hard time understanding.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// file to read
#define dictionary "dictionary.txt"
// No. of buckets
const unsigned int N = 10;
typedef struct node
{
char* word;
struct node *next;
}
node;
node *table[10];
// hash function
unsigned int hash(char *word)
{
// TODO
unsigned int hash = 5381;
int c = 0;
while (c == *word++)
hash = ((hash << 5) + hash) + c;
return hash % 10;
}
int main(void)
{
// initialize array heads to NULL
for (int i = 0; i < N; i++)
{
table[i] = NULL;
}
// Open file to read
FILE *indata = fopen(dictionary, "r");
if (indata == NULL)
{
printf("cant open\n");
return 1;
}
// variable to store words read from the file
char *words = malloc(sizeof(char) * 20);
if (words == NULL)
{
printf("no memory\n");
return 1;
}
// While loop to read through the file
while (fgets(words, 20, indata))
{
// get the index of the word using hash function
int index = hash(words);
// create new node
node *newNode = malloc(sizeof(node));
if (newNode == NULL)
{
printf("here\n");
return 1;
}
// make the new node the new head of the list
strcpy(newNode->word, words);
newNode->next = table[index];
table[index] = newNode;
// free memory
free(newNode);
}
// free memory
free(words);
// loop to print out the values of the hash table
for (int i = 0; i < N; i++)
{
node *tmp = table[i];
while (tmp->next != NULL)
{
printf("%s\n", tmp->word);
tmp = tmp->next;
}
}
// loop to free all memory of the hash table
for (int i = 0; i < N; i++)
{
if (table[i] != NULL)
{
node *tmp = table[i]->next;
free(table[i]);
table[i] = tmp;
}
}
// close the file
fclose(indata);
}
At least three bugs that independently caused a segfault:
First, newNode->word is used unitialized, so it points to random memory, so the strcpy would segfault. Better to use strdup
Also, after you put newNode in the table, you do free(newNode) making what it points to invalid. This causes the second loop to segfault
Third, in the second loop, if table[i] is null, the while (tmp->next != NULL) will segfault
I've annotated and corrected your code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// file to read
#define dictionary "dictionary.txt"
// No. of buckets
const unsigned int N = 10;
typedef struct node {
char *word;
struct node *next;
} node;
node *table[10];
// hash function
unsigned int
hash(char *word)
{
// TODO
unsigned int hash = 5381;
int c = 0;
while (c == *word++)
hash = ((hash << 5) + hash) + c;
// NOTE: not a bug but probably better
#if 0
return hash % 10;
#else
return hash % N;
#endif
}
int
main(void)
{
// initialize array heads to NULL
for (int i = 0; i < N; i++) {
table[i] = NULL;
}
// Open file to read
FILE *indata = fopen(dictionary, "r");
if (indata == NULL) {
printf("cant open\n");
return 1;
}
// variable to store words read from the file
char *words = malloc(sizeof(char) * 20);
if (words == NULL) {
printf("no memory\n");
return 1;
}
// While loop to read through the file
while (fgets(words, 20, indata)) {
// get the index of the word using hash function
int index = hash(words);
// create new node
node *newNode = malloc(sizeof(node));
if (newNode == NULL) {
printf("here\n");
return 1;
}
// make the new node the new head of the list
// NOTE/BUG: word is never set to anything valid -- possible segfault here
#if 0
strcpy(newNode->word, words);
#else
newNode->word = strdup(words);
#endif
newNode->next = table[index];
table[index] = newNode;
// free memory
// NOTE/BUG: this will cause the _next_ loop to segfault -- don't deallocate
// the node you just added to the table
#if 0
free(newNode);
#endif
}
// free memory
free(words);
// loop to print out the values of the hash table
for (int i = 0; i < N; i++) {
node *tmp = table[i];
// NOTE/BUG: this test fails if the tmp is originally NULL (i.e. no entries
// in the given hash index)
#if 0
while (tmp->next != NULL) {
#else
while (tmp != NULL) {
#endif
printf("%s\n", tmp->word);
tmp = tmp->next;
}
}
// loop to free all memory of the hash table
for (int i = 0; i < N; i++) {
if (table[i] != NULL) {
node *tmp = table[i]->next;
free(table[i]);
table[i] = tmp;
}
}
// close the file
fclose(indata);
}
UPDATE:
I made a linked list program before that stores an integer in the list, int number; struct node *next; and I used newNode->number = 5; and it worked, why is it in this case it doesn't?? Is it because I am working with strings here??
The difference is that word is a pointer. It must be assigned a value before it can be used. strcpy does not assign a value to word. It tries to use the contents of word as the destination address of the copy.
But, the other two bugs happen regardless of word being a char * vs number being int.
If you had defined word not as a pointer, but as a fixed array [not as good in this usage], the strcpy would have worked. That is, instead of char *word;, if you had done (e.g.) char word[5];
But, what you did is better [with the strdup change] unless you can guarantee that the length of word can hold the input. strdup will guarantee that.
But, notice that I [deliberately] made word have only five chars to illustrate the problem. It means that the word to be added can only be 4 characters in length [we need one extra byte for the nul terminator character]. You'd need to use strncpy instead of strcpy but strncpy has issues [it does not guarantee to add the nul char at the end if the source length is too large].
Conincidentally, there is another question today that has an answer that may help shed some more light on the differences of your word struct member: Difference between memory allocations of struct member (pointer vs. array) in C
From a cursory glance I can see two problems:
You don't allocate space for your word in the node; you simply strcopy the word into an undefined pointer. You might want to use strdup instead.
You free the memory of the node after you added it to the list. The table is an array of pointers, so you store the point in the table and then throw away the memory that it points to.
Oh, three: and in the final loop you free the unallocated memory again...
I am new to C programming. I am trying to do the pset5 in CS50 while trying to understand the concepts of memory, linked list and hashtable. I wrote the code and it compiled but there seems to be something wrong because every time I tried to execute the code it returns some garbage value. Could anyone please help me with that? Many thanks.
#include<stdio.h>
#include<stdlib.h>
#include<ctype.h>
#include<string.h>
#include "dictionary.h"
#define DICTIONARY "dictionaries/small"
typedef struct node
{
char WORD[LENGTH + 1];
struct node *next;
}
node;
int hash(char *word);
int main(void)
{
node **HASHTABLE = malloc(sizeof(node) * 26);
//open the dictionary
FILE *dic = fopen(DICTIONARY, "r");
if (dic == NULL)
{
fprintf(stderr, "Could not open the library\n");
return 1;
}
int index = 0;
char word[LENGTH + 1];
for (int c = fgetc(dic); c != EOF; c = fgetc(dic))
{
word[index] = c;
index++;
if (c == '\n')
{
int table = hash(word);
printf("%d\n", table);
//create a newnode
node *newnode = malloc(sizeof(node));
strcpy(newnode->WORD, word);
newnode->next = NULL;
printf("Node: %s\n", newnode->WORD);
index = 0;
//add new node to hash table
if (HASHTABLE[table] == NULL)
{
HASHTABLE[table] = newnode;
}
else
{
HASHTABLE[table]->next = newnode;
}
}
}
for(int i = 0; i < 26; i++)
{
node *p = HASHTABLE[i];
while (p != NULL)
{
printf("%s", p->WORD);
p = p->next;
}
}
//free memory
for(int i = 0; i < 26; i++)
{
node *p = HASHTABLE[i];
while (p != NULL)
{
node *temp = p->next;
free(p);
p = temp;
}
}
free(HASHTABLE);
}
int hash(char *word)
{
int i = 0;
if (islower(word[0]))
return i = word[0] - 'a';
if (isupper(word[0]))
return i = word[0] - 'A';
return 0;
}
Your code has serious problems that result in undefined behavior.
Two of them are the result of this line:
node **HASHTABLE = malloc(sizeof(node) * 26);
That allocates 26 node structures, but the HASHTABLE variable expects the address of a pointer to an array of node * pointers (that's the ** in the node **HASHTABLE declaration).
So, you should replace it with something like:
node **HASHTABLE = malloc( 26 * sizeof( *HASHTABLE ) );
Note that I used the dereferenced value of the variable being assigned to - HASHTABLE. This means in this case a node (one less * than in the declaration). So if the type of HASHTABLE changes, you don't need to make any other changes to the malloc() statement.
That problem, while technically undefined behavior, likely wouldn't cause any problems.
However, there's still a problem with
node **HASHTABLE = malloc( 26 * sizeof( *HASHTABLE ) );
that will cause problems - and serious ones.
That array of 26 pointers isn't initialized - you don't know what's in them. They can point anywhere. So this won't work well, if at all:
if (HASHTABLE[table] == NULL)
Meaning this points off to somewhere unknown:
HASHTABLE[table]->next = newnode;
And that will cause all kinds of problems.
The simplest fix? Initialize the values all to zero by using calloc() instead of malloc():
node **HASHTABLE = calloc( 26, sizeof( *HASHTABLE ) );
Until that's fixed, any results from your entire program are questionable, at best.
The reason for the garbage is that you didn't null-terminate the string:
strcpy(newnode->WORD, word);
strcpy expects the src to point to a null-terminated string. Simply adding 0 at the end. Simply terminate it with
word[index] = 0;
before the strcpy.
Other than that, the ones in Andrew Henle's answer should be addressed too, but I am not going to repeat them here.
BTW, next you will notice that
HASHTABLE[table]->next = newnode;
wouldn't work properly - that code always inserts the node as the 2nd one. But you want to always insert the new node unconditionally as the head, with
newnode->next = HASHTABLE[table];
HASHTABLE[table] = newnode;
There need not be any special condition for inserting the first node to a bucket.
Any guidance would be appreciated. I personally believe the problem lies in the load method. Also, the basic functionality of each method is written in the comments. What could be the cause of my segmentation fault? and Is everything working as intended? Thank you for your time.
Any resources that may point in me in the proper direction would be appreciated too.
/**
* Implements a dictionary's functionality.
*/
#include <stdbool.h>
#include "dictionary.h"
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#include <cs50.h>
//Defining node:
typedef struct node
{ //Inner workings of each "element" in the linked lists
char word[LENGTH + 1]; //the word within the node is +1'd due to the memory after the word containing /0
struct node *next; //linked list
}node;
node *alphabetList[27]; //26 buckets that can contain variables of type node(of dynamic size)
//one bucket for each letter of the alphabet
node *cursor = NULL;
node *head = NULL;
/**
* Returns true if word is in dictionary else false.
*/
bool check(const char *word)
{
int bucketIndex ;
//no need to malloc information b/c we are simply pointing to previously established nodes.
if(word[0] >= 65 && word[0] < 97){
bucketIndex = word[0] - 65;
}
else{
bucketIndex = word[0] - 97;
}
node *head = alphabetList[bucketIndex];
node *cursor = head;
while(cursor != NULL)
{
cursor = cursor -> next;
if(strcmp(cursor -> word, word) != 0)
{
return true;
}
}
return false;
}
/**
* Loads dictionary into memory. Returns true if successful else false.
*/
bool load(const char *dictionary)
{
char *word = NULL;
int i = 0; //index
FILE *dictionaryTextFile;
dictionaryTextFile = fopen(dictionary, "r");
//scan for word
while(fscanf(dictionaryTextFile, "%s", word) != EOF)
{
//for every word we scan we want to malloc a node to ascertain we have sufficent memory
node *new_node = malloc(sizeof(node));
if(new_node == NULL) //error check(if you run out of memory malloc will return null)
{
unload();
return false;
}
//error check complete.
else{
strcpy(new_node -> word, word);
}
//not sure from here on
char first_letter = new_node[i].word[0]; //first letter of node word (confused on how to execute this properly)
first_letter = tolower(first_letter);
int index = first_letter - 97;
if(word){
for(node *ptr = alphabetList[index]; ptr!= NULL; ptr = ptr->next)
{
if(!ptr-> next){
ptr->next = new_node;
}
}
}
else
{
alphabetList[index] = new_node;
}
i++;
}
return true;
}
/**
* Returns number of words in dictionary if loaded else 0 if not yet loaded.
*/
unsigned int size(void)
{
return 0;
}
/**
* Unloads dictionary from memory. Returns true if successful else false.
*/
bool unload(void)
{
for(int i = 0; i <= 26; i++)
{
node *head = alphabetList[i];
node *cursor = head;
while(cursor != NULL)
{
node *temp = cursor;
cursor = cursor -> next;
free(temp);
}
}
return true;
}
The problem is obvious now you've said on which line the code crashes. Consider these lines...
char *word = NULL;
int i = 0; //index
FILE *dictionaryTextFile;
dictionaryTextFile = fopen(dictionary, "r");
//scan for word
while(fscanf(dictionaryTextFile, "%s", word) != EOF)
You've got 2 problems there. Firstly, you don't check that the call to fopen worked. You should always check that the value returned is not NULL.
Secondly, and the cause of the crash, is that word is still NULL - you don't allocate any space to hold a string in it. You might as well declare it the same as you declare it inside node so replace
char *word = NULL;
with
char word[LENGTH+1];
Speaking of node and to save you coming back with another crash later, you should always make sure you initialise all attributes of a struct. In this case new_node->next should be set to NULL as otherwise you'll come to check it later in your for loop (which looks fine BTW) and it might appear to point to a node, but it's pointing at some random place in memory and the code will crash.
I'm having some issues finalizing my code for my programming course (I'm an absolute beginner in C). The aim is to read words from standard input (runfile < input.c), count their frequencies, and sort the list alphabetically (capitalized words first), example output:
Image Sample output
I found pieces of codes here on Stack, which I adapted, and so far it produces the output with words and their frequencies. However, I can't figure out how to get the list sorted as in the sample above. Our teacher suggests, that if a new word is found, it should be inserted sorted straight away into the linked list, he gave us following code sample (it is an excerpt from this program):
void addSorted(link *n, int x) {
if (*n == NULL || x < (*n)->data) {
*n = cons(x, *n);
} else {
addSorted(&((*n)->next), x);
}
}
As far as I understand it, 'link *n' should be the pointer to the next node, 'data' is holding integers in that case, and 'cons' should be a function within this code to construct a new node or link, not sure about 'int x', my guess it's the current integer for comparison.
As I said, I'm having trouble adapting this last bit into my code. I tried to adapt my addWord() function, but it doesn't work out for me.
Below you find the working code I have so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
//=============== STRUCTURE ==================
typedef struct word {
char *mywords; // list node with word pointer
int freq; // Frequency count
struct word *pNext; // Pointer to next node in linked list
} Word;
//======= INITIATION OF FUNCTIONS ===========
int readWord(char *temp_word, int temp_size); // Given function to get words
void addWord(char *pWord); // Adds a word to the list or updates exisiting word
void printmywords(Word *pListNodes); // Output list of words and frequencies
Word* construct(char *word); // Constructs list nodes
//============GLOBAL VARIABLES================
Word *pFirst = NULL; // Pointer to first node in linked list
//================ MAIN ======================
int main () {
char temp_word[32]; // temporary buffer to hold words
int size = 10000;
Word *pNode = NULL; // pointer to word counter
while (readWord(temp_word, size)) { // Read all words from standard input
addWord(temp_word); // Add word to list
}
// List the words and their counts
pNode = pFirst;
while(pNode != NULL)
{
printmywords(pNode);
pNode = pNode->pNext;
}
printf("\n");
// Free the allocated memory
pNode = pFirst;
while(pNode != NULL)
{
free(pNode->mywords);
pFirst = pNode;
pNode = pNode->pNext;
free(pFirst);
}
return 0;
}
//================ FUNCTIONS =================
void printmywords(Word *pListNodes)
{
printf("\n%-20s %5d", pListNodes->mywords,pListNodes->freq); // output word and frequency
}
void addWord(char *word)
{
Word *pNode = NULL;
Word *pLast = NULL;
if(pFirst == NULL)
{
pFirst = construct(word);
return;
}
// Update frequency, if word in list
pNode = pFirst;
while(pNode != NULL)
{
if(strcmp(word, pNode->mywords) == 0)
{
++pNode->freq;
return;
}
pLast = pNode;
pNode = pNode->pNext;
}
// Add new word, if not in list
pLast->pNext = construct(word);
}
Word* construct(char *word)
{
Word *pNode = NULL;
pNode = (Word*)malloc(sizeof(Word));
pNode->mywords = (char*)malloc(strlen(word)+1);
strcpy(pNode->mywords, word);
pNode->freq = 1;
pNode->pNext = NULL;
return pNode;
}
int readWord(char *temp_word, int temp_size) {
char *p = temp_word;
char c;
// skip all non-word characters
do {
c = getchar();
if (c == EOF)
return 0;
} while (!isalpha(c));
// read word chars
do {
if (p - temp_word < temp_size - 1)
*p++ = c;
c = getchar();
} while (isalpha(c));
// finalize word
*p = '\0';
return 1;
}
Any help is appreciated.
Okay, try these two functions:
Word *cons(char *word, Word *next) {
Word *result = construct(word);
if (result) {
result->pNext = next;
}
else {
printf("Out of memory in cons\n");
exit(1);
}
return result;
}
void addSorted(Word **nodeRef, char *word) {
Word *node = *nodeRef;
/* strcmp will do a binary comparison, which suits your purpose
because you want capitalized words before lower-case; the order
of the arguments is important - <0 means the first argument should
come before the second argument. */
if ((node == NULL) || (strcmp(word, node->mywords) < 0)) {
*nodeRef = cons(word, node);
}
else if (strcmp(word, node->mywords) == 0) {
++node->freq;
}
else {
/* there's not really any point to using recursion on a linked
list, except for the fact that it's really easy to use recursion
on a linked list. On a vary large list, iteration would most likely
be faster; however, professors really like to show how clever they
are, so you're better off using it anyway. */
addSorted(&node->pNext, word);
}
}
A couple of other points:
char temp_word[32]; // temporary buffer to hold words
int size = 10000;
You've got a 31 character buffer, but you're telling your readWord function that it is 10K characters?
Also, don't cast the return value from malloc().