Count the number of occurrences of each word

Count the number of occurrences of each word - c

I'm trying to count the number of occurrences of each word in the function countWords I believe i started the for loop in the function properly but how do I compare the words in the arrays together and count them and then delete the duplicates? Isn't it like a fibonacci series or am I mistaken? Also int n has the value of 756 because thats how many words are in the array and wordsArray are the elements in the array.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
int *countWords( char **words, int n);
int main(int argc, char *argv[])
{
char buffer[100]; //Maximum word size is 100 letters
FILE *textFile;
int numWords=0;
int nextWord;
int i, j, len, lastChar;
char *wordPtr;
char **wordArray;
int *countArray;
int *alphaCountArray;
char **alphaWordArray;
int *freqCountArray;
char **freqWordArray;
int choice=0;
//Check to see if command line argument (file name)
//was properly supplied. If not, terminate program
if(argc == 1)
{
printf ("Must supply a file name as command line argument\n");
return (0);
}
//Open the input file. Terminate program if open fails
textFile=fopen(argv[1], "r");
if(textFile == NULL)
{
printf("Error opening file. Program terminated.\n");
return (0);
}
//Read file to count the number of words
fscanf(textFile, "%s", buffer);
while(!feof(textFile))
{
numWords++;
fscanf(textFile, "%s", buffer);
}
printf("The total number of words is: %d\n", numWords);
//Create array to hold pointers to words
wordArray = (char **) malloc(numWords*sizeof(char *));
if (wordArray == NULL)
{
printf("malloc of word Array failed. Terminating program.\n");
return (0);
}
//Rewind file pointer and read file again to create
//wordArray
rewind(textFile);
for(nextWord=0; nextWord < numWords; nextWord++)
{
//read next word from file into buffer.
fscanf(textFile, "%s", buffer);
//Remove any punctuation at beginning of word
i=0;
while(!isalpha(buffer[i]))
{
i++;
}
if(i>0)
{
len = strlen(buffer);
for(j=i; j<=len; j++)
{
buffer[j-i] = buffer[j];
}
}
//Remove any punctuation at end of word
len = strlen(buffer);
lastChar = len -1;
while(!isalpha(buffer[lastChar]))
{
lastChar--;
}
buffer[lastChar+1] = '\0';
//make sure all characters are lower case
for(i=0; i < strlen(buffer); i++)
{
buffer[i] = tolower(buffer[i]);
}
//Now add the word to the wordArray.
//Need to malloc an array of chars to hold the word.
//Then copy the word from buffer into this array.
//Place pointer to array holding the word into next
//position of wordArray
wordPtr = (char *) malloc((strlen(buffer)+1)*sizeof(char));
if(wordPtr == NULL)
{
printf("malloc failure. Terminating program\n");
return (0);
}
strcpy(wordPtr, buffer);
wordArray[nextWord] = wordPtr;
}
//Call countWords() to create countArray and replace
//duplicate words in wordArray with NULL
countArray = countWords(wordArray, numWords);
if(countArray == NULL)
{
printf("countWords() function returned NULL; Terminating program\n");
return (0);
}
//Now call compress to remove NULL entries from wordArray
compress(&wordArray, &countArray, &numWords);
if(wordArray == NULL)
{
printf("compress() function failed; Terminating program.\n");
return(0);
}
printf("Number of words in wordArray after eliminating duplicates and compressing is: %d\n", numWords);
//Create copy of compressed countArray and wordArray and then sort them alphabetically
alphaCountArray = copyCountArray(countArray, numWords);
freqCountArray = copyCountArray(alphaCountArray, numWords);
int *countWords( char **wordArray, int n)
{
return NULL;
int i=0;
int n=0;
for(i=0;i<n;i++)
{
for(n=0;n<wordArray[i];n++)
{
}
}
}

Assuming you want the return value of countWords to be an array of integers with word counts for each unique word, you need to have a double loop. One loop goes over the whole array, the second loop goes through the rest of the array (after the current word), looking for duplicates.
You could do something like this pseudo code:
Allocate the return array countArray (n integers)
Loop over all words (as you currently do in your `for i` loop)
If the word at `i` is not null // Check we haven't already deleted this word
// Found a new word
Set countArray[i] to 1
Loop through the rest of the words e.g. for (j = i + 1; j < n; j++)
If the word at j is not NULL and matches the word at i (using strcmp)
// Found a duplicate word
Increment countArray[i] (the original word's count)
// We don't want wordArray[j] anymore, so
Free wordArray[j]
Set wordArray[j] to NULL
Else
// A null indicates this was a duplicate, set the count to 0 for consistency.
Set countArray[i] to 0
Return wordArray

I'm going to throw you a bit of a curve ball here.
Rather than fix your code, which can be easily fixed as it's pretty good on its own, but incomplete, I decided to write an example from scratch.
No need to read the file twice [first time just to get the maximum count]. This could be handled by a dynamic array and realloc.
The main point, I guess, is that it is much easier to ensure that word list has no duplicates while creating it, rather than removing duplicates at the end.
I opted for a few things.
I created a "word control" struct. You've got several separate arrays that are indexed the same way. That, sort of, "cries out" for a struct. That is, rather than [say] 5 separate arrays, have a single array of a struct that has 5 elements in it.
The word list is a linked list of these structs. It could be a dynamic array on the heap that gets realloced instead, but the linked list is actually easier to maintain for this particular usage.
Each struct has the [cleaned up] word text and a count of the occurrences (vs. your separate wordArray and countArray).
When adding a word, the list is scanned for an existing match. If one is found, the count is incremented, rather than creating a new word list element. That's the key to eliminating duplicates [i.e. don't create them in the first place].
Anyway, here it is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>
#define sysfault(_fmt...) \
do { \
printf(_fmt); \
exit(1); \
} while (0)
// word control
typedef struct word {
struct word *next; // linked list pointer
char *str; // pointer to word string
int count; // word frequency count
} word_t;
word_t wordlist; // list of words
// cleanword -- strip chaff and clean up word
void
cleanword(char *dst,const char *src)
{
int chr;
// NOTE: using _two_ buffers in much easier than trying to clean one
// buffer in-place
for (chr = *src++; chr != 0; chr = *src++) {
if (! isalpha(chr))
continue;
chr = tolower(chr);
*dst++ = chr;
}
*dst = 0;
}
// addword -- add unique word to list and keep count of number of words
void
addword(const char *str)
{
word_t *cur;
word_t *prev;
char word[1000];
// get the cleaned up word
cleanword(word,str);
// find a match to a previous word [if it exists]
prev = NULL;
for (cur = wordlist.next; cur != NULL; cur = cur->next) {
if (strcmp(cur->str,word) == 0)
break;
prev = cur;
}
// found a match -- just increment the count (i.e. do _not_ create a
// duplicate that has to be removed later)
if (cur != NULL) {
cur->count += 1;
return;
}
// new unique word
cur = malloc(sizeof(word_t));
if (cur == NULL)
sysfault("addword: malloc failure -- %s\n",strerror(errno));
cur->count = 1;
cur->next = NULL;
// save off the word string
cur->str = strdup(word);
if (cur->str == NULL)
sysfault("addword: strdup failure -- %s\n",strerror(errno));
// add the new word to the end of the list
if (prev != NULL)
prev->next = cur;
// add the first word
else
wordlist.next = cur;
}
int
main(int argc,char **argv)
{
FILE *xf;
char buf[1000];
char *cp;
char *bp;
word_t *cur;
--argc;
++argv;
xf = fopen(*argv,"r");
if (xf == NULL)
sysfault("main: unable to open '%s' -- %s\n",*argv,strerror(errno));
while (1) {
// get next line
cp = fgets(buf,sizeof(buf),xf);
if (cp == NULL)
break;
// loop through all words on a line
bp = buf;
while (1) {
cp = strtok(bp," \t\n");
bp = NULL;
if (cp == NULL)
break;
// add this word to the list [avoiding duplicates]
addword(cp);
}
}
fclose(xf);
// print the words and their counts
for (cur = wordlist.next; cur != NULL; cur = cur->next)
printf("%s %d\n",cur->str,cur->count);
return 0;
}

Related

Outputting a string from a structure

I am trying to create a program that takes input from a file, puts each word into a "words" structure, and then outputs the results with the frequency of each word, but whenever I try to output the string it just prints something like ?k#?? where I would expect the string to be.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct s_words {
char *str; //stores the word; no pre-determined size
int count;
struct s_words* next;
} words;
words* create_words(char* word) {
//allocate space for the structure
words* newWord = malloc(strlen(word));
if (NULL != newWord){
//allocate space for storing the new word in "str"
//if str was array of fixed size, storage wud be wasted
newWord->str = (char *)malloc(strlen(word));
strcpy(newWord->str,word); //copy “word” into newWord->str
newWord->str[strlen(word)]='\0';
newWord->count = 1; //initialize count to 1;
newWord->next = NULL; //initialize next;
}
return newWord;
}
//If the word is in the list, add 1 to count.
words* add_word(words* wordList, char* word) {
int found=0;
words *temp=wordList;
// search if word exists in the list; if so, make found=1
while (temp!=NULL) {
// printf("looptest\n");
if (strcmp(word,temp->str) == 0) { //use strcmp command
//printf("looptest0\n");
found=1;
temp->count = temp->count + 1; //increment count;
return wordList;
//printf("looptest1\n");
}
else {
temp = temp -> next; //update temp
// printf("looptest2\n");
}
}
// printf("looptest3\n");
//new word
words* newWord = create_words(word);
// printf("looptest4\n");
if (NULL != newWord) {
// printf("looptest5\n");
newWord->next = wordList;
wordList = newWord;
//Insert new word at the head of the list
}
else{
// printf("looptest6\n");
temp = wordList;
while(temp->next != NULL){
// printf("looptest7\n");
temp = temp->next;
}
temp->next = newWord;
}
return newWord;
}
int main(int argc, char* argv[]) {
words *mywords; //head of linked list containing words
mywords=NULL;
FILE *myFile;
myFile = fopen(argv[1],"r"); //first parameter is input file
if (myFile==0) {
printf("file not opened\n");
return 1;
}
else {
printf("file opened\n");
}
//start reading file character by character;
//when word has been detected; call the add_word function
int ch, word = 0, k=0;
char thisword[100];
while ( (ch = fgetc(myFile)) != EOF )
{
// printf("%c",ch);
if (ch==' ' || ch==',' || ch==';' || ch==':' || ch == '.') //detect new word? Check if ch is a delimiter
{
// printf("\ncheck2\n");
if ( word ) //make sure previous character was not delimiter
{
// printf("check\n");
word = 0;
thisword[k] = '\0'; //make the kth character of thisword as \0
// printf("test2\n");
//now call add_word to add thisword into the list
mywords = add_word(mywords,thisword);
// printf("check3\n");
k=0;
}
// printf("test\n");
}
else
{
word = 1;
thisword[k] = ch; //make the kth character of thisword equal to ch
k++;
}
if(ch == EOF){
thisword[k] = '\0';
mywords = add_word(mywords,thisword);
}
}
printf("%s\n",mywords->str);
printf("printing list\n");
//Traverse list and print each word and its count to outputfile
//output file is second parameter being passed
//haven't started to deal with the output file
words* temp = mywords;
while(temp != NULL){
printf("%s\tcount: %i\n",temp->str,temp->count);
temp = temp->next;
}
printf("list complete\n");
return 0;
}
This is all my code, I can't figure out how to error test what the problem is since I can't figure out how to output the strings. I've only started programming in C this year so I assume there's something basic I'm missing.

newWord->str = (char *)malloc(strlen(word));
strcpy(newWord->str,word); //copy “word” into newWord->str
newWord->str[strlen(word)]='\0';
.. writes the null out-of-bounds.
Assuming that strlen() returns the desired value, you should malloc an extra char:
newWord->str = (char *)malloc(1+strlen(word));
Note Olaf comment re. casting in C. Also note that it's unlikely that this is your ONLY bug.

recording of each word in a text file in c

I am trying to build a function that will check if the word is in a list of words, if it is, it will increment the corresponding counter for the frequency of that word. Otherwise, it will create a copy of the
word and append it to the list. Then set the corresponding frequency counter to 1.
I get no compiler errors but when I attempt to print the frequency of any word I get a number in the 2 millions and I have no idea why.
I am given a main file I cannot modify:
#include <stdlib.h>
#include <string.h>
#define MAX_WORDS 300
#define LINE_LEN 80
void increment_word_freq(char *freq_words[MAX_WORDS], int *frequency, int *n, char *word);
int main(){
char delim[] = " ,.!-;\"\n";
char filename[] = "cookbook.txt";
char line[LINE_LEN];
char *word;
char *freq_words[MAX_WORDS]; // a list of frequent words
int frequency[MAX_WORDS]; // frequency of the words
int n = 0; // number of words in the list
int min_occr;
FILE *fp;
fp = fopen(filename, "r");
if(!fp){
printf("Could not open file %s\n", filename);
exit(1);
}
// read one line at a time
while(fgets(line, LINE_LEN, fp)){
// get the words from the line
word = strtok(line, delim);
while(word != NULL) {
// convert the word to lowercase
int i;
for(i = 0; i < strlen(word); i++)
word[i] = tolower(word[i]);
increment_word_freq(freq_words, frequency, &n, word);
word = strtok(NULL,delim);
}
}
}
this is the function I am attempting to use:
void increment_word_freq(char *freq_words[MAX_WORDS], int *frequency, int *n, char *word){
for(int i=0; i<MAX_WORDS; i++){
if(freq_words[i] == word){
frequency[i]++;
break;
}
else if(i=MAX_WORDS-1){
frequency[i]= *word;
*n++;
}
}
}
like I said before, no compiler errors but attempting to print the frequency of any word will give a number in the 2 millions and I have no idea why.
Any and all help and advice is greatly appreciated!

freq_words[i] == word only compares the pionter freq_words[i] with the pointer word. You have to campare the strings the pointers refer to. Change your code to strcmp(freq_words[i], word) == 0. Apart from this you have to allocate dynamic memory to stroe your strings. Use strcpy to copy a string int the dynamic memory. You have to do so, because word is a pointer to a char somewhere in line, but line will be overwritten if you read the next line of the file. Adapt your code like this:
#include <string.h> // strcmp, strcpy
void increment_word_freq( char *freq_words[MAX_WORDS], int *frequency, int *n, char *word)
{
for ( int i=0; i < *n; i++) // for all current members of freq_words
{
if ( strcmp( freq_words[i], word ) == 0 ) // test if word is member of freq_words
{
frequency[i]++; // increment count
return; // finished, because word was found
}
}
// word was not found in freq_words => add new word to freq_words
if ( *n < MAX_WORDS-1 ) // test if there is one more place in freq_words
{
freq_words[*n] = malloc( strlen(word) + 1 ); // allocate dynamic memory for new meber of freq_words
strcpy( freq_words[*n], word ); // copy word to freq_words[*n]
frequency[*n] = 1; // int frequency[*n] with 1
(*n)++; // increment count of members of freq_words
}
}
Note you have to free the allocated memory at the end of main, otherwise you have memory leaks.
for ( int i=0; i < *n; i++)
{
free( freq_words[i] );
}

While Loop stops before condition

I have a problem with my code. I want to load a dictionary which works fine with a small one. But when i try to load the larger version, my while loop stops at the 701th word which is " acclimatization" and then the programs continues. I searched a lot on forums and tried a lot of things, but i just can't find the reason this is caused. Does anyone have an idea of how this occurs?
Dictionary.c
bool load(const char* dictionary)
{
// reserve space for word
char* word = malloc(sizeof(char*));
// open file
FILE* dict = fopen(dictionary, "r");
if (dict == NULL)
{
fclose(dict);
fprintf(dict, "Could not load %s.\n", dictionary);
return 1;
}
root = (struct node *) malloc(sizeof(struct node));
root->is_word = false;
//Loops over word aslong the EOF is not reached
while (fgets(word,LENGTH,dict) != NULL)
{
printf("word = %s\n", word);
int word_length = strlen(word) -1;
node* current = root;
word_count++;
//Loops over letters
for (int i = 0; i < word_length; i++)
{
int index;
node *next_node;
// checks if letter isnt a apostrophe
if(word[i] == 39)
{
index = MAX_CHARS - 1;
}
// gets nummeric value of letter
else
{
index = tolower(word[i]) - 'a';
}
next_node = current->children[index];
// creates new node if letter didnt exists before
if(next_node == NULL)
{
next_node = malloc(sizeof(node));
current->children[index] = next_node;
current->is_word = false;
printf("new letter: %c\n", word[i]);
}
else
{
printf("letter: %c\n", word[i]);
}
// checks for end of the word
if(i == word_length - 1)
{
next_node->is_word = true;
}
current = next_node;
}
}
return true;
}
The node is defined by:
// node
typedef struct node
{
bool is_word;
struct node* children[27];
}
node;

char* word = malloc(sizeof(char*));
Depending on platform it can be 4 or 8 . You need to allocate more memory.
char* word;
word = malloc(LENGTH); // LENGTH as you use it here while (fgets(word,LENGTH,dict) != NULL)
if(word!=NULL){ // and checking if malloc is successful
// your code
free(word); // freeing allocated memory
return true;
}
else { // executed only if malloc fails
//handle error
}
You can give any desired size.
Note - Using function free() , you need to free every time you allocate memory.

You allocate very little space for word, it's probably 8 or 4 bytes depending on your platform.
You are allocating space for 1 char pointer, so when you read from the file LENGTH characters you can be storing bytes beyond the limits of the allocated buffer. The problem is, that the behavior is undefined thus the program might work or it might stop or anything can happen.
You don't need to allocate it dynamically, just like this it's ok
char word[100];
while (fgets(word, sizeof(word), file) != NULL) ...
/* ^ this only works with arrays, */
/* the benefit is that you can */
/* change the definition of word */
/* and resize it without changing */
/* this part. */
/* */
/* It will NOT work if you use `malloc()' */
Also, you would have a memory leak if fopen() failes, every malloc() requires a corresponding free().
Suggestion:
for (int i = 0; i < word_length; i++)
can be written like this too
for (int i = 0; ((word[i] != '\n') && (word[i] != '\0')); i++)
and you avoid calling strlen() which will also iterate through the characters.

How to get word before and after the current word in a file?

I have a file, "data.txt" that consists of the following: http://pastebin.com/FY9ZTQX6
I'm trying to get the word before and after the "<". The old word being the one on the left and the new word being the one on the right. This is what I have so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
/*
Name: Marcus Lorenzana
Assignment: Final
*/
//binary tree struct to hold left and right node
//as well as the word and number of occurrences
typedef struct node
{
char *word;
int count;
struct node *left;
struct node *right;
}
node;
//,.?!:;-
int punctuation[7];
void insert(node ** dictionary, node * entry);
char* readFile(char* filename);
void printDictionary(node * tree);
void toLower(char** word);
void getReplacementWords(char *filecontents, char **newWord, char **oldWord) ;
int main()
{
char *word;
char* filecontents = readFile("data.txt");
char* oldWord;
char* newWord;
//create dictionary node
node *dictionary;
node *entry;
//read words and punctuation in from the text file
word = strtok (filecontents, " \n");
dictionary = NULL;
while (word != NULL)
{
//word = strlwr(word);
entry = (node *) malloc(sizeof(node));
entry->left = entry->right = NULL;
entry->word = malloc(sizeof(char)*(strlen(word)+1));
entry->word = word;
insert(&dictionary,entry);
word = strtok (NULL, " \n");
}
//printDictionary(dictionary);
filecontents = readFile("data.txt");
getReplacementWords(filecontents,&newWord,&oldWord);
return 0;
}
void insert(node ** dictionary, node * entry)
{
if(!(*dictionary))
{
*dictionary = entry;
entry->count=1;
return;
}
int result = strcmp(entry->word,(*dictionary)->word);
if(result<0){
insert(&(*dictionary)->left, entry);
entry->count++;
}
else if(result>0){
insert(&(*dictionary)->right, entry);
entry->count++;
} else {
entry->count++;
}
}
//put file contents in string for strtok
char* readFile(char* filename)
{
FILE* file = fopen(filename,"r");
if(file == NULL)
{
return NULL;
}
fseek(file, 0, SEEK_END);
long int size = ftell(file);
rewind(file);
char* content = calloc(size + 1, 1);
fread(content,1,size,file);
return content;
}
void printDictionary(node * dictionary)
{
if(dictionary->left) {
printDictionary(dictionary->left);
}
printf("%s\n",dictionary->word);
if(dictionary->right) {
printDictionary(dictionary->right);
}
}
void getReplacementWords(char *filecontents, char **newWord, char **oldWord) {
char *word;
word = strtok (filecontents, " \n");
while (word != NULL)
{
printf("\n%s",word);
int result = strcmp(word,"<");
if (result == 0) {
printf("\nFound replacement identifier");
}
word = strtok (NULL, " \n");
}
}

you can use
fscanf(filename , "%s < %s" , firstStringContainer , secondStringContainer)
after using fseek to get to the line containing the < character this will get the string before the < to be stored in firstStringContainer and the one after in secondStringContainer
here's a code a recommand you to use :
int found = 0;
char buffer[chooseYourSize];
char firstStringContainer[chooseYourSize] , secondStringContainer[chooseYourSize];
while(fgets(buffer , sizeof(buffer) , filename) != NULL)
{
if(strchr(buffer , '<'))
{
found++;
break;
}
}
if(found)
{
fscanf(file , "%s < %s" , firstStringContainer , secondStringContainer);
}
of course this only works if the lines targeted only contains the three elements string < string which is the case here

If your data is in the format of STRING1 < STRING2 you can do:
fscanf(file,"%s < %s", string1, string2);
if it's somewhere on a line it's going to be a little more difficult. What you can do is grab lines from the file and put them into a buffer, then locate the >, go back to the beginning of the first string, and read what you want.
while(fgets(buff,sizeof(buff),file) != NULL
{
if( (pointer = strstr(buff," > ")) != NULL)
{
//now you have located the > just go back
//in the buff till you reach the start of
//string1 and then use
sscanf(buff+(pointer * sizeof(char)),"%s > %s",string1, string2)
}
}
it's been a while since I did this so there might be syntax errors

You can use fseek() in a loop to skip 1 element forward/back and verify if it is space or > or other needed character (another function from string.h).
When you find this symbol, you can move the pointer forward/back to another space or other needed character, remember the number of skipped characters N and then copy N symbols to a string variable.
substitute < replacement
^ find this symbol
substitute < replacement
^ make a loop that makes `counter++` when it finds `space`
(int counter = 0;)
substitute < replacement
^ the loop will continue and will find the 2nd `space`, and make `counter++`
when `counter == 2` (1 space after and 1 before the word) the loop stops.
Now `file` pointer points to the `space` symbol before the 1st word.
Then skip 1 element forward (using `fseek()`) and now you have
`file` pointer that points to the 1st word.
And now you can do whatever you want!
Do the same actions to find the 2nd word (file pointer will point to the 2nd word so you will be able to call this function again: it will be looking for thw 2nd > in your text) and make a function findWordsNearArrow() or something like that.
You can call this function in a loop so when it finds EOF it will return specific value that you can use to exit the loop.
Think again. (c)

Use fgets() and strchr() to get to the line with the <.
while (strchr (fgets (buffer, sizeof (buffer), file), '<') == NULL)
; // do nothing
Then use strtok() to parse the current line in the buffer
strcpy (oldword, strtok (buffer, "<"));
strcpy (newword, strtok (NULL, "\n"));

"Pointer being freed was not allocated" happen on mac but not on window7

I am doing an exercise on a book, changing the words in a sentence into pig latin. The code works fine in window 7, but when I compiled it in mac, the error comes out.
After some testings, the error comes from there. I don't understand the reason of this problem. I am using dynamic memories for all the pointers and I have also added the checking of null pointer.
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
free(walker);
walker++;
}
Full source code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#define inputSize 81
void getSentence(char sentence [], int size);
int countWord(char sentence[]);
char ***parseSentence(char sentence[], int *count);
char *translate(char *world);
char *translateSentence(char ***words, int count);
int main(void){
/* Local definition*/
char sentence[inputSize];
int wordsCnt;
char ***head;
char *result;
getSentence(sentence, inputSize);
head = parseSentence(sentence, &wordsCnt);
result = translateSentence(head, wordsCnt);
printf("\nFinish the translation: \n");
printf("%s", result);
return 0;
}
void getSentence(char sentence [81], int size){
char *input = (char *)malloc(size);
int length;
printf("Input the sentence to big latin : ");
fflush(stdout);
fgets(input, size, stdin);
// do not copy the return character at inedx of length - 1
// add back delimater
length = strlen(input);
strncpy(sentence, input, length-1);
sentence[length-1]='\0';
free(input);
}
int countWord(char sentence[]){
int count=0;
/*Copy string for counting */
int length = strlen(sentence);
char *temp = (char *)malloc(length+1);
strcpy(temp, sentence);
/* Counting */
char *pToken = strtok(temp, " ");
char *last = NULL;
assert(pToken == temp);
while (pToken){
count++;
pToken = strtok(NULL, " ");
}
free(temp);
return count;
}
char ***parseSentence(char sentence[], int *count){
// parse the sentence into string tokens
// save string tokens as a array
// and assign the first one element to the head
char *pToken;
char ***words;
char *pW;
int noWords = countWord(sentence);
*count = noWords;
/* Initiaze array */
int i;
words = (char ***)calloc(noWords+1, sizeof(char **));
for (i = 0; i< noWords; i++){
words[i] = (char **)malloc(sizeof(char *));
}
/* Parse string */
// first element
pToken = strtok(sentence, " ");
if (pToken){
pW = (char *)malloc(strlen(pToken)+1);
strcpy(pW, pToken);
**words = pW;
/***words = pToken;*/
// other elements
for (i=1; i<noWords; i++){
pToken = strtok(NULL, " ");
pW = (char *)malloc(strlen(pToken)+1);
strcpy(pW, pToken);
**(words + i) = pW;
/***(words + i) = pToken;*/
}
}
/* Loop control */
words[noWords] = NULL;
return words;
}
/* Translate a world into big latin */
char *translate(char *word){
int length = strlen(word);
char *bigLatin = (char *)malloc(length+3);
/* translate the word into pig latin */
static char *vowel = "AEIOUaeiou";
char *matchLetter;
matchLetter = strchr(vowel, *word);
// consonant
if (matchLetter == NULL){
// copy the letter except the head
// length = lenght of string without delimiter
// cat the head and add ay
// this will copy the delimater,
strncpy(bigLatin, word+1, length);
strncat(bigLatin, word, 1);
strcat(bigLatin, "ay");
}
// vowel
else {
// just append "ay"
strcpy(bigLatin, word);
strcat(bigLatin, "ay");
}
return bigLatin;
}
char *translateSentence(char ***words, int count){
char *bigLatinSentence;
int length = 0;
char *bigLatinWord;
/* calculate the sum of the length of the words */
char ***walker = words;
while (*walker){
length += strlen(**walker);
walker++;
}
/* allocate space for return string */
// one space between 2 words
// numbers of space required =
// length of words
// + (no. of words * of a spaces (1) -1 )
// + delimater
// + (no. of words * ay (2) )
int lengthOfResult = length + count + (count * 2);
bigLatinSentence = (char *)malloc(lengthOfResult);
// trick to initialize the first memory
strcpy(bigLatinSentence, "");
/* Translate each word */
int i;
char *w;
for (i=0; i<count; i++){
w = translate(**(words + i));
strcat(bigLatinSentence, w);
strcat(bigLatinSentence, " ");
assert(w != **(words + i));
free(w);
}
/* free memory of big latin words */
walker = words;
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
free(walker);
walker++;
}
return bigLatinSentence;
}

Your code is unnecessarily complicated, because you have set things up such that:
n: the number of words
words: points to allocated memory that can hold n+1 char ** values in sequence
words[i] (0 <= i && i < n): points to allocated memory that can hold one char * in sequence
words[n]: NULL
words[i][0]: points to allocated memory for a word (as before, 0 <= i < n)
Since each words[i] points to stuff-in-sequence, there is a words[i][j] for some valid integer j ... but the allowed value for j is always 0, as there is only one char * malloc()ed there. So you could eliminate this level of indirection entirely, and just have char **words.
That's not the problem, though. The freeing loop starts with walker identical to words, so it first attempts to free words[0][0] (which is fine and works), then attempts to free words[0] (which is fine and works), then attempts to free words (which is fine and works but means you can no longer access any other words[i] for any value of i—i.e., a "storage leak"). Then it increments walker, making it more or less equivalent to &words[1]; but words has already been free()d.
Instead of using walker here, I'd use a loop with some integer i:
for (i = 0; words[i] != NULL; i++) {
free(words[i][0]);
free(words[i]);
}
free(words);
I'd also recommending removing all the casts on malloc() and calloc() return values. If you get compiler warnings after doing this, they usually mean one of two things:
you've forgotten to #include <stdlib.h>, or
you're invoking a C++ compiler on your C code.
The latter sometimes works but is a recipe for misery: good C code is bad C++ code and good C++ code is not C code. :-)
Edit: PS: I missed the off-by-one lengthOfResult that #David RF caught.

int lengthOfResult = length + count + (count * 2);
must be
int lengthOfResult = length + count + (count * 2) + 1; /* + 1 for final '\0' */
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
/* free(walker); Don't do this, you still need walker */
walker++;
}
free(words); /* Now */
And you have a leak:
int main(void)
{
...
free(result); /* You have to free the return of translateSentence() */
return 0;
}

In this code:
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
free(walker);
walker++;
}
You need to check that **walker is not NULL before freeing it.
Also - when you compute the length of memory you need to return the string, you are one byte short because you copy each word PLUS A SPACE (including a space after the last word) PLUS THE TERMINATING \0. In other words, when you copy your result into the bigLatinSentence, you will overwrite some memory that isn't yours. Sometimes you get away with that, and sometimes you don't...

Wow, so I was intrigued by this, and it took me a while to figure out.
Now that I figured it out, I feel dumb.
What I noticed from running under gdb is that the thing failed on the second run through the loop on the line
free(walker);
Now why would that be so. This is where I feel dumb for not seeing it right away. When you run that line, the first time, the whole array of char*** pointers at words (aka walker on the first run through) on the second run through, when your run that line, you're trying to free already freed memory.
So it should be:
while (walker != NULL && *walker != NULL){
free(**walker);
free(*walker);
walker++;
}
free(words);
Edit:
I also want to note that you don't have to cast from void * in C.
So when you call malloc, you don't need the (char *) in there.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Count the number of occurrences of each word - c

Related

Outputting a string from a structure

recording of each word in a text file in c

While Loop stops before condition

How to get word before and after the current word in a file?

"Pointer being freed was not allocated" happen on mac but not on window7

Categories

Resources