Counting how many times a word is in a file - c

I'm trying to write a program that gets a string, and counts how many times that string is found in a specific file.
The file is currently: hello my name hello is oria
I like hello to program
And the word I'm counting is hello.
this is my code
int num_of_words(FILE* stream,char* str)
{
int count=0,i=0,length;
char c;
rewind(stream);
length=strlen(str);
do
{
c=fgetc(stream);
while(c==*(str+i))
{
c=fgetc(stream);
i++;
if(i==length)
{
count++;
i=0;
}
}
i=0;
}while(c!=EOF);
return count;
}
The idea is that there is a certain index called i, and it advances only if there is a match between letters. if i reached the length of the string, then it means we found all the letters in succession, and i raise count by one.
For some reason, it always returns zero.

I. while (!eof) is wrong.
II. Don't reinvent the wheel - there's the strstr() function in the C standard library which helps you find a substringĀ in another string.
I'd rather read the file into a buffer then use the following function. If the file is not too large, this should not be a problem.
int count(const char *haystack, const char *needle)
{
int n = 0;
const char *p = haystack;
size_t len = strlen(needle);
while (p = strstr(p, needle)) {
n++;
p += len;
}
return n;
}

char c; should probably be int c; to accommodate for the type of data that fgetc returns. Otherwise, when char is an unsigned type c can never equal EOF and your loop will never end.
Before your code increments i once, it reads two characters from the file. That seems problematic to me. I'd move the c=fgetc(stream); in the inner loop to the end of that loop, eg:
while(c==*(str+i))
{
i++;
if(i==length)
{
count++;
i=0;
break;
}
c=fgetc(stream);
}

Related

sscanf cycle segmentation fault

I haven't programmed for a couple years and I have a question with sscanf:
I want to separate a string into several using sscanf, but sscanf gives me segmentation fault in a cycle. Why is that? how can I use sscanf in a cycle without happening?
Example:
int main() {
char str[100];
char mat[100][100]; int i = 0;
strcpy(str, "higuys\nilovestackoverflow\n2234\nhaha");
while (sscanf(str, "%s", mat[i]) == 1) i++;
}
int sscanf(const char *str, const char *format, ...);
while(sscanf(str,"%s", mat[i]) == 1) i++;
Since str is constant in the prototype it cannot be changed by sscanf (unless sscanf is very broken :)), so it successfully repeats over and over, returning 1 all the time...
So i increases, and at some point you're hitting a memory boundary and the system stops your harmful program.
If you want to read a multi-line string, use a loop with strtok for instance, that will go through your string and yield lines.
Note: my previous answer correctly assumed that the previous version question had a typo with an extra ; in the middle
while(sscanf(str,"%s", mat[i]) == 1); i++;
is always successful since str is the input and doesn't change (unlike when you're reading from a file using fscanf or fgets).
So it was just an infinite loop in that case.
sscanf stops at \n, stores the word higuys into the array mat[i] and returns 1. The loop condition is true, i gets incremented and the process goes on for the next element of mat as the destination with the same source string... every element of mat receives the same higuys string, and the loop continues, causing a buffer overflow, invoking undefined behavior and ultimately crashing.
Here is how to modify your code to make it work:
#include <stdio.h>
int main(void) {
const char *str = "higuys\nilovestackoverflow\n2234\nhaha";
char mat[100][100];
int i = 0, n = 0;
/* parse the multiline string */
while (sscanf(str, "%s%n", mat[i], &n) == 1) {
str += n;
i++;
}
/* output the array */
for (int j = 0; j < i; j++) {
printf("mat[%d] = %s\n", j, mat[j]);
}
return 0;
}

How do I count occurrences of a list of strings and output them to a new file?

I have been given three '.txt' files.
The first is a list of words.
The second is a document to search.
The third is a blank document that will have my output written to it.
I'm supposed to take each word in the first file, search the second file and print the number of occurrences in the third file as "wordX = numOccurences."
I've got a good function that will return the wordCount, and it returns it correctly for the first word, but then I get a zero for all the remaining words.
I've tried to dereference everything, and I think I've come to a standstill. There's something wrong with the "pointer talk."
I have yet to start outputting the words to a new file, but that printf statement should be a print to file statement in append mode. Easy enough.
Here is the working wordCount function - it works if I just give it a single word, like "testing," but if I give it an array I want to iterate through, it just returns 0.
int countWord(char* filePath, char* word){ //Not mine. This is a working prototype function from SO, returns word count of particular word
FILE *fp;
int count = 0;
int ch, len;
if(NULL==(fp=fopen(filePath, "r")))
return -1;
len = strlen(word);
for(;;){
int i;
if(EOF==(ch=fgetc(fp))) break;
if((char)ch != *word) continue;
for(i=1;i<len;++i){
if(EOF==(ch = fgetc(fp))) goto end;
if((char)ch != word[i]){
fseek(fp, 1-i, SEEK_CUR);
goto next;
}
}
++count;
next: ;
}
end:
fclose(fp);
return count;
}
This is my part of the program, trying to call the function while the loop gets all the words from the first file. The loop IS grabbing the words, because it prints them, but wordCount isn't accepting anything beyond the first word.
int main(){
FILE *ptr_file;
char words[100];
ptr_file = fopen("searchWords.txt", "r");
if(!ptr_file)
return -1;
while( fgets(words, 100, ptr_file)!=NULL )
{
int wordCount = 0;
char key[100] = &*words;
wordCount = countWord("document.txt", words);
printf("%s = %d\n", words, wordCount);
}
fclose(ptr_file);
return 0;
}
fgets reads \n too.That is the problem. To quote
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str.
To solve this, change it
while( fgets(words, 100, ptr_file)!=NULL )
{
int len = strlen(words);
words[len-1] = '\0';
An immediate problem: fgets doesn't strip end-of-line from the string, so whatever you pass to countWord has an embedded newline.

Loop crashing in C

I'm very new to C and I'm still learning the basics. I'm creating an application that reads in a text file and breaks down the words individually. My intention will be to count the amount of times each word occurs.
Anyway, the last do-while loop in the code below executes fine, and then crashes. This loop prints memory address to this word (pointer) and then prints the word. It accomplishes this fine, and then crashes on the last iteration. My intention is to push this memory address into a singly linked list, albeit once it's stopped crashing.
Also, just a quick mention regarding the array sizes below; I yet figured out how to set the correct size needed to hold the word character array etc because you must define the size before the array is filled, and I don't know how to do this. Hence why I've set them to 1024.
#include<stdio.h>
#include<string.h>
int main (int argc, char **argv) {
FILE * pFile;
int c;
int n = 0;
char *wp;
char wordArray[1024];
char delims[] = " "; // delims spaces in the word array.
char *result = NULL;
result = strtok(wordArray, delims);
char holder[1024];
pFile=fopen (argv[1],"r");
if (pFile == NULL) perror ("Error opening file");
else {
do {
c = fgetc (pFile);
wordArray[n] = c;
n++;
} while (c != EOF);
n = 0;
fclose (pFile);
do {
result = strtok(NULL, delims);
holder[n] = *result; // holder stores the value of 'result', which should be a word.
wp = &holder[n]; // wp points to the address of 'holder' which holds the 'result'.
n++;
printf("Pointer value = %d\n", wp); // Prints the address of holder.
printf("Result is \"%s\"\n", result); // Prints the 'result' which is a word from the array.
//sl_push_front(&wp); // Push address onto stack.
} while (result != NULL);
}
return 0;
}
Please ignore the bad program structure, as I mentioned, I'm new to this!
Thanks
As others have pointed out, your second loop attempts to dereference result before you check for it being NULL. Restructure your code as follows:
result = strtok( wordArray, delims ); // do this *after* you have read data into
// wordArray
while( result != NULL )
{
holder[n] = *result;
...
result = strtok( NULL, delims );
}
Although...
You're attempting to read the entire contents of the file into memory before breaking it up into words; that's not going to work for files bigger than the size of your buffer (currently 1K). If I may make a suggestion, change your code such that you're reading individual words as you go. Here's an example that breaks the input stream up into words delimited by whitespace (blanks, newlines, tabs, etc.) and punctuation (period, comma, etc.):
#include <stdio.h>
#include <ctype.h>
int main(int argc, char **argv)
{
char buffer[1024];
int c;
size_t n = 0;
FILE *input = stdin;
if( argc > 1 )
{
input = fopen( argv[1], "r");
if (!input)
input = stdin;
}
while(( c = fgetc(input)) != EOF )
{
if (isspace(c) || ispunct(c))
{
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
n = 0;
}
}
else
{
buffer[n++] = c;
}
}
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
}
fclose(input);
return 0;
}
No warranties express or implied (having pounded this out before 7:00 a.m.). But it should give you a flavor of how to parse a file as you go. If nothing else, it avoids using strtok, which is not the greatest of tools for parsing input. You should be able to adapt this general structure to your code. For best results, you should abstract that out into its own function:
int getNextWord(FILE *stream, char *buf, size_t bufsize)
{
int c;
size_t n = 0;
while(( c = fgetc(input)) != EOF && n < bufsize)
{
if (isspace(c) || ispunct(c))
{
if (n > 0)
{
buf[n] = 0;
n = 0;
}
}
else
{
buffer[n++] = c;
}
}
if (n > 0)
{
buffer[n] = 0;
printf("read word %s\n", buffer);
}
if (n == 0)
return 0;
else
return 1;
}
and you would call it like
void foo(void)
{
char word[SOME_SIZE];
...
while (getNextWord(inFile, word, sizeof word))
{
do_something_with(word);
}
...
}
If you expect in your do...while code, that result could be null (this is the condition for loop break), how do you think this code-line:
holder[n] = *result;
must work? It seems to me, that it is the reason for crashing in your program.
Change do while loop to while
use
while (condition)
{
}
instead of
do {
}while(condition)
It is crashing because you are trying to derefrance a NULL pointer result in do while loop.
I work mostly with Objective-C and was just looking at your question for fun, but I may have a solution.
Before setting n=0; after your first do-while loop, create another variable called totalWords and set it equal to n, totalWords can be declared anywhere within the file (except within one of the do-while loops), but can be defined at the top to the else block since its lifetime is short:
totalWords = n;
then you can set n back to zero:
n = 0;
Your conditional for the final do-while loop should then say:
...
} while (n <= ++totalWords);
The logic behind the application will thus say, count the words in the file (there are n words, which is the totalWords in the file). When program prints the results to the console, it will run the second do-while loop, which will run until n is one result past the value of totalWords (this ensures that you print the final word).
Alternately, it is better practice and clearer for other programmers to use a loop and a half:
do {
result = strtok(NULL, delims);
holder[n] = *result;
wp = &holder[n];
printf("Pointer value = %d\n", wp);
printf("Result is \"%s\"\n", result);
//sl_push_front(&wp); // Push address onto stack.
if (n == totalWords) break; // This forces the program to exit the do-while after we have printed the last word
n++; // We only need to increment if we have not reached the last word
// if our logic is bad, we will enter an infinite loop, which will tell us while testing that our logic is bad.
} while (true);

Calculating length of a string in C

int strlength(const char *myStr){
//variable used for str length counter
int strlength = 0;
//loop through string until end is reached. Each iteration adds one to the string length
while (myStr[strlength] != '\0'){
putchar(myStr[strlength]);
strlength++;
}
return strlength;
}
Why will this not work as intended? I just want to find the length of a string.
From a comment on another answer:
I am using fgets to read in a string. and i have checked to make sure that the string that was typed was stored correclty
In that case, there is a trailing newline stored, so your function computes
strlength("hello\n")
The code is correct, you just didn't pass it the input you believed to pass.
More reliable version:
size_t strlength(const char * myStr)
{
return strlen(myStr);
}
You can try this also:-
int string_length(char *s)
{
int c = 0;
while(*(s+c))
c++;
return c;
}
No need to worry about fgets() and removing the trailing \n.
while (myStr[strlength] != '\0'){
putchar(myStr[strlength]);
strlength++; //When mysStr[strlength] == \0, this line would have already incremented by 1
}
Quick fix:
return (strlength-1);//This will return correct value.
A more better approach:
int strlen(const char *s)
{
char *str=s;
while(*str)
{
str++;
}
return (s-str);
}

strlen inconsistent with zero length string

I'm creating a DataStage parallel routine, which is a C or C++ function that is called from within IBM (formerly Ascential) DataStage. It is failing if one of the strings passed in is zero length. If I put this at the very first line of the function:
return strlen(str);
then it returns 0 for the calls that pass in empty values into str. If I put this at the first line, however...
if (strlen(str)==0) {return 0;}
then it does not return and goes into an infinite loop
I'm baffled - it works fine in a test harness, but not in DataStage.
Maybe there is something odd about the way DataStage passes empty strings to C routines?
int pxStrFirstCharList(char *str, char *chars )
{
if (strlen(str)==0) {return 0;}
if (strlen(chars)==0) {return 0;}
int i = 0;
//Start search
while (str[i]) //for the complete input string
{
if (strchr(chars, str[i]))
{
return i+1;
}
++i;
}
return 0;
}
There is a builtin function for what you are doing, it's called strcspn. This function takes two strings, and searches the first one for the first occurance of any of the characters of the second string.
I suggest using that than RYO...
http://www.cplusplus.com/reference/clibrary/cstring/strcspn/
How about this?
int pxStrFirstCharList(char *str, char *chars )
{
if (str && chars && (0 != strlen(str)) && (0 != strlen(chars)))
{
int i = 0;
//Start search
while (str[i]) //for the complete input string
{
if (strchr(chars, str[i]))
{
return i+1;
}
++i;
}
}
return 0;
}
Also, I don't quite get the point of the while loop ... (and no, I don't mean that this could be written as for). What I mean is that on one hand you are doing a search (strstr) that itself will be implemented as a loop and still you have some outer loop. Could it be that you actually wanted to have chars in its place, i.e.:
int pxStrFirstCharList(char *str, char *chars )
{
if (str && chars && (0 != strlen(str)) && (0 != strlen(chars)))
{
int i = 0;
//Start search
while (chars[i]) //for the complete input string
{
if (strchr(str, chars[i]))
{
return i+1;
}
++i;
}
}
return 0;
}
...? That is, look for each of the characters within chars inside the string denoted by str ...
If NULL is not explicitly part of the game, at least during development phase, it's always a good idea to add a precondition check on pointers received by a function:
int pxStrFirstCharList(char *str, char *chars )
{
if (!str)
return -1;
if (!chars)
return -2;
....
(The negative values -1 and -2 than tell the caller that something went wrong)
Or doing it in a more relaxed way, silently accepting NULL pointer strings as ""-string:
int pxStrFirstCharList(char *str, char *chars )
{
if (!str)
return 0;
if (!chars)
return 0;
...
If you are the only one using this API you could #ifndef BUILD_RELEASE these checks away for a release build if anything is tested stable.
I guess it is the strlen's issue when the length of the string is 0. For example,
char s1[0];
char *s2="a";
printf("%d %s\n", sizeof(s1), s1);//0 #
printf("%d %s\n", strlen(s1), s1);//3 #
printf("%d %s\n", sizeof(s2), s2);//8 a
printf("%d %s\n", strlen(s2), s2);// 1 a
You will get a weird answer for using strlen and you can check its source code in detail(https://code.woboq.org/userspace/glibc/string/strlen.c.html). In nutshell, you can use sizeof instead of strlen for char string or avoid 0 length case by using strlen.

Resources