C program to count total words in an input file - c

Input file contains a completely empty line at line 2 and an unnecessary white space after the final full stop of the text. With this input file I am getting 48 words while I was suppose to get 46 words.
My input file contains:
"Opening from A Tale of Two Cities by Charles Darwin
It was the best of times, it was the worst of times. It was the age
of wisdom, it was the age of foolishness. It was the epoch of
belief, it was the epoch of incredulity. "
Here's how I tried:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define max_story_words 1000
#define max_word_length 80
int main (int argc, char **argv)
{
char story[max_story_words][max_word_length] = {{0}};
char line[max_story_words] = {0};
char *p;
char ch = 0;
char *punct="\n ,!.:;?-";
int num_words = 1;
int i = 0;
FILE *file_story = fopen ("TwoCitiesStory.txt", "r");
if (file_story==NULL) {
printf("Unable to open story file '%s'\n","TwoCitiesStory.txt");
return (EXIT_FAILURE);
}
/* count words */
while ((ch = fgetc (file_story)) != EOF) {
if (ch == ' ' || ch == '\n')
num_words++;
}
rewind (file_story);
i = 0;
/* read each line in file */
while (fgets (line, max_word_length, file_story) != NULL)
{
/* tokenize line into words removing punctuation chars in punct */
for (p = strtok (line, punct); p != NULL; p = strtok (NULL, punct))
{
/* convert each char in p to lower-case with tolower */
char *c = p;
for (; *c; c++)
*c = tolower (*c);
/* copy token (word) to story[i] */
strncpy ((char *)story[i], p, strlen (p));
i++;
}
}
/* output array */
for(i = 0; i < num_words; i++)
printf ("story[%d]: %s\n", i, story[i]);
printf("\ntotal words: %d\n\n",num_words);
return (EXIT_SUCCESS);
}

Your num_words takes account of the two extra whitespaces, that's why you get 48.
You should simply print i immediately after the fgets-strtok loop, if I'm not mistaken.

Something along these lines:
while ((ch = fgetc (file_story)) != EOF) {
if (ch == ' ') {
num_words++;
while( (ch = fgetc (file_story)) == ' ' && (ch != EOF) )
}
if (ch == '\n') {
num_words++;
while( (ch = fgetc (file_story)) == '\n' && (ch != EOF) )
}
Though I wonder why you are only taking whitespace and newline characters for counting new words. Two words separated by some other punctuation mark are definitely not accouted for in your code

My suggestion is to change the words counting loop as follows:
/* count words */
num_words = 0;
int flag = 0; // set 1 when word starts and 0 when word ends
while ((ch = fgetc (file_story)) != EOF) {
if ( isalpha(ch) )
{
if( 0 == flag ) // if it is a first letter of word ...
{
num_words++; // ... add to word count
flag = 1; // and set flag to skip not first letters
}
continue;
}
if ( isspace(ch) || ispunct(ch) ) // if word separator ...
{
flag = 0; // ... reset flag
}
}

Related

How do I strcmp specifically for newline in C?

#include <stdio.h>
#include <string.h>
int main() {
int counter1, counter2;
char line[200] = ""; //store all words that don't need to be deleted
char deleteWord[100]; //word that needs to be deleted
char space;
char word[100];
scanf("%s", deleteWord);
while (1) {
scanf("%s", word);
if (feof(stdin))
break;
// increment counter of total words
++counter1;
if (strcmp(word, deleteWord) == 0) {
// see if the word read in == delete word
// increment counter of deleted words
++counter2;
} else
if (strcmp(word, " ") == 0) { // space is an actual space
strcat(line, word);
strcat(line, " ");
} else
if (strcmp(word, "\n")) { // space a new line \n
strcat(line, word);
strcat(line, "\n");
}
}
printf("--NEW TEXT--\n%s", line);
return 0;
}
In summary, my code is supposed to remove a user input string (one or more words) from another user input string (containing or not containing the word(s)) and produce the output. The code removes the word but it adds a newline per word for each iteration. I believe it is doing this because the expression for the second else if is always true. However, when I properly add the strcmp function for the second else if statement, the code does not produce an output at all (no compiler errors - just missing input). Why is this happening and how do I do a strcmp function for a newline?
Your read the words with scanf("%s", word), which poses these problems:
all white space is ignored, so you cannot test for spaces nor newlines as you try to do in the loop, and you cannot keep track of line breaks.
you should tell scanf() the maximum number of bytes to store into the destination array word, otherwise any word longer than 99 characters will cause a buffer overflow and invoke undefined behavior.
you should test the return value of scanf() instead of callin feof() which might be true after the last word has been successfully read. You should simply write the loop as
while (scanf("%99s", word) == 1) {
// increment counter of total words
++counter1;
...
you do not test if the words fit in the line array either, causing a buffer overflow if the words kept amount to more than 199 characters including separators.
To delete a specific word from a stream, you could read one line at a time and delete the matching words from the line:
#include <ctype.h>
#include <stdio.h>
#include <string.h>
int main() {
char deleteWord[100]; //word that needs to be deleted
char line[1000]; //store all words that don't need to be deleted
printf("Enter the word to remove: ");
if (scanf("%99s", deleteWord) != 1)
return 1;
// read and discard the rest of the input line
int c;
while ((c = getchar()) != EOF && c != '\n')
continue;
size_t len = strlen(deleteWord);
printf("Enter the text: ");
while (fgets(line, sizeof line, stdin)) {
char *p = line;
char *q;
while ((p = strstr(p, deleteWord)) != NULL) {
if ((p == line || isspace((unsigned char)p[-1]))
&& (p[len] == '\0' || isspace((unsigned char)p[len]))) {
/* remove the word */
memmove(p, p + len, strlen(p + len) + 1);
} else {
p += len;
}
}
/* squeeze sequences of spaces as a single space */
for (p = q = line + 1; *p; p++) {
if (*p != ' ' || p[-1] != ' ')
*q++ = *p;
}
*q = '\0';
fputs(line, stdout);
}
return 0;
}

How to use fgets() function so that it only reads 12 characters per line?

I have a file that contains the following words:
Theendsherethiswillnotjaksdjlasdfjkl;asdjfklasdjfkl;asdfjl;
these
are
the
Below is my code :
int i = 0;
bool duplicateFound = false;
while(fgets(line,12,fp)){
for (int j = 0; j < i; j++){
if (strcmp(wordList[j], line) == 0){
duplicateFound = true;
printf("Duplicate Found on Line %d : %s\n", j, wordList[j]);
}
}
if (duplicateFound == false){
strcpy(wordList[i], line);
printf("%s", wordList[i]);
}
i++;*/
printf("%s", line);
}
I am using line to save each word so that I can later check it for duplicates in the array.
I want it so that the function only reads up to 12 characters on each line but it outputs the following output.
ACTUAL OUTPUT :
Theendsherethiswillnotjaksdjlasdfjkl;asdjfklasdjfkl;asdfjl;
these
are
the
EXPECTED OUTPUT:
Theendsheret
these
are
the
You really should just call fgets and then do line[12] = '\0', but that doesn't cleanly deal with input that has long lines. One option is to simply abort if fgets ever returns a partial line (eg, if strchr(line, '\n') returns NULL).
If you want to handle long lines, you can just discard data with getchar until you see a newline. Assuming that you don't want to consider the newline to be one of the 12 characters, you could do something like:
#include <stdio.h>
#include <string.h>
int
main(void)
{
char line[13];
while( fgets(line, 13, stdin) ) {
char *c = strchr(line, '\n');
int ch;
if( c == NULL ) while( (ch = getchar()) != EOF ) {
if( ch == '\n' ) {
break;
}
} else {
*c = '\0';
}
if( printf("%s\n", line) < 0 ) {
break;
}
}
return ferror(stdout) || ferror(stdin) || fclose(stdout) || fclose(stdin);
}

how to read input string until a blank line in C?

first of all i'm new to coding in C.
I tried to read a string of unknowns size from the user until a blank line is given and then save it to a file, and after that to read the file.
I've only managed to do it until a new line is given and I don't know how to look for a blank line.
#include <stdio.h>
#include <stdlib.h>
char *input(FILE* fp, size_t size) {
char *str;
int ch;
size_t len = 0;
str = realloc(NULL, sizeof(char)*size);
if (!str)return str;
while (EOF != (ch = fgetc(fp)) && ch != '\n') {
str[len++] = ch;
if (len == size) {
str = realloc(str, sizeof(char)*(size += 16));
if (!str)return str;
}
}
str[len++] = '\0';
return realloc(str, sizeof(char)*len);
}
int main(int argc, const char * argv[]) {
char *istr;
printf("input string : ");
istr = input(stdin, 10);
//write to file
FILE *fp;
fp = fopen("1.txt", "w+");
fprintf(fp, istr);
fclose(fp);
//read file
char c;
fp = fopen("1.txt", "r");
while ((c = fgetc(fp)) != EOF) {
printf("%c", c);
}
printf("\n");
fclose(fp);
free(istr);
return 0;
}
Thanks!
I would restructure your code a little. I would change your input() function to be a function (readline()?) that reads a single line. In main() I would loop reading line by line via readline().
If the line is empty (only has a newline -- use strcmp(istr, "\n")), then free the pointer, and exit the loop. Otherwise write the line to the file and free the pointer.
If your concept of an empty line includes " \n" (prefixed spaces), then write a function is_only_spaces() that returns a true value for a string that looks like that.
While you could handle the empty line in input(), there is value in abstracting the line reading from the input termination conditions.
Why not use a flag or a counter. For a counter you could simply increase the counter each character found. If a new line is found and the counter is 0 it must be a blank line. If a new line character is found and the counter is not 0, it must be the end of the line so reset the counter to 0 and continue.
Something like this:
int count = 0;
while ((ch = fgetc(fp)) != EOF)
{
if(ch == '\n')
{
if(count == 0)
{
break;
}
count = 0;
str[len++] = ch;
}
else
{
str[len++] = ch;
ch++;
}
}
Another way would be to simply check if the last character in the string was a new line.
while ((ch = fgetc(fp)) != EOF)
{
if(ch == '\n' && str[len - 1] == '\n')
{
break;
}
}
A blank line is a line which contains only a newline, right ? So you can simply keep the last 2 characters you read. If they are '\n', then you have detected a blank line : the first '\n' is the end of the previous line, the second one is the end of the current line (which is a blank line).
char *input(FILE* fp, size_t size) {
char *str;
int ch, prev_ch;
size_t len = 0;
str = realloc(NULL, sizeof(char)*size);
if (!str)return str;
while (EOF != (ch = fgetc(fp)) && (ch != '\n' && prev_ch != '\n')) {
str[len++] = ch;
if (len == size) {
str = realloc(str, sizeof(char)*(size += 16));
if (!str)return str;
}
prev_ch = ch;
}
str[len++] = '\0';
return realloc(str, sizeof(char)*len);
}
Note that parenthesis around ch != '\n' && prev_ch != '\n' are here to make the condition more understandable.
To improve this, you can keep your function that reads only a line and test if the line returned is empty (it contains only a '\n').

fgets and chdir acting strangely together in C

I am currently creating a simple shell for homework and I've run into a problem. Here is a snippet of code with the pieces that pertain to the problem (I may have forgotten some pieces please tell me if you see anything missing):
eatWrd returns the first word from a string, and takes that word out of the string.
wrdCount, as implied, returns the number of words in a string.
if either of these codes are necessary for a response I can post them, just please tell me, I am almost 100% positive they are not the cause of the problem.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX 100
int main(void)
{
char input[MAX];
char *argm[MAX];
memset(input, 0, sizeof(input));
memset(argm, 0, sizeof(argm));
while(1){
printf("cmd:\n");
fgets(input, MAX-1, stdin);
for(i=0;i < wrdCount(input); i++){
argm[i] = eatWrd(input);
}
argm[i] = NULL;
if (!strncmp(argm[0],"cd" , 2)){
chdir(argm[1]);
}
if (!strncmp(argm[0],"exit", 4)){
exit(0);
}
memset(input, 0, sizeof(input));
memset(argm, 0, sizeof(argm));
}
}
Anyways, this loop works for lots of other commands using execvp, (such as cat, ls, etc.), when I use cd, it works as expected, except when I try to exit the shell, it takes multiple exit calls to actually get out. (as it turns out, the number of exit calls is exactly equal to the number of times I call cd). It only takes one exit call when I don't use cd during a session. I'm not really sure what's going on, any help is appreciated, thanks.
Here is eatWrd:
char* eatWrd(char * cmd)
{
int i = 0; // i keeps track of position in cmd
int count = 0; // count keeps track of position of second word
char rest[MAX_LINE]; // rest will hold cmd without the first word
char * word = (char *) malloc(MAX_LINE); //word will hold the first word
sscanf(cmd, "%s", word); //scan the first word into word
// iterate through white spaces, then first word, then the following white spaces
while(cmd[i] == ' ' || cmd[i] == '\t'){
i++;
count++;
}
while(cmd[i] != ' ' && cmd[i] != '\t' && cmd[i] != '\n' && cmd[i] != '\0'){
i++;
count++;
}
while(cmd[i] == ' ' || cmd[i] == '\t'){
i++;
count++;
}
// copy the rest of cmd into rest
while(cmd[i] != '\n' && cmd[i] != '\0'){
rest[i-count] = cmd[i];
i++;
}
rest[i-count] = '\0';
memset(cmd, 0, MAX_LINE);
strcpy(cmd, rest); //move rest into cmd
return word; //return word
}
And here is wrdCount:
int wrdCount(char *sent)
{
char *i = sent;
int words = 0;
//keep iterating through the string,
//increasing the count if a word and white spaces are passed,
// until the string is finished.
while(1){
while(*i == ' ' || *i == '\t') i++;
if(*i == '\n' || *i == '\0') break;
words++;
while(*i != ' ' && *i != '\t' && *i != '\n' && *i != '\0') i++;
}
return words;
}
This variation on your code works for me:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
#include <unistd.h>
#define MAX 100
char *eatWrd(char **line) {
char *next_c = *line;
char *word_start = NULL;
while (isspace(*next_c)) next_c += 1;
if (*next_c) {
word_start = next_c;
do {
next_c += 1;
} while (*next_c && ! isspace(*next_c));
*next_c = '\0';
*line = next_c + 1;
}
return word_start;
}
int main(void)
{
char input[MAX];
char *argm[MAX];
while(1) {
int word_count = 0;
char *next_input = input;
printf("cmd:\n");
fgets(input, MAX, stdin);
do {
argm[word_count] = eatWrd(&next_input);
} while (argm[word_count++]);
/* The above always overcounts by one */
word_count -= 1;
if (!strcmp(argm[0], "cd")){
chdir(argm[1]);
} else if (!strcmp(argm[0], "exit")) {
exit(0);
}
}
}
Note my variation on eatWrd(), which does not have to move any data around, and which does not require pre-parsing the string to determine how many words to expect. I suppose your implementation would be more complex, so as to handle quoting or some such, but it could absolutely follow the same general approach.
Note, too, my correction to the command-matching conditions, using !strcmp() instead of strncmp().

Reading a text file into 2 separate arrays of characters (in C)

For a class I have to write a program to read in a text file in the format of:
T A E D Q Q
Z H P N I U
C K E W D I
V U X O F C
B P I R G K
N R T B R B
EXIT
THE
QUICK
BROWN
FOX
I'm trying to get the characters into an array of chars, each line being its own array.
I'm able to read from the file okay, and this is the code I use to parse the file:
char** getLinesInFile(char *filepath)
{
FILE *file;
const char mode = 'r';
file = fopen(filepath, &mode);
char **textInFile;
/* Reads the number of lines in the file. */
int numLines = 0;
char charRead = fgetc(file);
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
numLines++;
}
charRead = fgetc(file);
}
fseek(file, 0L, SEEK_SET);
textInFile = (char**) malloc(sizeof(char*) * numLines);
/* Sizes the array of text lines. */
int line = 0;
int numChars = 1;
charRead = fgetc(file);
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
textInFile[line] = (char*) malloc(sizeof(char) * numChars);
line++;
numChars = 0;
}
else if(charRead != ' ')
{
numChars++;
}
charRead = fgetc(file);
}
/* Fill the array with the characters */
fseek(file, 0L, SEEK_SET);
charRead = fgetc(file);
line = 0;
int charNumber = 0;
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
line++;
charNumber = 0;
}
else if(charRead != ' ')
{
textInFile[line][charNumber] = charRead;
charNumber++;
}
charRead = fgetc(file);
}
return textInFile;
}
This is a run of my program:
Welcome to Word search!
Enter the file you would like us to parse:testFile.txt
TAEDQQ!ZHPNIU!CKEWDI!VUXOFC!BPIRGK!NRTBRB!EXIT!THE!QUICK!BROWN!FOX
Segmentation fault
What's going on? A), why are the exclamation marks there, and B) why do I get a seg fault at the end? The last thing I do in the main is iterate through the array/pointers.
1) In the first part of your program, you are miscounting the number of lines in the file. The actual number of lines in the file is 11, but your program gets 10. You need to start counting from 1, as there will always be at least one line in the file. So change
int numLines = 0;
to
int numLines = 1;
2) In the second part of the program you are miscounting the number of characters on each line. You need to keep your counter initializations the same. At the start of the segment you initialize numChars to 1. In that case you need to reset your counter to 1 after each iteration, so change:
numChars = 0;
to
numChars = 1;
This should provide enough space for all the non-space characters and for the ending NULL terminator. Keep in mind that in C char* strings are always NULL terminated.
3) Your program also does not account for differences in line termination, but under my test environment that is not a problem -- fgetc returns only one character for the line terminator, even though the file is saved with \r\n terminators.
4) In the second part of your program, you are also not allocating memory for the very last line. This causes your segfault in the third part of your program when you try to access the unallocated space.
Note how your code only saves lines if they end in \r or \n. Guess what, EOF which technically is the line ending for the last line does not qualify. So your second loop does not save the last line into the array.
To fix this, add this after the second part:
textInFile[line] = (char*) malloc(sizeof(char) * numChars);
4) In your program output you are seeing those weird exclamation points because you are not NULL terminating your strings. So you need to add the line marked as NULL termination below:
if(charRead == '\n' || charRead == '\r')
{
textInFile[line][charNumber] = 0; // NULL termination
line++;
charNumber = 0;
}
5) Because you are checking for EOF, you have the same problem in your third loop, so you must add this before the return
textInFile[line][charNumber] = 0; // NULL termination
6) I am also getting some headaches because of the whole program structure. You read the same file character by character 3 times! This is extremely slow and inefficient.
Fixed code follows below:
char** getLinesInFile(char *filepath)
{
FILE *file;
const char mode = 'r';
file = fopen(filepath, &mode);
char **textInFile;
/* Reads the number of lines in the file. */
int numLines = 1;
char charRead = fgetc(file);
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
numLines++;
}
charRead = fgetc(file);
}
fseek(file, 0L, SEEK_SET);
textInFile = (char**) malloc(sizeof(char*) * numLines);
/* Sizes the array of text lines. */
int line = 0;
int numChars = 1;
charRead = fgetc(file);
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
textInFile[line] = (char*) malloc(sizeof(char) * numChars);
line++;
numChars = 1;
}
else if(charRead != ' ')
{
numChars++;
}
charRead = fgetc(file);
}
textInFile[line] = (char*) malloc(sizeof(char) * numChars);
/* Fill the array with the characters */
fseek(file, 0L, SEEK_SET);
charRead = fgetc(file);
line = 0;
int charNumber = 0;
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
textInFile[line][charNumber] = 0; // NULL termination
line++;
charNumber = 0;
}
else if(charRead != ' ')
{
textInFile[line][charNumber] = charRead;
charNumber++;
}
charRead = fgetc(file);
}
textInFile[line][charNumber] = 0; // NULL termination
return textInFile;
}
You aren't null terminating your arrays. This probably explains both problems. Be sure to allocate an extra character for the null terminator.
Do This:
if(charRead == '\n')
{
textInFile[line] = (char*) malloc(sizeof(char) * (numChars+1));
line++;
numChars = 0;
}
Then:
if(charRead == '\n')
{
textInFile[line][charNumber]='\0';
line++;
charNumber = 0;
}
Also you are reading the file 3 times! This thread has some good explanation on how to read a file efficiently.

Resources