Counting the number of characters, words, lines in a text file - c

I have to check the number of words, characters, and lines in a file. I am getting the number of lines and words one more than it should actually be. What is wrong with my program?
#include <stdio.h>
int main()
{
FILE *number2;
number2 =fopen("Part2.txt", "r");
//open the file to read
if (number2 == NULL)
{
return 4;
}
char cha;
int character = 0;
int word = 0;
int lne = 0;
while ((cha = fgetc(number2)) != EOF)
// till end of file
{
//character++;
// if (cha == '\n' || cha == '\0')
//lne++;
while ((cha = fgetc(number2))!= EOF)
{
character++;
//new characters
if (cha == '\n' || cha == '\0')
lne++;
// new lines
if (cha == ' ' || cha == '\t' || cha == '\n' || cha == '\0')
word++;
//new words--- though the professor talks about only whitespace this is proper way
}
if (character > 0)
{
word++;
lne++;
}
// increasing the worlds and lines for last word and printing
printf("\n");
printf("Total number of characters = %d\n", character);
printf("Total number of words = %d\n", word-1);
printf("Total number of lines = %d\n", lne-1);
// decreasing by 1 in word and line because there is one extra line and this runs the code perfectly
fclose (number2);
//closing the file though professor didn't ask for smooth functioning of programms
return 0;
}
}

Related

why does only the 1st file reading function executes over multiple programs of the same kind in C language?

This code contains 3 file handling related functions which read from a file named "mno". But only the 1st called function in the main() is working. If the 1st function of the list is commented then, only the 2nd function will work and the third won't. Same goes for the 3rd one
#include <stdio.h>
#include <ctype.h>
#include <unistd.h>
void countVowel(char fin[])
{
FILE *fl;
char ch;
int count = 0;
fl = fopen(fin, "r");
while (ch != EOF)
{
ch = tolower(fgetc(fl));
count += (ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u') ? 1 : 0;
}
fclose(fl);
printf("Number of Vowels in the file \" %s \"-> \t %d \n", fin, count);
}
void countConsonant(char fin[])
{
FILE *fl;
char ch;
int count = 0;
fl = fopen(fin, "r");
while (ch != EOF)
{
ch = tolower(fgetc(fl));
count += (!(ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u') && (ch >= 'a' && ch <= 'z')) ? 1 : 0;
}
fclose(fl);
printf("Number of Consonant in the file \" %s \"-> \t %d \n", fin, count);
}
void countAlphabet(char fin[])
{
FILE *fl;
char ch;
int count = 0;
fl = fopen(fin, "r");
while (ch != EOF)
{
ch = tolower(fgetc(fl));
count += (ch >= 'a' && ch <= 'z') ? 1 : 0;
}
fclose(fl);
printf("Number of Alphabets in the file \" %s \"-> \t %d \n", fin, count);
}
int main()
{
countVowel("mno"); // output -> 10
countConsonant("mno"); // output -> 0
countAlphabet("mno"); // output -> 0
return 0;
}
Here are the contents of "mno" file ->
qwertyuiopasdfghjklzxcvbnm, QWERTYUIOPASDFGHJKLZXCVBNM, 1234567890
As others have mentioned, your handling of EOF was incorrect:
ch was uninitialized on the first loop iteration
Doing tolower(fgetc(fl)) would obliterate the EOF value.
Using char ch; instead of int ch; would allow a [legitimate] 0xFF to be seen as an EOF.
But, it seems wasteful to have three separate functions to create the three different counts because the most time is spent in the I/O versus the determination of what type of character we're looking at. This is particularly true when the counts are so interelated.
We can keep track of multiple types of counts easily using a struct.
Here's a refactored version that calculates all three counts in a single pass through the file:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <ctype.h>
struct counts {
int vowels;
int consonants;
int alpha;
};
void
countAll(const char *fin,struct counts *count)
{
FILE *fl;
int ch;
int vowel;
count->vowels = 0;
count->consonants = 0;
count->alpha = 0;
fl = fopen(fin, "r");
if (fl == NULL) {
perror(fin);
exit(1);
}
while (1) {
ch = fgetc(fl);
// stop on EOF
if (ch == EOF)
break;
// we only care about alphabetic chars
if (! isalpha(ch))
continue;
// got one more ...
count->alpha += 1;
ch = tolower(ch);
// is current character a vowel?
vowel = (ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u');
// since we know it's alphabetic, it _must_ be either a vowel or a
// consonant
if (vowel)
count->vowels += 1;
else
count->consonants += 1;
}
fclose(fl);
printf("In the file: \"%s\"\n",fin);
printf(" Number of Vowels: %d\n",count->vowels);
printf(" Number of Consonants: %d\n",count->consonants);
printf(" Number of Alphabetics: %d\n",count->alpha);
}
int
main(void)
{
struct counts count;
countAll("mno",&count);
return 0;
}
For your given input file, the program output is:
In the file: "mno"
Number of Vowels: 10
Number of Consonants: 42
Number of Alphabetics: 52
You are using ch uninitialized. at while (ch != EOF). Every function call after the first has ch equal to 0 at the start, because you forgot to initialize it and the memory was set to -1 before. You can fix it by replacing the loops like this:
int ch;
...
while ((ch = fgetc(fl)) != EOF)
{
ch = tolower(ch);
count += ...;
}
Here ch is getting initialized before you check it and later converted to lowercase.
EDIT:
Note that this only works if ch is an int, so it can handle the value of -1 (EOF) and the byte 255 is not truncated to -1.
EDIT:
At first I said ch was 0 all the time. It was -1. I am so sorry, I swapped it with the null terminator, which is usually the reason for such behavior.

palindrome c program is not working for some reason

this program checks weather the entered string is palindrome or not . it should be in a way like it should even tell the string is palindrome if there is space or any special character
like messi is a palindrome of iss em
and ronald!o is a palindrome of odlanor
this is the program and for some odd reason it is strucking and not working
#include <stdio.h>
#include <string.h>
int main() {
char palstr[100], ans[100];
printf("enter the string for checking weather the string is a palindrome or not");
scanf("%[^/n]", &palstr);
int ispalin = 1, i = 0, n = 0;
int num = strlen(palstr);
printf("the total length of the string is %d", num);
while (i <= num) {
if (palstr[i] == ' ' || palstr[i] == ',' || palstr[i] == '.' ||
palstr[i] == '!' || palstr[i] == '?') {
i++;
}
palstr[n++] == palstr[i++];
}
int j = num;
i = 0;
while (i <= num) {
ans[j--] = palstr[i];
}
printf("the reverse of the string %s is %s", palstr, ans);
if (ans == palstr)
printf("the string is a palindrome");
else
printf("the string is not a palindrome");
return 0;
}
A few points to consider. First, regarding the code:
if (ans == palstr)
This is not how you compare strings in C, it compares the addresses of the strings, which are always different in this case.
The correct way to compare strings is:
if (strcmp(ans, palstr) == 0)
Second, you should work out the length of the string after you have removed all unwanted characters since that's the length you'll be working with. By that I mean something like:
char *src = palstr, dst = palstr;
while (*src != '\0') {
if (*c != ' ' && *src != ',' && *src != '.' && *src != '!' && *src != '?') {
*dst++ = *src;
}
src++;
}
Third, you have a bug in your while loop anyway in that, if you get two consecutive bad characters, you will only remove the first (since your if does that then blindly copies the next character regardless).
Fourth, you may want to consider just stripping out all non-alpha characters rather than that small selection:
#include <ctype.h>
if (! isalpha(*src) {
*dst++ = *src;
}
Fifth and finally, you don't really need to create a new string to check for a palindrome (though you may still need to if you want to print the string in reverse), you can just start at both ends and move inward, something like:
char *left = &palstr, right = palstr + strlen(palstr) - 1, ispalin = 1;
while (left < right) {
if (*left++ != *right--) {
ispalin = 0;
break;
}
}
There may be other things I've missed but that should be enough to start on.
well, the are so many bugs in this code. I will point them out with comments.
#include <stdio.h>
#include <string.h>
int main() {
char palstr[100], ans[100];
printf("enter the string for checking weather the string is a palindrome or not\n");
scanf("%s", palstr); // your former code won't stop input util Ctrl+D
int ispalin = 1, i = 0, n = 0;
int num = strlen(palstr);
printf("the total length of the string is %d\n", num);
while (i < num) { // < insted of <=
if (palstr[i] == ' ' || palstr[i] == ',' || palstr[i] == '.' ||
palstr[i] == '!' || palstr[i] == '?') {
i++;
continue;// without this, marks still in the string
}
palstr[n++] = palstr[i++]; //should be =
}
palstr[n] = '\0'; //
num = n; // the length might be changed
i = 0;
int j = num-1; // reverse
while (i < num) { //
ans[i++] = palstr[j--]; //
}
ans[i] = '\0'; //
printf("the reverse of the string %s is %s\n", palstr, ans);
//if (ans == palstr) they can never be equal
if (strcmp(ans, palstr)==0)
printf("the string is a palindrome\n");
else
printf("the string is not a palindrome\n");
return 0;
}

Why is my wc implementation giving wrong word count?

Here is a small code snippet.
while((c = fgetc(fp)) != -1)
{
cCount++; // character count
if(c == '\n') lCount++; // line count
else
{
if(c == ' ' && prevC != ' ') wCount++; // word count
}
prevC = c; // previous character equals current character. Think of it as memory.
}
Now when I run wc with the file containing this above snippet code(as is), I am getting 48 words, but when I use my program on same input data, I am getting 59 words.
How to calculate word count exactly like wc does?
You are treating anything that isn't a space as a valid word. This means that a newline followed by a space is a word, and since your input (which is your code snippet) is indented you get a bunch of extra words.
You should use isspace to check for whitespace instead of comparing the character to ' ':
while((c = fgetc(fp)) != EOF)
{
cCount++;
if (c == '\n')
lCount++;
if (isspace(c) && !isspace(prevC))
wCount++;
prevC = c;
}
There is an example of the function you want in the book: "Brian W Kernighan And Dennis M Ritchie: The Ansi C Programming Language". As the author says: This is a bare-bones version of the UNIX program wc. Altered to count only words is like this:
#include <stdio.h>
#define IN 1 /* inside a word */
#define OUT 0 /* outside a word */
/* nw counts words in input */
main()
{
int c, nw, state;
state = OUT;
nw = 0;
while ((c = getchar()) != EOF) {
if (c == ' ' || c == '\n' || c == '\t')
state = OUT;
else if (state == OUT) {
state = IN;
++nw;
}
}
printf("%d\n", nw);
}
Instead of checking for spaces only you should check for escape sequences like \t \n space and so on.
This will give the correct results.
You can use isspace() from <ctype.h>
Change the line
if(c == ' ' && prevC != ' ') wCount++;
to
if(isspace(c) && !(isspace(prevC)) wCount++;
This would give the correct results.
Don't forget to include <ctype.h>
You can do:
int count()
{
unsigned int cCount = 0, wCount = 0, lCount = 0;
int incr_word_count = 0;
char c;
FILE *fp = fopen ("text", "r");
if (fp == NULL)
{
printf ("Failed to open file\n");
return -1;
}
while((c = fgetc(fp)) != EOF)
{
cCount++; // character count
if(c == '\n') lCount++; // line count
if (c == ' ' || c == '\n' || c == '\t')
incr_word_count = 0;
else if (incr_word_count == 0) {
incr_word_count = 1;
wCount++; // word count
}
}
fclose (fp);
printf ("line : %u\n", lCount);
printf ("word : %u\n", wCount);
printf ("char : %u\n", cCount);
return 0;
}

How to expect different data types in scanf()?

I'm developing a chess game in C just for practicing. At the beginning of the game, the user can type 4 things:
ROW<whitespace>COL (i.e. 2 2)
'h' for help
'q' to quit
How can I use a scanf to expect 2 integers or 1 char?
Seems like it would be most sensible to read a whole line, and then decide what it contains. This will not include using scanf, since it would consume the contents stdin stream.
Try something like this :
char input[128] = {0};
unsigned int row, col;
if(fgets(input, sizeof(input), stdin))
{
if(input[0] == 'h' && input[1] == '\n' && input[2] == '\0')
{
// help
}
else if(input[0] == 'q' && input[1] == '\n' && input[2] == '\0')
{
// quit
}
else if((sscanf(input, "%u %u\n", &row, &col) == 2))
{
// row and column
}
else
{
// error
}
}
It's better to avoid using scanf at all. It usually causes more trouble than what it solves.
One possible solution is to use fgets to get the whole line and then use strcmp to see if the user typed 'h' or 'q'. If not, use sscanf to get row and column.
This one is just using scanf
#include <stdio.h>
int main()
{
char c;
int row, col;
scanf("%c", &c);
if (c == 'h')
return 0;
if (c == 'q')
return 0;
if (isdigit(c)) {
row = c - '0';
scanf("%d", &col);
printf("row %d col %d", row, col);
}
return 0;
}
int row, col;
char cmd;
char *s = NULL;
int slen = 0;
if (getline(&s, &slen, stdin) != -1) {
if (sscanf(s, "%d %d", &row, &col) == 2) {
free(s);
// use row and col
}
else if (sscanf(s, "%c", &cmd) == 1) {
free(s);
// use cmd
}
else {
// error
}
}
P.S.: those who did not read and understand my answer carefully, please respect yourself, DO NOT VOTE-DOWN AT WILL!
Beside "get the whole line and then use sscanf", read char by char until '\n' was entered is also a better way. If the program encountered 'h' or 'q', it could do the relevant action immediately, meanwhile you cloud also provide a realtime analysis for the input stream.
example:
#define ROW_IDX 0
#define COL_IDX 1
int c;
int buffer[2] = {0,0};
int buff_pos;
while( (c = getchar())) {
if (c == '\n') {
//a line was finished
/*
row = buffer[ROW_IDX];
col = buffer[COL_IDX];
*/
buff_pos = 0;
memset(buffer , 0 , sizeof(buffer));//clear the buffer after do sth...
} else if (c == 'h') {
//help
} else if (c == 'q') {
//quit
} else {
//assume the input is valid number, u'd better verify whether input is between '0' and '9'
if (c == ' ') {
//meet whitespace, switch the buffer from 'row' to 'col'
++buff_pos;
} else {
buffer[buff_pos%2] *= 10;
buffer[buff_pos%2] += c - '0';
}
}
}

Two spaces between sentences

I have a code which finds if there is more than one space between the words, in that case change them to one.
And I need to add some additional function which should make two spaces between sentences.
(A sentence's last symbol is . )
For example.
if i have file with text:
This is my first program. Hello world
program should print me:
This is my first program. Hello world
Code:
# include <stdio.h>
# include <stdlib.h>
int main()
{
FILE *in;
char myStr[100],newStr[100];
int ch;
int j,i,k,z=0;
in=fopen("duom.txt","r");
if(in){
while(EOF != ch){
ch=fgetc(in);
myStr[z] = ch;
z++;
k=0;
for(i=0; myStr[i] != '\0'; i++) {
if(myStr[i-1] != '.' && myStr[i] == ' ' && myStr[i+1] == ' ' )
continue;
newStr[k]= myStr[i];
k++;
}
}
}
for(j=0;j<k;j++){
printf("%c",newStr[j]);
}
printf("\n");
fclose(in);
system("pause");
return 0;
}
I don't ask you to write my whole code, just give me some ideas.
Sorry for my bad english :/
This loop follows your general approach of processing the file in blocks:
Your Approach Revised:
# include <stdio.h>
# include <stdlib.h>
int main() {
FILE *in;
char myStr[100],newStr[100];
int ch;
int j,i,k,z=0;
in=fopen("duom.txt","r");
if(!(in)) { fprintf(stderr,"Error opening file!\n"); }
else { //the file was opened
int go = 1; //master loop control
while(go) { //master loop
z = 0; //set sub loop
ch = '\0';//control variables
while(z < 100 && EOF != ch){ //process file in 99 character blocks
ch=fgetc(in); //getting one character at a time
if(EOF == ch) { go = 0; } //break master loop
else { myStr[z++] = ch; } //or process char
}
myStr[z] = '\0'; //null terminate the string
for(i=0; myStr[i] != '\0'; i++) {
//i=99='\0' <-- assumed is highest string size
//if i=0; Do you really want that leading space?
if(i== 0 && myStr[i] == ' ' ) { continue; }
//if i=98 it is the last char in the string i=99 should be '\0'
//So do you really want that trailing space?
if(i==98 && myStr[i] == ' ' ) { continue; }
//Same rational as above.
//So do you really want those trailing 2 spaces?
if(i==97 && myStr[i] == ' ' && myStr[i+1] == ' ') { continue; }
//if i=0; myStr[i-1] will likely cause a segmentation fault
if(i > 0 && myStr[i] == ' ' && myStr[i+1] == ' ' && myStr[i-1] != '.') { continue; }
newStr[k] = myStr[i];
k++;
}
for(j=0;j<k;j++){ printf("%c",newStr[j]); } //print the 99 char block
}
printf("\n"); //print a newline for good measure
fclose(in); //close file
}
return 0;
}
Note that the code will misbehave for files with size greater then 99 chars because spacing format comparisons are not be made from the end of one 99 char block to the beginning of another. You could implement this by not deleting the leading/trailing spaces comparing the values at i=1 & i=2 with the last two chars at i=97 & i=98 in the previous block.
This is a different, better loop. It solves the block barrier issues of the other approach and uses much less memory
Better approach:
# include <stdio.h>
# include <stdlib.h>
int main() {
FILE *in;
in=fopen("duom.txt","r");
if(!(in)) { fprintf(stderr,"Error opening file!\n"); return -1; }
//the file was opened
int x; //stores current char
int y; //stores previous char
for(y='\0'; (x=fgetc(in)) != EOF; y=x) { //read in 'x' until end of file
// The following conditions cover all cases:
// is 'x' not a space? Then print 'x'
// is 'x' a space but 'y' a period? Then print two spaces
// is 'x' a space and 'y' not a period but also not a space? Then print a space
// Otherwise 'x' is part of extra spacing, do nothing
if(x != ' ') { printf("%c",x); }
else if(x == ' ' && y == '.') { printf(" "); }
else if(x == ' ' && y != '.' && y != ' ') { printf(" "); }
else { ; } //do nothing
}
printf("\n"); //print a newline for good measure
fclose(in); //close file
return 0;
}
I suggest using strtok() and concatenate the tokens together separated by the correct number of spaces. If a token ends with a period, use two spaces. Otherwise, only use one. This way you don't even need to check how many spaces are in between words.

Resources