Count number of word, line, character in a file - c

Thanks for the attention.
I write a piece of code to count the number of word, line, and character using C language.
while((c = fgetc(fp)) != EOF)
{
if((char)(c) == ' ' || (char)(c) == '\t'){
num_word++;
num_char++;
}
else if((char)(c) == '\n'){
num_line++;
num_word++;
num_char++;
}
else{
num_char++;
}
}
Everything works fine except for the num_word. For example, if the test case has a blank line, it would count one more.
example for test
case
My program would count 5 instead of 4. Any hints for solving this problem?

You are using two branches (space, \t and \n as delimiters) for new words. It fails for continuous whitespace characters.
To solve the problem, you can either skip the continuous whitespace characters (by keeping a counter or flag for it), or set a new non-whitespace character as beginning of a new word.
The C Programming Language by K&R provides very nice examples.

Your word count is 1 more because you have 1 blank line. It will be 2 more if you have 2 blank lines and so on. This is because of the way you are counting the words on line break. The easiest thing you can do is NOT increment the word count if there are successive line breaks indicating blank lines.
char lastChar = ' ';
while((c = fgetc(fp)) != EOF)
{
if((char)(c) == ' ' || (char)(c) == '\t'){
num_word++;
num_char++;
}
else if((char)(c) == '\n'){
num_line++;
if(lastChar != '\n')
num_word++;
num_char++;
}
else{
num_char++;
}
lastChar = (char)c;
}

Related

Problems when trying to skip '\n' in reading txt files

I wrote a fiarly small program to help with txt files formatting, but when I tried to read from the input files and skip unwanted '\n' I actually skipped the next character after '\n' instead.
The characters I work on in the sample file is like this:
abcde
abc
ab
abcd
And my code looks like this:
while (!feof(fp1)) {
ch = fgetc(fp1);
if (ch != '\n') {
printf("%c",ch);
}
else {
ch = fgetc(fp1); // move to the next character
if (ch == '\n') {
printf("%c",ch);
}
}
}
The expected result is
abcdeabc
ababcd
But I actually got
abcdebc
abbcd
I guess the problem is in ch = fgetc(fp1); // move to the next character
, but I just can't find a correct way to implement this idea.
Think of the flow of your code (lines numbered below):
1: while (!feof(fp1)) {
2: ch = fgetc(fp1);
3: if (ch != '\n') {
4: printf("%c",ch);
5: }
6: else {
7: ch = fgetc(fp1); // move to the next character
8: if (ch == '\n') {
9: printf("%c",ch);
10: }
11: }
12: }
When you get a newline followed by non-newline, the flow is (starting at the else line): 6, 7, 8, 10, 11, 12, 1, 2.
It's that execution of the final 2 in that sequence that effectively throws away the non-newline character that you had read at 7.
If your intent is to basically throw away single newlines and convert sequences of newlines (two or more) to a single one(a), you can use something like the following pseudo-code:
set numNewlines to zero
while not end-file:
get thisChar
if numNewlines is one or thisChar is not newline:
output thisChar
if thisChar is newline:
increment numNewlines
else:
set numNewlines to zero
This reads the character in one place, making it less likely that you'll inadvertently skip one due to confused flow.
It also uses the newline history to decide what gets printed. It only outputs a newline on the second occurrence in a sequence of newlines, ignoring the first and any after the second.
That means taht a single newline will never be echoed and any group of two or more will be transformed into one.
Some actual C code that demonstrates this(b) follows:
#include <stdio.h>
#include <stdbool.h>
int main(void) {
// Open file.
FILE *fp = fopen("testprog.in", "r");
if (fp == NULL) {
fprintf(stderr, "Cannot open input file\n");
return 1;
}
// Process character by character.
int numNewlines = 0;
while (true) {
// Get next character, stop if none left.
int ch = fgetc(fp);
if (ch == EOF) break;
// Output only second newline in a sequence of newlines,
// or any non-nwline.
if (numNewlines == 1 || ch != '\n') {
putchar(ch);
}
// Manage sequence information.
if (ch == '\n') {
++numNewlines;
} else {
numNewlines = 0;
}
}
// Finish up cleanly.
fclose(fp);
return 0;
}
(a) It's unclear from your question how you want to handle sequences of three or more newlines so I've had to make an assumption.
(b) Of course, you shouldn't use this if your intent is to learn, because:
You'll learn more if you try yourself and have to fix any issues.
Educational institutions will almost certainly check submitted code against a web search, and you'll probably be pinged for plagiarism.
I'm just providing it for completeness.

Multiple blank lines are not squeezed in one blank line(C) using I/O redirection

I am asked to squeezed two or more consecutive blank lines in the input as one blank line in the output. So I have to use Cygwin to do I/O or test it.
Example: ./Lab < test1.txt > test2.txt
my code is:
int main(void){
format();
printf("\n");
return 0;
}
void format(){
int c;
size_t nlines = 1;
size_t nspace = 0;
int spaceCheck = ' ';
while (( c= getchar()) != EOF ){
/*TABS*/
if(c == '\t'){
c = ' ';
}
/*SPACES*/
if (c ==' '){/*changed from isspace(c) to c==' ' because isspace is true for spaces/tabs/newlines*/
/* while (isspace(c = getchar())); it counts while there is space we will put one space only */
if(nspace > 0){
continue;
}
else{
putchar(c);
nspace++;
nlines = 0;
}
}
/*NEW LINE*/
else if(c == '\n'){
if(nlines >0){
continue;
}
else{
putchar(c);
nlines++;
nspace = 0;
}
}
else{
putchar(c);
nspace = 0;
nlines = 0;
}
}
}
However my test2.txt doesn't have the result I want. Is there something wrong in my logic/code?
You provide too little code, the interesting part would be the loop around the code you posted...
What you actually have to do there is skipping the output:
FILE* file = ...;
char c, prev = 0;
while((c = fgets(file)) != EOF)
{
if(c != '\n' || prev != '\n')
putchar(c);
prev = c;
}
If we have an empty line following another one, then we encounter two subsequent newline characters, so both c and prev are equal to '\n', which is the situation we do not want to output c (the subsequent newline) – and the inverse situation is any one of both being unequal to '\n', as you see above – and only then you want to output your character...
Side note: prev = 0 – well, I need to initalise it to anything different than a newline, could as well have been 's' – unless, of course, you want to skip an initial empty line, too, then you would have to initialise it with '\n'...
Edit, referring to your modified code: Edit2 (removed references to code as it changed again)
As your modified code shows that you do not only want to condense blank lines, but whitespace, too, you first have to consider that you have two classes of white space, on one hand, the newlines, on the other, any others. So you have to differentiate appropriately.
I recommend now using some kind of state machine:
#define OTH 0
#define WS 1
#define NL1 2
#define NL2 3
int state = OTH;
while (( c= getchar()) != EOF )
{
// first, the new lines:
if(c == '\n')
{
if(state != NL2)
{
putchar('\n');
state = state == NL1 ? NL2 : NL1;
}
}
// then, any other whitespace
else if(isspace(c))
{
if(state != WS)
{
putchar(' ');
state = WS;
}
}
// finally, all remaining characters
else
{
putchar(c);
state = OTH;
}
}
First differentiation occurs to the current character's own class (newline, whitespace or other), second differentiation according to the previous character's class, which defines the current state. Output occurs always for any non-whitespace character or if the two subsequent whitespace characters only, if they are of different class (newline is a little specific, I need two states for, as we want to leave one blank line, which means we need two subsequent newline characters...).
Be aware: whitespace only lines do not apply as blank lines in above algorithm, so they won't be eliminated (but reduced to a line containing one single space). From the code you posted, I assume this is intended...
For completeness: This is a variant removing leading and trailing whitespace entirely and counting whitespace-only lines as empty lines:
if(c == '\n')
{
if(state != NL2)
{
putchar('\n');
state = state == NL1 ? NL2 : NL1;
}
}
else if(isspace(c))
{
if(state == OTH)
state = WS;
}
else
{
if(state == WS)
{
putchar('');
}
putchar(c);
state = OTH;
}
Secret: Only enter the whitespace state, if there was a non-ws character before, but print the space character not before you encounter the next non-whitespace.
Coming to the newlines - well, if there was a normal character, we are either in state OTH or WS, but none of the two NL states. If there was only whitespace on the line, the state is not modified, thus we remain in the corresponding NL state (1 or 2) and skip the line correspondingly...
To dissect this:
if(c == '\n') {
nlines++;
is nlines ever reset to zero?
if(nlines > 1){
c = '\n';
And what happens on the third \n in sequence? will nlines > 1 be true? Think about it!
}
}
putchar(c);
I don't get this: You unconditionally output your character anyways, defeating the whole purpose of checking whether it's a newline.
A correct solution would set a flag when c is a newline and not output anything. Then, when c is NOT a newline (else branch), output ONE newline if your flag is set and reset the flag. I leave the code writing to you now :)

C - how to read a txt file how many string split by space?

I have text file which include thousands of string
but each string split by a space " "
How can i count how many strings there are?
You don't need the strtok() as you only need to count the number of space characters.
while (fgets(line, sizeof line, myfile) != NULL) {
for (size_t i = 0; line[i]; i++) {
if (line[i] == ' ') totalStrings++;
}
}
If you want to consider any whitespace character then you can use isspace() function.
You can read character by character as well without using an array:
int ch;
while ((ch=fgetc(myfile)) != EOF) {
if (ch == ' ') totalStrings++;
}
But I don't see why you want to avoid using an array as it would probably be more efficient (reading more chars at a time rather than reading one byte at a time).
fgets() function will read entire line from file (you need to know maximum possible size of that line. Then, you can use strtok() from ` to parse the string and count the words.
Using fgetc(), you can count the spaces.
Take note that in cases wherein there are spaces at the beginning of the string, those will be counted as well and it is okay if spaces are present on the start of the line. Else, it won't give accurate results as the first string won't be counted because it has no space before it.
To solve that, we need to check first the first character and increment the string counter if it is an alphabet character.
int str_count = 0;
int c;
// first char
if( isalpha(c = fgetc(myfile)) )
str_count++;
else
ungetc(c, myfile);
Then, we loop through the rest of the contents.
Checking if an alphabet character follows a space will verify if there is a next string after the space, else a space at the end of the line will be counted as well, giving an inaccurate result.
do
{
c = fgetc(myfile);
if( c == EOF )
break;
if(isspace(c)) {
if( isalpha(c = fgetc(myfile)) ) {
str_count++;
ungetc(c, myfile);
} else if(c == '\n') { // for multiple newlines
str_count++;
}
}
} while(1);
Tested on a Lorem Ipsum generator of 1500 words:
http://pastebin.com/w6EiSHbx

Why is this program yielding wrong output

This program is supposed to remove all comments from a C source code (in this case comments are considered double slashes '//' and a newline character '\n' and anything in between them, and also anything between '/* ' and '*/'.
The program:
#include <stdio.h>
/* This is a multi line comment
testing */
int main() {
int c;
while ((c = getchar()) != EOF)
{
if (c == '/') //Possible comment
{
c = getchar();
if (c == '/') // Single line comment
while (c = getchar()) //While there is a character and is not EOF
if (c == '\n') //If a space character is found, end of comment reached, end loop
break;
else if (c == '*') //Multi line comment
{
while (c = getchar()) //While there is a character and it is not EOF
{
if (c == '*' && getchar() == '/') //If c equals '*' and the next character equals '/', end of comment reached, end loop
break;
}
}
else putchar('/'); putchar(c); //If not comment, print '/' and the character next to it
}
else putchar(c); //if not comment, print character
}
}
After I use this source code as its own input, this is the output I get:
#include <stdio.h>
* This is a multi line comment
testing *
int main() {
int c;
while ((c = getchar()) != EOF)
{
if (c == '') ////////////////
{
c = getchar();
if (c == '') ////////////////////
while (c = getchar()) /////////////////////////////////////////
if (c == '\n') ///////////////////////////////////////////////////////////////
break;
else if (c == '*') ///////////////////
{
while (c = getchar()) ////////////////////////////////////////////
{
No more beyond this point. I'm compiling it using g++ on the ubuntu terminal.
As you can see, multi lines comments had only their '/' characters removed, while single line ones, had all their characters replaced by '/'. Apart from that, any '/' characters that were NOT the beginning of a new comment were also removed, as in the line if (c == ''), which was supposed to be if (c == '/').
Does anybody know why? thanks.
C does not take notice of the way you indent your code. It only cares about its own grammar.
Look carefully at your elses and think about which if they attach to (hint: the closest open one).
There are other bugs, as well. EOF is not 0, so only the first while is correct. And what happens if the comment looks like this: /* something **/?
You have some (apparent) logic errors...
1.
while (c = getchar()) //While there is a character and is not EOF
You're assuming that EOF == 0. Why not be explicit and change the preceding line to:
while((c = getchar()) != EOF)
2.
else putchar('/'); putchar(c);
Are both of the putchars supposed to be part of the else clause? If so, you need braces {} around the two putchar statements. Also, give each putchar its own line; it not only looks nicer but it's more readable.
Conclusion
Other than what I've mentioned, your logic looks sound.
As already mentioned, the if/else matching is incorrect. One aditional missing functionality is that you must make it more stateful to keep track of whether you are inside a string or not, e.g.
printf("This is not // a comment\n");

The C Programming Language K&R exercise 1- 9

I'm new on here and relatively new to programming logic in general. In an effort to develop my skill I've begun reading this fine piece of literature. I really feel that I am grasping the concepts well but this exercise seems to have caught me off guard. I can produce the program but some of the examples I've seen seem to introduce some concepts not yet covered by the book like the examples here. inspace seems to be serving a function that is more than just a variable created by the programmer.
#include <stdio.h>
int main(void)
{
int c;
int inspace;
inspace = 0;
while((c = getchar()) != EOF)
{
if(c == ' ')
{
if(inspace == 0)
{
inspace = 1;
putchar(c);
}
}
/* We haven't met 'else' yet, so we have to be a little clumsy */
if(c != ' ')
{
inspace = 0;
putchar(c);
}
}
return 0;
}
In the next example, pc seems to be doing something in regards to counting spaces but I'm not sure what.
I managed to create a program that completes this task but it was using only the variable c that I created, thus I understand its purpose.
The objective of this code is copy text and if there is more then one spaces ' ' consecutive print only one space.
Variable inspace is used to keep track of whether last time printed char was scape or non-space.
if inspace is zero means a char was printed that was not space. and
if inspace is one means a last time space was printed.
So if inspace is zero next time scape can be printed on reading a scape, and if inspace is one then next consecutive scape found so not to print space.
See C is current char read. (read comments)
if(c == ' ') // currently space read
{
if(inspace == 0) // last time non-space printed, So space can be print
{
inspace = 1; // printing space so switch inspace 1
putchar(c); // print space
}
}
Next if
if(c != ' ') // A char not space read, its to to print unconditionally
{
inspace = 0; // remember that non-scape print
putchar(c);
}
Took me a while but this is the answer I think.
#include <stdio.h>
main()
{
int c, blank;
blank = 0;
while ((c=getchar()) != EOF){
if (c == ' '){
if (blank == 0){
printf("%c", c);
blank = 1;
}
}
if (c != ' '){
if (blank == 1){
blank = 0;
}
printf("%c", c);
}
}
}
inspace is essentially a variable to indicate you are or are not in the "just seen a space" state. You enter this state after seeing a space, and you exit this state when you see a non-space. You print your input only if you're not in the inspace state, thus you do not print multiple adjacent spaces.
I managed to create a program that completes this task but it was using only the variable c that I created, thus I understand its purpose.
In your program, if the input is "hello world", is that its exact output? The program you posted will output "hello world" (compressing the multiple spaces between the words down to one).
I was also having the same problem but finally got a program that works.
#include<stdio.h>
/* copy input to its output, replacing each
string of one or more blanks by a single blank */
int main()
{
int c, nspace=0;
while((c=getchar()) != EOF){
if(c==' ') ++nspace;
else{
if(nspace >= 1){
printf(" ");
putchar(c);
nspace=0;
}
else
putchar(c);
}
}
}

Resources