Why is my implementation of wc off by one word? - c

[Solved] Writing parsing code is a trap. A line with 15 spaces will have 15 words. Blank lines will also count as a word. Back to flex and bison for me.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
FILE *fp = NULL;
int iChars =0, iWords =0, iLines =0;
int ch;
/* if there is a command line arg, then try to open it as the file
otherwise, use stdin */
fp = stdin;
if (argc == 2) {
fp = fopen(argv[1],"r");
if (fp == NULL) {
fprintf(stderr,"Unable to open file %s. Exiting.\n",argv[1]);
exit(1);
}
}
/* read until the end of file, counting chars, words, lines */
while ((ch = fgetc(fp)) != EOF) {
if (ch == '\n') {
iWords++;
iLines++;
}
if (ch == '\t' || ch == ' ') {
iWords++;
}
iChars++;
}
/* all done. If the input file was not stdin, close it*/
if (fp != stdin) {
fclose(fp);
}
printf("chars: %d,\twords: %d,\tlines: %d.\n",iChars,iWords,iLines);
}
TEST DATA foo.sh
#!/home/ojblass/source/bashcrypt/a.out
This is line 1
This is line 2
This is line 3
ojblass#linux-rjxl:~/source/bashcrypt>
wc foo.sh
5 13 85 foo.sh
ojblass#linux-rjxl:~/source/bashcrypt>
a.out foo.sh
chars: 85, words: 14, lines: 5.

Your algorithm is wrong. If you have in the test file 2 blank characters in succession the counter for words will be incremented twice, but it should be incremented only once.
A solution will be to remember last character read. If the character read is a special character (blank, new line, ...) and the previous character is an alphanumeric then you increment the counter for words.

You are counting \n as a word even for a blank line.

Related

Loop through text file after filtering out comments in C

I am trying to filter out comments from a text file denoted by '#'. I am having trouble looping through the entire file and printing the output to the terminal. The code removes the first line of text and the second lines comments as it should but does not continue past line 2 (prints 4, 2), any help would be appreciated. I'm definitely missing something as I have had to learn two semesters of C in a weekend and don't totally have a grasp on all of its usage.
The file being read
# this line is a full comment that might be pseudo-code or whatever
4, 2 # 4, 3
1
# 9
7
endNode
endNet
The program
#include <stdio.h>
#include <string.h>
#define BUFF_SIZE 1024
#define COMMENT_MARKER '#'
int main()
{
FILE *fp;
char buffer[BUFF_SIZE];
if ((fp = fopen("F:\\PythonProjects\\C\\text.txt", "r")) == NULL)
{
perror("Error opening file");
exit(1);
}
while (fgets(buffer, BUFF_SIZE, fp) != NULL)
{
char *comment = strchr(buffer, COMMENT_MARKER);
if (comment != NULL)
{
size_t len = strlen(comment);
memset(comment, '\0', len);
printf("%s", buffer);
}
}
fclose(fp);
}
Your current code only prints out a line if a # is found in it. It skips printing lines without any comment. And because you set everything from the first # to the end of the string to nul bytes, it won't print a newline after each line, meaning the results all run together.
You can fix these issues by moving the output after the comment removal block, and always printing out a newline. This means that in lines without comments, you have to do something about the newline at the end (if any; it could be missing because of a long line or the input file lacking one after the last line) so you don't get two newlines after each non-comment line.
Luckily, there are ways in standard C to find the first occurrence of one of a set of character, not just a single character. You can look for either the comment character or newline in a single pass through the line, and replace it with a single nul byte - no need to memset() everything after it to 0's. Example:
#include <stdio.h>
#include <string.h>
#include <stdlib.h> // Needed for exit()
#define BUFF_SIZE 1024
#define COMMENT_MARKER '#'
int main()
{
FILE *fp;
char buffer[BUFF_SIZE];
if ((fp = fopen("text.txt", "r")) == NULL)
{
perror("Error opening file");
exit(1);
}
char tokens[3] = { COMMENT_MARKER, '\n', '\0' };
while (fgets(buffer, BUFF_SIZE, fp) != NULL)
{
// Look for the first # or newline in the string
char *comment_or_nl = strpbrk(buffer, tokens);
if (comment_or_nl)
{
// and if found, replace it with a nul byte
*comment_or_nl = '\0';
}
// Then print out the possibly-truncated string (puts() adds a newline)
puts(buffer);
}
fclose(fp);
return 0;
}
Here is your code, minimally adapted to achieve the objective.
#include <stdio.h>
#include <string.h>
#define BUFF_SIZE 1024
#define COMMENT_MARKER '#'
int main()
{
FILE *fp;
if ((fp = fopen("F:\\PythonProjects\\C\\text.txt", "r")) == NULL)
{
perror("Error opening file");
exit(1);
}
char buffer[BUFF_SIZE]; // declare variables proximate to use
while (fgets(buffer, BUFF_SIZE, fp) != NULL)
{
char *comment = strchr(buffer, COMMENT_MARKER);
if (comment != NULL)
{
strcpy( comment, "\n" ); // Just clobber the comment section
}
printf("%s", buffer); // Always print something
}
fclose(fp);
}

Printing the first 10 line of a file in C

I'm new to programming in C. And I'm trying to print the first 10 lines of a text file. When I run my program with a text file containing 11 lines of text, only the first line is displayed. I'm not sure why it does that, but I suspect there is something wrong in my while loop. Can someone please help me?
#include <stdio.h>
int main(int argc, char *argv[]){
FILE *myfile;
char content;
int max = 0;
// Open file
myfile = fopen(argv[1], "r");
if (myfile == NULL){
printf("Cannot open file \n");
exit(0);
}
// Read the first 10 lines from file
content = fgetc(myfile);
while (content != EOF){
max++;
if (max > 10)
break;
printf ("%c", content);
content = fgetc(myfile);
}
fclose(myfile);
return 0;
}
You have been already advised to use fgets. However, if your file has lines of unknown length, you may still want to use fgetc. Just make sure you count only newlines, not all characters:
int max = 0;
int content;
while ((content = fgetc(myfile)) != EOF && max < 10){
if (content == '\n') max++;
putchar(content);
}
fgetc() returns the next character in the file, not the next line. You probably want to use fgets() instead, which reads up to the next newline character into a buffer. Your code should probably end up with something like:
// allocate 1K for a buffer to read
char *buff = malloc(1024);
// iterate through file until we are out of data or we read 10 lines
while(fgets(buff, 1024, myfile) != NULL && max++ < 10) {
printf("%s\n", buff);
}
free(buff);
// close your file, finish up...
Read more about fgets() here: https://www.tutorialspoint.com/c_standard_library/c_function_fgets.htm
fgetc function reads the next character not the next ine. for reading the number of lines you should use fgets function. this function reads the full string till the end of the one line and stores it in a string.
your code Shuld be as:-
#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *myfile;
char content[200];
int max = 0;
// Open file
myfile = fopen(argv[1], "r");
if (myfile == NULL)
{
printf("Cannot open file \n");
exit(0);
}
// Read the first 10 lines from file
fgets(content, 200, myfile);
while (content != EOF)
{
max++;
if (max > 10)
break;
printf("%s", content);
fgets(content, 200, myfile);
}
fclose(myfile);
return 0;
}

How to delete blank lines from a txt file with c, in linux - whitout Bash

I tried creating a .c program that when it is run it takes a file and it prints only the lines on which there is something (a space, a letter, a number....etc) not the blank lines.
I need to run this on a virtual machine using ubuntu(it's running the newest version of ubuntu). So far I have only managed to print it's contents but not on lines like they are in the file.
The code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char *name = argv[1];
FILE *f = fopen(name, "r");
char x;
while(fscanf(f, "%c" , &x) > 0)
{
printf("%c", x);
if(x == '\n')
{
printf("\n");
}
}
}
file contents:
as
d
3
results:
asd3
desired result:
as
d
3
First, you have no error checking. That makes your program difficult to use.
Second, you output every character unconditionally and then output newlines an extra time. What you want to do is output every character once, unless it's a newline right after a newline (as that would create an empty line) in which case you don't want to output it.
Here's the code fixed up:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
if (argc < 2)
{
fprintf (stderr, "An argument is required\n");
return -1;
}
char *name = argv[1];
FILE *f = fopen(name, "r");
if (f == NULL)
{
fprintf (stderr, "Unable to open file for reading\n");
return -1;
}
char x, px = '\n';
while(fscanf(f, "%c" , &x) > 0)
{
// don't output a newline after a newline
if ((x != '\n') || (px != '\n'))
printf("%c", x);
// keep track of what character was before the next one
px = x;
}
}
It really would be much easier to just read each line in and then output the line if it's non-empty.
You can use fgets() function which gets the entire line including the new line character (\n), After you read the line, you can skip printing the line if the first character (line[0]) is newline character.
Here is the code segment that does it, You need to error checking for argc and file existence as done by #David Schwartz
char line[200];
while (fgets(line, 100, fp))
{
if (line[0] != '\n')
printf(line);
}
This should work.

Add same text to end of each line in file from C

I am trying to add a -1 to the end of each line of a file. For instance, file.txt is
1 4 5
2 5 9
3 5 6
but would become
1 4 5 -1
2 5 9 -1
3 5 6 -1
I am figuring out how to add text in general to a file from C, but I cannot figure out how to add the same text to each line in the file, and assure that the new line character is placed after the new last character in the lines (in this case -1).
Here is what I have tried:
FILE *f = fopen("file.txt", "w");
if (f == NULL)
{
printf("Error opening file!\n");
exit(1);
}
/* print some text */
const char *text = " -1";
fprintf(f, "%s\n", text);
Any advice greatly appreciated!
I can add -1 to each line using a text editor, by replacing "\r\n" with " -1\r\n" or similar depending on the file's eol format.
Or programmatically, create a new file like this:
#include <stdio.h>
#include <string.h>
int main()
{
FILE *fr, *fw;
char buffer[10000];
fr = fopen("file.txt","rt");
if (fr == NULL) {
printf("Error opening input file\n");
return 1;
}
fw = fopen("file1.txt","wt");
if (fw==NULL) {
printf("Error opening output file\n");
fclose (fr);
return 1;
}
while (fgets(buffer, 10000, fr) != NULL) {
buffer [ strcspn(buffer, "\r\n") ] = 0; // remove trailing newline etc
fprintf(fw, "%s -1\n", buffer);
}
fclose(fw);
fclose(fr);
return 0;
}
Output file:
1 4 5 -1
2 5 9 -1
3 5 6 -1
Simply read each char, one at a time and print the suffix when the end-of-line detected. Then print the character read.
void Append(FILE *inf, FILE *outf, const char *suffix) {
int ch;
for (;;) {
int ch = fgetc(inf);
if (ch == '\n' || ch == EOF) {
fputs(suffix, outf);
if (ch == EOF) {
break;
}
}
fputc(ch, outf);
}
}
// Error checking omitted
char tmp[L_tmpnam];
tmpnam(tmp);
FILE *inf = fopen("file.txt", "r");
FILE *outf = fopen(tmp, "w");
Append(inf, outf, " -1");
fclose(inf);
fclose(outf);
remove("file.txt");
rename(tmp, "file.txt");
If you agree to use two seperate files for input and output, your job will be very easy. The algorithm to achieve what you want can be designed like below
Open the input file, open the output file. [fopen()]
define a string with the constant input value that you want to add after each line. [char * constvalue = "-1";]
Read a line from the input file. [fgets()##]
use fprintf() to write the data read from the input file and the constant value, together. Some pseudocode may look like
fprintf(outfile, "%s %s", readdata, constvalue);
loop untill there is value in the input file [while (fgets(infile....) != NULL)]
close both the files. [fclose()]
## -> fgets() reads and stores the trailing newline \n to the supplied buffer. You may want to remove that.

Counting number of words with multiple whitespaces

I am trying to write a program that prints the number of words found in a text file. Words are defined as sequences of characters separated by any number of white space.
However, I am having a problem when there are multiple whitespaces because then it doesn't report the right number of words.
Here is my code so far:
#include <stdio.h>
int main()
{
FILE *fp;
char str;
int i=0;
/* opening file for reading */
fp = fopen("myfile.txt" , "r");
if(fp == NULL) {
perror("Error opening file");
return(-1);
}
while(( str = fgetc(fp)) != EOF ) {
if (str == ' ')
++i;
}
printf("%d\n", i);
fclose(fp);
return(0);
}
myfile.txt is:
Let's do this! You can do it. Believe in yourself.
I'm not sure if I use fgets, fscanf, or fgetc.
Let's say I define whitespace as it is defined in the fscanf function when reading a string
It prints 14 which is not right. I'm not sure how to account for multiple whitespaces. In this case, whitespaces are any number of spaces between words.
Counting a whitespace only if it is not preceded by any other white space will do the trick.
#include <stdio.h>
int main()
{
FILE *fp;
char str;
char prevchar; //tracks the previous character
int i=0;
/* opening file for reading */
fp = fopen("myfile.txt" , "r");
if(fp == NULL) {
perror("Error opening file");
return(-1);
}
prevchar='x'; //initialize prevchar to anything except a space
while(( str = fgetc(fp)) != EOF ) {
if (str == ' ' && prevchar!=' ') // update the count only if previous character encountered was not a space
++i;
prevchar=str;
}
printf("%d\n", i+1);
fclose(fp);
return(0);
}
Edit: The code assumes that words are separated by one or more spaces and does not cover all the corner cases like when sentences spread over multiple lines or when words are separated by comma and not spaces. But these cases can be covered by adding more conditions.
just use a little state diagram, two cases are, either you are inside a word, or you are outside a word
#include <stdio.h>
int main()
{
FILE *fp;
char str;
int i=0,inside_word =0;
/* opening file for reading */
fp = fopen("myfile.txt" , "r");
if(fp == NULL) {
perror("Error opening file");
return(-1);
}
inside_word =0;
while(( str = fgetc(fp)) != EOF ) {
if (str == ' ' || str == '\n' || str == '\t')
inside_word = 0;
else if(inside_word == 0){
i++;
inside_word=1;
}
}
printf("%d\n", i);
fclose(fp);
return(0);
}
First thing comes into my mind is, add another while loop right after ++i to exhaust space characters.
And by the way, be careful with your terminology, you are not dealing with whitespaces you are just taking care of space characters. \t and \n are also whitespaces!
How about using regular expression such as '!\s+!' to replace with a single space ' ', then continue with your code

Resources