Counting lines, numbers, and characters in C - c

I'm new to C and I got an assignment today that requires that I read text in from a file, count the number of lines, characters, and words, and return it in a specific format.
Just to be clear - I need to read in this text file:
"I must not fear.
Fear is the mind-killer.
Fear is the little-death that brings total obliteration.
I will face my fear.
I will permit it to pass over me and through me.
And when it has gone past I will turn the inner eye to see its path.
Where the fear has gone there will be nothing... only I will remain"
Litany Against Fear, Dune by Frank Herbert
and have it output like so:
1)"I must not fear.[4,17]
2)Fear is the mind-killer.[4,24]
3)Fear is the little-death that brings total obliteration.[8,56]
4)I will face my fear.[5,20]
5)I will permit it to pass over me and through me.[11,48]
6)And when it has gone past I will turn the inner eye to see its path.[16,68]
7)Where the fear has gone there will be nothing... only I will remain"[13,68]
8) Litany Against Fear, Dune by Frank Herbert[7,48]
Now, I've written something that will accept the file, it counts the number of lines properly, but I have 2 major issues - 1. How do I get the text from the file to appear in the output? I can't get that at all. My word count doesn't work at all, and my character count is off too. Can you please help?
#include <stdio.h>
#define IN 1
#define OUT 0
void main()
{
int numChars = 0;
int numWords = 0;
int numLines = 0;
int state = 0;
int test = 0;
FILE *doesthiswork;
doesthiswork = fopen("testWords.in", "r");
state = OUT;
while ((test = fgetc(doesthiswork)) != EOF)
{
++numChars;
if ( test == '\n')
{
++numLines;
if (test == ' ' || test == '\t' || test == '\n')
{
state = OUT;
}
else if (state == OUT)
{
state = IN;
++numWords;
}
}
printf("%d) I NEED TEXT HERE. [%d %d]\n",numLines, numWords, numChars);
}
}

It will be better if you use getline() function to read each line from the file.
And after reading the line process it using strtok() function. With this you will get the number of words in the line and save it in a variable.
Then process each variable and get the number of characters.
Output the line number, number of words and the number of characters.
Then read another line and so on.

How do I get the text from the file to appear in the output?
It should be stored there by preparing a buffer.
My word count doesn't work at all, and my character count is off too.
Order in which the test is wrong.
fix like this:
#include <stdio.h>
#define IN 1
#define OUT 0
int main(){
int numChars = 0;
int numWords = 0;
int numLines = 0;
int state = OUT;
int test;
char buffer[1024];
int buff_pos = 0;
FILE *doesthiswork;
doesthiswork = fopen("data.txt", "r");
state = OUT;
while((test = fgetc(doesthiswork)) != EOF) {
++numChars;
buffer[buff_pos++] = test;
if(test == ' ' || test == '\t' || test == '\n'){
state = OUT;
if(test == '\n') {
++numLines;
--numChars;//no count newline
buffer[--buff_pos] = '\0';//rewrite newline
printf("%d)%s[%d,%d]\n", numLines, buffer, numWords, numChars);
buff_pos = 0;
numWords = numChars = 0;
}
} else {
if(state == OUT){
state = IN;
++numWords;
}
}
}
fclose(doesthiswork);
if(buff_pos != 0){//Input remains in the buffer.
++numLines;
buffer[buff_pos] = '\0';
printf("%d)%s[%d,%d]\n", numLines, buffer, numWords, numChars);
}
return 0;
}

Related

C Programming , getc() , stdin , file redirection

My assignment is to redirect a text file and do all sorts of operations on it , everything is working except I have a little problem :
so the main function that reads input is getline1():
char* getline1(){
char *LinePtr = (char*)malloc(sizeof(char*)*LINE);
int i = 0;
for ( ; (*(LinePtr+i) = getc(stdin)) != '\n' ; i++){}
*(LinePtr+i) = '\0';
return LinePtr;
}
it returns a pointer to char array of a single line,
so we know that a new line saparates with '\n' char,
previous problem I had is when I wrote the getline1() function like this :
for (int i = 0 ; Line[i] != '\n' ; i++){
Line[i] = getc(stdin);
}
as it logically it may be authentic the getc() is a streaming function and I saw online answers that this will not work didn't quite understand why.
anyway the big issue is that I need to know how many lines there are in the text so I can stop reading values , or to know from getline1() function that there is no next line left and Im done.
things we need to take for account :
1.only <stdio.h> <stdlib.h> need to be used
2.Im using Linux Ubuntu and gcc compiler
3.the ridirection goes like ./Run<input.txt
also I understand that stdin is a file pointer , didn't found a way that this can help me.
Thank you ,
Denis
You should check for the EOF signal in addition to the newline character, you should also check for that your index-1 is always smaller than LINE to avoid overflow and also save space for the NULL terminator.
#define LINE 100
char *my_getline(void)
{
size_t i = 0;
char *str = NULL;
int c = 0;
if ((str = malloc(LINE)) == NULL)
{
fprintf(stderr,"Malloc failed");
exit(EXIT_FAILURE);
}
while (i+1 < LINE && (c = getchar()) != EOF && c != '\n') /* Saving space for \0 */
{
str[i++] = c;
}
str[i] = '\0';
return str;
}
Thanks for everybody , I just made another function to count line this was the only lazy option available :)
static void linecounter(){
FILE *fileptr;
int count = 0;
char chr;
fileptr = fopen("input.txt", "r");
chr = getc(fileptr);
while (chr != EOF){
if (chr == '\n'){count = count + 1;}
chr = getc(fileptr);}
fclose(fileptr);
count_lines = count;}

C Reading a file of digits separated by commas

I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!
For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.
First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.
My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");

Working on a C program to count characters, words, sentences, lines, or all the above.

We are asked to create a program that provides a table in which the user can choose whether to count characters, words, sentences, lines, or all the above. This requires a separate function for each utility. I have the line counter working perfectly, but for some reason, the character counter function keeps returning 0. The program is incomplete, but I am getting very frustrated with the character counter.
#include <stdlib.h>
#define WHT_SPC\
(cur == ' ' || cur == '\n' || cur == '\t')
int countLines(sp1);
int wordCounter(sp1);
int characterCounter(sp1);
int sentenceCounter (sp1);
int main()
{
int lineCount = 0;
int wordCount = 0;
int characterCount= 0;
int sentenceCount = 0;
char filename[100];
FILE* sp1;
printf("Enter Filename to be read: ");
gets(filename);
sp1 = fopen(filename,"r");
lineCount = countLines(sp1);
characterCount = characterCounter(sp1);
printf("Number of Lines: %d\n",lineCount);
printf("Number of Characters: %d\n",characterCount);
fclose(sp1);
return 0;
}
int countLines(sp1)
{
int curCh;
int preCh;
int countLn = 0;
while ((curCh = fgetc(sp1)) != EOF)
{
if (curCh == '\n')
countLn++;
preCh = curCh;
}
if (preCh != '\n')
countLn++;
return countLn;
}
int characterCounter(sp1)
{
int chr;
int countCh = 0;
while ((chr = fgetc(sp1)) != EOF)
{
if (chr != 'n' && chr != ' ')
countCh++;
}
return countCh;
}
I understand the lack of comments is not ideal, but my problem is very specific. Not looking for answers just some advice to kind of point me in the right direction.
sp1 is a structure that holds, among other things, your position in the file you are reading. After the inner loop of countLines(), this field will point the end of the file. You will need to call rewind(sp1); before performing other read operations on this same stream, so it will work on its beginning again.
Don't hesitate to take a look at man rewind!
fgetc() increments the filepointer.
Thus characterCounter() caled after countLines() has to return 0 as the first character it encounters is EOF.
Try using fseek(sp1, 0L, SEEK_SET) before characterCounter().

Why is my character count incorrect?

The following code gets the number of words:
int count = 0;
for (int i = 0; chars[i] != EOF; i++)
{
if (chars[i] == ' ')
{
count++;
}
}
My problem is, that it doesn't count the words correctly.
For example, if my file.txt has the following text in it:
spaced-out there's I'd like
It says I have 6 words, when according to MS Word I'd have 4.
spaced-out and in
Gives me a word count of 4.
spaced out and in
Gives me a word count of 6
I'm sorry if this question has been answered before, Google doesn't take into account the special characters in the search, so it is hard to find the answer to coding. I'd preferably have the words just by identifying if it's a space or not.
I tried looking for answers but no one seemed to have the same problem exactly. I know that the .txt files might end in /r/n in Windows, but then that should be part of one word. For example:
spaced out and in/r/n
I believe it should still give me 4 words. Also when I add || chars[i] == '\n' as:
for (int i = 0; chars[i] != EOF || chars[i] == '\n'; i++)
I get even more words, 8 for the line
spaced out and in
I am doing this on a Linux-based server, but on an SSH client on Windows. The characters come from a .txt file.
Edit: Okay, here is the code, I avoided the #include when posting it.
#define BUF_SIZE 500
#define OUTPUT_MODE 0700
int main(int argc, char *argv[])
{
int input, output;
int readSize = 1, writeSize;
char chars[BUF_SIZE];
int count = 0;
input = open(argv[1], O_RDONLY);
output = creat(argv[2], OUTPUT_MODE);
while (readSize > 0)
{
readSize = read(input, chars, BUF_SIZE);
if (readSize < 0)
exit(4);
for (int i = 0; chars[i] != '\0'; i++)
{
if (chars[i] == ' ')
{
count++;
}
}
writeSize = write(output, chars, readSize);
if (writeSize <= 0)
{
close(input);
close(output);
printf("%d words\n", count);
exit(5);
}
}
}
I am writing this answer because I think, I know what your confusion is. But note that you did not explain how you read the file, I'll give an example and explain why we test != EOF, which is not a character that you read from a file.
It appears that you think EOF is a character that is stored in the file, well it's not. If you just want to count words you can do something like
int chr;
while ((chr = fgetc(file)) != EOF)
count += (chr == ' ') ? 1 : 0;
note that chr MUST be of type int because EOF is of type int, but it's certainly not present in the file! It's returned by functions like fgetc() to indicate that there is nothing more to read, note that an attempt to read must be made in order for it to return it.
Oops, also note that my sample code will not count the last word. But that's for you to figure out.
Also, this would count multiple spaces as "words" something that you should also workout.

How to calculate number of lines in file?

I work in C-language at first time and have some question.
How can I get the number of lines in file?
FILE *in;
char c;
int lines = 1;
...
while (fscanf(in,"%c",&c) == 1) {
if (c == '\n') {
lines++;
}
}
Am I right? I actually don't know how to get the moment , when string cross to the new line.
OP's code functions well aside from maybe an off-by-one issue and a last line issue.
Standard C library definition
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr ยง7.21.2 2
A line ends with a '\n' and the last line may or may not end with a '\n'.
If using the idea that the last line of a file does not require a final '\n, then the goal is to count the number of occurrences that a character is read after a '\n'.
// Let us use a wide type
// Start at 0 as the file may be empty
unsigned long long line_count = 0;
int previous = '\n';
int ch;
while ((ch = fgetc(in)) != EOF) {
if (previous == '\n') line_count++;
previous = ch;
}
printf("Line count:%llu\n", line_count);
Reading a file one character at a time may be less efficient than other means, but functionally meets OP goal.
This answer uses (ch = fgetc(in)) != EOF instead of fscanf(in,"%c",&c) == 1 which is typically ""faster", but with an optimizing compiler, either may emit similar performance code. Such details of speed can be supported with analysis or profiling. When in doubt, code for clarity.
Can use this utility function
/*
* count the number of lines in the file called filename
*
*/
int countLines(char *filename)
{
FILE *in = fopen(filename,"r");
int ch=0;
int lines=0;
if(in == NULL){
return 0; // return lines;
}
while((ch = fgetc(in)) != EOF){
if(ch == '\n'){
lines++;
}
}
fclose(in);
return lines;
}
When counting the 'number of \n characters', you have to remember that you are counting the separators, and not the items. See 'Fencepost Error'
Your example should work, but:
if the file does not end with a \n, then you might be off-by-one (depending on your definition of 'a line').
depending on your definition of 'a line' you may be missing \r characters in the file (typically used by Macs)
it will not be very efficient or quick (calling scanf() is expensive)
The example below will ingest a buffer each time, looking for \r and \n characters. There is some logic to latch these characters, so that the following line endings should be handled correctly:
\n
\r
\r\n
#include <stdio.h>
#include <errno.h>
int main(void) {
FILE *in;
char buf[4096];
int buf_len, buf_pos;
int line_count, line_pos;
int ignore_cr, ignore_lf;
in = fopen("my_file.txt", "rb");
if (in == NULL) {
perror("fopen()");
return 1;
}
line_count = 0;
line_pos = 0;
ignore_cr = 0;
ignore_lf = 0;
/* ingest a buffer at a time */
while ((buf_len = fread(&buf, 1, sizeof(buf), in)) != 0) {
/* walk through the buffer, looking for newlines */
for (buf_pos = 0; buf_pos < buf_len; buf_pos++) {
/* look for '\n' ... */
if (buf[buf_pos] == '\n') {
/* ... unless we've already seen '\r' */
if (!ignore_lf) {
line_count += 1;
line_pos = 0;
ignore_cr = 1;
}
/* look for '\r' ... */
} else if (buf[buf_pos] == '\r') {
/* ... unless we've already seen '\n' */
if (!ignore_cr) {
line_count += 1;
line_pos = 0;
ignore_lf = 1;
}
/* on any other character, count the characters per line */
} else {
line_pos += 1;
ignore_lf = 0;
ignore_cr = 0;
}
}
}
if (line_pos > 0) {
line_count += 1;
}
fclose(in);
printf("lines: %d\n", line_count);
return 0;
}

Resources