Counting the number of lines in a .txt file in c

Counting the number of lines in a .txt file in c - c

I have a processes.txt file that contains details about incoming processes like so,
0 4 96 30
3 2 32 40
5 1 100 20
20 3 4 30
I wanted to find the number of lines in this file. How can that be done?
I tried this code, but it always returns the number of lines as 0
char c;
int count = 0;
// fp is the pile pointer
for (c = getc(fp); c != EOF; c = getc(fp))
if (c == '\n') // Increment count if this character is newline
count = count + 1;

Apart from the char that should be an int your code is more or less fine. The problem is somewhere in the code you didn't show.
This works:
#include <stdio.h>
int main() {
FILE* fp = fopen("processes.txt", "r");
if (fp == NULL)
{
printf("Could not open file.");
return 1;
}
int c; // this must be an int
int count = 0;
for (c = getc(fp); c != EOF; c = getc(fp))
if (c == '\n') // Increment count if this character is newline
count = count + 1;
printf("The file has %d line(s)\n", count);
fclose(fp);
}
However if the last line of the file does not end with a \n, it is not counted.

Please, read How to create a minimal, complete and verifiable example.
In order to test your program snippet, I first had to complete your fragment of code in order to make it compilable. Probably your error has gone with that modifications, as my run of it shows (over your input text) this output:
pru.c
#include <stdio.h>
int main()
{
char c;
int count = 0;
FILE *fp = stdin; // most probably your error is
// related to this initialization.
// fp is the pile pointer
for (c = getc(fp); c != EOF; c = getc(fp))
if (c == '\n') // Increment count if this character is newline
count = count + 1;
printf("%d\n", count);
return 0;
}
and running it:
$ pru <<EOF
0 4 96 30
3 2 32 40
5 1 100 20
20 3 4 30
EOF
4
$ _
Which is the correct answer.
Despite of this, your program fragment, shows a non visible error, as you have been told in the comments to your question: type of c variable should be int and not char, but why?
Because char is the type you want to receive, all available values are possible, so to indicate that some special condition has been detected in your file (the end of the data in the stream, or EOF is not one of those values, but a special condition) one extra value is needed, so making the type char insufficient to include all possible return values from fgetc(3). This is the reason to make fgetc(3) function to return an int.
Check the documentation of fgetc(3) as your program works almost fine, while you have to be given a reason of why:
When the program reads a character, it is mapped into the int values 0 to 255, so all different bytes convert as positive integer values, while normally (almost every implementation does) EOF is mapped into the integer value -1. What is happening here is that all your values are converted into a char, making EOF to be mapped into one of those 0 to 256 values (which one is dependent on the implementation, but normally it is the value 255 ---or -1 if char happens to be signed) so:
in case your char type is represented as a two's complement type (signed) your values 0 to 255 are mapped into 0 to 127 and -128 to -1, and the EOF value is mapped to some of them (mostly -1).
in case your char type is represented as an unsigned type, your values 0 to 255 are mapped into 0 to 255 and the EOF value is mapped to one of them (most probably 255)
it doesn't matter which value the EOF is converted to, as you make your comparison in a coherent type system, so the converted char value is compared with the converted EOF value making that EOF is converted into the converted value of EOF. But this makes another char value to happen to show the same behaviour, making that one such charater on input will be interpreted as EOF, and will make your program to stop prematurely.
In both cases above, if a byte with the same mapped-to value of EOF is input, your program will finish, believing that it has reached the end of the file, and your count will be erroneous. This is not the case here, but you can get a surprise with one file that has such a character.
So your final program (corrected) would be:
#include <stdio.h>
int main()
{
int c;
int count = 0;
FILE *fp = stdin;
// fp is the pile pointer
for (c = getc(fp); c != EOF; c = getc(fp))
if (c == '\n') // Increment count if this character is newline
count = count + 1;
printf("%d\n", count);
return 0;
}
Before terminating, I'll recommend to use a while loop, as it is a frequently used idiom in C to produce more compact form of your loop:
#include <stdio.h>
int main()
{
int c;
long count = 0;
FILE *fp = stdin; /* probably you dont have intialized this
* field in your code, but who knows, if
* you have not posted a complete
* sample */
// fp is the pile pointer
while ((c = getc(fp)) != EOF)
if (c == '\n') // Increment count if this character is newline
count++; // this is another frequently used idiom :)
printf("%d\n", count);
return 0;
}

Maybe your file is at the end or in error when you do this?? And you need to start at the beginning
int c; // c must be int
int count = 0;
// fp is the pile pointer
rewind(fp); // back to beginning, clear error
for (c = getc(fp); c != EOF; c = getc(fp))
if (c == '\n') // Increment count if this character is newline
count = count + 1;

Related

Function of (char)getchar() in C programming

My friend asked me what is (char)getchar() which he found in some online code and I googled and found 0 results of it being used, I thought the regular usage is just ch = getchar(). This is the code he found, Can anyone explain what is this function?
else if (input == 2)
{
if (notes_counter != LIST_SIZE)
{
printf("Enter header: ");
getchar();
char c = (char)getchar();
int tmp_count = 0;
while (c != '\n' && tmp_count < HEADER)
{
note[notes_counter].header[tmp_count++] = c;
c = (char)getchar();
}
note[notes_counter].header[tmp_count] = '\0';
printf("Enter content: ");
c = (char)getchar();
tmp_count = 0;
while (c != '\n' && tmp_count < CONTENT)
{
note[notes_counter].content[tmp_count++] = c;
c = (char)getchar();
}
note[notes_counter].content[tmp_count] = '\0';
printf("\n");
notes_counter++;
}
}

(char)getchar() is a mistake. Never use it.
getchar returns an int that is either an unsigned char value of a character that was read or is the value of EOF, which is negative. If you convert it to char, you lose the distinction between EOF and some character that maps to the same char value.
The result of getchar should always be assigned to an int object, not a char object, so that these values are preserved, and the result should be tested to see if it is EOF before the program assumes a character has been read. Since the program uses c to store the result of getchar, c should be declared as int c, not char c.
It is possible a compiler issued a warning for c = getchar(); because that assignment implicitly converts an int to a char, which can lose information as mentioned above. (This warning is not always issued by a compiler; it may depend on warning switches used.) The correct solution for that warning is to change c to an int, not to insert a cast to char.
About the conversion: The C standard allows char to be either signed or unsigned. If it is unsigned, then (char) getchar() will convert an EOF returned by getchar() to some non-negative value, which will be the same value as one of the character values. If it is signed, then (char) getchar() will convert some of the unsigned char character values to char in an implementation-defined way, and some of those conversions may produce the same value as EOF.

The code is a typical example of incorrect usage of the getchar() function.
getchar(), and more generally getc(fp) and fgetc(fp) return a byte from the stream as a positive value between 0 and UCHAR_MAX or the special negative value EOF upon error or end of file.
Storing this value into a variable of type char loses information. It makes testing for EOF
unreliable if type char is signed: if EOF has the value (-1) it cannot be distinguished from a valid byte value 255, which most likely gets converted to -1 when stored to a char variable on CPUs with 8-bit bytes
impossible on architectures where type char is unsigned by default, on which all char values are different from EOF.
In this program, the variables receiving the getchar() return value should have type int.
Note also that EOF is not tested in the code fragment, causing invalid input strings such as long sequences of ÿÿÿÿÿÿÿÿ at end of file.
Here is a modified version:
else if (input == 2)
{
if (notes_counter != LIST_SIZE)
{
int c;
// consume the rest of the input line left pending by `scanf()`
// this should be performed earlier in the function
while ((c = getchar()) != EOF && c != '\n')
continue;
printf("Enter header: ");
int tmp_count = 0;
while ((c = getchar()) != EOF && c != '\n') {
if (tmp_count + 1 < HEADER)
note[notes_counter].header[tmp_count++] = c;
}
note[notes_counter].header[tmp_count] = '\0';
printf("Enter content: ");
tmp_count = 0;
while ((c = getchar()) != EOF && c != '\n')
if (tmp_count + 1 < CONTENT)
note[notes_counter].content[tmp_count++] = c;
}
note[notes_counter].content[tmp_count] = '\0';
printf("\n");
notes_counter++;
}
}

find and save an integer from a textfile in C

I'm very new to programming and C. I have textfile with some random text and an integer that i want to find and save. The textfile looks something like this (I only want to save 23 in this case, not 56):
# this is a comment
23
this is some random text
56
And this is the code I have:
int *num = malloc(sizeof(int));
*num = fgetc(f);
while(!feof(f)){
*num = fgetc(f);
if(isdigit(*num)){
break;
}
}
if(!isdigit(*num)){
printf("Error: no number found.\n");
free(num);
}
else{
printf("%c\n", *num);
}
I'm kinda stuck right now and my program only prints out the number 2 :/
Very thankful for help.

As #pbn said you're better off using sscanf.
But if you really, really want, you can do it your way, by reading one character at a time, but you'll need to "build" the integer yourself, converting the character to integer, keeping track of what you have, and multiplying by powers of 10 for every digit the number that you already have.
Something like this (not complete code, it's just to get you started):
int c;
int num = 0;
while (c = fgetc(f)) {
if(!isdigit(c)) {
break;
}
num = (num * 10) + (c - '0');
}
The c- '0' part is to convert the text representation of the integer to the integer itself. 0 is character 48, 1 is 49 and so on.
This is assuming that on the line with numbers, you ONLY have numbers, not a mix of numerical and non-numerical characters.
Also, do not use !feof(file).

One option could be using getline and sscanf functions. I assumed that text lines do not contain numbers:
#include <stdio.h>
int main() {
int value, matched = 0;
char *line = NULL;
size_t size;
while(getline(&line, &size, stdin) != -1) {
if ((matched = sscanf(line, "%d", &value)) != 0)
break;
}
if (matched)
printf("value: %d\n", value);
return 0;
}
This part:
while(getline(&line, &size, stdin) != -1) {
will try to read the entire stream line by line.
Next line uses sscanf return value, which is the number of input items successfully matched and assigned, to determine whether the integer value has been found. If so it stops reading the stream.

One simple way in your program is once you find digit don't just stop continue untill you find next " " , "\n" , "\0" . Till then add Number = Number*10 +(*num);, define Number as global or something.

Trying to count the number of words in a file [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a file named myf which has a lot of text in it and I am trying to use blank spaces as a way of counting the number of words. Basically, in the count method of my program, there is a variable int d which acts like a boolean function. Also, there is an incrementer called count.
I have a for loop which will traverse the array that's put into the argument of the method count, and will see if the pointer *p is a non letter. If it is a non letter AND d=0, d=1 and count is incremented. This way, if the next character is also a non space, since d=1, the else if statement will not be incremented again. The only way for d to reset to 0 is if a space is present, at which point, if another letter is found, it will be incremented again. Then the method count will return the variable count. Seems simple enough, but I keep getting wrong numbers.
#include <stdio.h>
#include<stdlib.h>
#include <string.h>
#include <ctype.h>
int count(char x[]) {
int d = 0;
int count = 0;
for (char *p = x; *p != EOF; *p++) {
// this will traverse file
printf("%c", *p);
// this is just to see the output of the file
if (*p == ' ' && d == 1) {
d = 0;
}
else if (*p != ' ' && d == 0) {
count++;
d = 1;
}
}
return count;
}
int main() {
char c;
int r = 0;
char l[1000];
FILE *fp = fopen("myf", "r");
while ((c = fgetc(fp)) != EOF) {
l[r] = c;
r++;
}
printf("\n %d", count(l));
}

To count the number of words, count the occurrences of a letter after a non-letter.
*p != EOF is the wrong test. EOF indicate that the input operation either 1) had not more input or 2) an input error occurred. It does not signify the end of a string.
Use int to save the result from fgetc() as that returns an int in the range of unsigned char and EOF. Typically 257 different values. char is insufficient.
Small stuff: No need for an array. Let code consider ' as a letter. As the number of words could be very large, let code use a wide type like unsigned long long.
#include <ctype.h>
int isletter(int ch) {
return isalpha(c) || c == '\'';
}
#include <stdio.h>
int main(void) {
unsigned long long count = 0;
FILE *fp = fopen("myf", "r");
if (fp) {
int c;
int previous = ' ';
while ((c = fgetc(fp)) != EOF) {
if (!isletter(previous) && isletter(ch)) count++;
previous = ch;
}
fclose(fp);
}
printf("%llu\n", count);
}

Don't do this
*p != EOF
EOF is actually a negative integer and you're using it as a char. You should pass in how many character you want to iterate over ie
int count(char x[], int max){
then use the for loop like
int m = 0;
for ( char *p = x; m < max; p++, m++)
Note I also changed *p++ to p++. You also need to update your program to consider things that are non space etc ie this line
else if (*p != ' ' && d==0 )
What happens when it encounters a \n, it will likely count an extra word.

Program runs too slowly with large input - C

The goal for this program is for it to count the number of instances that two consecutive letters are identical and print this number for every test case. The input can be up to 1,000,000 characters long (thus the size of the char array to hold the input). The website which has the coding challenge on it, however, states that the program times out at a 2s run-time. My question is, how can this program be optimized to process the data faster? Does the issue stem from the large char array?
Also: I get a compiler warning "assignment makes integer from pointer without a cast" for the line str[1000000] = "" What does this mean and how should it be handled instead?
Input:
number of test cases
strings of capital A's and B's
Output:
Number of duplicate letters next to each other for each test case, each on a new line.
Code:
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
int main() {
int n, c, a, results[10] = {};
char str[1000000];
scanf("%d", &n);
for (c = 0; c < n; c++) {
str[1000000] = "";
scanf("%s", str);
for (a = 0; a < (strlen(str)-1); a++) {
if (str[a] == str[a+1]) { results[c] += 1; }
}
}
for (c = 0; c < n; c++) {
printf("%d\n", results[c]);
}
return 0;
}

You don't need the line
str[1000000] = "";
scanf() adds a null terminator when it parses the input and writes it to str. This line is also writing beyond the end of the array, since the last element of the array is str[999999].
The reason you're getting the warning is because the type of str[10000000] is char, but the type of a string literal is char*.
To speed up the program, take the call to strlen() out of the loop.
size_t len = strlen(str)-1;
for (a = 0; a < len; a++) {
...
}

str[1000000] = "";
This does not do what you think it does and you're overflowing the buffer which results in undefined behaviour. An indexer's range is from 0 - sizeof(str) EXCLUSIVE. So you either add one to the
1000000 when initializing or use 999999 to access it instead. To get rid of the compiler warning and produce cleaner code use:
str[1000000] = '\0';
Or
str[999999] = '\0';
Depending on what you did to fix it.
As to optimizing, you should look at the assembly and go from there.

count the number of instances that two consecutive letters are identical and print this number for every test case
For efficiency, code needs a new approach as suggeted by #john bollinger & #molbdnilo
void ReportPairs(const char *str, size_t n) {
int previous = EOF;
unsigned long repeat = 0;
for (size_t i=0; i<n; i++) {
int ch = (unsigned char) str[i];
if (isalpha(ch) && ch == previous) {
repeat++;
}
previous = ch;
}
printf("Pair count %lu\n", repeat);
}
char *testcase1 = "test1122a33";
ReportPairs(testcase1, strlen(testcase1));
or directly from input and "each test case, each on a new line."
int ReportPairs2(FILE *inf) {
int previous = EOF;
unsigned long repeat = 0;
int ch;
for ((ch = fgetc(inf)) != '\n') {
if (ch == EOF) return ch;
if (isalpha(ch) && ch == previous) {
repeat++;
}
previous = ch;
}
printf("Pair count %lu\n", repeat);
return ch;
}
while (ReportPairs2(stdin) != EOF);
Unclear how OP wants to count "AAAA" as 2 or 3. This code counts it as 3.

One way to dramatically improve the run-time for your code is to limit the number of times you read from stdin. (basically process input in bigger chunks). You can do this a number of way, but probably one of the most efficient would be with fread. Even reading in 8-byte chunks can provide a big improvement over reading a character at a time. One example of such an implementation considering capital letters [A-Z] only would be:
#include <stdio.h>
#define RSIZE 8
int main (void) {
char qword[RSIZE] = {0};
char last = 0;
size_t i = 0;
size_t nchr = 0;
size_t dcount = 0;
/* read up to 8-bytes at a time */
while ((nchr = fread (qword, sizeof *qword, RSIZE, stdin)))
{ /* compare each byte to byte before */
for (i = 1; i < nchr && qword[i] && qword[i] != '\n'; i++)
{ /* if not [A-Z] continue, else compare */
if (qword[i-1] < 'A' || qword[i-1] > 'Z') continue;
if (i == 1 && last == qword[i-1]) dcount++;
if (qword[i-1] == qword[i]) dcount++;
}
last = qword[i-1]; /* save last for comparison w/next */
}
printf ("\n sequential duplicated characters [A-Z] : %zu\n\n",
dcount);
return 0;
}
Output/Time with 868789 chars
$ time ./bin/find_dup_digits <dat/d434839c-d-input-d4340a6.txt
sequential duplicated characters [A-Z] : 434893
real 0m0.024s
user 0m0.017s
sys 0m0.005s
Note: the string was actually a string of '0's and '1's run with a modified test of if (qword[i-1] < '0' || qword[i-1] > '9') continue; rather than the test for [A-Z]...continue, but your results with 'A's and 'B's should be virtually identical. 1000000 would still be significantly under .1 seconds. You can play with the RSIZE value to see if there is any benefit to reading a larger (suggested 'power of 2') size of characters. (note: this counts AAAA as 3) Hope this helps.

Putting numbers separated by a space into an array

I want to have a user enter numbers separated by a space and then store each value as an element of an array. Currently I have:
while ((c = getchar()) != '\n')
{
if (c != ' ')
arr[i++] = c - '0';
}
but, of course, this stores one digit per element.
If the user was to type:
10 567 92 3
I was wanting the value 10 to be stored in arr[0], and then 567 in arr[1] etc.
Should I be using scanf instead somehow?

There are several approaches, depending on how robust you want the code to be.
The most straightforward is to use scanf with the %d conversion specifier:
while (scanf("%d", &a[i++]) == 1)
/* empty loop */ ;
The %d conversion specifier tells scanf to skip over any leading whitespace and read up to the next non-digit character. The return value is the number of successful conversions and assignments. Since we're reading a single integer value, the return value should be 1 on success.
As written, this has a number of pitfalls. First, suppose your user enters more numbers than your array is sized to hold; if you're lucky you'll get an access violation immediately. If you're not, you'll wind up clobbering something important that will cause problems later (buffer overflows are a common malware exploit).
So you at least want to add code to make sure you don't go past the end of your array:
while (i < ARRAY_SIZE && scanf("%d", &a[i++]) == 1)
/* empty loop */;
Good so far. But now suppose your user fatfingers a non-numeric character in their input, like 12 3r5 67. As written, the loop will assign 12 to a[0], 3 to a[1], then it will see the r in the input stream, return 0 and exit without saving anything to a[2]. Here's where a subtle bug creeps in -- even though nothing gets assigned to a[2], the expression i++ still gets evaluated, so you'll think you assigned something to a[2] even though it contains a garbage value. So you might want to hold off on incrementing i until you know you had a successful read:
while (i < ARRAY_SIZE && scanf("%d", &a[i]) == 1)
i++;
Ideally, you'd like to reject 3r5 altogether. We can read the character immediately following the number and make sure it's whitespace; if it's not, we reject the input:
#include <ctype.h>
...
int tmp;
char follow;
int count;
...
while (i < ARRAY_SIZE && (count = scanf("%d%c", &tmp, &follow)) > 0)
{
if (count == 2 && isspace(follow) || count == 1)
{
a[i++] = tmp;
}
else
{
printf ("Bad character detected: %c\n", follow);
break;
}
}
If we get two successful conversions, we make sure follow is a whitespace character - if it isn't, we print an error and exit the loop. If we get 1 successful conversion, that means there were no characters following the input number (meaning we hit EOF after the numeric input).
Alternately, we can read each input value as text and use strtol to do the conversion, which also allows you to catch the same kind of problem (my preferred method):
#include <ctype.h>
#include <stdlib.h>
...
char buf[INT_DIGITS + 3]; // account for sign character, newline, and 0 terminator
...
while(i < ARRAY_SIZE && fgets(buf, sizeof buf, stdin) != NULL)
{
char *follow; // note that follow is a pointer to char in this case
int val = (int) strtol(buf, &follow, 10);
if (isspace(*follow) || *follow == 0)
{
a[i++] = val;
}
else
{
printf("%s is not a valid integer string; exiting...\n", buf);
break;
}
}
BUT WAIT THERE'S MORE!
Suppose your user is one of those twisted QA types who likes to throw obnoxious input at your code "just to see what happens" and enters a number like 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 which is obviously too large to fit into any of the standard integer types. Believe it or not, scanf("%d", &val) will not yak on this, and will wind up storing something to val, but again it's an input you'd probably like to reject outright.
If you only allow one value per line, this becomes relatively easy to guard against; fgets will store a newline character in the target buffer if there's room, so if we don't see a newline character in the input buffer then the user typed something that's longer than we're prepared to handle:
#include <string.h>
...
while (i < ARRAY_SIZE && fgets(buf, sizeof buf, stdin) != NULL)
{
char *newline = strchr(buf, '\n');
if (!newline)
{
printf("Input value too long\n");
/**
* Read until we see a newline or EOF to clear out the input stream
*/
while (!newline && fgets(buf, sizeof buf, stdin) != NULL)
newline = strchr(buf, '\n');
break;
}
...
}
If you want to allow multiple values per line such as '10 20 30', then this gets a bit harder. We could go back to reading individual characters from the input, and doing a sanity check on each (warning, untested):
...
while (i < ARRAY_SIZE)
{
size_t j = 0;
int c;
while (j < sizeof buf - 1 && (c = getchar()) != EOF) && isdigit(c))
buf[j++] = c;
buf[j] = 0;
if (isdigit(c))
{
printf("Input too long to handle\n");
while ((c = getchar()) != EOF && c != '\n') // clear out input stream
/* empty loop */ ;
break;
}
else if (!isspace(c))
{
if (isgraph(c)
printf("Non-digit character %c seen in numeric input\n", c);
else
printf("Non-digit character %o seen in numeric input\n", c);
while ((c = getchar()) != EOF && c != '\n') // clear out input stream
/* empty loop */ ;
break;
}
else
a[i++] = (int) strtol(buffer, NULL, 10); // no need for follow pointer,
// since we've already checked
// for non-digit characters.
}
Welcome to the wonderfully whacked-up world of interactive input in C.

Small change to your code: only increment i when you read the space:
while ((c = getchar()) != '\n')
{
if (c != ' ')
arr[i] = arr[i] * 10 + c - '0';
else
i++;
}
Of course, it's better to use scanf:
while (scanf("%d", &a[i++]) == 1);
providing that you have enough space in the array. Also, be careful that the while above ends with ;, everything is done inside the loop condition.
As a matter of fact, every return value should be checked.

scanf returns the number of items successfully scanned.
Give this code a try:
#include <stdio.h>
int main()
{
int arr[500];
int i = 0;
int sc = 0; //scanned items
int n = 3; // no of integers to be scanned from the single line in stdin
while( sc<n )
{
sc += scanf("%d",&arr[i++]);
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Counting the number of lines in a .txt file in c - c

Related

Function of (char)getchar() in C programming

find and save an integer from a textfile in C

Trying to count the number of words in a file [closed]

Program runs too slowly with large input - C

Putting numbers separated by a space into an array

Categories

Resources