How to calculate number of lines in file? - c

I work in C-language at first time and have some question.
How can I get the number of lines in file?
FILE *in;
char c;
int lines = 1;
...
while (fscanf(in,"%c",&c) == 1) {
if (c == '\n') {
lines++;
}
}
Am I right? I actually don't know how to get the moment , when string cross to the new line.

OP's code functions well aside from maybe an off-by-one issue and a last line issue.
Standard C library definition
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2
A line ends with a '\n' and the last line may or may not end with a '\n'.
If using the idea that the last line of a file does not require a final '\n, then the goal is to count the number of occurrences that a character is read after a '\n'.
// Let us use a wide type
// Start at 0 as the file may be empty
unsigned long long line_count = 0;
int previous = '\n';
int ch;
while ((ch = fgetc(in)) != EOF) {
if (previous == '\n') line_count++;
previous = ch;
}
printf("Line count:%llu\n", line_count);
Reading a file one character at a time may be less efficient than other means, but functionally meets OP goal.
This answer uses (ch = fgetc(in)) != EOF instead of fscanf(in,"%c",&c) == 1 which is typically ""faster", but with an optimizing compiler, either may emit similar performance code. Such details of speed can be supported with analysis or profiling. When in doubt, code for clarity.

Can use this utility function
/*
* count the number of lines in the file called filename
*
*/
int countLines(char *filename)
{
FILE *in = fopen(filename,"r");
int ch=0;
int lines=0;
if(in == NULL){
return 0; // return lines;
}
while((ch = fgetc(in)) != EOF){
if(ch == '\n'){
lines++;
}
}
fclose(in);
return lines;
}

When counting the 'number of \n characters', you have to remember that you are counting the separators, and not the items. See 'Fencepost Error'
Your example should work, but:
if the file does not end with a \n, then you might be off-by-one (depending on your definition of 'a line').
depending on your definition of 'a line' you may be missing \r characters in the file (typically used by Macs)
it will not be very efficient or quick (calling scanf() is expensive)
The example below will ingest a buffer each time, looking for \r and \n characters. There is some logic to latch these characters, so that the following line endings should be handled correctly:
\n
\r
\r\n
#include <stdio.h>
#include <errno.h>
int main(void) {
FILE *in;
char buf[4096];
int buf_len, buf_pos;
int line_count, line_pos;
int ignore_cr, ignore_lf;
in = fopen("my_file.txt", "rb");
if (in == NULL) {
perror("fopen()");
return 1;
}
line_count = 0;
line_pos = 0;
ignore_cr = 0;
ignore_lf = 0;
/* ingest a buffer at a time */
while ((buf_len = fread(&buf, 1, sizeof(buf), in)) != 0) {
/* walk through the buffer, looking for newlines */
for (buf_pos = 0; buf_pos < buf_len; buf_pos++) {
/* look for '\n' ... */
if (buf[buf_pos] == '\n') {
/* ... unless we've already seen '\r' */
if (!ignore_lf) {
line_count += 1;
line_pos = 0;
ignore_cr = 1;
}
/* look for '\r' ... */
} else if (buf[buf_pos] == '\r') {
/* ... unless we've already seen '\n' */
if (!ignore_cr) {
line_count += 1;
line_pos = 0;
ignore_lf = 1;
}
/* on any other character, count the characters per line */
} else {
line_pos += 1;
ignore_lf = 0;
ignore_cr = 0;
}
}
}
if (line_pos > 0) {
line_count += 1;
}
fclose(in);
printf("lines: %d\n", line_count);
return 0;
}

Related

Is there any cross-platform approach for dealing with new line characters in text files in C language? [duplicate]

This question already has answers here:
Removing trailing newline character from fgets() input
(14 answers)
Closed 7 years ago.
I'm reading stdin and there are sometimes unix-style and sometimes windows-style newlines.
How to consume either type of newline?
Assuming you know there will be a newline, the solution is to consume one character, and then decide:
10 - LF ... Unix style newline
13 - CR ... Windows style newline
If it's 13, you have to consume one more character (10)
const char x = fgetc(stdin); // Consume LF or CR
if (x == 13) fgetc(stdin); // consume LF
There are a few more newline conventions than that. In particular, all four involving CR \r and LF \n -- \n, \r, \r\n, and \n\r -- are actually encoutered in the wild.
For reading text input, possibly interactively, and supporting all of those four newline encodings at the same time, I recommend using a helper function something like the following:
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <ctype.h>
#include <stdio.h>
#include <errno.h>
size_t get_line(char **const lineptr, size_t *const sizeptr, char *const lastptr, FILE *const in)
{
char *line;
size_t size, have;
int c;
if (!lineptr || !sizeptr || !in) {
errno = EINVAL; /* Invalid parameters! */
return 0;
}
if (*lineptr) {
line = *lineptr;
size = *sizeptr;
} else {
line = NULL;
size = 0;
}
have = 0;
if (lastptr) {
if (*lastptr == '\n') {
c = getc(in);
if (c != '\r' && c != EOF)
ungetc(c, in);
} else
if (*lastptr == '\r') {
c = getc(in);
if (c != '\n' && c != EOF)
ungetc(c, in);
}
*lastptr = '\0';
}
while (1) {
if (have + 2 >= size) {
/* Reallocation policy; my personal quirk here.
* You can replace this with e.g. have + 128,
* or (have + 2)*3/2 or whatever you prefer. */
size = (have | 127) + 129;
line = realloc(line, size);
if (!line) {
errno = ENOMEM; /* Out of memory */
return 0;
}
*lineptr = line;
*sizeptr = size;
}
c = getc(in);
if (c == EOF) {
if (lastptr)
*lastptr = '\0';
break;
} else
if (c == '\n') {
if (lastptr)
*lastptr = c;
else {
c = getc(in);
if (c != EOF && c != '\r')
ungetc(c, in);
}
break;
} else
if (c == '\r') {
if (lastptr)
*lastptr = c;
else {
c = getc(in);
if (c != EOF && c != '\n')
ungetc(c, in);
}
break;
}
if (iscntrl(c) && !isspace(c))
continue;
line[have++] = c;
}
if (ferror(in)) {
errno = EIO; /* I/O error */
return 0;
}
line[have] = '\0';
errno = 0; /* No errors, even if have were 0 */
return have;
}
int main(void)
{
char *data = NULL;
size_t size = 0;
size_t len;
char last = '\0';
setlocale(LC_ALL, "");
while (1) {
len = get_line(&data, &size, &last, stdin);
if (errno) {
fprintf(stderr, "Error reading standard input: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
if (!len && feof(stdin))
break;
printf("Read %lu characters: '%s'\n", (unsigned long)len, data);
}
free(data);
data = NULL;
size = 0;
return EXIT_SUCCESS;
}
Except for the errno constants I used (EINVAL, ENOMEM, and EIO), the above code is C89, and should be portable.
The get_line() function dynamically reallocates the line buffer to be long enough when necessary. For interactive inputs, you must accept a newline at the first newline-ish character you encounter (as trying to read the second character would block, if the first character happens to be the only newline character). If specified, the one-character state at lastptr is used to detect and handle correctly any two-character newlines at the start of the next line read. If not specified, the function will attempt to consume the entire newline as part of the current line (which is okay for non-interactive inputs, especially files).
The newline is not stored or counted in the line length. For added ease of use, the function also skips non-whitespace control characters. Especially embedded nul characters (\0) often cause headaches, so having the function skip those altogether is often a robust approach.
As a final touch, the function always sets errno -- to zero if no error occurred, nonzero error code otherwise --, including ferror() cases, so detecting error conditions is trivial.
The above code snippet includes a main(), which reads and displays input lines, using the current locale for the meaning of "non-whitespace control character" (!isspace(c) && iscntrl(c)).
Although this is definitely not the fastest mechanism to read input, it is not that slow, and it is a very robust one.
Questions?

Removing punctuation and capitalizing in C

I'm writing a program for school that asks to read text from a file, capitalizes everything, and removes the punctuation and spaces. The file "Congress.txt" contains
(Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the government for a redress of grievances.)
It reads in correctly but what I have so far to remove the punctuation, spaces, and capitalize causes some major problems with junk characters. My code so far is:
void processFile(char line[]) {
FILE *fp;
int i = 0;
char c;
if (!(fp = fopen("congress.txt", "r"))) {
printf("File could not be opened for input.\n");
exit(1);
}
line[i] = '\0';
fseek(fp, 0, SEEK_END);
fseek(fp, 0, SEEK_SET);
for (i = 0; i < MAX; ++i) {
fscanf(fp, "%c", &line[i]);
if (line[i] == ' ')
i++;
else if (ispunct((unsigned char)line[i]))
i++;
else if (islower((unsigned char)line[i])) {
line[i] = toupper((unsigned char)line[i]);
i++;
}
printf("%c", line[i]);
fprintf(csis, "%c", line[i]);
}
fclose(fp);
}
I don't know if it's an issue but I have MAX defined as 272 because that's what the text file is including punctuation and spaces.
My output I am getting is:
C╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠
╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠Press any key to continue . . .
The fundamental algorithm needs to be along the lines of:
while next character is not EOF
if it is alphabetic
save the upper case version of it in the string
null terminate the string
which translates into C as:
int c;
int i = 0;
while ((c = getc(fp)) != EOF)
{
if (isalpha(c))
line[i++] = toupper(c);
}
line[i] = '\0';
This code doesn't need the (unsigned char) cast with the functions from <ctype.h> because c is guaranteed to contain either EOF (in which case it doesn't get into the body of the loop) or the value of a character converted to unsigned char anyway. You only have to worry about the cast when you use char c (as in the code in the question) and try to write toupper(c) or isalpha(c). The problem is that plain char can be a signed type, so some characters, notoriously ÿ (y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS), will appear as a negative value, and that breaks the requirements on the inputs to the <ctype.h> functions. This code will attempt to case-convert characters that are already upper-case, but that's probably cheaper than a second test.
What else you do in the way of printing, etc is up to you. The csis file stream is a global scope variable; that's a bit (tr)icky. You should probably terminate the output printing with a newline.
The code shown is vulnerable to buffer overflow. If the length of line is MAX, then you can modify the loop condition to:
while (i < MAX - 1 && (c = getc(fp)) != EOF)
If, as would be a better design, you change the function signature to:
void processFile(int size, char line[]) {
and assert that the size is strictly positive:
assert(size > 0);
and then the loop condition changes to:
while (i < size - 1 && (c = getc(fp)) != EOF)
Obviously, you change the call too:
char line[4096];
processFile(sizeof(line), line);
in the posted code, there is no intermediate processing,
so the following code ignores the 'line[]' input parameter
void processFile()
{
FILE *fp = NULL;
if (!(fp = fopen("congress.txt", "r")))
{
printf("File could not be opened for input.\n");
exit(1);
}
// implied else, fopen successful
unsigned int c; // must be integer so EOF (-1) can be recognized
while( EOF != (c =(unsigned)fgetc(fp) ) )
{
if( (isalpha(c) || isblank(c) ) && !ispunct(c) ) // a...z or A...Z or space
{
// note toupper has no effect on upper case characters
// note toupper has no effect on a space
printf("%c", toupper(c));
fprintf(csis, "%c", toupper(c));
}
}
printf( "\n" );
fclose(fp);
} // end function: processFile
Okay so what I did was created a second character array. My first array read in the entire file. I created a second array which would only take in alphabetical characters from the first array then make them uppercase. My correct and completed function for that part of my homework is as follows:
void processFile(char line[], char newline[]) {
FILE *fp;
int i = 0;
int j = 0;
if (!(fp = fopen("congress.txt", "r"))) { //checks file open
printf("File could not be opened for input.\n");
exit(1);
}
line[i] = '\0';
fseek(fp, 0, SEEK_END); //idk what they do but they make it not crash
fseek(fp, 0, SEEK_SET);
for (i = 0; i < MAX; ++i) { //reads the file into the first array
fscanf(fp, "%c", &line[i]);
}
for (i = 0; i < MAX; ++i) {
if (isalpha(line[i])){ //if it's an alphabetical character
newline[j] = line[i]; //read into new array
newline[j] = toupper(newline[j]); //makes that letter capitalized
j++;
}
}
fclose(fp);
}
Just make sure that after creating the new array, it will be smaller than your defined MAX. To make it easy I just counted the now missing punctuation and spaces (which was 50) so for future "for" loops it was:
for (i = 0; i < MAX - 50; ++i)

Jumping to next line with fscanf()

I have two files .csv and I need to read the whole file but it have to be filed by field. I mean, csv files are files with data separated by comma, so I cant use fgets.
I need to read all the data but I don't know how to jump to the next line.
Here is what I've done so far:
int main()
{
FILE *arq_file;
arq_file = fopen("file.csv", "r");
if(arq_file == NULL){
printf("Not possible to read the file.");
exit(0);
}
while( !feof(arq_file) ){
fscanf(arq_file, "%i %lf", &myStruct[i+1].Field1, &myStruct[i+1].Field2);
}
fclose(arq_file);
return 0;
}
It will get in a infinity loop because it never gets the next line.
How could I reach the line below the one I just read?
Update: File 01 Example
1,Alan,123,
2,Alan Harper,321
3,Jose Rendeks,32132
4,Maria da graça,822282
5,Charlie Harper,9999999999
File 02 Example
1,320,123
2,444,321
3,250,123,321
3,3,250,373,451
2,126,621
1,120,320
2,453,1230
3,12345,0432,1830
I think an example is better than giving you hints, this is a combination of fgets() + strtok(), there are other functions that could work for example strchr(), though it's easier this way and since I just wanted to point you in the right direction, well I did it like this
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int
main(void)
{
FILE *file;
char buffer[256];
char *pointer;
size_t line;
file = fopen("data.dat", "r");
if (file == NULL)
{
perror("fopen()");
return -1;
}
line = 0;
while ((pointer = fgets(buffer, sizeof(buffer), file)) != NULL)
{
size_t field;
char *token;
field = 0;
while ((token = strtok(pointer, ",")) != NULL)
{
printf("line %zu, field %zu -> %s\n", line, field, token);
field += 1;
pointer = NULL;
}
line += 1;
}
return 0;
}
I think it's very clear how the code works and I hope you can understand.
If the same code has to handle both data files, then you're stuck with reading the fields into a string, and subsequently converting the string into a number.
It is not clear from your description whether you need to do something special at the end of line or not — but because only one of the data lines ends with a comma, you do have to allow for fields to be separated by a comma or a newline.
Frankly, you'd probably do OK with using getchar() or equivalent; it is simple.
char buffer[4096];
char *bufend = buffer + sizeof(buffer) - 1;
char *curfld = buffer;
int c;
while ((c = getc(arq_file)) != EOF)
{
if (curfld == bufend)
…process overlong field…
else if (c == ',' || c == '\n')
{
*curfld = '\0';
process(buffer);
curfld = buffer;
}
else
*curfld++ = c;
}
if (c == EOF && curfld != buffer)
{
*curfld = '\0';
process(buffer);
}
However, if you want to go with higher level functions, then you do want to use fgets() to read lines (unless you need to worry about deviant line endings, such as DOS vs Unix vs old-style Mac (CR-only) line endings). Or use POSIX
getline() to read arbitrarily long lines. Then split the lines using strtok_r() or equivalent.
char *buffer = 0;
size_t buflen = 0;
while (getline(&buffer, &buflen, arq_file) != -1)
{
char *posn = buffer;
char *epos;
char *token;
while ((token = strtok_r(posn, ",\n", &epos)) != 0)
{
process(token);
posn = 0;
}
/* Do anything special for end of line */
}
free(buffer);
If you think you must use scanf(), then you need to use something like:
char buffer[4096];
char c;
while (fscanf(arq_file, "%4095[^,\n]%c", buffer, &c) == 2)
process(buffer);
The %4095[^,\n] scan set reads up to 4095 characters that are neither comma nor newline into buffer, and then reads the next character (which must, therefore, either be comma or newline — or conceivably EOF, but that causes problems) into c. If the last character in the file is neither comma nor newline, then you will skip the last field.

Counting lines, numbers, and characters in C

I'm new to C and I got an assignment today that requires that I read text in from a file, count the number of lines, characters, and words, and return it in a specific format.
Just to be clear - I need to read in this text file:
"I must not fear.
Fear is the mind-killer.
Fear is the little-death that brings total obliteration.
I will face my fear.
I will permit it to pass over me and through me.
And when it has gone past I will turn the inner eye to see its path.
Where the fear has gone there will be nothing... only I will remain"
Litany Against Fear, Dune by Frank Herbert
and have it output like so:
1)"I must not fear.[4,17]
2)Fear is the mind-killer.[4,24]
3)Fear is the little-death that brings total obliteration.[8,56]
4)I will face my fear.[5,20]
5)I will permit it to pass over me and through me.[11,48]
6)And when it has gone past I will turn the inner eye to see its path.[16,68]
7)Where the fear has gone there will be nothing... only I will remain"[13,68]
8) Litany Against Fear, Dune by Frank Herbert[7,48]
Now, I've written something that will accept the file, it counts the number of lines properly, but I have 2 major issues - 1. How do I get the text from the file to appear in the output? I can't get that at all. My word count doesn't work at all, and my character count is off too. Can you please help?
#include <stdio.h>
#define IN 1
#define OUT 0
void main()
{
int numChars = 0;
int numWords = 0;
int numLines = 0;
int state = 0;
int test = 0;
FILE *doesthiswork;
doesthiswork = fopen("testWords.in", "r");
state = OUT;
while ((test = fgetc(doesthiswork)) != EOF)
{
++numChars;
if ( test == '\n')
{
++numLines;
if (test == ' ' || test == '\t' || test == '\n')
{
state = OUT;
}
else if (state == OUT)
{
state = IN;
++numWords;
}
}
printf("%d) I NEED TEXT HERE. [%d %d]\n",numLines, numWords, numChars);
}
}
It will be better if you use getline() function to read each line from the file.
And after reading the line process it using strtok() function. With this you will get the number of words in the line and save it in a variable.
Then process each variable and get the number of characters.
Output the line number, number of words and the number of characters.
Then read another line and so on.
How do I get the text from the file to appear in the output?
It should be stored there by preparing a buffer.
My word count doesn't work at all, and my character count is off too.
Order in which the test is wrong.
fix like this:
#include <stdio.h>
#define IN 1
#define OUT 0
int main(){
int numChars = 0;
int numWords = 0;
int numLines = 0;
int state = OUT;
int test;
char buffer[1024];
int buff_pos = 0;
FILE *doesthiswork;
doesthiswork = fopen("data.txt", "r");
state = OUT;
while((test = fgetc(doesthiswork)) != EOF) {
++numChars;
buffer[buff_pos++] = test;
if(test == ' ' || test == '\t' || test == '\n'){
state = OUT;
if(test == '\n') {
++numLines;
--numChars;//no count newline
buffer[--buff_pos] = '\0';//rewrite newline
printf("%d)%s[%d,%d]\n", numLines, buffer, numWords, numChars);
buff_pos = 0;
numWords = numChars = 0;
}
} else {
if(state == OUT){
state = IN;
++numWords;
}
}
}
fclose(doesthiswork);
if(buff_pos != 0){//Input remains in the buffer.
++numLines;
buffer[buff_pos] = '\0';
printf("%d)%s[%d,%d]\n", numLines, buffer, numWords, numChars);
}
return 0;
}

What is the easiest way to count the newlines in an ASCII file?

Which is the fastest way to get the lines of an ASCII file?
Normally you read files in C using fgets. You can also use scanf("%[^\n]"), but quite a few people reading the code are likely to find that confusing and foreign.
Edit: on the other hand, if you really do just want to count lines, a slightly modified version of the scanf approach can work quite nicely:
while (EOF != (scanf("%*[^\n]"), scanf("%*c")))
++lines;
The advantage of this is that with the '*' in each conversion, scanf reads and matches the input, but does nothing with the result. That means we don't have to waste memory on a large buffer to hold the content of a line that we don't care about (and still take a chance of getting a line that's even larger than that, so our count ends up wrong unless we got to even more work to figure out whether the input we read ended with a newline).
Unfortunately, we do have to break up the scanf into two pieces like this. scanf stops scanning when a conversion fails, and if the input contains a blank line (two consecutive newlines) we expect the first conversion to fail. Even if that fails, however, we want the second conversion to happen, to read the next newline and move on to the next line. Therefore, we attempt the first conversion to "eat" the content of the line, and then do the %c conversion to read the newline (the part we really care about). We continue doing both until the second call to scanf returns EOF (which will normally be at the end of the file, though it can also happen in case of something like a read error).
Edit2: Of course, there is another possibility that's (at least arguably) simpler and easier to understand:
int ch;
while (EOF != (ch=getchar()))
if (ch=='\n')
++lines;
The only part of this that some people find counterintuitive is that ch must be defined as an int, not a char for the code to work correctly.
Here's a solution based on fgetc() which will work for lines of any length and doesn't require you to allocate a buffer.
#include <stdio.h>
int main()
{
FILE *fp = stdin; /* or use fopen to open a file */
int c; /* Nb. int (not char) for the EOF */
unsigned long newline_count = 0;
/* count the newline characters */
while ( (c=fgetc(fp)) != EOF ) {
if ( c == '\n' )
newline_count++;
}
printf("%lu newline characters\n", newline_count);
return 0;
}
Maybe I'm missing something, but why not simply:
#include <stdio.h>
int main(void) {
int n = 0;
int c;
while ((c = getchar()) != EOF) {
if (c == '\n')
++n;
}
printf("%d\n", n);
}
if you want to count partial lines (i.e. [^\n]EOF):
#include <stdio.h>
int main(void) {
int n = 0;
int pc = EOF;
int c;
while ((c = getchar()) != EOF) {
if (c == '\n')
++n;
pc = c;
}
if (pc != EOF && pc != '\n')
++n;
printf("%d\n", n);
}
Common, why You compare all characters? It is very slow. In 10MB file it is ~3s.
Under solution is faster.
unsigned long count_lines_of_file(char *file_patch) {
FILE *fp = fopen(file_patch, "r");
unsigned long line_count = 0;
if(fp == NULL){
return 0;
}
while ( fgetline(fp) )
line_count++;
fclose(fp);
return line_count;
}
What about this?
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 4096
int main(int argc, char** argv)
{
int count;
int bytes;
FILE* f;
char buffer[BUFFER_SIZE + 1];
char* ptr;
if (argc != 2 || !(f = fopen(argv[1], "r")))
{
return -1;
}
count = 0;
while(!feof(f))
{
bytes = fread(buffer, sizeof(char), BUFFER_SIZE, f);
if (bytes <= 0)
{
return -1;
}
buffer[bytes] = '\0';
for (ptr = buffer; ptr; ptr = strchr(ptr, '\n'))
{
++count;
++ptr;
}
}
fclose(f);
printf("%d\n", count - 1);
return 0;
}

Resources