Currently to read a file line by line in C I am using:
char buffer[1024];
while(fgets(buffer, sizeof(buffer), file) != NULL) {
//do something with each line that is now stored in buffer
}
However there is no guarantee in the file that the line will be shorter than 1024. What will happen if a line is longer than 1024? Will the rest of the line be read in the next iteration of the while loop?
And how can I read line by line without a maximum length?
Yes, the rest of the line will be read in the next iteration.
You can detect whether or not you read a whole line by inspecting the last character of the string (i.e. the one before the null terminator) to see if it is '\n' or not -- fgets passes '\n' through to you.
There is no Standard C function which will read a line whilst dynamically allocating enough memory for it, however there is a POSIX function getline() which does that. You could write your own that uses fgets or otherwise to do the reading, in a loop with realloc, of course.
From the standards §7.19.7.2,
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into the
array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
From MSDN,
fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The newline character, if read, is included in the string.
So, yes fgets will read the rest of the line in next iteration if the it doesn't encounters the newline character within sizeof(buffer)-1 range.
If you want to read the whole line in one shot, then it is better to go with malloc and, if needed, reallocing the memory as per your needs.
Related
I am learning fgets from the manpage. I did some tests on fgets to make sure I understand it. One of the tests I did results in behaviour contrary to what is specified in the man page. The man page says:
char *fgets(char s[restrict .size], int size, FILE *restrict stream);
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an EOF
or a newline. If a newline is read, it is stored into the buffer. A
terminating null byte ('\0') is stored after the last character in the
buffer.
But it doesn't "read in at most one less than size characters from stream". As demonstrated by the following program:
#include<stdio.h>
#include<stdlib.h>
int main(){
FILE *fp;
fp=fopen("sample", "r");
char *s=calloc(50, sizeof(char));
while(fgets(s,2,fp)!=NULL) printf("%s",s);
}
The sample file:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The output of the compiled binary:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The expected output according to the man page:
t
j
a
4
5
Is the man page wrong, or am I missing something?
Is the man page wrong or am i missing something?
I won't say that the man page is wrong but it could be more clear.
There are 3 things that may stop fgets from reading from the stream.
The buffer is full (i.e. only room left for the termination character)
A newline character was read from the stream
End-Of-File occured
The quoted man page only mentions two of those conditions clearly.
Reading stops after an EOF or a newline.
That is #2 and #3 are mentioned very explicit while #1 is (kind of) derived from
reads in at most one less than size characters from stream
Here is another description from https://man7.org/linux/man-pages/man3/fgets.3p.html
... read bytes from stream into the array pointed to by s until n-1 bytes are read, or a newline is read and transferred to s, or an end-of-file condition is encountered.
where the 3 cases are clearly mentioned.
But yes... you are missing something. Once the buffer gets full, the rest of the current line is not read and discarded. The rest will stay in the stream and be available for the next read. So nothing is lost. You just need more fgets calls to read all data.
As suggested in a number of comments (e.g. Fe2O3 and Lundin) you can see this if you change the print statement so that it includes a delimiter of some kind. For instance (from Lundin):
printf("|%s|",s);
This will make clear exactly what you got from the individual fgets calls.
In the provided quote there is writte clear
If a newline is read, it is stored into the buffer.
Where do you see that this call fgets(s,2,fp) reads the new line character for example when reading this line?
thiis is line no. 1
The line contains only one new line character at its end.
This call reads only one character after another that is character by character that is appended by the terminating zero character '\0'.
So the read strings look like
{ 't', '\0' }
{ 'h', '\0' },
{ 'i', '\0' }
// ...
{ '1', '\0' }
{ '\n', '\0' }
If you have a call of fgets like that
fgets(s,n,fp)
then at most n-1 characters are read from the input stream. One character is reserved for the terminating zero character '\0' to build a string.
From the C Standard (7.21.7.2 The fgets function)
2 The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array
I have a program in C, which reads in a file with buffers:
char buffer[65536];
while(fgets(buffer,65536,inputFile)){...}
Now I have some questions:
What happens if the filesize is smaller than buffersize?
Does fgets() overwrite the full buffer, so if previous buffer (buffer1) was full, and the next one (buffer2) for example size 10 isn't, will the buffer still have the chars of buffer1 in it for example at buffer2[20] = buffer1[20] because it didn't overwrite?
How to know how many chars are inside the buffer, so you can backward loop the buffer
To answer your questions, firstly you must understand fgets(3). It reads a line at once, fgets() returns after newline or EOF is reached. And a '\0' is appended to terminate the string. I answer to your question as followings...
Normally fgets() takes a line of string at a time not the entire file. In your code, it means the line has maximum length of 65535 + '\0' makes 65536, which is considered too long. To read a text file, normally you have to put fgets() in for () or while () loop until EOF is reached (buffer == NULL).
If it reads a line, which is longer than the length of buffer, it will read only buffer length - 1 chars, then append '\0'.
strlen(3) will give you the length of the string.
As the comments suggest, the man can help here, or google... From cppreference
Reads at most count - 1 characters from the given file stream and stores them in the character array pointed to by str. Parsing stops if end-of-file occurs or a newline character is found, in which case str will contain that newline character. If no errors occur, writes a null character at the position immediately after the last character written to str.
and also:
Return value: str on success, null pointer on failure.
Basically the number is the amount of chars you read+1. Usually you don't gobble the whole line but only some of it. In any case the buffer has a null terminator and you can keep reading until EOF.
Let's say i have a file containing a string of length 10 :
abcdefghij
And read my text file line by line using fgets() with a buffer of size 4.
I will have to call fgets() in a loop to make sure the whole line is read.
But, at say the first call, it will read the first 3 characters (buffer size -1) right ? Will it also append a null terminating character at the last position of my 4 char buffer even if the real end of my string wasn't reached, or will fill it with 4 characters and no null terminating characters at the end of my buffer ? which would make a call to strlen() impossible ?
Thank you :)
It is in the documentation.
Verbatim from man fgets (italics by me):
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops after
an EOF or a newline. If a newline is read, it is stored into the
buffer. A terminating null byte ('\0') is stored after the last
character in the buffer.
Verbatim from the HP-UX 11 man page (italics by me):
fgets()
Reads characters from the stream into the array pointed
to by s, until n-1 characters are read, a new-line
character is read and transferred to s, or an end-of-
file condition is encountered. The string is then
terminated with a null character.
From the POSIX specs:
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Last not least from MSDN:
The fgets function reads a string from the input stream argument and stores it in str. fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The result stored in str is appended with a null character. The newline character, if read, is included in the string.
Looking at the man page of fgets you can see :
"The newline, if any, is retained. If any characters are read and
there is no error, a `\0' character is appended to end the string"
(http://www.manpagez.com/man/3/fgets/)
From C standard draft (2010) n1547.pdf, 7.21.7.2.2:
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
I used fseek (moves a pointer in the file).
Program looked like (pseudo-code):
read string[default length value];
while string length == default length value
{
resize string;
move back to the line start with fseek;
read string;
}
To I used malloc and realloc to resize the string (which is a pointer).
I hope to know what will be in the left part of buf after fgets() been exacted. For example:
char buf[100];
fgets(buf, sizeof(buf), fp);
If one line has just 10 characters + '\n', then what will be in the left part of buf (from but[12] to buf[99])?
If execute fgets() twice, will the second input cover the first input to buf?
When fgets reads data it changes one element of the buffer at a time (simplifying assumption) until it reaches the limit or it finds a terminator in the input. All other elements in the buffer remain the same as before calling fgets (thus, they might have random data or they might leak previously read info).
#define SIZE 100
...
char buf[SIZE];
fgets reads at most SIZE - 1 characters from the given file stream and stores them in buf. The produced character string is always NULL-terminated. Parsing stops if end-of-file occurs or a newline character is found, in which case buf will contain that newline character. The left of data remains unchanged and it will contain whatever it was holding earlier ,it may be random data or anything left in the memory.And this is validated by the C standard below :
From the C11 standard:-
7.21.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n, FILE * restrict stream);
Description
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after
a new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
Returns
The fgets function returns s if successful. If end-of-file is
encountered and no characters have been read into the array, the
contents of the array remain unchanged and a null pointer is
returned. If a read error occurs during the operation, the array
contents are indeterminate and a null pointer is returned.
Emphasis mine :)
Now the second question :-
If execute fgets() twice, will the second input cover the first input to buf?
If by cover the second input you mean overwrite then yes ,obviously it will overwrite the first input ,you can execute this yourself and see.
I am trying to read strings from a file that has each string on a new line but I think it reads a newline character once instead of a string and I don't know why. If I'm going about reading strings the wrong way please correct me.
i=0;
F1 = fopen("alg.txt", "r");
F2 = fopen("tul.txt", "w");
if(!feof(F1)) {
do{ //start scanning file
fgets(inimene[i].Enimi, 20, F1);
fgets(inimene[i].Pnimi, 20, F1);
fgets(inimene[i].Kood, 12, F1);
printf("i=%d\nEnimi=%s\nPnimi=%s\nKaad=%s",i,inimene[i].Enimi,inimene[i].Pnimi,inimene[i].Kood);
i++;}
while(!feof(F1));};
/*finish getting structs*/
The printf is there to let me see what was read into what and here is the result
i=0
Enimi=peter
Pnimi=pupkin
Kood=223456iatb i=1
Enimi=
Pnimi=masha
Kaad=gubkina
i=2
Enimi=234567iasb
Pnimi=sasha
Kood=dudkina
As you can see after the first struct is read there is a blank(a newline?) onct and then everything is shifted. I suppose I could read a dummy string to absorb that extra blank and then nothing would be shifted, but that doesn't help me understand the problem and avoid in the future.
Edit 1: I know that it stops at a newline character but still reads it. I'm wondering why it doesn't read it during the third string and transfers to the fourth string instead of giving the fourth string the fourth line of the source but it happens just once.
The file is formatted like this by the way
peter
pupkin
223456iatb
masha
gubkina
234567iasb
sasha
dudkina
123456iasb
fgets stops reading when it reads a newline, but the newline is considered a valid character and is included in the returned string.
If you want to remove it, you'll need to trim it yourself:
length = strlen(str);
if (str[length - 1] == '\n')
str[length - 1] = '\0';
Where str is the string into which you read the data from the file, and length is of type size_t.
To answer the edit to the question: the reason the newline is not read during the third read is because you are not reading enough characters. You give fgets a limit of 12 characters, which means it can actually read a maximum of 11 characters since it has to add the null terminator to the end.
The line you read is 11 characters in length before the newline. Note that there is a space at the end of that line when you output it:
Kood=223456iatb i=1
^
As already stated, if there's enough room in the buffer, then fgets() reads the data including the newline into the buffer and null terminates the line. If there isn't enough room in the buffer before coming across the newline, fgets() copies what it can (the length of the buffer minus one byte) and null terminates the string. The library resumes reading from where fgets() left off on the next iteration.
Don't mess with buffers smaller than 2 bytes long.
Note that gets() removes the newline (but does not protect you from buffer overflows, so do not use it). If things go as currently planned, gets() will be removed from the next version of the C standard; it will be a long time before it is removed from C libraries (it will just become a non-standard - or ex-standard - additional function available for abuse).
Your code should check each of the fgets() function calls:
while (fgets(inimene[i].Enimi, 20, F1) != 0 &&
fgets(inimene[i].Pnimi, 20, F1) != 0 &&
fgets(inimene[i].Kood, 12, F1) != 0)
{
printf("i=%d\nEnimi=%s\nPnimi=%s\nKaad=%s", i, inimene[i].Enimi, inimene[i].Pnimi, inimene[i].Kood);
i++;
}
There are places for do/while loops; they are not used very often, though.
the fgets function reads newline char as a part of the string read.
From the description of fgets:
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a newline is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
if Enimi/Pnimi/Kood are arrays not pointers:
while( fgets(inimene[i].Enimi,sizeof inimene[i].Enimi,F1) &&
fgets(inimene[i].Pnimi,sizeof inimene[i].Pnimi,F1) &&
fgets(inimene[i].Kood,sizeof inimene[i].Kood,F1) )
{
if( strchr(inimene[i].Enimi,'\n') ) *strchr(inimene[i].Enimi,'\n')=0;
if( strchr(inimene[i].Pnimi,'\n') ) *strchr(inimene[i].Pnimi,'\n')=0;
if( strchr(inimene[i].Kood,'\n') ) *strchr(inimene[i].Kood,'\n')=0;
printf("i=%d\nEnimi=%s\nPnimi=%s\nKaad=%s", i, inimene[i].Enimi, inimene[i].Pnimi,inimene[i].Kood);
i++;
}