I have a program in C, which reads in a file with buffers:
char buffer[65536];
while(fgets(buffer,65536,inputFile)){...}
Now I have some questions:
What happens if the filesize is smaller than buffersize?
Does fgets() overwrite the full buffer, so if previous buffer (buffer1) was full, and the next one (buffer2) for example size 10 isn't, will the buffer still have the chars of buffer1 in it for example at buffer2[20] = buffer1[20] because it didn't overwrite?
How to know how many chars are inside the buffer, so you can backward loop the buffer
To answer your questions, firstly you must understand fgets(3). It reads a line at once, fgets() returns after newline or EOF is reached. And a '\0' is appended to terminate the string. I answer to your question as followings...
Normally fgets() takes a line of string at a time not the entire file. In your code, it means the line has maximum length of 65535 + '\0' makes 65536, which is considered too long. To read a text file, normally you have to put fgets() in for () or while () loop until EOF is reached (buffer == NULL).
If it reads a line, which is longer than the length of buffer, it will read only buffer length - 1 chars, then append '\0'.
strlen(3) will give you the length of the string.
As the comments suggest, the man can help here, or google... From cppreference
Reads at most count - 1 characters from the given file stream and stores them in the character array pointed to by str. Parsing stops if end-of-file occurs or a newline character is found, in which case str will contain that newline character. If no errors occur, writes a null character at the position immediately after the last character written to str.
and also:
Return value: str on success, null pointer on failure.
Basically the number is the amount of chars you read+1. Usually you don't gobble the whole line but only some of it. In any case the buffer has a null terminator and you can keep reading until EOF.
Related
I am learning fgets from the manpage. I did some tests on fgets to make sure I understand it. One of the tests I did results in behaviour contrary to what is specified in the man page. The man page says:
char *fgets(char s[restrict .size], int size, FILE *restrict stream);
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an EOF
or a newline. If a newline is read, it is stored into the buffer. A
terminating null byte ('\0') is stored after the last character in the
buffer.
But it doesn't "read in at most one less than size characters from stream". As demonstrated by the following program:
#include<stdio.h>
#include<stdlib.h>
int main(){
FILE *fp;
fp=fopen("sample", "r");
char *s=calloc(50, sizeof(char));
while(fgets(s,2,fp)!=NULL) printf("%s",s);
}
The sample file:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The output of the compiled binary:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The expected output according to the man page:
t
j
a
4
5
Is the man page wrong, or am I missing something?
Is the man page wrong or am i missing something?
I won't say that the man page is wrong but it could be more clear.
There are 3 things that may stop fgets from reading from the stream.
The buffer is full (i.e. only room left for the termination character)
A newline character was read from the stream
End-Of-File occured
The quoted man page only mentions two of those conditions clearly.
Reading stops after an EOF or a newline.
That is #2 and #3 are mentioned very explicit while #1 is (kind of) derived from
reads in at most one less than size characters from stream
Here is another description from https://man7.org/linux/man-pages/man3/fgets.3p.html
... read bytes from stream into the array pointed to by s until n-1 bytes are read, or a newline is read and transferred to s, or an end-of-file condition is encountered.
where the 3 cases are clearly mentioned.
But yes... you are missing something. Once the buffer gets full, the rest of the current line is not read and discarded. The rest will stay in the stream and be available for the next read. So nothing is lost. You just need more fgets calls to read all data.
As suggested in a number of comments (e.g. Fe2O3 and Lundin) you can see this if you change the print statement so that it includes a delimiter of some kind. For instance (from Lundin):
printf("|%s|",s);
This will make clear exactly what you got from the individual fgets calls.
In the provided quote there is writte clear
If a newline is read, it is stored into the buffer.
Where do you see that this call fgets(s,2,fp) reads the new line character for example when reading this line?
thiis is line no. 1
The line contains only one new line character at its end.
This call reads only one character after another that is character by character that is appended by the terminating zero character '\0'.
So the read strings look like
{ 't', '\0' }
{ 'h', '\0' },
{ 'i', '\0' }
// ...
{ '1', '\0' }
{ '\n', '\0' }
If you have a call of fgets like that
fgets(s,n,fp)
then at most n-1 characters are read from the input stream. One character is reserved for the terminating zero character '\0' to build a string.
From the C Standard (7.21.7.2 The fgets function)
2 The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array
Currently to read a file line by line in C I am using:
char buffer[1024];
while(fgets(buffer, sizeof(buffer), file) != NULL) {
//do something with each line that is now stored in buffer
}
However there is no guarantee in the file that the line will be shorter than 1024. What will happen if a line is longer than 1024? Will the rest of the line be read in the next iteration of the while loop?
And how can I read line by line without a maximum length?
Yes, the rest of the line will be read in the next iteration.
You can detect whether or not you read a whole line by inspecting the last character of the string (i.e. the one before the null terminator) to see if it is '\n' or not -- fgets passes '\n' through to you.
There is no Standard C function which will read a line whilst dynamically allocating enough memory for it, however there is a POSIX function getline() which does that. You could write your own that uses fgets or otherwise to do the reading, in a loop with realloc, of course.
From the standards §7.19.7.2,
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into the
array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
From MSDN,
fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The newline character, if read, is included in the string.
So, yes fgets will read the rest of the line in next iteration if the it doesn't encounters the newline character within sizeof(buffer)-1 range.
If you want to read the whole line in one shot, then it is better to go with malloc and, if needed, reallocing the memory as per your needs.
Let's say i have a file containing a string of length 10 :
abcdefghij
And read my text file line by line using fgets() with a buffer of size 4.
I will have to call fgets() in a loop to make sure the whole line is read.
But, at say the first call, it will read the first 3 characters (buffer size -1) right ? Will it also append a null terminating character at the last position of my 4 char buffer even if the real end of my string wasn't reached, or will fill it with 4 characters and no null terminating characters at the end of my buffer ? which would make a call to strlen() impossible ?
Thank you :)
It is in the documentation.
Verbatim from man fgets (italics by me):
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops after
an EOF or a newline. If a newline is read, it is stored into the
buffer. A terminating null byte ('\0') is stored after the last
character in the buffer.
Verbatim from the HP-UX 11 man page (italics by me):
fgets()
Reads characters from the stream into the array pointed
to by s, until n-1 characters are read, a new-line
character is read and transferred to s, or an end-of-
file condition is encountered. The string is then
terminated with a null character.
From the POSIX specs:
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Last not least from MSDN:
The fgets function reads a string from the input stream argument and stores it in str. fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The result stored in str is appended with a null character. The newline character, if read, is included in the string.
Looking at the man page of fgets you can see :
"The newline, if any, is retained. If any characters are read and
there is no error, a `\0' character is appended to end the string"
(http://www.manpagez.com/man/3/fgets/)
From C standard draft (2010) n1547.pdf, 7.21.7.2.2:
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
I used fseek (moves a pointer in the file).
Program looked like (pseudo-code):
read string[default length value];
while string length == default length value
{
resize string;
move back to the line start with fseek;
read string;
}
To I used malloc and realloc to resize the string (which is a pointer).
I hope to know what will be in the left part of buf after fgets() been exacted. For example:
char buf[100];
fgets(buf, sizeof(buf), fp);
If one line has just 10 characters + '\n', then what will be in the left part of buf (from but[12] to buf[99])?
If execute fgets() twice, will the second input cover the first input to buf?
When fgets reads data it changes one element of the buffer at a time (simplifying assumption) until it reaches the limit or it finds a terminator in the input. All other elements in the buffer remain the same as before calling fgets (thus, they might have random data or they might leak previously read info).
#define SIZE 100
...
char buf[SIZE];
fgets reads at most SIZE - 1 characters from the given file stream and stores them in buf. The produced character string is always NULL-terminated. Parsing stops if end-of-file occurs or a newline character is found, in which case buf will contain that newline character. The left of data remains unchanged and it will contain whatever it was holding earlier ,it may be random data or anything left in the memory.And this is validated by the C standard below :
From the C11 standard:-
7.21.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n, FILE * restrict stream);
Description
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after
a new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
Returns
The fgets function returns s if successful. If end-of-file is
encountered and no characters have been read into the array, the
contents of the array remain unchanged and a null pointer is
returned. If a read error occurs during the operation, the array
contents are indeterminate and a null pointer is returned.
Emphasis mine :)
Now the second question :-
If execute fgets() twice, will the second input cover the first input to buf?
If by cover the second input you mean overwrite then yes ,obviously it will overwrite the first input ,you can execute this yourself and see.
Assuming the FILE* is valid, consider:
char buf[128];
if(fgets(buf,sizeof buf,myFile) != NULL) {
strlen(buf) == 0; //can this ever be true ? In what cases ?
}
Yes. Besides passing 1 (as noted by Ignacio), fgets doesn't do any special handling for embedded nulls. So if the next character in the FILE * is NUL, strlen will be 0. This is one of the reasons why I prefer the POSIX getline function. It returns the number of characters read so embedded nulls are not a problem.
From the fgets(3) man page:
DESCRIPTION
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an
EOF or a newline. If a newline is read, it is stored into the buffer.
A '\0' is stored after the last character in the buffer.
...
RETURN VALUE
...
gets() and fgets() return s on success, and NULL on error or when end
of file occurs while no characters have been read.
From that, it can be inferred that a size of 1 will cause it to read an empty string. Experimentation here confirms that.
Incidentally, a size of 0 appears to not modify the buffer at all, not even putting in a \0.