I hope to know what will be in the left part of buf after fgets() been exacted. For example:
char buf[100];
fgets(buf, sizeof(buf), fp);
If one line has just 10 characters + '\n', then what will be in the left part of buf (from but[12] to buf[99])?
If execute fgets() twice, will the second input cover the first input to buf?
When fgets reads data it changes one element of the buffer at a time (simplifying assumption) until it reaches the limit or it finds a terminator in the input. All other elements in the buffer remain the same as before calling fgets (thus, they might have random data or they might leak previously read info).
#define SIZE 100
...
char buf[SIZE];
fgets reads at most SIZE - 1 characters from the given file stream and stores them in buf. The produced character string is always NULL-terminated. Parsing stops if end-of-file occurs or a newline character is found, in which case buf will contain that newline character. The left of data remains unchanged and it will contain whatever it was holding earlier ,it may be random data or anything left in the memory.And this is validated by the C standard below :
From the C11 standard:-
7.21.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n, FILE * restrict stream);
Description
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after
a new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
Returns
The fgets function returns s if successful. If end-of-file is
encountered and no characters have been read into the array, the
contents of the array remain unchanged and a null pointer is
returned. If a read error occurs during the operation, the array
contents are indeterminate and a null pointer is returned.
Emphasis mine :)
Now the second question :-
If execute fgets() twice, will the second input cover the first input to buf?
If by cover the second input you mean overwrite then yes ,obviously it will overwrite the first input ,you can execute this yourself and see.
Related
I am learning fgets from the manpage. I did some tests on fgets to make sure I understand it. One of the tests I did results in behaviour contrary to what is specified in the man page. The man page says:
char *fgets(char s[restrict .size], int size, FILE *restrict stream);
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an EOF
or a newline. If a newline is read, it is stored into the buffer. A
terminating null byte ('\0') is stored after the last character in the
buffer.
But it doesn't "read in at most one less than size characters from stream". As demonstrated by the following program:
#include<stdio.h>
#include<stdlib.h>
int main(){
FILE *fp;
fp=fopen("sample", "r");
char *s=calloc(50, sizeof(char));
while(fgets(s,2,fp)!=NULL) printf("%s",s);
}
The sample file:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The output of the compiled binary:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The expected output according to the man page:
t
j
a
4
5
Is the man page wrong, or am I missing something?
Is the man page wrong or am i missing something?
I won't say that the man page is wrong but it could be more clear.
There are 3 things that may stop fgets from reading from the stream.
The buffer is full (i.e. only room left for the termination character)
A newline character was read from the stream
End-Of-File occured
The quoted man page only mentions two of those conditions clearly.
Reading stops after an EOF or a newline.
That is #2 and #3 are mentioned very explicit while #1 is (kind of) derived from
reads in at most one less than size characters from stream
Here is another description from https://man7.org/linux/man-pages/man3/fgets.3p.html
... read bytes from stream into the array pointed to by s until n-1 bytes are read, or a newline is read and transferred to s, or an end-of-file condition is encountered.
where the 3 cases are clearly mentioned.
But yes... you are missing something. Once the buffer gets full, the rest of the current line is not read and discarded. The rest will stay in the stream and be available for the next read. So nothing is lost. You just need more fgets calls to read all data.
As suggested in a number of comments (e.g. Fe2O3 and Lundin) you can see this if you change the print statement so that it includes a delimiter of some kind. For instance (from Lundin):
printf("|%s|",s);
This will make clear exactly what you got from the individual fgets calls.
In the provided quote there is writte clear
If a newline is read, it is stored into the buffer.
Where do you see that this call fgets(s,2,fp) reads the new line character for example when reading this line?
thiis is line no. 1
The line contains only one new line character at its end.
This call reads only one character after another that is character by character that is appended by the terminating zero character '\0'.
So the read strings look like
{ 't', '\0' }
{ 'h', '\0' },
{ 'i', '\0' }
// ...
{ '1', '\0' }
{ '\n', '\0' }
If you have a call of fgets like that
fgets(s,n,fp)
then at most n-1 characters are read from the input stream. One character is reserved for the terminating zero character '\0' to build a string.
From the C Standard (7.21.7.2 The fgets function)
2 The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array
I have a program in C, which reads in a file with buffers:
char buffer[65536];
while(fgets(buffer,65536,inputFile)){...}
Now I have some questions:
What happens if the filesize is smaller than buffersize?
Does fgets() overwrite the full buffer, so if previous buffer (buffer1) was full, and the next one (buffer2) for example size 10 isn't, will the buffer still have the chars of buffer1 in it for example at buffer2[20] = buffer1[20] because it didn't overwrite?
How to know how many chars are inside the buffer, so you can backward loop the buffer
To answer your questions, firstly you must understand fgets(3). It reads a line at once, fgets() returns after newline or EOF is reached. And a '\0' is appended to terminate the string. I answer to your question as followings...
Normally fgets() takes a line of string at a time not the entire file. In your code, it means the line has maximum length of 65535 + '\0' makes 65536, which is considered too long. To read a text file, normally you have to put fgets() in for () or while () loop until EOF is reached (buffer == NULL).
If it reads a line, which is longer than the length of buffer, it will read only buffer length - 1 chars, then append '\0'.
strlen(3) will give you the length of the string.
As the comments suggest, the man can help here, or google... From cppreference
Reads at most count - 1 characters from the given file stream and stores them in the character array pointed to by str. Parsing stops if end-of-file occurs or a newline character is found, in which case str will contain that newline character. If no errors occur, writes a null character at the position immediately after the last character written to str.
and also:
Return value: str on success, null pointer on failure.
Basically the number is the amount of chars you read+1. Usually you don't gobble the whole line but only some of it. In any case the buffer has a null terminator and you can keep reading until EOF.
May seem like a silly question for most of you, but I'm still trying to determine the final answer. Some hours ago I decided to replace all the scanf() functions in my project with the fgets() in order to get a more robust code.
I learned that the fgets() automatically ends the inserted input string with the '\n' and the NUL characters but..
let's say I have something like this:
char user[16];
An array of 16 char which stores a username (15 characters max, I reserve the last one for the NUL terminator).
The question is: if I insert a 15 characters strings, then the '\n' would end up in the last cell of the array, but what about the NUL terminator?
does the '\0' get stored in the following block of memory?
(no segmentation fault when calling the printf() function implies that the inserted string is actually NUL terminated, right?).
As a complement to 5gon12eder answer. I assume you have something like :
char user[16];
fgets(user, 16, stdin);
and your input is abcdefghijklmno\n , that is 15 characters and a newline.
fgets will put in user the 15 (16-1) first characters of the input followed by a null and you will effectively get "abcdefghijklmno", which is what you want
But ... the \n still remains in stream buffer an is actually available for next read (be it a fgets or anything else) on same FILE. More exactly, until you do another fgets you cannot know whether there was other characters following the o.
As #5gon12eder suggests, use:
char user[16];
fgets(user, sizeof user, stdin);
// Function prototype for reference
#include <stdio.h>
char *fgets(char * restrict s, int n, FILE * restrict stream);
Now for details:
The '\n' and the '\0' are not automatically appended. Only the '\0' is automatically appended. fgets() will stop reading once it gets a '\n', but will stop for other reasons too including a full buffer. In those cases, there is no '\n' before the '\0'.
fgets() does not read a C string, but reads a line. The input stream is typically in text mode and then end-of-line translations occur. On some systems, '\r', '\n' pair will translate to '\n'. On others, it will not. Usually the files being read match this translation, but exceptions occur. In binary mode, no translations occur.
fgets() reads in '\0'. and continues reading. Thus using strlen(buf) does not always reflect the true number of char read. There may be a full-proof method to determine the true number of char read when '\0' are in the middle, but itis is likely easier to code with fread() or fgetc().
On EOF condition (and no data read) or IO error, fgets() returns NULL. When an I/O error occurs, the contents of the buffer is not defined.
Pedantic issue: The C standard uses a type of int as the size of the buffer but often code passes a variable of type size_t. A size n less than 1 or more than INT_MAX can be a problem. A size of 1 should do nothing more than fill the buf[0] = '\0', but some systems behave differently especially if the EOF condition is near or passed. But as long as 2 <= n <= INT_MAX, a terminating '\0' can be expected. Note: fgets() may return NULL when the size is too small.
Code typically likes to delete the terminating '\n' with something that could cause trouble. Suggest:
char buf[80];
if (fgets(buf, sizeof buf, stdin) == NULL) Handle_IOError_or_EOF();
// IMO potential UB and undesired behavior
// buf[strlen(buf)-1] = '\0';
// Suggested end-of-line deleter
size_t len = strlen(buf);
if (len > 0 && buf[len - 1] == '\n') buf[--len] = '\0';
Robust code checks the return value from fgets(). The following approach has short-comings. 1) if an IO Error occurred the buffer contents are not defined. Checking the buffer contents will not provide reliable results . 2) A '\0' may have been the first char read and the file is not in the EOF condition.
// Following is weak code.
buf[0] = '\0';
fgets(buf, sizeof buf, stdin);
if (strlen(buf) == 0) Handle_EOF();
// Robust, but too much for code snippets
if (fgets(buf, sizeof buf, stdin) == NULL) {
if (ferror(stdin)) Handle_IOError();
else if (feof(stdin)) Handle_EOF();
else if (sizeof buf <= 1) Handle_too_small_buffer(); // pedantic check
else Hmmmmmmm();
}
Documentation of fgets from the C99 Standard (N1256)
7.19.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);
Description
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A
null character is written immediately after the last character read into the array.
Coming to your post, you said:
An array of 16 char which stores a username (15 characters max, I reserve the last one for the NUL terminator). The question is: if I insert a 15 characters strings, then the '\n' would end up in the last cell of the array, but what about the NUL terminator?
For such a case, the newline character is not read until the next call to fgets or any other call to read from the stream.
does the '\0' get stored in the following block of memory? (no segmentation fault when calling the printf() function implies that the inserted string is actually NUL terminated, right?).
The terminating null character is always set. In your case, the 16-th character will be the terminating null character.
From the man page of fgets:
char *fgets(char *s, int size, FILE *stream);
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
I think that is pretty clear, isn't it?
Currently to read a file line by line in C I am using:
char buffer[1024];
while(fgets(buffer, sizeof(buffer), file) != NULL) {
//do something with each line that is now stored in buffer
}
However there is no guarantee in the file that the line will be shorter than 1024. What will happen if a line is longer than 1024? Will the rest of the line be read in the next iteration of the while loop?
And how can I read line by line without a maximum length?
Yes, the rest of the line will be read in the next iteration.
You can detect whether or not you read a whole line by inspecting the last character of the string (i.e. the one before the null terminator) to see if it is '\n' or not -- fgets passes '\n' through to you.
There is no Standard C function which will read a line whilst dynamically allocating enough memory for it, however there is a POSIX function getline() which does that. You could write your own that uses fgets or otherwise to do the reading, in a loop with realloc, of course.
From the standards §7.19.7.2,
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into the
array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
From MSDN,
fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The newline character, if read, is included in the string.
So, yes fgets will read the rest of the line in next iteration if the it doesn't encounters the newline character within sizeof(buffer)-1 range.
If you want to read the whole line in one shot, then it is better to go with malloc and, if needed, reallocing the memory as per your needs.
Let's say i have a file containing a string of length 10 :
abcdefghij
And read my text file line by line using fgets() with a buffer of size 4.
I will have to call fgets() in a loop to make sure the whole line is read.
But, at say the first call, it will read the first 3 characters (buffer size -1) right ? Will it also append a null terminating character at the last position of my 4 char buffer even if the real end of my string wasn't reached, or will fill it with 4 characters and no null terminating characters at the end of my buffer ? which would make a call to strlen() impossible ?
Thank you :)
It is in the documentation.
Verbatim from man fgets (italics by me):
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops after
an EOF or a newline. If a newline is read, it is stored into the
buffer. A terminating null byte ('\0') is stored after the last
character in the buffer.
Verbatim from the HP-UX 11 man page (italics by me):
fgets()
Reads characters from the stream into the array pointed
to by s, until n-1 characters are read, a new-line
character is read and transferred to s, or an end-of-
file condition is encountered. The string is then
terminated with a null character.
From the POSIX specs:
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.
Last not least from MSDN:
The fgets function reads a string from the input stream argument and stores it in str. fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The result stored in str is appended with a null character. The newline character, if read, is included in the string.
Looking at the man page of fgets you can see :
"The newline, if any, is retained. If any characters are read and
there is no error, a `\0' character is appended to end the string"
(http://www.manpagez.com/man/3/fgets/)
From C standard draft (2010) n1547.pdf, 7.21.7.2.2:
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
I used fseek (moves a pointer in the file).
Program looked like (pseudo-code):
read string[default length value];
while string length == default length value
{
resize string;
move back to the line start with fseek;
read string;
}
To I used malloc and realloc to resize the string (which is a pointer).