Reading files with fgets() - c

I have a doubt about reading files in C using fgets(). I've seen people use loops in order to do this, but I skip the loop part, doing this instead.
What's the difference between using a loop and my way?
#include <stdio.h>
#include <stdlib.h>
int main() {
FILE *file = NULL;
char string[30];
file = fopen("test.txt", "r"); //test.txt contains "Hello world!"
if (file == NULL) {
puts("ERROR");
return 1;
}
fgets(string, 30, file);
puts(string);
fclose(file);
return 0;
}
Outputs:
Hello world!

What's the difference between using a loop and my way?
OP's way has many problems.
Wrong buffer size
fgets(string, 30, file); overstates the buffer size of 10 allowing undefined behavior (UB) due to a potential buffer overflow.
Input result not checked
fgets(string, 30, file); does not check the return value of fgets().
Until the return value is checked, the contents of string are not known to be updated correctly.
Extra '\n'
puts(string); appends an extra '\n'.
The entire file is not certainly read
A single read might not read the entire contents. Use a loop.
Alternative: read until fgets() returns NULL.
while (fgets(string, sizeof string, file)) {
fputs(string, stdout);
}

From the man page:
The fgets() function shall read bytes from stream into the array pointed to by s until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. A null byte shall be written immediately after the last bytes read into the array.
According to this, fgets stops reading when it encounters a newline, an EOF condition, an input error, or when it has read n - 1 characters. So your approach only reads one line from the file. That's well and good if you need to read only a single line.
To read a whole file line by line, fgets is called in a loop until an EOF condition is reached. Another way would be to read the whole file into a buffer with fread, and then parse it.
Or read it character by character by calling getc in a loop.
EDIT: In your code, fgets is trying to read (n - 1) 29 bytes of memory whereas you allocated only 10 bytes for the buffer. This leads to undefined behaviour. The memory not allocated should not be read. Use sizeof (string) instead.
"Hello World!" can not fit in a buffer you allocated 10 bytes for.
RETURN VALUE:
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer, and shall set errno to indicate the error.
You didn't check the return value of fgets.

Related

How to get fscanf to stop if it hits a newline? [duplicate]

I'm trying to read a line using the following code:
while(fscanf(f, "%[^\n\r]s", cLine) != EOF )
{
/* do something with cLine */
}
But somehow I get only the first line every time. Is this a bad way to read a line? What should I fix to make it work as expected?
It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure.
I prefer to use fgets() to get each line in and then sscanf() that. You can then continue to examine the line read in as you see fit. Something like:
#define LINESZ 1024
char buff[LINESZ];
FILE *fin = fopen ("infile.txt", "r");
if (fin != NULL) {
while (fgets (buff, LINESZ, fin)) {
/* Process buff here. */
}
fclose (fin);
}
fgets() appears to be what you're trying to do, reading in a string until you encounter a newline character.
If you want read a file line by line (Here, line separator == '\n') just make that:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *fp;
char *buffer;
int ret;
// Open a file ("test.txt")
if ((fp = fopen("test.txt", "r")) == NULL) {
fprintf(stdout, "Error: Can't open file !\n");
return -1;
}
// Alloc buffer size (Set your max line size)
buffer = malloc(sizeof(char) * 4096);
while(!feof(fp))
{
// Clean buffer
memset(buffer, 0, 4096);
// Read a line
ret = fscanf(fp, "%4095[^\n]\n", buffer);
if (ret != EOF) {
// Print line
fprintf(stdout, "%s\n", buffer);
}
}
// Free buffer
free(buffer);
// Close file
fclose(fp);
return 0;
}
Enjoy :)
If you try while( fscanf( f, "%27[^\n\r]", cLine ) == 1 ) you might have a little more luck. The three changes from your original:
length-limit what gets read in - I've used 27 here as an example, and unfortunately the scanf() family require the field width literally in the format string and can't use the * mechanism that the printf() can for passing the value in
get rid of the s in the format string - %[ is the format specifier for "all characters matching or not matching a set", and the set is terminated by a ] on its own
compare the return value against the number of conversions you expect to happen (and for ease of management, ensure that number is 1)
That said, you'll get the same result with less pain by using fgets() to read in as much of a line as will fit in your buffer.
Using fscanf to read/tokenise a file always results in fragile code or pain and suffering. Reading a line, and tokenising or scanning that line is safe, and effective. It needs more lines of code - which means it takes longer to THINK about what you want to do (and you need to handle a finite input buffer size) - but after that life just stinks less.
Don't fight fscanf. Just don't use it. Ever.
It looks to me like you're trying to use regex operators in your fscanf string. The string [^\n\r] doesn't mean anything to fscanf, which is why your code doesn't work as expected.
Furthermore, fscanf() doesn't return EOF if the item doesn't match. Rather, it returns an integer that indicates the number of matches--which in your case is probably zero. EOF is only returned at the end of the stream or in case of an error. So what's happening in your case is that the first call to fscanf() reads all the way to the end of the file looking for a matching string, then returns 0 to let you know that no match was found. The second call then returns EOF because the entire file has been read.
Finally, note that the %s scanf format operator only captures to the next whitespace character, so you don't need to exclude \n or \r in any case.
Consult the fscanf documentation for more information: http://www.cplusplus.com/reference/clibrary/cstdio/fscanf/
Your loop has several issues. You wrote:
while( fscanf( f, "%[^\n\r]s", cLine ) != EOF )
/* do something */;
Some things to consider:
fscanf() returns the number of items stored. It can return EOF if it reads past the end of file or if the file handle has an error. You need to distinguish a valid return of zero in which case there is no new content in the buffer cLine from a successfully read.
You do a have a problem when a failure to match occurs because it is difficult to predict where the file handle is now pointing in the stream. This makes recovery from a failed match harder to do than might be expected.
The pattern you wrote probably doesn't do what you intended. It is matching any number of characters that are not CR or LF, and then expecting to find a literal s.
You haven't protected your buffer from an overflow. Any number of characters may be read from the file and written to the buffer, regardless of the size allocated to that buffer. This is an unfortunately common error, that in many cases can be exploited by an attacker to run arbitrary code of the attackers choosing.
Unless you specifically requested that f be opened in binary mode, line ending translation will happen in the library and you will generally never see CR characters, and usually not in text files.
You probably want a loop more like the following:
while(fgets(cLine, N_CLINE, f)) {
/* do something */ ;
}
where N_CLINE is the number of bytes available in the buffer starting a cLine.
The fgets() function is a much preferred way to read a line from a file. Its second parameter is the size of the buffer, and it reads up to 1 less than that size bytes from the file into the buffer. It always terminates the buffer with a nul character so that it can be safely passed to other C string functions.
It stops on the first of end of file, newline, or buffer_size-1 bytes read.
It leaves the newline character in the buffer, and that fact allows you to distinguish a single line longer than your buffer from a line shorter than the buffer.
It returns NULL if no bytes were copied due to end of file or an error, and the pointer to the buffer otherwise. You might want to use feof() and/or ferror() to distinguish those cases.
i think the problem with this code is because when you read with %[^\n\r]s, in fact, you reading until reach '\n' or '\r', but you don't reading the '\n' or '\r' also.
So you need to get this character before you read with fscanf again at loop.
Do something like that:
do{
fscanf(f, "%[^\n\r]s", cLine) != EOF
/* Do something here */
}while(fgetc(file) != EOF)

does the fgets() function append the \n\0 characters exceeding the maximum length?

May seem like a silly question for most of you, but I'm still trying to determine the final answer. Some hours ago I decided to replace all the scanf() functions in my project with the fgets() in order to get a more robust code.
I learned that the fgets() automatically ends the inserted input string with the '\n' and the NUL characters but..
let's say I have something like this:
char user[16];
An array of 16 char which stores a username (15 characters max, I reserve the last one for the NUL terminator).
The question is: if I insert a 15 characters strings, then the '\n' would end up in the last cell of the array, but what about the NUL terminator?
does the '\0' get stored in the following block of memory?
(no segmentation fault when calling the printf() function implies that the inserted string is actually NUL terminated, right?).
As a complement to 5gon12eder answer. I assume you have something like :
char user[16];
fgets(user, 16, stdin);
and your input is abcdefghijklmno\n , that is 15 characters and a newline.
fgets will put in user the 15 (16-1) first characters of the input followed by a null and you will effectively get "abcdefghijklmno", which is what you want
But ... the \n still remains in stream buffer an is actually available for next read (be it a fgets or anything else) on same FILE. More exactly, until you do another fgets you cannot know whether there was other characters following the o.
As #5gon12eder suggests, use:
char user[16];
fgets(user, sizeof user, stdin);
// Function prototype for reference
#include <stdio.h>
char *fgets(char * restrict s, int n, FILE * restrict stream);
Now for details:
The '\n' and the '\0' are not automatically appended. Only the '\0' is automatically appended. fgets() will stop reading once it gets a '\n', but will stop for other reasons too including a full buffer. In those cases, there is no '\n' before the '\0'.
fgets() does not read a C string, but reads a line. The input stream is typically in text mode and then end-of-line translations occur. On some systems, '\r', '\n' pair will translate to '\n'. On others, it will not. Usually the files being read match this translation, but exceptions occur. In binary mode, no translations occur.
fgets() reads in '\0'. and continues reading. Thus using strlen(buf) does not always reflect the true number of char read. There may be a full-proof method to determine the true number of char read when '\0' are in the middle, but itis is likely easier to code with fread() or fgetc().
On EOF condition (and no data read) or IO error, fgets() returns NULL. When an I/O error occurs, the contents of the buffer is not defined.
Pedantic issue: The C standard uses a type of int as the size of the buffer but often code passes a variable of type size_t. A size n less than 1 or more than INT_MAX can be a problem. A size of 1 should do nothing more than fill the buf[0] = '\0', but some systems behave differently especially if the EOF condition is near or passed. But as long as 2 <= n <= INT_MAX, a terminating '\0' can be expected. Note: fgets() may return NULL when the size is too small.
Code typically likes to delete the terminating '\n' with something that could cause trouble. Suggest:
char buf[80];
if (fgets(buf, sizeof buf, stdin) == NULL) Handle_IOError_or_EOF();
// IMO potential UB and undesired behavior
// buf[strlen(buf)-1] = '\0';
// Suggested end-of-line deleter
size_t len = strlen(buf);
if (len > 0 && buf[len - 1] == '\n') buf[--len] = '\0';
Robust code checks the return value from fgets(). The following approach has short-comings. 1) if an IO Error occurred the buffer contents are not defined. Checking the buffer contents will not provide reliable results . 2) A '\0' may have been the first char read and the file is not in the EOF condition.
// Following is weak code.
buf[0] = '\0';
fgets(buf, sizeof buf, stdin);
if (strlen(buf) == 0) Handle_EOF();
// Robust, but too much for code snippets
if (fgets(buf, sizeof buf, stdin) == NULL) {
if (ferror(stdin)) Handle_IOError();
else if (feof(stdin)) Handle_EOF();
else if (sizeof buf <= 1) Handle_too_small_buffer(); // pedantic check
else Hmmmmmmm();
}
Documentation of fgets from the C99 Standard (N1256)
7.19.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);
Description
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A
null character is written immediately after the last character read into the array.
Coming to your post, you said:
An array of 16 char which stores a username (15 characters max, I reserve the last one for the NUL terminator). The question is: if I insert a 15 characters strings, then the '\n' would end up in the last cell of the array, but what about the NUL terminator?
For such a case, the newline character is not read until the next call to fgets or any other call to read from the stream.
does the '\0' get stored in the following block of memory? (no segmentation fault when calling the printf() function implies that the inserted string is actually NUL terminated, right?).
The terminating null character is always set. In your case, the 16-th character will be the terminating null character.
From the man page of fgets:
char *fgets(char *s, int size, FILE *stream);
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
I think that is pretty clear, isn't it?

Reading line by line in C

Currently to read a file line by line in C I am using:
char buffer[1024];
while(fgets(buffer, sizeof(buffer), file) != NULL) {
//do something with each line that is now stored in buffer
}
However there is no guarantee in the file that the line will be shorter than 1024. What will happen if a line is longer than 1024? Will the rest of the line be read in the next iteration of the while loop?
And how can I read line by line without a maximum length?
Yes, the rest of the line will be read in the next iteration.
You can detect whether or not you read a whole line by inspecting the last character of the string (i.e. the one before the null terminator) to see if it is '\n' or not -- fgets passes '\n' through to you.
There is no Standard C function which will read a line whilst dynamically allocating enough memory for it, however there is a POSIX function getline() which does that. You could write your own that uses fgets or otherwise to do the reading, in a loop with realloc, of course.
From the standards §7.19.7.2,
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into the
array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
From MSDN,
fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The newline character, if read, is included in the string.
So, yes fgets will read the rest of the line in next iteration if the it doesn't encounters the newline character within sizeof(buffer)-1 range.
If you want to read the whole line in one shot, then it is better to go with malloc and, if needed, reallocing the memory as per your needs.

Why does this code not specify which element of the array is accessed?

#include <stdio.h>
#include <stdlib.h>
FILE *fptr;
main()
{
char fileLine[100];
fptr = fopen("C:\\Users\\user\\Desktop\\Summary.h", "r");
if (fptr != 0){
while (!feof(fptr)){
fgets(fileLine, 100, fptr); // << not specified like fileLine[1] ?
if (!feof(fptr)){
puts(fileLine); // The same thing ?
}
}
}
else
{
printf("\nErorr opening file.\n");
}
fclose(fptr);
return 0;
}
The tremendous pain here, why the array elements are not specified, and how the array holds the lines?
char fileLine[100];
This is not an array of lines, it's an array of characters. One char represents one character (or more precisely one byte). The declaration char fileLine[100] makes it an array of 100 characters. C doesn't have distinct types for strings and for arrays of characters: a string (such as the content of a line) is just an array of characters, with a null byte after the last character.
At each run through the loop, fileLine contains the line that is read by fgets. That string is printed out by puts. Each call to fgets overwrite the line that was previously stored in the string.
Note that since fgets retains the newline character that terminates each line, and puts adds a newline after printing the string, you will get double-spaced output. If a line is more than 99 characters long (strictly speaking, again, more than 99 bytes long), you'll get a line break after each block of 99 characters.
If you wanted to store all the lines, you'd need an array of strings, i.e. an array of arrays of characters.
char fileLines[42][100];
int i = 0;
while (!feof(fptr)) {
fgets(fileLines[i], 100, fptr);
++i;
}
/* i-1 lines have been read, from fileLines[0] to fileLines[i-2] */
The way you're using feof is quite awkward there. feof tells you whether the last attempt to read reached the end of the file, not whether the next attempt to read would reach the end of the file. For example, here, after the last line has been read, feof() is false (because the program doesn't know yet that this is the last line, it has to attempt to read more); then fgets runs again, and returns NULL because it couldn't read anything. Nonetheless i is incremeneted; and after that feof() returns false which terminates the loop. Thus i ends up being one plus the number of lines read.
While you can fix this here by decrementing i, the way that actually works even in real-life programs — and that also makes more sense — is to test the result of fgets. You know that you've reached the end of the file because fgets is unable to read a line.
char fileLines[42][100];
int i = 0;
while (fgets(fileLines[i], 100, fptr))
++i;
}
/* i lines have been read, from fileLines[0] to fileLines[i-1] */
(This is a toy example, real-life code would need dynamic memory management and error checks for long lines, too many lines, and read errors.)
The array of characters that is fileLine is treated as a string.

Can fgets ever read an empty string?

Assuming the FILE* is valid, consider:
char buf[128];
if(fgets(buf,sizeof buf,myFile) != NULL) {
strlen(buf) == 0; //can this ever be true ? In what cases ?
}
Yes. Besides passing 1 (as noted by Ignacio), fgets doesn't do any special handling for embedded nulls. So if the next character in the FILE * is NUL, strlen will be 0. This is one of the reasons why I prefer the POSIX getline function. It returns the number of characters read so embedded nulls are not a problem.
From the fgets(3) man page:
DESCRIPTION
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an
EOF or a newline. If a newline is read, it is stored into the buffer.
A '\0' is stored after the last character in the buffer.
...
RETURN VALUE
...
gets() and fgets() return s on success, and NULL on error or when end
of file occurs while no characters have been read.
From that, it can be inferred that a size of 1 will cause it to read an empty string. Experimentation here confirms that.
Incidentally, a size of 0 appears to not modify the buffer at all, not even putting in a \0.

Resources