Using fgets to read strings from file in C - c

I am trying to read strings from a file that has each string on a new line but I think it reads a newline character once instead of a string and I don't know why. If I'm going about reading strings the wrong way please correct me.
i=0;
F1 = fopen("alg.txt", "r");
F2 = fopen("tul.txt", "w");
if(!feof(F1)) {
do{ //start scanning file
fgets(inimene[i].Enimi, 20, F1);
fgets(inimene[i].Pnimi, 20, F1);
fgets(inimene[i].Kood, 12, F1);
printf("i=%d\nEnimi=%s\nPnimi=%s\nKaad=%s",i,inimene[i].Enimi,inimene[i].Pnimi,inimene[i].Kood);
i++;}
while(!feof(F1));};
/*finish getting structs*/
The printf is there to let me see what was read into what and here is the result
i=0
Enimi=peter
Pnimi=pupkin
Kood=223456iatb i=1
Enimi=
Pnimi=masha
Kaad=gubkina
i=2
Enimi=234567iasb
Pnimi=sasha
Kood=dudkina
As you can see after the first struct is read there is a blank(a newline?) onct and then everything is shifted. I suppose I could read a dummy string to absorb that extra blank and then nothing would be shifted, but that doesn't help me understand the problem and avoid in the future.
Edit 1: I know that it stops at a newline character but still reads it. I'm wondering why it doesn't read it during the third string and transfers to the fourth string instead of giving the fourth string the fourth line of the source but it happens just once.
The file is formatted like this by the way
peter
pupkin
223456iatb
masha
gubkina
234567iasb
sasha
dudkina
123456iasb

fgets stops reading when it reads a newline, but the newline is considered a valid character and is included in the returned string.
If you want to remove it, you'll need to trim it yourself:
length = strlen(str);
if (str[length - 1] == '\n')
str[length - 1] = '\0';
Where str is the string into which you read the data from the file, and length is of type size_t.
To answer the edit to the question: the reason the newline is not read during the third read is because you are not reading enough characters. You give fgets a limit of 12 characters, which means it can actually read a maximum of 11 characters since it has to add the null terminator to the end.
The line you read is 11 characters in length before the newline. Note that there is a space at the end of that line when you output it:
Kood=223456iatb i=1
^

As already stated, if there's enough room in the buffer, then fgets() reads the data including the newline into the buffer and null terminates the line. If there isn't enough room in the buffer before coming across the newline, fgets() copies what it can (the length of the buffer minus one byte) and null terminates the string. The library resumes reading from where fgets() left off on the next iteration.
Don't mess with buffers smaller than 2 bytes long.
Note that gets() removes the newline (but does not protect you from buffer overflows, so do not use it). If things go as currently planned, gets() will be removed from the next version of the C standard; it will be a long time before it is removed from C libraries (it will just become a non-standard - or ex-standard - additional function available for abuse).
Your code should check each of the fgets() function calls:
while (fgets(inimene[i].Enimi, 20, F1) != 0 &&
fgets(inimene[i].Pnimi, 20, F1) != 0 &&
fgets(inimene[i].Kood, 12, F1) != 0)
{
printf("i=%d\nEnimi=%s\nPnimi=%s\nKaad=%s", i, inimene[i].Enimi, inimene[i].Pnimi, inimene[i].Kood);
i++;
}
There are places for do/while loops; they are not used very often, though.

the fgets function reads newline char as a part of the string read.
From the description of fgets:
The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a newline is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.

if Enimi/Pnimi/Kood are arrays not pointers:
while( fgets(inimene[i].Enimi,sizeof inimene[i].Enimi,F1) &&
fgets(inimene[i].Pnimi,sizeof inimene[i].Pnimi,F1) &&
fgets(inimene[i].Kood,sizeof inimene[i].Kood,F1) )
{
if( strchr(inimene[i].Enimi,'\n') ) *strchr(inimene[i].Enimi,'\n')=0;
if( strchr(inimene[i].Pnimi,'\n') ) *strchr(inimene[i].Pnimi,'\n')=0;
if( strchr(inimene[i].Kood,'\n') ) *strchr(inimene[i].Kood,'\n')=0;
printf("i=%d\nEnimi=%s\nPnimi=%s\nKaad=%s", i, inimene[i].Enimi, inimene[i].Pnimi,inimene[i].Kood);
i++;
}

Related

C fgets buffersize

I have a program in C, which reads in a file with buffers:
char buffer[65536];
while(fgets(buffer,65536,inputFile)){...}
Now I have some questions:
What happens if the filesize is smaller than buffersize?
Does fgets() overwrite the full buffer, so if previous buffer (buffer1) was full, and the next one (buffer2) for example size 10 isn't, will the buffer still have the chars of buffer1 in it for example at buffer2[20] = buffer1[20] because it didn't overwrite?
How to know how many chars are inside the buffer, so you can backward loop the buffer
To answer your questions, firstly you must understand fgets(3). It reads a line at once, fgets() returns after newline or EOF is reached. And a '\0' is appended to terminate the string. I answer to your question as followings...
Normally fgets() takes a line of string at a time not the entire file. In your code, it means the line has maximum length of 65535 + '\0' makes 65536, which is considered too long. To read a text file, normally you have to put fgets() in for () or while () loop until EOF is reached (buffer == NULL).
If it reads a line, which is longer than the length of buffer, it will read only buffer length - 1 chars, then append '\0'.
strlen(3) will give you the length of the string.
As the comments suggest, the man can help here, or google... From cppreference
Reads at most count - 1 characters from the given file stream and stores them in the character array pointed to by str. Parsing stops if end-of-file occurs or a newline character is found, in which case str will contain that newline character. If no errors occur, writes a null character at the position immediately after the last character written to str.
and also:
Return value: str on success, null pointer on failure.
Basically the number is the amount of chars you read+1. Usually you don't gobble the whole line but only some of it. In any case the buffer has a null terminator and you can keep reading until EOF.

does the fgets() function append the \n\0 characters exceeding the maximum length?

May seem like a silly question for most of you, but I'm still trying to determine the final answer. Some hours ago I decided to replace all the scanf() functions in my project with the fgets() in order to get a more robust code.
I learned that the fgets() automatically ends the inserted input string with the '\n' and the NUL characters but..
let's say I have something like this:
char user[16];
An array of 16 char which stores a username (15 characters max, I reserve the last one for the NUL terminator).
The question is: if I insert a 15 characters strings, then the '\n' would end up in the last cell of the array, but what about the NUL terminator?
does the '\0' get stored in the following block of memory?
(no segmentation fault when calling the printf() function implies that the inserted string is actually NUL terminated, right?).
As a complement to 5gon12eder answer. I assume you have something like :
char user[16];
fgets(user, 16, stdin);
and your input is abcdefghijklmno\n , that is 15 characters and a newline.
fgets will put in user the 15 (16-1) first characters of the input followed by a null and you will effectively get "abcdefghijklmno", which is what you want
But ... the \n still remains in stream buffer an is actually available for next read (be it a fgets or anything else) on same FILE. More exactly, until you do another fgets you cannot know whether there was other characters following the o.
As #5gon12eder suggests, use:
char user[16];
fgets(user, sizeof user, stdin);
// Function prototype for reference
#include <stdio.h>
char *fgets(char * restrict s, int n, FILE * restrict stream);
Now for details:
The '\n' and the '\0' are not automatically appended. Only the '\0' is automatically appended. fgets() will stop reading once it gets a '\n', but will stop for other reasons too including a full buffer. In those cases, there is no '\n' before the '\0'.
fgets() does not read a C string, but reads a line. The input stream is typically in text mode and then end-of-line translations occur. On some systems, '\r', '\n' pair will translate to '\n'. On others, it will not. Usually the files being read match this translation, but exceptions occur. In binary mode, no translations occur.
fgets() reads in '\0'. and continues reading. Thus using strlen(buf) does not always reflect the true number of char read. There may be a full-proof method to determine the true number of char read when '\0' are in the middle, but itis is likely easier to code with fread() or fgetc().
On EOF condition (and no data read) or IO error, fgets() returns NULL. When an I/O error occurs, the contents of the buffer is not defined.
Pedantic issue: The C standard uses a type of int as the size of the buffer but often code passes a variable of type size_t. A size n less than 1 or more than INT_MAX can be a problem. A size of 1 should do nothing more than fill the buf[0] = '\0', but some systems behave differently especially if the EOF condition is near or passed. But as long as 2 <= n <= INT_MAX, a terminating '\0' can be expected. Note: fgets() may return NULL when the size is too small.
Code typically likes to delete the terminating '\n' with something that could cause trouble. Suggest:
char buf[80];
if (fgets(buf, sizeof buf, stdin) == NULL) Handle_IOError_or_EOF();
// IMO potential UB and undesired behavior
// buf[strlen(buf)-1] = '\0';
// Suggested end-of-line deleter
size_t len = strlen(buf);
if (len > 0 && buf[len - 1] == '\n') buf[--len] = '\0';
Robust code checks the return value from fgets(). The following approach has short-comings. 1) if an IO Error occurred the buffer contents are not defined. Checking the buffer contents will not provide reliable results . 2) A '\0' may have been the first char read and the file is not in the EOF condition.
// Following is weak code.
buf[0] = '\0';
fgets(buf, sizeof buf, stdin);
if (strlen(buf) == 0) Handle_EOF();
// Robust, but too much for code snippets
if (fgets(buf, sizeof buf, stdin) == NULL) {
if (ferror(stdin)) Handle_IOError();
else if (feof(stdin)) Handle_EOF();
else if (sizeof buf <= 1) Handle_too_small_buffer(); // pedantic check
else Hmmmmmmm();
}
Documentation of fgets from the C99 Standard (N1256)
7.19.7.2 The fgets function
Synopsis
#include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);
Description
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A
null character is written immediately after the last character read into the array.
Coming to your post, you said:
An array of 16 char which stores a username (15 characters max, I reserve the last one for the NUL terminator). The question is: if I insert a 15 characters strings, then the '\n' would end up in the last cell of the array, but what about the NUL terminator?
For such a case, the newline character is not read until the next call to fgets or any other call to read from the stream.
does the '\0' get stored in the following block of memory? (no segmentation fault when calling the printf() function implies that the inserted string is actually NUL terminated, right?).
The terminating null character is always set. In your case, the 16-th character will be the terminating null character.
From the man page of fgets:
char *fgets(char *s, int size, FILE *stream);
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
I think that is pretty clear, isn't it?

Reading line by line in C

Currently to read a file line by line in C I am using:
char buffer[1024];
while(fgets(buffer, sizeof(buffer), file) != NULL) {
//do something with each line that is now stored in buffer
}
However there is no guarantee in the file that the line will be shorter than 1024. What will happen if a line is longer than 1024? Will the rest of the line be read in the next iteration of the while loop?
And how can I read line by line without a maximum length?
Yes, the rest of the line will be read in the next iteration.
You can detect whether or not you read a whole line by inspecting the last character of the string (i.e. the one before the null terminator) to see if it is '\n' or not -- fgets passes '\n' through to you.
There is no Standard C function which will read a line whilst dynamically allocating enough memory for it, however there is a POSIX function getline() which does that. You could write your own that uses fgets or otherwise to do the reading, in a loop with realloc, of course.
From the standards §7.19.7.2,
char *fgets(char * restrict s, int n, FILE * restrict stream);
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into the
array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
From MSDN,
fgets reads characters from the current stream position to and including the first newline character, to the end of the stream, or until the number of characters read is equal to n – 1, whichever comes first. The newline character, if read, is included in the string.
So, yes fgets will read the rest of the line in next iteration if the it doesn't encounters the newline character within sizeof(buffer)-1 range.
If you want to read the whole line in one shot, then it is better to go with malloc and, if needed, reallocing the memory as per your needs.

C - fgets() - length of newline char

I am trying to read 1 line and I am not sure how newline char is represented. Should I consider it as 2 chars or 1 char, when reading it from file by fgets() ? For example, I have a line of 15 chars + new line in file. So how should I safely allocate string and read that line?
At first, I tried this:
char buf[16];
fgets(buf, 16, f);
It read the line correctly without newline char and I assume that buf[15] holds the null character.
However, when I want to read and store the newline char, it doesn't work as I thought. As far as I know, '\n' should be considered as one char and take just one byte, so to read it, I just need to read one more char.
But when i try this
char buf[17];
fgets(buf, 17, f);
it does completely the same thing than previous example - there is now newline char stored in my string (I am not sure where null char is stored in this case)
To read entire line with newline I need to do this
char buf[18];
fgets(buf, 18, f);
OR this (it works, but I am not sure if it's safe)
char buf[17];
fgets(buf, 18, f);
So the questions is, why do I need to allocate and read 18 chars, when the line has only 15 chars + newline?
You need to provide buffer space for the 15-chars of text, up to 2 characters for the new line (to handle Windows line termination of \r\n), and one more for the null termination. So that's 18.
Like you did here:
char buf[18]; fgets(buf, 18, f);
The num parameter to fgets tells the call the size of your buffer it's writing to.
I am trying to read 1 line and I am not sure how newline char is represented.
In text mode, newline is '\n' and that's true on any conform C implementation and I wouldn't use fgets on anything but a text mode stream (I don't know -- and I don't want to know -- how it works in binary mode on an implementation using \r as end of line marker, or worse using an out of band end of line marker, I wouldn't be surprised it looks for a \n and never find one thus try to read until the end of file).
You should allocate space for the maximal line length, included the newline plus the terminating NUL and more important you must never lie the fgets about the length of the buffer. You can check if the buffer was long enough as the newline won't be present if it isn't.
The matter is about the espace sequence that lets you test for a newline, it is two characters \0x0d\0x0a but when using a strcmp and need to provide a string for this and a length, the C escape code holds in one character, so you must:
if(strncmp(&buff[i], "\n", 1) == 0)
which would not work with a length of two. Don't ask me why.

fscanf not scanning file

I am trying to scan a file using fscanf and put the string into an char array of size 20 as follows:
char buf[20];
fscanf(fp, "%s", buf);
The file fp currently contains: 1 + 23.
I am setting a pointer to the first element in buf as follows:
char *p;
p = buf;
Printing buf, printf("%s", buf) yields only 1. Trying to increment p and printing prints out rubbish as well (p++; printf("%c", *p)).
What am I doing wrong with fscanf here? Why isn't it reading the whole string from the file?
fscanf (and related functions) with the format-string "%s" will try to read as many characters as it can without including a whitespace, in this case it will find the first character (1) and store it, then it will hit a space () and therefore stop searching.
If you'd like to read the whole line at once consider using fgets, it is also safer to use since you need to specify the size of your destination buffer as one of it's arguments.
fgets will try to read at maximum length-of-buffer minus 1 characters (last byte is saved for the trailing null-byte), it will stop at either reading that many characters, hitting a new-line or the end of the file.
fgets (buf, 20, fp);
Links to documentation
codecogs.com - scanf, fscanf and related functions - <stdio.h>
codecogs.com - fgets - <stdio.h>

Resources