Reading input from a file in C

Reading input from a file in C - c

I came across the following question:
If a file contains the line "I am a boy\r\n" then on reading this line into the array str using fgets(). What will str contain?
[A]. "I am a boy\r\n\0"
[B]. "I am a boy\r\0"
[C]. "I am a boy\n\0"
[D]. "I am a boy"
The answer has been given as option c with the explanation
Declaration: char *fgets(char *s, int n, FILE *stream);
fgets reads characters from stream into the string s. It stops when it reads either n - 1 characters or a newline character, whichever comes first.
However, I couldn't understand how will \r (carriage return) influence fgets. I mean, shouldn't it be that first "I am a boy" is read, then on encountering \r cursor is set at the initial position and "I" from "I am a body" is overwritten by \n and space following "I" is overwritten by \0.
Any help is deeply appreciated.
P.s: My claim is based on the explanation given on this link: https://www.quora.com/What-exactly-is-r-in-the-C-language

First, every time you see a multiple choice quiz on some programming website, I recommend you close the tab and do something productive instead such as watching videos of kittens. Because the questions seem to be just some variants of
Which of these is the first letter of the alphabet (only one is right)
A
a
6
a
the letter a
all of the above.
Carriage returns and line feeds do not affect the input read by a C program in that way. Each additional byte is just on top of the other bytes. Otherwise, this is very badly phrased question, as the answer be any of A, B, C or D, or maybe none of them. Saying that C is the only one that is right is wrong.
First question is what it means if "the file contains \r"? Here I assume that the author meant that the file contains the 10 characters I am a boy followed by ASCII 13 and ASCII 10 (carriage return and line feed).
In C there are two translation modes for reading files, text mode and binary mode. On POSIX systems (all those operating systems with X in their name, except for Windows eXcePtion) these are equal - the text mode is ignored. So when you read the line into a buffer with fgets on POSIX, it will look for that line feed and store all letters as is including the , so the buffer will have the following sequence of bytes I am a boy\r\n\0. Therefore A could be true.
But on Windows, the text mode translates the carriage return and the linefeed to one newline character with ASCII value 10 in memory, so what you will have is I am a boy\n\0. Therefore C could be true. If your file was opened in binary mode, you'll still have I am a boy\r\n\0 - so how'd you claim that C is the only one that can be true?
If the string that you'd read with fgets would be I am a boy\r\n (POSIX or binary mode) but you told fgets your buffer has space for only 12 characters, then you'd get 11 characters of the input and terminating \0, and therefore you'd have I am a boy\r\0. The carriage return character would remain in the stream. Therefore B could be true. B cannot be true if you indicated that the buffer will have more space.
Finally any of these array contents does contain the string I am a boy, therefore D would be true in all of the cases above.
And if your buffer didn't have enough space for 10 characters and the terminator then you'd have some prefix of the contents, such as I am a bo followed by \0 which means that none of these was true.

Related

fgets doesn't read in at most one less than size characters from stream

I am learning fgets from the manpage. I did some tests on fgets to make sure I understand it. One of the tests I did results in behaviour contrary to what is specified in the man page. The man page says:
char *fgets(char s[restrict .size], int size, FILE *restrict stream);
fgets() reads in at most one less than size characters from stream and
stores them into the buffer pointed to by s. Reading stops after an EOF
or a newline. If a newline is read, it is stored into the buffer. A
terminating null byte ('\0') is stored after the last character in the
buffer.
But it doesn't "read in at most one less than size characters from stream". As demonstrated by the following program:
#include<stdio.h>
#include<stdlib.h>
int main(){
FILE *fp;
fp=fopen("sample", "r");
char *s=calloc(50, sizeof(char));
while(fgets(s,2,fp)!=NULL) printf("%s",s);
}
The sample file:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The output of the compiled binary:
thiis is line no. 1
joke joke 2 joke joke
arch linux btw 3
4th line
5th line
The expected output according to the man page:
t
j
a
4
5
Is the man page wrong, or am I missing something?

Is the man page wrong or am i missing something?
I won't say that the man page is wrong but it could be more clear.
There are 3 things that may stop fgets from reading from the stream.
The buffer is full (i.e. only room left for the termination character)
A newline character was read from the stream
End-Of-File occured
The quoted man page only mentions two of those conditions clearly.
Reading stops after an EOF or a newline.
That is #2 and #3 are mentioned very explicit while #1 is (kind of) derived from
reads in at most one less than size characters from stream
Here is another description from https://man7.org/linux/man-pages/man3/fgets.3p.html
... read bytes from stream into the array pointed to by s until n-1 bytes are read, or a newline is read and transferred to s, or an end-of-file condition is encountered.
where the 3 cases are clearly mentioned.
But yes... you are missing something. Once the buffer gets full, the rest of the current line is not read and discarded. The rest will stay in the stream and be available for the next read. So nothing is lost. You just need more fgets calls to read all data.
As suggested in a number of comments (e.g. Fe2O3 and Lundin) you can see this if you change the print statement so that it includes a delimiter of some kind. For instance (from Lundin):
printf("|%s|",s);
This will make clear exactly what you got from the individual fgets calls.

In the provided quote there is writte clear
If a newline is read, it is stored into the buffer.
Where do you see that this call fgets(s,2,fp) reads the new line character for example when reading this line?
thiis is line no. 1
The line contains only one new line character at its end.
This call reads only one character after another that is character by character that is appended by the terminating zero character '\0'.
So the read strings look like
{ 't', '\0' }
{ 'h', '\0' },
{ 'i', '\0' }
// ...
{ '1', '\0' }
{ '\n', '\0' }
If you have a call of fgets like that
fgets(s,n,fp)
then at most n-1 characters are read from the input stream. One character is reserved for the terminating zero character '\0' to build a string.
From the C Standard (7.21.7.2 The fgets function)
2 The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array

Input stream reads and push backs by fscanf and scanf

Regarding fscanf (and I assume similarly for scanf), C17 7.21.6.2.9 states the following:
"An input item is read from the stream... An input item is defined as
the longest sequence of input characters which does not exceed any
specified field width and which is, or is a prefix of, a matching
input sequence. The first character, if any, after the input item
remains unread..."
Before reading this I had always assumed that the first character after the input item was read too, then pushed back. For example, if the input was 5X and the conversion specification was %d, both the 5 and the X would be read but the X would be pushed back. However, the quote above seems to indicate that each successive character in the input stream is being "peeked" at before it is read, so the X would never be read in the first place and a push back would never be necessary. However, footnote 289 states that fscanf pushes back at most one input character onto the input stream. So I guess my question is about what all of this really means. Does "read" mean to remove a character from the stream or could it also mean to "peek" at a character without removing it?

Input stream can push back at least 1 character.
Scanning "5X" with "%d" results in "5" being read and converted to an int 5, then saved. The "X" is read, but pushed back.
Trouble occurs with input like "-a" as the "-" is read and so is "a". C guarantees a successful push-back of "a", but if "-" is successfully pushed back depends on the implementation.
int main() {
int i;
scanf("%d", &i); // Enter -a
printf("%c\n", getchar());
}
My output: -, not a as expected with only 1 push back. YMMV.
This is one of the reasons that it is better to read a line of user input with fgets() into a string and then parse the string, than to use (f)scanf().

The pushback is not always necessary. For example, if the conversion specification is %3d and the code reads three decimal digits successfully, it doesn't need to read anything more and there is no pushback.
The pushback is always the character that was read, so beyond recording where to read next, the input buffer doesn't need to change. (Using ungetc(), you can unget (push back) a character other than the one that was read.)
Reading a character means logically removing it from the stream. If it isn't a usable character, it is pushed back, so the effect is the same as peeking.

Why does fgets() store a \0 after the last character in a buffer?

I've been doing abit of reading through the Linux programmer's manual looking up various functions and trying to get a deeper understanding of what they are/how they work.
Looking at fgets() I read "A '\0' is stored after the last character in the buffer .
I've read through What does \0 stand for? and have a pretty solid understanding of what \0 symbolizes (a null character right ?). But what I'm struggling to grasp is its relevance to fgets(), I don't really understand why it "needs" to end with a null character.

As you already said, you are probably aware that \0 constitutes the end of all strings in C. As per the C standard, everything that is a string needs to be \0 terminated.
Since fgets() makes a string, that string, of course, will be properly null terminated.
Do note that for all string functions in C, any string you use or generate with them must be terminated with a \0 character.

Because otherwise you do not know how long the resulting string is.
One of the arguments to fgets is the maximum number of characters to read, but it's just that: a maximum. If you ask for 512 characters, but there are only 8 in the buffer, you will only get 8 characters … and a NULL in the 9th slot to demark the logical end of the C-string.
Arguably, fgets could instead have been designed to return the number of characters read, but then for most purposes you'd only have to add the NULL byte yourself manually, and the function would have to find a way to signify an error other than returning a null pointer.

From C standards:
The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
This is to make sure that there is no buffer-overflow (characters/contents are not going beyond the provided storage) is in the created string.

As all the people before me said, fgets reads bytes from a file and makes them into a standard C string, which is null-terminated. The termination with the \0 byte reflects the fact that this function is text-oriented.
If you don't want to use null-termination for the data read from the file, it's not a string (not text), and also the end-of-line byte \n has no significance. In this case, you can use fread.
So C has two functions to read from file: fgets for text and fread for non-text (binary data).
BTW if the input file has a genuine zero-valued byte, fgets will do an uncomfortable thing: it will continue reading until it reads an end-of-line byte \n, and the output "string" will have two (or more) null-terminations. This doesn't make any sense as text, so it's another example of fgets being text-oriented and unsuitable for arbitrary data.

What is the char in C for the int value 10? Where I can look up this?

I have a character in a char-Array which I get with fputs(). But it contains a char which is getting count by the function strlen(). I decide to give me out the int value of this char to see where the problem is.
As char I can see nothing. Thought its a Whitespace but not sure. Would like if someone could tell me what it is and explain why it is there.
printf("%d",(int) input[6]); //--> give me the value of 10 out.

The value 10 is the ASCII value for the newline character (LF, or linefeed). Closely related is character 13, which is CR, or carriage return, which, on Windows systems, often precedes the LF character. I would suggest getting a copy of the ASCII table (they're all over the web) and referencing it from time to time.
Character 10 can be represented by '\n' in C code, as well as '\012', '\x0a', and '\u000a'
Character 13 (carriage return) can be represented by '\r', '\015', '\x0d', and '\u000d'.

It is the newline (LF (NL line feed, new line)) in ASCII. See all of the values here.

As already pointed out by the others, the character 10 in ASCII is LF (line feed).
If you wanted printf to output the character (not see its ordinal value), you could use the %c format specifier to pass a single character.
Example:
printf("-%c-", input[6]);
should yield:
-
-
I.e. two dashes separated by a line feed. Please keep in mind that the outcome on Windows depends on how your C runtime handles a single LF without CR as on Windows a line break is customarily represented by CRLF instead of just LF which is the standard on unixoid systems. The only exception to that rule were old Mac systems which used to use only CR to encode a line break.

Reading the string with defined number of characters from the input

So I am trying to read a defined number of characters from the input. Let's say that I want to read 30 characters and put them in to a string. I managed to do this with a for loop, and I cleaned the buffer as shown below.
for(i=0;i<30;i++){
string[i]=getchar();
}
string[30]='\0';
while(c!='\n'){
c=getchar(); // c is some defined variable type char
}
And this is working for me, but I was wondering if there is another way to do this. I was researching and some of them are using sprintf() for this problem, but I didn't understand that solution. Then I found that you can use scanf with %s. And some of them use %3s when they want to read 3 characters. I tried this myself, but this command only reads the string till the first empty space. This is the code that I used:
scanf("%30s",string);
And when I run my program with this line, if I for example write: "Today is a beatiful day. It is raining, but it's okay i like rain." I thought that the first 30 characters would be saved in to the string. But when i try to read this string with puts(string); it only shows "Today".
If I use scanf("%s",string) or gets(string) that would rewrite some parts of my memory if the number of characters on input is greater than 30.

You can use scanf("%30[^\n]",s)
Actually, this is how you can set which characters to input. Here, carat sign '^' denotes negation, ie. this will input all characters except \n. %30 asks to input 30 characters. So, there you are.

The API you're looking for is fgets(). The man page describes
char *fgets(char *s, int size, FILE *stream);
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.