Is trailing newline necessary in fgets? - c

When I search using keywords of 'fgets' and 'newline', there are many posts regarding how to remove the trailing newline character (and such removal appears to be a burden). Yet it seems there is few explaination on how that newline is necessary for fgets to include. Also in C++, the 'std::getline' and 'std::istream:getline' methods will not keep the newline character. So is there a reason for it?

Here is satisfying (IMHO) explanation:
http://www.cplusplus.com/reference/cstdio/fgets/
Especially:
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str.

No, it's not necessary but if present it will be included in the returned line.
The manual page says:
Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
So that's why it behaves that way.
Note that you can't assume that there will be a newline last in the buffer, you must check before removing it otherwise you risk truncating the last line if it didn't have a newline.

Related

What is the difference between %s and %s%*c [duplicate]

This question already has answers here:
%*c in scanf() - what does it mean?
(4 answers)
Closed 3 years ago.
Hi I am reading some code and this line has been used:
scanf("%s%*c",dati[i].part);
What does %s%*c do and why not just use %s?
What does %s%*c do
The %s has the same meaning as anywhere else -- skip leading whitespace and scan the next sequence of non-whitespace characters into the specified character array.
The %*c means the same thing as %c -- read the next input character, whatever it is (i.e. without skipping leading whitespace) -- except that the * within means that the result should not be assigned anywhere, and therefore that no corresponding pointer argument should be expected. Also, assignment suppression means that scanf's return value is not affected by whether that field is successfully scanned.
and why not just use %s?
We cannot say for sure why the author of the code in which you saw it used %s%*c, except for the unsatisfying "because that's what the author thought was appropriate." We have no context at all for making any other judgement.
Certainly the actual effect is to consume the next input character after the string, if any. If there is such a character then it will necessarily be a whitespace character, else it would have been scanned by the preceding %s directive. We might therefore speculate that the author's idea was to consume a trailing newline.
There are at least two problems with that:
The next character might not be a newline. For example, there might be trailing space characters before a newline, in which case the first of those space characters would be consumed, but the newline would remain in the stream. If that's a genuine problem then %*c does not reliably solve it.
In practice, it's not very useful. Most scanf directives are like %s in that they automatically skip leading whitespace, including newlines. The %*c serves only to confuse if the next directive that will be processed is any of those. Moreover, it is possible for a scanf format to explicitly express that a run of whitespace at a given position should be skipped, and it is clearer to make use of that in conjunction with the next directive to be processed if that next directive is one of those that don't automatically skip whitespace (and whitespace skipping is in fact desired).
That doesn't mean that assignment suppression generally or %*c specifically is useless, mind. It's just trying to use that technique to attempt to consume trailing newlines that is poorly conceived.
The %* format specifier in a scanf call instructs the function to read data in the following format (c in your case) from the input buffer but not to store it anywhere (i.e. discard it).
In your specific case, the %*c is being used to read and discard the trailing newline character (added when the user hits the Enter key), which will otherwise remain in the input buffer, and likely upset any subsequent calls to scanf.

how to read line by line in c?

I would like to read text files line by line in c.
I saw some examples using fgets. But I don't know if the fgets reads the caracteres until the end of the line, or it will read the amunt of chactrers specified (without stoping at the end of the line).
Best regards.
For future, if you're using a vim editor, try using man fgets. It'll give you some basic info on the function and its parameters. You can use this on literally any function that you're unsure about and it may help to clear some things up (although in my experience it confused things a bit more sometimes since I'm also a beginner)
fgets reads until either a null-byte (basically '\0'), the new line character or until it reaches the end of the file.
One of many references located here.
fgets - char * fgets ( char * str, int num, FILE * stream );
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.
A newline character makes fgets stop reading, but it is considered a
valid character by the function and included in the string copied to
str.
Many code examples out there.

What's the difference between gets and scanf?

If the code is
scanf("%s\n",message)
vs
gets(message)
what's the difference?It seems that both of them get input to message.
The basic difference [in reference to your particular scenario],
scanf() ends taking input upon encountering a whitespace, newline or EOF
gets() considers a whitespace as a part of the input string and ends the input upon encountering newline or EOF.
However, to avoid buffer overflow errors and to avoid security risks, its safer to use fgets().
Disambiguation: In the following context I'd consider "safe" if not leading to trouble when correctly used. And "unsafe" if the "unsafetyness" cannot be maneuvered around.
scanf("%s\n",message)
vs
gets(message)
What's the difference?
In terms of safety there is no difference, both read in from Standard Input and might very well overflow message, if the user enters more data then messageprovides memory for.
Whereas scanf() allows you to be used safely by specifying the maximum amount of data to be scanned in:
char message[42];
...
scanf("%41s", message); /* Only read in one few then the buffer (messega here)
provides as one byte is necessary to store the
C-"string"'s 0-terminator. */
With gets() it is not possible to specify the maximum number of characters be read in, that's why the latter shall not be used!
The main difference is that gets reads until EOF or \n, while scanf("%s") reads until any whitespace has been encountered. scanf also provides more formatting options, but at the same time it has worse type safety than gets.
Another big difference is that scanf is a standard C function, while gets has been removed from the language, since it was both superfluous and dangerous: there was no protection against buffer overruns. The very same security flaw exists with scanf however, so neither of those two functions should be used in production code.
You should always use fgets, the C standard itself even recommends this, see C11 K.3.5.4.1
Recommended practice
6 The fgets function allows properly-written
programs to safely process input lines too long to store in the result
array. In general this requires that callers of fgets pay attention to
the presence or absence of a new-line character in the result array.
Consider using fgets (along with any needed processing based on
new-line characters) instead of gets_s.
(emphasis mine)
There are several. One is that gets() will only get character string data. Another is that gets() will get only one variable at a time. scanf() on the other hand is a much, much more flexible tool. It can read multiple items of different data types.
In the particular example you have picked, there is not much of a difference.
gets - Reads characters from stdin and stores them as a string.
scanf - Reads data from stdin and stores them according to the format specified int the scanf statement like %d, %f, %s, etc.
gets:->
gets() reads a line from stdin into the buffer pointed to by s until either a terminating newline or EOF, which it replaces with a null byte ('\0').
BUGS:->
Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
scanf:->
The scanf() function reads input from the standard input stream stdin;
BUG:->
Some times scanf makes boundary problems when deals with array and string concepts.
In case of scanf you need that format mentioned, unlike in gets. So in gets you enter charecters, strings, numbers and spaces.
In case of scanf , you input ends as soon as a white-space is encountered.
But then in your example you are using '%s' so, neither gets() nor scanf() that the strings are valid pointers to arrays of sufficient length to hold the characters you are sending to them. Hence can easily cause an buffer overflow.
Tip: use fgets() , but that all depends on the use case
The concept that scanf does not take white space is completely wrong. If you use this part of code it will take white white space also :
#include<stdio.h>
int main()
{
char name[25];
printf("Enter your name :\n");
scanf("%[^\n]s",name);
printf("%s",name);
return 0;
}
Where the use of new line will only stop taking input. That means if you press enter only then it will stop taking inputs.
So, there is basically no difference between scanf and gets functions. It is just a tricky way of implementation.
scanf() is much more flexible tool while gets() only gets one variable at a time.
gets() is unsafe, for example: char str[1]; gets(str)
if you input more then the length, it will end with SIGSEGV.
if only can use gets, use malloc as the base variable.

How is \0 incorporated into normal text files in reference to fgets

I was just wondering that when you input text just using a normal application such as textedit (on OSX) would it still harbour the same '\0' character on the end of each string so that when read through fgets() if would pick said character up and stop reading?
Because I've created a normal text file, but fgets() keeps on stopping at the end of the designated length, instead of when it finds that character, so I have suspicious if it actually exists when I write to a normal text file.
For Example:
How Are You
There
fgets(str, 15, stdin);
This would end up producing: TherAre You
No, in general, text files do not contain \0 characters. fgets reads the number of characters requested, or to the end of the line, whichever comes first. It's fgets itself that appends the \0. From the man page:
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
No, text files don't generally contain any control characters. The termination is a C "feature", i.e. a property of how the C language and environment works with strings. Text files are independent of C. The termination is added (to the in-memory buffer into which the data has been read) by the fgets() function.
If your input file does contain a null byte and you're reading with fgets() or equivalent, you have difficulty knowing whether the null in the middle of the string was simply a null in the 'text' file or indicates that the last line of the file did not end with a newline, or that the line was truncated. Clearly, if you try another read and get more data, it was not a premature EOF. If the character immediately before the null byte is a newline, then you can assume that the null byte is the end of string marker added by fgets().
Generally speaking, therefore, if the file contains null bytes, it is not a good idea to use fgets() to read the file.

Force scanf to consume newline

I have a program that reads data from stdin. This data is a sequence of bytes. If there is a byte describing a new line in it (in hex: 0x0A), scanf stops reading.
Can I mask this byte, so that scanf continues to read the whole sequence?
It is important that the memory, that is written by scanf contains the newline-byte.
Without seeing your code, I can't make a precise recommendation. But if your goal is take the input "as-is", I'll recommend read() as an alternative to scanf(). See this question for someone who had the exact opposite issue.
scanf("%[^`]s", str);
You can use some thing like this. `\n will now be the terminating sequence of characters. You can replace ` using any other character or even a group of them and input will end with that character followed by a \n.

Resources