gets works the first time but then it gets bypassed [duplicate] - c

I have problems with my C program when I try to read / parse input.
Help?
This is a FAQ entry.
StackOverflow has many questions related to reading input in C, with answers usually focussed on the specific problem of that particular user without really painting the whole picture.
This is an attempt to cover a number of common mistakes comprehensively, so this specific family of questions can be answered simply by marking them as duplicates of this one:
Why does the last line print twice?
Why does my scanf("%d", ...) / scanf("%c", ...) fail?
Why does gets() crash?
...
The answer is marked as community wiki. Feel free to improve and (cautiously) extend.

The Beginner's C Input Primer
Text mode vs. Binary mode
Check fopen() for failure
Pitfalls
Check any functions you call for success
EOF, or "why does the last line print twice"
Do not use gets(), ever
Do not use fflush() on stdin or any other stream open for reading, ever
Do not use *scanf() for potentially malformed input
When *scanf() does not work as expected
Read, then parse
Read (part of) a line of input via fgets()
Parse the line in-memory
Clean Up
Text mode vs. Binary mode
A "binary mode" stream is read in exactly as it has been written. However, there might (or might not) be an implementation-defined number of null characters ('\0') appended at the end of the stream.
A "text mode" stream may do a number of transformations, including (but not limited to):
removal of spaces immediately before a line-end;
changing newlines ('\n') to something else on output (e.g. "\r\n" on Windows) and back to '\n' on input;
adding, altering, or deleting characters that are neither printing characters (isprint(c) is true), horizontal tabs, or new-lines.
It should be obvious that text and binary mode do not mix. Open text files in text mode, and binary files in binary mode.
Check fopen() for failure
The attempt to open a file may fail for various reasons -- lack of permissions, or file not found being the most common ones. In this case, fopen() will return a NULL pointer. Always check whether fopen returned a NULL pointer, before attempting to read or write to the file.
When fopen fails, it usually sets the global errno variable to indicate why it failed. (This is technically not a requirement of the C language, but both POSIX and Windows guarantee to do it.) errno is a code number which can be compared against constants in errno.h, but in simple programs, usually all you need to do is turn it into an error message and print that, using perror() or strerror(). The error message should also include the filename you passed to fopen; if you don't do that, you will be very confused when the problem is that the filename isn't what you thought it was.
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(int argc, char **argv)
{
if (argc < 2) {
fprintf(stderr, "usage: %s file\n", argv[0]);
return 1;
}
FILE *fp = fopen(argv[1], "r");
if (!fp) {
// alternatively, just `perror(argv[1])`
fprintf(stderr, "cannot open %s: %s\n", argv[1], strerror(errno));
return 1;
}
// read from fp here
fclose(fp);
return 0;
}
Pitfalls
Check any functions you call for success
This should be obvious. But do check the documentation of any function you call for their return value and error handling, and check for those conditions.
These are errors that are easy when you catch the condition early, but lead to lots of head-scratching if you do not.
EOF, or "why does the last line print twice"
The function feof() returns true if EOF has been reached. A misunderstanding of what "reaching" EOF actually means makes many beginners write something like this:
// BROKEN CODE
while (!feof(fp)) {
fgets(buffer, BUFFER_SIZE, fp);
printf("%s", buffer);
}
This makes the last line of the input print twice, because when the last line is read (up to the final newline, the last character in the input stream), EOF is not set.
EOF only gets set when you attempt to read past the last character!
So the code above loops once more, fgets() fails to read another line, sets EOF and leaves the contents of buffer untouched, which then gets printed again.
Instead, check whether fgets failed directly:
// GOOD CODE
while (fgets(buffer, BUFFER_SIZE, fp)) {
printf("%s", buffer);
}
Do not use gets(), ever
There is no way to use this function safely. Because of this, it has been removed from the language with the advent of C11.
Do not use fflush() on stdin or any other stream open for reading, ever
Many people expect fflush(stdin) to discard user input that has not yet been read. It does not do that. In plain ISO C, calling fflush() on an input stream has undefined behaviour. It does have well-defined behavior in POSIX and in MSVC, but neither of those make it discard user input that has not yet been read.
Usually, the right way to clear pending input is read and discard characters up to and including a newline, but not beyond:
int c;
do c = getchar(); while (c != EOF && c != '\n');
Do not use *scanf() for potentially malformed input
Many tutorials teach you to use *scanf() for reading any kind of input, because it is so versatile.
But the purpose of *scanf() is really to read bulk data that can be somewhat relied upon being in a predefined format. (Such as being written by another program.)
Even then *scanf() can trip the unobservant:
Using a format string that in some way can be influenced by the user is a gaping security hole.
If the input does not match the expected format, *scanf() immediately stops parsing, leaving any remaining arguments uninitialized.
It will tell you how many assignments it has successfully done -- which is why you should check its return code (see above) -- but not where exactly it stopped parsing the input, making graceful error recovery difficult.
It skips any leading whitespaces in the input, except when it does not ([, c, and n conversions). (See next paragraph.)
It has somewhat peculiar behaviour in some corner cases.
When *scanf() does not work as expected
A frequent problem with *scanf() is when there is an unread whitespace (' ', '\n', ...) in the input stream that the user did not account for.
Reading a number ("%d" et al.), or a string ("%s"), stops at any whitespace. And while most *scanf() conversion specifiers skip leading whitespace in the input, [, c and n do not. So the newline is still the first pending input character, making either %c and %[ fail to match.
You can skip over the newline in the input, by explicitly reading it e.g. via fgetc(), or by adding a whitespace to your *scanf() format string. (A single whitespace in the format string matches any number of whitespace in the input.)
Read, then parse
We just adviced against using *scanf() except when you really, positively, know what you are doing. So, what to use as a replacement?
Instead of reading and parsing the input in one go, as *scanf() attempts to do, separate the steps.
Read (part of) a line of input via fgets()
fgets() has a parameter for limiting its input to at most that many bytes, avoiding overflow of your buffer. If the input line did fit into your buffer completely, the last character in your buffer will be the newline ('\n'). If it did not all fit, you are looking at a partially-read line.
Parse the line in-memory
Especially useful for in-memory parsing are the strtol() and strtod() function families, which provide similar functionality to the *scanf() conversion specifiers d, i, u, o, x, a, e, f, and g.
But they also tell you exactly where they stopped parsing, and have meaningful handling of numbers too large for the target type.
Beyond those, C offers a wide range of string processing functions. Since you have the input in memory, and always know exactly how far you have parsed it already, you can walk back as many times you like trying to make sense of the input.
And if all else fails, you have the whole line available to print a helpful error message for the user.
Clean Up
Make sure you explicitly close any stream you have (successfully) opened. This flushes any as-yet unwritten buffers, and avoids resource leaks.
fclose(fp);

Related

fgets() gets does not stop when it encounters a NUL. Under what circumstance will this be a problem?

I understand that when using fgets, the program will not stop when it encounters NUL, namely '\0'. However when will this a problem and needs to be manually addressed?
My main use case for fgets is to get it from user input (like a better version of scanf to allow reading white spaces.) I cannot think of a situation where a user will want to terminates his input by typing '\0'.
Recall that text file input is usually lines: characters followed by a '\n' (expect maybe the last line). On reading text input, a null character is not special. It is not an alternate end-of-line. It is just another non-'\n' character.
It is functions like fgets(), fscanf() append a null character to the read buffer to denote the end of string. Now when code reads that string, is a null character a read one or the appended one?
If code uses fgets(), fscanf(), getchar(), etc. is not really the issue. The issue is how should code detect null characters and how to handle them.
Reading a null character from a text stream is uncommon, but not impossible. Null characters tend to reflect a problem more often than valid text data.
Reasons null characters exist in a text file
The text file is a wide character text file, perhaps UTF16 when null characters are common. Code needs to read this file with fgetws() and related functions.
The text file is a binary data one. Better to use fread().
File is a text file, yet through error or nefarious intent, code has null characters. Usually best to detect, if possible, and exit working this file with an error message or status.
Legitimate text file uncommonly using null characters. fgets() is not the best tool. Likely need crafted input functions or other extensions like getline().
How to detect?
fgets(): prefill buffer with non-zero input. See if the characters after the first null character are all the pre-fill value.
fscanf(): Read a line with some size like char buf[200]; fscanf(f, "%199[^\n]%n", buf, &length); and use length for input length. Additional code needed to handle end-of-line, extra-long lines, 0 length lines, etc.
fgetc(): Build user code to read/handle as needed - tends to be slow.
How to handle?
In general, error out with a message or status.
If null characters are legitimate to this code's handling of text files, code needs to handle input, not as C strings, but as a buffer and length.
Good luck.

scanf and a left character in the input stream from previous input → setvbuf?

I have a comprehension question. I haven't used C for a long time, but today I rummaged through the C language threads a bit and I came across the following sentence which I roughly cite here:
“scanf is reading a character which is left in the input stream from the previous input you type.”
And something came into my memory. In those days, I used setvbuf(stdin, NULL, _IONBF, 0); setvbuf(stdin, NULL, _IONBF, BUFSIZ); to avoid this behavior. Those lines have ensured that there are no characters left from the previous input in the input stream.
My question now is: is that a good solution? This is something what I've always wanted to know
I think what you're saying is you would make those two calls to setvbuf after a call to scanf, and that on at least one C implementation it had the effect of discarding any "left over" characters, that were not consumed by scanf, in stdin's buffer.
This is not a good way to discard "left over" characters, because the C standard says that setvbuf "may be used only after the stream has been associated with an open file and before any other operation (other than an unsuccessful call to setvbuf) is performed on the stream". (N1570 §7.21.5.6p2.) Violation of this rule should only cause setvbuf to fail, but nonetheless that means it doesn't do what you want.
The best way to do what you want is to not use scanf. Instead, write a real parser for your input and feed it either character by character (using getchar) or line by line (using getline or fgets). A real parser has principled handling of white space (including newline characters) and syntax errors, and therefore never finds itself with "left over" input.
Writing real parsers is unfortunately a book-length topic, so I will say only that sscanf is never useful in a real parser either. (Standard library functions that are useful include strsep, strchr, and the strto* family.)
There are rare circumstances (usually involving processing of interactive input) where you really do have to discard junk at the end of a line. In those circumstances, the way to do it is with this loop:
int c;
do c = getchar();
while (c != '\n' && c != EOF);

scanf what remains on the input stream after failer

I learned C a few years ago using K&R 2nd edition ANSI C. I’ve been reviewing my notes, while I’m learning more modern C from 2 other books.
I notice that K&R never use scanf in the book, except on the one section where they introduce it. They mainly use a getline function which they write in the book, which they change later in the book, once they introduce pointers. There getline is different then gcc getline, which caused me some problems until i change the name of getline to ggetline.
Reviewing my notes i found this quote:
This simplification is convenient and superficially attractive, and it
works, as far as it goes. The problem is that scanf does not work well
in more complicated situations. In section 7.1, we said that calls to
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline. Worse, it turns out that
scanf's error handling is inadequate for many purposes. It tells you
whether a conversion succeeded or not (more precisely, it tells you
how many conversions succeeded), but it doesn't tell you anything more
than that (unless you ask very carefully). Like atoi and atof, scanf
stops reading characters when it's processing a %d or %f input and it
finds a non-numeric character. Suppose you've prompted the user to
enter a number, and the user accidentally types the letter 'x'. scanf
might return 0, indicating that it couldn't convert a number, but the
unconvertable text (the 'x') remains on the input stream unless you
figure out some other way to remove it.
For these reasons (and several others, which I won't bother to
mention) it's generally recommended that scanf not be used for
unstructured input such as user prompts. It's much better to read
entire lines with something like getline (as we've been doing all
along) and then process the line somehow. If the line is supposed to
be a single number, you can use atoi or atof to convert it. If the
line has more complicated structure, you can use sscanf (which we'll
meet in a minute) to parse it. (It's better to use sscanf than scanf
because when sscanf fails, you have complete control over what you do
next. When scanf fails, on the other hand, you're at the mercy of
where in the input stream it has left you.)
At first i thought this quote was from K&R, but i cannot find it in the book. Then i realized thats it's from lecture notes i got online, for someone who taught a course years ago using K&R book.
lecture notes
I know that K&R book is 30 years old now, so it is dated is some ways.
This quote is very old, so i was wondering if scanf still has this behavior or has it changed?
Does scanf still leave stuff in the input stream when it fails? for example above:
Suppose you've prompted the user to enter a number, and the user
accidentally types the letter 'x'. scanf might return 0, indicating
that it couldn't convert a number, but the unconvertable text (the
'x') remains on the input stream.
Is the following still true?
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline.
Has scanf changed much since the quotes above were written? Or are they still true today?
The reason i ask, in the newer books i am reading, no one mentions these issues.
scanf() is evil - use fgets() and then parse.
The detail is not that scanf() is completely bad.
1) The format specifiers are often used in a weak manner
char buf[100];
scanf("%s", buf); // bad - no width limit
2) The return value is errantly not checked
scanf("%99[\n]", buf); // what if use entered `"\n"`?
puts(buf);
3) When input is not as expected, it is not clear what remains in stdin.
if (scanf("%d %d %d", &i, &j, &k) != 3) {
// OK, not what is in `stdin`?
}
you can have baffling problems if you try to intermix calls to scanf with calls to getchar or getline.
Yes. Many scanf() calls leave a trailing '\n' in stdin that are then read as an empty line by getline(), fgets(). scanf() is not for reading lines. getline() and fgets() are much better suited to read a line.
Has scanf changed much since the quotes above were written?
Only so much change can happen without messing up the code base. #Jonathan Leffler
scanf() remains troublesome. scanf() is unable to accept an argument (after the format) to indicate how many characters to accept into a char * destination.
Some systems have added additional format options to help.
A fundamental issues is this:
User input is evil. It is more robust to get the text input as one step, qualify input, then parse and assess its success than trying to do all this in one function.
Security
The weakness of scanf() and coder tendency to code scanf() poorly has been a gold mine for hackers.
IMO, C lacks a robust user input function set.

What's the difference between gets and scanf?

If the code is
scanf("%s\n",message)
vs
gets(message)
what's the difference?It seems that both of them get input to message.
The basic difference [in reference to your particular scenario],
scanf() ends taking input upon encountering a whitespace, newline or EOF
gets() considers a whitespace as a part of the input string and ends the input upon encountering newline or EOF.
However, to avoid buffer overflow errors and to avoid security risks, its safer to use fgets().
Disambiguation: In the following context I'd consider "safe" if not leading to trouble when correctly used. And "unsafe" if the "unsafetyness" cannot be maneuvered around.
scanf("%s\n",message)
vs
gets(message)
What's the difference?
In terms of safety there is no difference, both read in from Standard Input and might very well overflow message, if the user enters more data then messageprovides memory for.
Whereas scanf() allows you to be used safely by specifying the maximum amount of data to be scanned in:
char message[42];
...
scanf("%41s", message); /* Only read in one few then the buffer (messega here)
provides as one byte is necessary to store the
C-"string"'s 0-terminator. */
With gets() it is not possible to specify the maximum number of characters be read in, that's why the latter shall not be used!
The main difference is that gets reads until EOF or \n, while scanf("%s") reads until any whitespace has been encountered. scanf also provides more formatting options, but at the same time it has worse type safety than gets.
Another big difference is that scanf is a standard C function, while gets has been removed from the language, since it was both superfluous and dangerous: there was no protection against buffer overruns. The very same security flaw exists with scanf however, so neither of those two functions should be used in production code.
You should always use fgets, the C standard itself even recommends this, see C11 K.3.5.4.1
Recommended practice
6 The fgets function allows properly-written
programs to safely process input lines too long to store in the result
array. In general this requires that callers of fgets pay attention to
the presence or absence of a new-line character in the result array.
Consider using fgets (along with any needed processing based on
new-line characters) instead of gets_s.
(emphasis mine)
There are several. One is that gets() will only get character string data. Another is that gets() will get only one variable at a time. scanf() on the other hand is a much, much more flexible tool. It can read multiple items of different data types.
In the particular example you have picked, there is not much of a difference.
gets - Reads characters from stdin and stores them as a string.
scanf - Reads data from stdin and stores them according to the format specified int the scanf statement like %d, %f, %s, etc.
gets:->
gets() reads a line from stdin into the buffer pointed to by s until either a terminating newline or EOF, which it replaces with a null byte ('\0').
BUGS:->
Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
scanf:->
The scanf() function reads input from the standard input stream stdin;
BUG:->
Some times scanf makes boundary problems when deals with array and string concepts.
In case of scanf you need that format mentioned, unlike in gets. So in gets you enter charecters, strings, numbers and spaces.
In case of scanf , you input ends as soon as a white-space is encountered.
But then in your example you are using '%s' so, neither gets() nor scanf() that the strings are valid pointers to arrays of sufficient length to hold the characters you are sending to them. Hence can easily cause an buffer overflow.
Tip: use fgets() , but that all depends on the use case
The concept that scanf does not take white space is completely wrong. If you use this part of code it will take white white space also :
#include<stdio.h>
int main()
{
char name[25];
printf("Enter your name :\n");
scanf("%[^\n]s",name);
printf("%s",name);
return 0;
}
Where the use of new line will only stop taking input. That means if you press enter only then it will stop taking inputs.
So, there is basically no difference between scanf and gets functions. It is just a tricky way of implementation.
scanf() is much more flexible tool while gets() only gets one variable at a time.
gets() is unsafe, for example: char str[1]; gets(str)
if you input more then the length, it will end with SIGSEGV.
if only can use gets, use malloc as the base variable.

C: Reading a text file (with variable-length lines) line-by-line using fread()/fgets() instead of fgetc() (block I/O vs. character I/O)

Is there a getline function that uses fread (block I/O) instead of fgetc (character I/O)?
There's a performance penalty to reading a file character by character via fgetc. We think that to improve performance, we can use block reads via fread in the inner loop of getline. However, this introduces the potentially undesirable effect of reading past the end of a line. At the least, this would require the implementation of getline to keep track of the "unread" part of the file, which requires an abstraction beyond the ANSI C FILE semantics. This isn't something we want to implement ourselves!
We've profiled our application, and the slow performance is isolated to the fact that we are consuming large files character by character via fgetc. The rest of the overhead actually has a trivial cost by comparison. We're always sequentially reading every line of the file, from start to finish, and we can lock the entire file for the duration of the read. This probably makes an fread-based getline easier to implement.
So, does a getline function that uses fread (block I/O) instead of fgetc (character I/O) exist? We're pretty sure it does, but if not, how should we implement it?
Update Found a useful article, Handling User Input in C, by Paul Hsieh. It's a fgetc-based approach, but it has an interesting discussion of the alternatives (starting with how bad gets is, then discussing fgets):
On the other hand the common retort from C programmers (even those considered experienced) is to say that fgets() should be used as an alternative. Of course, by itself, fgets() doesn't really handle user input per se. Besides having a bizarre string termination condition (upon encountering \n or EOF, but not \0) the mechanism chosen for termination when the buffer has reached capacity is to simply abruptly halt the fgets() operation and \0 terminate it. So if user input exceeds the length of the preallocated buffer, fgets() returns a partial result. To deal with this programmers have a couple choices; 1) simply deal with truncated user input (there is no way to feed back to the user that the input has been truncated, while they are providing input) 2) Simulate a growable character array and fill it in with successive calls to fgets(). The first solution, is almost always a very poor solution for variable length user input because the buffer will inevitably be too large most of the time because its trying to capture too many ordinary cases, and too small for unusual cases. The second solution is fine except that it can be complicated to implement correctly. Neither deals with fgets' odd behavior with respect to '\0'.
Exercise left to the reader: In order to determine how many bytes was really read by a call to fgets(), one might try by scanning, just as it does, for a '\n' and skip over any '\0' while not exceeding the size passed to fgets(). Explain why this is insufficient for the very last line of a stream. What weakness of ftell() prevents it from addressing this problem completely?
Exercise left to the reader: Solve the problem determining the length of the data consumed by fgets() by overwriting the entire buffer with a non-zero value between each call to fgets().
So with fgets() we are left with the choice of writing a lot of code and living with a line termination condition which is inconsistent with the rest of the C library, or having an arbitrary cut-off. If this is not good enough, then what are we left with? scanf() mixes parsing with reading in a way that cannot be separated, and fread() will read past the end of the string. In short, the C library leaves us with nothing. We are forced to roll our own based on top of fgetc() directly. So lets give it a shot.
So, does a getline function that's based on fgets (and doesn't truncate the input) exist?
Don't use fread. Use fgets. I take it this is a homework/classproject problem so I'm not providing a complete answer, but if you say it's not, I'll give more advice. It is definitely possible to provide 100% of the semantics of GNU-style getline, including embedded null bytes, using purely fgets, but it requires some clever thinking.
OK, update since this isn't homework:
memset your buffer to '\n'.
Use fgets.
Use memchr to find the first '\n'.
If no '\n' is found, the line is longer than your buffer. Englarge the buffer, fill the new portion with '\n', and fgets into the new portion, repeating as necessary.
If the character following '\n' is '\0', then fgets terminated due to reaching end of a line.
Otherwise, fgets terminated due to reaching EOF, the '\n' is left over from your memset, the previous character is the terminating null that fgets wrote, and the character before that is the last character of actual data read.
You can eliminate the memset and use strlen in place of memchr if you don't care about supporting lines with embedded nulls (either way, the null will not terminate reading; it will just be part of your read-in line).
There's also a way to do the same thing with fscanf and the "%123[^\n]" specifier (where 123 is your buffer limit), which gives you the flexibility to stop at non-newline characters (ala GNU getdelim). However it's probably slow unless your system has a very fancy scanf implementation.
There isn't a big performance difference between fgets and fgetc/setvbuf.
Try:
int c;
FILE *f = fopen("blah.txt","r");
setvbuf(f,NULL,_IOLBF,4096); /* !!! check other values for last parameter in your OS */
while( (c=fgetc(f))!=EOF )
{
if( c=='\n' )
...
else
...
}

Resources