scanf what remains on the input stream after failer - c

I learned C a few years ago using K&R 2nd edition ANSI C. I’ve been reviewing my notes, while I’m learning more modern C from 2 other books.
I notice that K&R never use scanf in the book, except on the one section where they introduce it. They mainly use a getline function which they write in the book, which they change later in the book, once they introduce pointers. There getline is different then gcc getline, which caused me some problems until i change the name of getline to ggetline.
Reviewing my notes i found this quote:
This simplification is convenient and superficially attractive, and it
works, as far as it goes. The problem is that scanf does not work well
in more complicated situations. In section 7.1, we said that calls to
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline. Worse, it turns out that
scanf's error handling is inadequate for many purposes. It tells you
whether a conversion succeeded or not (more precisely, it tells you
how many conversions succeeded), but it doesn't tell you anything more
than that (unless you ask very carefully). Like atoi and atof, scanf
stops reading characters when it's processing a %d or %f input and it
finds a non-numeric character. Suppose you've prompted the user to
enter a number, and the user accidentally types the letter 'x'. scanf
might return 0, indicating that it couldn't convert a number, but the
unconvertable text (the 'x') remains on the input stream unless you
figure out some other way to remove it.
For these reasons (and several others, which I won't bother to
mention) it's generally recommended that scanf not be used for
unstructured input such as user prompts. It's much better to read
entire lines with something like getline (as we've been doing all
along) and then process the line somehow. If the line is supposed to
be a single number, you can use atoi or atof to convert it. If the
line has more complicated structure, you can use sscanf (which we'll
meet in a minute) to parse it. (It's better to use sscanf than scanf
because when sscanf fails, you have complete control over what you do
next. When scanf fails, on the other hand, you're at the mercy of
where in the input stream it has left you.)
At first i thought this quote was from K&R, but i cannot find it in the book. Then i realized thats it's from lecture notes i got online, for someone who taught a course years ago using K&R book.
lecture notes
I know that K&R book is 30 years old now, so it is dated is some ways.
This quote is very old, so i was wondering if scanf still has this behavior or has it changed?
Does scanf still leave stuff in the input stream when it fails? for example above:
Suppose you've prompted the user to enter a number, and the user
accidentally types the letter 'x'. scanf might return 0, indicating
that it couldn't convert a number, but the unconvertable text (the
'x') remains on the input stream.
Is the following still true?
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline.
Has scanf changed much since the quotes above were written? Or are they still true today?
The reason i ask, in the newer books i am reading, no one mentions these issues.

scanf() is evil - use fgets() and then parse.
The detail is not that scanf() is completely bad.
1) The format specifiers are often used in a weak manner
char buf[100];
scanf("%s", buf); // bad - no width limit
2) The return value is errantly not checked
scanf("%99[\n]", buf); // what if use entered `"\n"`?
puts(buf);
3) When input is not as expected, it is not clear what remains in stdin.
if (scanf("%d %d %d", &i, &j, &k) != 3) {
// OK, not what is in `stdin`?
}
you can have baffling problems if you try to intermix calls to scanf with calls to getchar or getline.
Yes. Many scanf() calls leave a trailing '\n' in stdin that are then read as an empty line by getline(), fgets(). scanf() is not for reading lines. getline() and fgets() are much better suited to read a line.
Has scanf changed much since the quotes above were written?
Only so much change can happen without messing up the code base. #Jonathan Leffler
scanf() remains troublesome. scanf() is unable to accept an argument (after the format) to indicate how many characters to accept into a char * destination.
Some systems have added additional format options to help.
A fundamental issues is this:
User input is evil. It is more robust to get the text input as one step, qualify input, then parse and assess its success than trying to do all this in one function.
Security
The weakness of scanf() and coder tendency to code scanf() poorly has been a gold mine for hackers.
IMO, C lacks a robust user input function set.

Related

why does this part of the program skip the second input?

On entering the first inputs this part of my program terminates.
float p_x,p_y,q_x,q_y,r_x,r_y;
printf("Note :- coordinates are entered as follows (x,y) \n");
printf("Enter the coordinates of 'p' :");
scanf("(%f,%f)",&p_x,&p_y);
printf("Enter the coordinates of 'q' :");
scanf("(%f,%f)",&q_x,&q_y);
Short answer: Change the second scanf call to
scanf(" (%f,%f)", &q_x, &q_y);
Note the extra space in the format statement.
Longer explanation: One of scanf's quirks is that it only reads what it needs, and leaves everything else sitting on the input stream for next time. So when you read p_x and p_y, scanf reads the (, the value for p_x, the ,, the value for p_y, and the ), but it doesn't do anything with the invisible final \n on the line, the "newline" character that's there as a result of the Enter key you typed.
So, later, when you try to read q_x and q_y, the first thing scanf wants to see is the ( you said should be there, but the first thing scanf actually sees is that \n character left over from last time. So the second scanf call fails, reading nothing.
In a scanf format statement, a space character means "read and ignore whitespace". So if you change the second format string to " (%f,%f)", scanf will read and ignore the stray newline, and then it will be able to read the input you asked for.
The reason you don't have this problem all the time is that most (though not all) of the scanf format characters automatically skip leading whitespace as a part of doing their work. If you had, say, a simpler call to
scanf("%f", &q_x);
followed by a later call to
scanf("%f", &q_y);
it would work just fine, without any extra explicit space characters in the format strings.
General advice: You may be thinking that this is a pretty lame situation. You may have gotten the impression that scanf was supposed to be a nice, simple way to do input in your C programs, and you may be thinking, "But this is not simple! How was I supposed to know to write " (%f,%f)"?" And I would absolutely agree with you: This is a pretty lame situation!
The fact is that scanf only seems to be a nice, simple way to do input. It's actually a terribly complicated, unforgiving, nearly useless mess. It's only simple if you're reading very simple input, such as single numbers or simple strings (not containing spaces), or maybe single characters. For anything more complicated than that, there are so many quirks and foibles and exceptions and special cases that it almost never works the first time, and it's often more trouble than it's worth to even try to get it working.
So the general advice is to only try to use scanf for very simple input, during the first few weeks of your C programming career. You can read what I mean by "very simple input" in this answer to a previous question. Then, once you've gotten a few skills under your belt, or when you need to do something a little more complicated, its time to learn how to do input using better techniques than scanf.
Check return value to identify problems.
// scanf("(%f,%f)",&q_x,&q_y);
if (scanf("(%f,%f)",&q_x,&q_y) != 2) {
puts("Bad input");
}
Tolerate and consume spaces, new lines, tabs, etc. between portion of input. OP did not do this. The '\n' of the first entry was not consumed and caused the 2nd scanf() to fail. "%f" already allows optional leading white-spaces. Add a " " before the fixed formats characters: (,) so optional white-spaces will get consumed there too.
// v---v---v add spaces to format to allow any white-space in input.
if (scanf(" (%f ,%f )",&q_x,&q_y) != 2) {
puts("Bad input");
}

What's the difference between gets and scanf?

If the code is
scanf("%s\n",message)
vs
gets(message)
what's the difference?It seems that both of them get input to message.
The basic difference [in reference to your particular scenario],
scanf() ends taking input upon encountering a whitespace, newline or EOF
gets() considers a whitespace as a part of the input string and ends the input upon encountering newline or EOF.
However, to avoid buffer overflow errors and to avoid security risks, its safer to use fgets().
Disambiguation: In the following context I'd consider "safe" if not leading to trouble when correctly used. And "unsafe" if the "unsafetyness" cannot be maneuvered around.
scanf("%s\n",message)
vs
gets(message)
What's the difference?
In terms of safety there is no difference, both read in from Standard Input and might very well overflow message, if the user enters more data then messageprovides memory for.
Whereas scanf() allows you to be used safely by specifying the maximum amount of data to be scanned in:
char message[42];
...
scanf("%41s", message); /* Only read in one few then the buffer (messega here)
provides as one byte is necessary to store the
C-"string"'s 0-terminator. */
With gets() it is not possible to specify the maximum number of characters be read in, that's why the latter shall not be used!
The main difference is that gets reads until EOF or \n, while scanf("%s") reads until any whitespace has been encountered. scanf also provides more formatting options, but at the same time it has worse type safety than gets.
Another big difference is that scanf is a standard C function, while gets has been removed from the language, since it was both superfluous and dangerous: there was no protection against buffer overruns. The very same security flaw exists with scanf however, so neither of those two functions should be used in production code.
You should always use fgets, the C standard itself even recommends this, see C11 K.3.5.4.1
Recommended practice
6 The fgets function allows properly-written
programs to safely process input lines too long to store in the result
array. In general this requires that callers of fgets pay attention to
the presence or absence of a new-line character in the result array.
Consider using fgets (along with any needed processing based on
new-line characters) instead of gets_s.
(emphasis mine)
There are several. One is that gets() will only get character string data. Another is that gets() will get only one variable at a time. scanf() on the other hand is a much, much more flexible tool. It can read multiple items of different data types.
In the particular example you have picked, there is not much of a difference.
gets - Reads characters from stdin and stores them as a string.
scanf - Reads data from stdin and stores them according to the format specified int the scanf statement like %d, %f, %s, etc.
gets:->
gets() reads a line from stdin into the buffer pointed to by s until either a terminating newline or EOF, which it replaces with a null byte ('\0').
BUGS:->
Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
scanf:->
The scanf() function reads input from the standard input stream stdin;
BUG:->
Some times scanf makes boundary problems when deals with array and string concepts.
In case of scanf you need that format mentioned, unlike in gets. So in gets you enter charecters, strings, numbers and spaces.
In case of scanf , you input ends as soon as a white-space is encountered.
But then in your example you are using '%s' so, neither gets() nor scanf() that the strings are valid pointers to arrays of sufficient length to hold the characters you are sending to them. Hence can easily cause an buffer overflow.
Tip: use fgets() , but that all depends on the use case
The concept that scanf does not take white space is completely wrong. If you use this part of code it will take white white space also :
#include<stdio.h>
int main()
{
char name[25];
printf("Enter your name :\n");
scanf("%[^\n]s",name);
printf("%s",name);
return 0;
}
Where the use of new line will only stop taking input. That means if you press enter only then it will stop taking inputs.
So, there is basically no difference between scanf and gets functions. It is just a tricky way of implementation.
scanf() is much more flexible tool while gets() only gets one variable at a time.
gets() is unsafe, for example: char str[1]; gets(str)
if you input more then the length, it will end with SIGSEGV.
if only can use gets, use malloc as the base variable.

fseek(stdin,0,SEEK_SET) and rewind(stdin) REALLY do flush the input buffer "stdin".Is it OK to use them? [duplicate]

This question already has an answer here:
Can fseek(stdin,1,SEEK_SET) or rewind(stdin) be used to flush the input buffer instead of non-portable fflush(stdin)?
(1 answer)
Closed 8 years ago.
I was thinking since the start that why can't fseek(stdin,0,SEEK_SET) and rewind(stdin) flush the input buffer since it is clearly written in cplusplusreference that calling these two functions flush the buffer(Input or Output irrespective).But since the whole idea seemed new,I had put it in a clumsy question yesterday.
Can fseek(stdin,1,SEEK_SET) or rewind(stdin) be used to flush the input buffer instead of non-portable fflush(stdin)?
And I was skeptical about the answers I got which seemed to suggest I couldn't do it.Frankly,I saw no reason why not.Today I tried it myself and it works!! I mean, to deal with the problem up the newline lurking in stdin while using multiple scanf() statments, it seems like I can use fseek(stdin,0,SEEK_SET) or rewind(stdin) inplace of the non-portable and UB fflush(stdin).
Please tell me if this is a correct approach without any risk.Till now, I had been using the following code to deal with newline in stdin: while((c = getchar()) != '\n' && c != EOF);. Here's my code below:
#include <stdio.h>
int main ()
{
int a,b;
char c;
printf("Enter 2 integers\n");
scanf("%d%d",&a,&b);
printf("Enter a character\n");
//rewind(stdin); //Works if activated
fseek(stdin,0,SEEK_SET); //Works fine
scanf("%c",&c); //This scanf() is skipped without fseek() or rewind()
printf("%d,%d,%c",a,b,c);
}
In my program, if I don't use either of fseek(stdin,0,SEEK_SET) or rewind(stdin),the second scanf() is skipped and newline is always taken up as the character.The problem is solved if I use fseek(stdin,0,SEEK_SET) or rewind(stdin).
I'm not sure where you read on cplusplusreference (whatever that is) that flushing to end of line is the mandated behaviour.
The closest matches I could find, http://www.cplusplus.com/reference/cstdio/fseek/ and http://www.cplusplus.com/reference/cstdio/rewind, don't mention flushing at all, other than in reference to fflush().
In any case, there's nothing in the C standard which mandates this behaviour either. C11 7.20.9.2 fseek and 7.20.9.5 rewind (which is, after all, identical to fseek with zero offset and SEEK_SET) also make no mention of flushing.
All they state is that the file pointer is moved to the relevant position in the stream.
So, to the extent this works in your environment, all we can say is that this works in your environment. It may not work elsewhere, it may even stop working in your envirnment at an indeterminate point in the future.
If you really want robust input, you should be using a two-stage approach, fgets to retrieve a line followed by sscanf to get what you want from that line. Mixing the two paradigms of input (scanf and getchar) is frequently problematic.
A good (robust, error-checking, and clearing to end of line if needed) input function can be found here.
I tested it right ago, and I checked that fseek doesn't work on stdin. fseek() usually works on the file on the disk so that it seems to be prohibited to access to stdin by the kernel for some secure reasons. Anyway, it was so happy to see who thought like me. Tnx for good question.

why scanf is safer than getchar? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
getc Vs getchar Vs Scanf for reading a character from stdin
I know that getchar is a macro that takes character as an input from stdin . where as scanf is a function that takes any data type as an input from stdin.
As getchar is a macro thats why it helps to make program run faster ,but inspite of this it is recommended to use scanf in place of getchar. WHY?
i learned on net that scanf is safe as compared to getchar,so ,what makes scanf safe?
I would argue that both can be as safe or unsafe as you make them, but in the end I would settle for using scanf():
Since it processes more than one characters at each call, scanf() is potentially faster.
Its behavior is well-documented which means that there are well-established methods to avoid security issues, such as including hard-coded length limits in the format string, and of course never using format strings generated from user input.
scanf() is far easier to use - getchar() may be "safer" on its own, but you will have to write a lot of code around it to get some actual functionality out of it. Code that will duplicate functionality provided by scanf() and will then have to be reviewed for security implications. The scanf() implementation is likely to be peer-reviewed extensively, something that your own code will probably never be.
Code using standard functions, such as scanf() is far easier to maintain than anything based on custom libraries, especially in programming teams with significant staff turnover.
scanf is faster because it can read multiple characters at once.
However, getchar is safer because it can only read one character at a time. If input is not checked, it is relatively easy to exploit program that is using scanf.
getChar() is a function as What I know and what I have read, Although it is termed as a macro by some of the websites but it is clearly termed as a function in the official C++ Reference website.
getchar() is used when you need one and only one character to be input from the keyboard where as scanf() is used to get multiple characters. Which one is faster then other depends totally on your requirement and implementation.
As far as the safety and ease of use is concerned i would cast my vote to scanf() as I can input multiple characters and I don't have to use flush() after the use of scanf() which sometimes becomes mandatory when using getchar().

How to use sscanf correctly and safely

First of all, other questions about usage of sscanf do not answer my question because the common answer is to not use sscanf at all and use fgets or getch instead, which is impossible in my case.
The problem is my C professor wants me to use scanf in a program. It's a requirement.
However the program also must handle all the incorrect input.
The program must read an array of integers. It doesn't matter in what format the integers
for the array are supplied. To make the task easier, the program might first read the size of the array and then the integers each in a new line.
The program must handle the inputs like these (and report errors appropriately):
999999999999999...9 (numbers larger than integer)
12a3 (don't read this as an integer 12)
a...z (strings)
11 aa 22 33\n all in one line (this might be handled by discarding everything after 11)
inputs larger than the input array
There might be more incorrect cases, these are the only few I could think of.
If the erroneous input is supplied, the program must ask the user to input again until
the correct input is given, but the previous correct input must be kept (only incorrect
input must be cleared from the input stream).
Everything must conform to C99 standard.
The scanf family of function cannot be used safely, especially when dealing with integers. The first case you mentioned is particularly troublesome. The standard says this:
If this object does not have an appropriate type, or if the result of
the conversion cannot be represented in the object, the behavior is
undefined.
Plain and simple. You might think of %5d tricks and such but you'll find they're not reliable. Or maybe someone will think of errno. The scanf functions aren't required to set errno.
Follow this fun little page: they end up ditching scanf altogether.
So go back to your C professor and ask them: how exactly does C99 mandate that sscanf will report errors ?
Well, let sscanf accept all inputs as %s (i.e. strings) and then program analyze them
If you must use scanf to accept the input, I think you start with something a bit like the following.
int array[MAX];
int i, n;
scanf("%d", &n);
for (i = 0; i < n && !feof(stdin); i++) {
scanf("%d", &array[i]);
}
This will handle (more or less) the free-format input problem since scanf will automatically skip leading whitespace when matching a %d format.
The key observation for many of the rest of your concerns is that scanf tells you how many format codes it parsed successfully. So,
int matches = scanf("%d", &array[i]);
if (matches == 0) {
/* no integer in the input stream */
}
I think this handles directly concerns (3) and (4)
By itself, this doesn't quite handle the case of the input12a3. The first time through the loop, scanf would parse '12as an integer 12, leaving the remaininga3` for the next loop. You would get an error the next time round, though. Is that good enough for your professor's purposes?
For integers larger than maxint, eg, "999999.......999", I'm not sure what you can do with straight scanf.
For inputs larger than the input array, this isn't a scanf problem per se. You just need to count how many integers you've parsed so far.
If you're allowed to use sscanf to decode strings after they've been extracted from the input stream by something like scanf("%s") you could also do something like this:
while (...) {
scanf("%s", buf);
/* use strtol or sscanf if you really have to */
}
This works for any sequence of white-space separated words, and lets you separate scanning the input for words, and then seeing if those words look like numbers or not. And, if you have to, you can use scanf variants for each part.
The problem is my C professor wants me to use scanf in a program.
It's a requirement.
However the program also must handle all the incorrect input.
This is an old question, so the OP is not in that professor's class any more (and hopefully the professor is retired), but for the record, this is a fundamentally misguided and basically impossible requirement.
Experience has shown that when it comes to interactive user input, scanf is suitable only for quick-and-dirty situations when the input can be assumed to correct.
If you want to read an integer (or a floating-point number, or a simple string) quickly and easily, then scanf is a nice tool for the job. However, its ability to gracefully handle incorrect input is basically nonexistent.
If you want to read input robustly, reliably detecting incorrect input, and perhaps warning the user and asking them to try again, scanf is simply not the right tool for the job. It's like trying to drive screws with a hammer.
See this answer for some guidelines for using scanf safely in those quick-and-dirty situations. See this question for suggestions on how to do robust input using something other than scanf.
scanf("%s", string) into long int_string = strtol(string, &end_pointer, base:10)

Resources