why scanf is safer than getchar? [duplicate] - c

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
getc Vs getchar Vs Scanf for reading a character from stdin
I know that getchar is a macro that takes character as an input from stdin . where as scanf is a function that takes any data type as an input from stdin.
As getchar is a macro thats why it helps to make program run faster ,but inspite of this it is recommended to use scanf in place of getchar. WHY?
i learned on net that scanf is safe as compared to getchar,so ,what makes scanf safe?

I would argue that both can be as safe or unsafe as you make them, but in the end I would settle for using scanf():
Since it processes more than one characters at each call, scanf() is potentially faster.
Its behavior is well-documented which means that there are well-established methods to avoid security issues, such as including hard-coded length limits in the format string, and of course never using format strings generated from user input.
scanf() is far easier to use - getchar() may be "safer" on its own, but you will have to write a lot of code around it to get some actual functionality out of it. Code that will duplicate functionality provided by scanf() and will then have to be reviewed for security implications. The scanf() implementation is likely to be peer-reviewed extensively, something that your own code will probably never be.
Code using standard functions, such as scanf() is far easier to maintain than anything based on custom libraries, especially in programming teams with significant staff turnover.

scanf is faster because it can read multiple characters at once.
However, getchar is safer because it can only read one character at a time. If input is not checked, it is relatively easy to exploit program that is using scanf.

getChar() is a function as What I know and what I have read, Although it is termed as a macro by some of the websites but it is clearly termed as a function in the official C++ Reference website.
getchar() is used when you need one and only one character to be input from the keyboard where as scanf() is used to get multiple characters. Which one is faster then other depends totally on your requirement and implementation.
As far as the safety and ease of use is concerned i would cast my vote to scanf() as I can input multiple characters and I don't have to use flush() after the use of scanf() which sometimes becomes mandatory when using getchar().

Related

What is the purpose of having a gets() function in c when it introduces vulnerabilities that fgets() doesn't? [duplicate]

This question already has answers here:
Why is the gets function so dangerous that it should not be used?
(13 answers)
Closed 9 months ago.
I understand why C developers don't use gets, since there are vulnerabilities that can be exploited with a buffer overflow attack, but why even have this function if it is safer to use fgets(). I suppose the question that I am asking is, are there any implications of gets() that are safe from a security standpoint?
gets was invented in the early days of the C language, about 50 years ago, before people had thought of the concept of buffer overflow attacks and when it was assumed that users of your program would be friendly. Obviously times have changed, and in retrospect its design was a mistake. It cannot be used safely and therefore should not be used at all, and current versions of the C language standard have actually removed it. See Why is the gets function so dangerous that it should not be used?
As to why it existed in the first place, most of the stdio functions have specialized versions that operate on stdin/stdout instead of on an arbitrary stream. For instance, there is fprintf which prints to an arbitrary stream, and printf which assumes stdout. It's Likewise there is fscanf/scanf, getc/getchar, and so on. It's mainly for convenience, so that you can save the trouble of typing stdout everywhere.
For line-based I/O, there was fputs/fgets for arbitrary streams, and puts/gets for stdout/stdin. Note that puts/gets had the special feature that they would automatically append/strip a newline character, so gets(buf) is not exactly equivalent to fgets(stdin, buf, INT_MAX).
It's not entirely clear why they decided that fgets should take a size parameter but gets should not. There may have been an implicit assumption that stdin was more likely to be "nice" input, like terminal input or text files with reasonable line length, whereas an arbitrary file opened with fopen might be more likely to need to handle extremely long lines and such. There are certainly many such aspects of the original C language that, again with the benefit of hindsight, were not well thought out.

How to definitely solve the scanf input stream problem

Suppose I want to run the following C snippet:
scanf("%d" , &some_variable);
printf("something something\n\n");
printf("Press [enter] to continue...")
getchar(); //placed to give the user some time to read the something something
This snippet will not pause! The problem is that the scanf will leave the "enter" (\n)character in the input stream1, messing up all that comes after it; in this context the getchar() will eat the \n and not wait for an actual new character.
Since I was told not to use fflush(stdin) (I don't really get why tho) the best solution I have been able to come up with is simply to redefine the scan function at the start of my code:
void nsis(int *pointer){ //nsis arconim of: no shenanigans integer scanf
scanf("%d" , pointer);
getchar(); //this will clean the inputstream every time the scan function is called
}
And then we simply use nsis in place of scanf. This should fly. However it seems like a really homebrew, put-together-with-duct-tape, solution. How do professional C developers handle this mess? Do they not use scanf at all? Do they simply accept to work with a dirty input stream? What is the standard here?
I wasn't able to find a definite answer on this anywhere! Every source I could find mentioned a different (and sketchy) solution...
EDIT: In response to all commenting some version of "just don't use scanf": ok, I can do that, but what is the purpose of scanf then? Is it simply an useless broken function that should never be used? Why is it in the libraries to begin with then?
This seems really absurd, especially considering all beginners are taught to use scanf...
[1]: The \n left behind is the one that the user typed when inputting the value of the variable some_variable, and not the one present into the printf.
but what is the purpose of scanf then?
An excellent question.
Is it simply a useless broken function that should never be used?
It is almost useless. It is, arguably, quite broken. It should almost never be used.
Why is it in the libraries to begin with then?
My personal belief is that it was an experiment. It tries to be the opposite of printf. But that turned out not to be such a good idea in practice, and the function never got used very much, and pretty much fell out of favor, except for one particular use case...
This seems really absurd, especially considering all beginners are taught to use scanf...
You're absolutely right. It is really quite absurd.
There's a decent reason why all beginners are taught to use scanf, though. During week 1 of your first C programming class, you might write the little program
#include <stdio.h>
int main()
{
int size = 5;
for(int i = 0; i < size; i++) {
for(int j = 0; j < size; j++)
putchar('*');
putchar('\n');
}
}
to print a square. And during that first week, to make a square of a different size, you just edit the line int size = 5; and recompile.
But pretty soon — say, during week 2 — you want a way for the user to enter the size of the square, without having to recompile. You're probably not ready to muck around with argv. You're probably not ready to read a line of text using fgets and convert it back to an integer using atoi. (You're probably not even ready to seriously contemplate the vast differences between the integer 5 and the string "5" at all.) So — during week 2 of your first C programming class — scanf seems like just the ticket.
That's the "one particular use case" I was talking about. And if you only used scanf to read small integers into simple C programs during the second week of your first C programming class, things wouldn't be so bad. (You'd still have problems forgetting the &, but that would be more or less manageable.)
The problem (though this is again my personal belief) is that it doesn't stop there. Virtually every instructor of beginning C classes teaches students to use scanf. Unfortunately, few or none of those instructors ever explicitly tell students that scanf is a stopgap, to be used temporarily during that second week, and to be emphatically graduated beyond in later weeks. And, even worse, many instructors go on to assign more advanced problems, involving scanf, for which it is absolutely not a good solution, such as trying to do robust or "user friendly" input validation.
scanf's only virtue is that it seems like a nice, simple way to get small integers and other simple input from the user into your early programs. But the problem — actually a big, shuddering pile of 17 separate problems — is that scanf turns out to be vastly complicated and full of exceptions and hard to use, precisely the opposite of what you'd want in order to make things easy for beginners. scanf is only useful for beginners, and it's almost perfectly useless for beginners. It has been described as being like square training wheels on a child's bicycle.
How do professional C developers handle this mess?
Quite simply: by not using scanf at all. For one thing, very few production C programs print prompts to a line-based screen and ask users to type something followed by Return. And for those programs that do work that way, professional C developers unhesitatingly use fgets or the like to read a full line of input as text, then use other techniques to break down the line to extract the necessary information.
In answer to your initial question, there's no good answer. One of the fundamental rules of scanf usage (a set of rules, by the way, that no instructor ever teaches) is that you should never try to mix scanf and getchar (or fgets) in the same program. If there were a good way to make your "Press [enter] to continue..." code work after having called scanf, we wouldn't need that rule.
If you do want to try to flush the extra newline, so that a later call to getchar might work, there are several questions here with a bunch of good answers:
scanf() leaves the newline character in the buffer
Using fflush(stdin)
How to properly flush stdin in fgets loop
There's one more unrelated point that ends up being pretty significant to your question. When C was invented, there was no such thing as a GUI with multiple windows. Therefore no C programmer ever had the problem of having their output disappear before they could read it. Therefore no C programmer ever felt the need to write printf("Press [enter] to continue..."); followed by getchar(). I believe (another personal belief) that it is egregiously bad behavior for any vendor of a GUI-based C compiler to rig things up so that the output disappears upon program exit. Persistent output windows ought to be the default, for the benefit of beginning C programmers, with some kind of non-default option to turn that behavior off for those who don't want it.
Is scanf broken? No it is not. It is an excellent input function when you want to parse free form input data where few errors are to be expected. Free form means here that new lines are not relevant exactly as when you read/write very long paragraphs on a normal screen. And few errors expected is common when you read from files.
The scanf family function has another nice point: you have the same syntax when reading from the standard input stream, a file stream or a character string. It can easily parse simple common types and provide a minimal return value to allow cautious programmers to know whether all or part of all the expected data could be decoded.
That being said, it has major drawbacks: first being a C function, it cannot directly control whether the programmer has passed types meeting the format specifications, and second, as beginners are not consistenly hit on their head when they forget to control its return value, it is really too easy to make fully broken programs using it.
But the rule is:
if input is expected to be line oriented, first use fgets to get lines and then sscanf testing return values of both
only if input is expect to be free form (irrelevant newlines), scanf should be used directly. But never without testing its return value except for trivial tests.
Another drawback is that beginners hope it to be clever. It can indeed parse simple input formats, but is only a poor man's parser: do not use it as a generic parser because that is not what it is intended for.
Provided those rules are observed, it is a nice tool consistent with most of C language and its standard library: a simple tool to do simple things. It is up to programmers or library implementers to build richer tools.
I have only be using C language for more than 30 years, and was never bitten by scanf (well I was when I was a beginner, but I now know that I was to blame). Simply I have just tried for decades to only use it for what it can do...

scanf what remains on the input stream after failer

I learned C a few years ago using K&R 2nd edition ANSI C. I’ve been reviewing my notes, while I’m learning more modern C from 2 other books.
I notice that K&R never use scanf in the book, except on the one section where they introduce it. They mainly use a getline function which they write in the book, which they change later in the book, once they introduce pointers. There getline is different then gcc getline, which caused me some problems until i change the name of getline to ggetline.
Reviewing my notes i found this quote:
This simplification is convenient and superficially attractive, and it
works, as far as it goes. The problem is that scanf does not work well
in more complicated situations. In section 7.1, we said that calls to
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline. Worse, it turns out that
scanf's error handling is inadequate for many purposes. It tells you
whether a conversion succeeded or not (more precisely, it tells you
how many conversions succeeded), but it doesn't tell you anything more
than that (unless you ask very carefully). Like atoi and atof, scanf
stops reading characters when it's processing a %d or %f input and it
finds a non-numeric character. Suppose you've prompted the user to
enter a number, and the user accidentally types the letter 'x'. scanf
might return 0, indicating that it couldn't convert a number, but the
unconvertable text (the 'x') remains on the input stream unless you
figure out some other way to remove it.
For these reasons (and several others, which I won't bother to
mention) it's generally recommended that scanf not be used for
unstructured input such as user prompts. It's much better to read
entire lines with something like getline (as we've been doing all
along) and then process the line somehow. If the line is supposed to
be a single number, you can use atoi or atof to convert it. If the
line has more complicated structure, you can use sscanf (which we'll
meet in a minute) to parse it. (It's better to use sscanf than scanf
because when sscanf fails, you have complete control over what you do
next. When scanf fails, on the other hand, you're at the mercy of
where in the input stream it has left you.)
At first i thought this quote was from K&R, but i cannot find it in the book. Then i realized thats it's from lecture notes i got online, for someone who taught a course years ago using K&R book.
lecture notes
I know that K&R book is 30 years old now, so it is dated is some ways.
This quote is very old, so i was wondering if scanf still has this behavior or has it changed?
Does scanf still leave stuff in the input stream when it fails? for example above:
Suppose you've prompted the user to enter a number, and the user
accidentally types the letter 'x'. scanf might return 0, indicating
that it couldn't convert a number, but the unconvertable text (the
'x') remains on the input stream.
Is the following still true?
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline.
Has scanf changed much since the quotes above were written? Or are they still true today?
The reason i ask, in the newer books i am reading, no one mentions these issues.
scanf() is evil - use fgets() and then parse.
The detail is not that scanf() is completely bad.
1) The format specifiers are often used in a weak manner
char buf[100];
scanf("%s", buf); // bad - no width limit
2) The return value is errantly not checked
scanf("%99[\n]", buf); // what if use entered `"\n"`?
puts(buf);
3) When input is not as expected, it is not clear what remains in stdin.
if (scanf("%d %d %d", &i, &j, &k) != 3) {
// OK, not what is in `stdin`?
}
you can have baffling problems if you try to intermix calls to scanf with calls to getchar or getline.
Yes. Many scanf() calls leave a trailing '\n' in stdin that are then read as an empty line by getline(), fgets(). scanf() is not for reading lines. getline() and fgets() are much better suited to read a line.
Has scanf changed much since the quotes above were written?
Only so much change can happen without messing up the code base. #Jonathan Leffler
scanf() remains troublesome. scanf() is unable to accept an argument (after the format) to indicate how many characters to accept into a char * destination.
Some systems have added additional format options to help.
A fundamental issues is this:
User input is evil. It is more robust to get the text input as one step, qualify input, then parse and assess its success than trying to do all this in one function.
Security
The weakness of scanf() and coder tendency to code scanf() poorly has been a gold mine for hackers.
IMO, C lacks a robust user input function set.

Are gets() and puts() technically sound to be used for strings?

my question is about
gets()
and
puts()
are they a perfect solution for string input and output?
gets is marked as obsolescent in C99 and has been removed in C11 because of security issues with this function. Don't use it, use fgets instead. As an historical note, gets was exploited (in fingerd) by the first massive internet worm: the inet worm back in 1988.
puts function is OK if it fits your needs.
gets() is fundamentally insecure in a really horrific way: it will write an unlimited number of characters to its argument, overflowing any buffer it is provided. As such, it should never, ever be used. Many newer compilers will issue an automatic warning if you use it. Instead, use fgets(), which takes a length argument:
char buf[...];
fgets(buf, sizeof(buf), stdin);
On the other hand, puts() is totally fine. It's equivalent to printf("%s\n", x);, and some compilers will in fact convert certain constant printf() calls to puts() as a standard optimization. Go wild.
For gets, see the man page:
BUGS
Never use gets(). Because it is impossible to tell without knowing
the data in advance how many characters gets() will read, and because gets() will continue to store
characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
puts is fine, if you're just looking to write a string to stdout.

if one complains about gets(), why not do the same with scanf("%s",...)?

From man gets:
Never use gets(). Because it is
impossible to tell without knowing the
data in advance how many
characters gets() will read, and
because gets() will continue to store
characters past the end of the buffer,
it is extremely dangerous to use.
It has been used to break computer
security. Use fgets() instead.
Almost everywhere I see scanf being used in a way that should have the same problem (buffer overflow/buffer overrun): scanf("%s",string). This problem exists in this case? Why there are no references about it in the scanf man page? Why gcc does not warn when compiling this with -Wall?
ps: I know that there is a way to specify in the format string the maximum length of the string with scanf:
char str[10];
scanf("%9s",str);
edit: I am not asking to determe if the preceding code is right or not. My question is: if scanf("%s",string) is always wrong, why there are no warnings and there is nothing about it in the man page?
The answer is simply that no-one has written the code in GCC to produce that warning.
As you point out, a warning for the specific case of "%s" (with no field width) is quite appropriate.
However, bear in mind that this is only the case for the case of scanf(), vscanf(), fscanf() and vfscanf(). This format specifier can be perfectly safe with sscanf() and vsscanf(), so the warning should not be issued in that case. This means that you cannot simply add it to the existing "scanf-style-format-string" analysis code; you will have to split that into "fscanf-style-format-string" and "sscanf-style-format-string" options.
I'm sure if you produce a patch for the latest version of GCC it stands a good chance of being accepted (and of course, you will need to submit patches for the glibc header files too).
Using gets() is never safe. scanf() can be used safely, as you said in your question. However, determining if you're using it safely is a more difficult problem for the compiler to work out (e.g. if you're calling scanf() in a function where you pass in the buffer and a character count as arguments, it won't be able to tell); in that case, it has to assume that you know what you're doing.
When the compiler looks at the formatting string of scanf, it sees a string! That's assuming the formatting string is not entered at run-time. Some compilers like GCC have some extra functionality to analyze the formatting string if entered at compile time. That extra functionality is not comprehensive, because in some situations a run-time overhead is needed which is a NO NO for languages like C. For example, can you detect an unsafe usage without inserting some extra hidden code in this case:
char* str;
size_t size;
scanf("%z", &size);
str = malloc(size);
scanf("%9s"); // how can the compiler determine if this is a safe call?!
Of course, there are ways to write safe code with scanf if you specify the number of characters to read, and that there is enough memory to hold the string. In the case of gets, there is no way to specify the number of characters to read.
I am not sure why the man page for scanf doesn't mention the probability of a buffer overrun, but vanilla scanf is not a secure option. A rather dated link - Link shows this as the case. Also, check this (not gcc but informative nevertheless) - Link
It may be simply that scanf will allocate space on the heap based on how much data is read in. Since it doesn't allocate the buffer and then read until the null character is read, it doesn't risk overwriting the buffer. Instead, it reads into its own buffer until the null character is found, and presumably copies that buffer into another of the correct size at the end of the read.

Resources